Matthias Boehm is a BMK-endowed professor for data management at Graz University of Technology,
Austria, and a research area manager for data management at the colocated Know-Center GmbH.
His cross-organizational research group focuses on high-level, data science-centric abstractions as well as systems and tools to execute
these tasks in an efficient and scalable manner. Prior to joining TU Graz in 2018, he was a research staff member at
IBM Research - Almaden, CA, USA, with a major focus on compilation and runtime
techniques for declarative, large-scale machine learning in Apache SystemML.
Matthias received his Ph.D. from Dresden University of Technology, Germany in 2011 with a
dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting
as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award, a 2016 SIGMOD Research Highlight
Award, a 2016 IBM Pat Goldberg Memorial Best Paper Award, and the 2021 SIGMOD DS&E Best Paper Award.
Apache SystemDS (An open source ML system for the end-to-end data science lifecycle),
ExDRa (exploratory data science and federated ML over raw data, w/ Siemens, DFKI, and TU Berlin),
DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines, w/ AVL, DLR, ETH Zurich, HPI Potsdam, ICCS, Infineon, Intel, ITU Copenhagen, KAI, TU Dresden, Uni Maribor, Uni Basel), and
ReWaste F (recycling and recovery of waste for future, 4 scientific and 14 industrial partners)
The DAMSLab (data management for data science laboratory) is a cross-organizational research group uniting the data management group of TU Graz and the research area data management of the co-located Know-Center.
We're looking for motivated PhD, master, and bachelor students to join our team. Our research focuses on building ML systems and tools for simplifying the data science liefecycle – from data integration over model training to deployment and scoring – via high-level language abstractions and specialized compiler and runtime techniques. If you're interested, please contact me directly via email.
Open Bachelor/Master Thesis Topics
This publication list covers the last six years. For a full list see DBLP
and Google Scholar
- Svetlana Sagadeeva, Matthias Boehm: SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging. SIGMOD 2021. [paper]
- Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Marian Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch: ExDRa: Exploratory Data Science on Federated Raw Data. SIGMOD 2021. [paper, slides, ACM DL (OpenAccess)]
- Arnab Phani, Benjamin Rath, Matthias Boehm: LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems. SIGMOD 2021. [paper]
- Prithviraj Sen, Marina Danilevsky, Yunyao Li, Siddhartha Brahma, Matthias Boehm, Laura Chiticariu, Rajasekar Krishnamurthy: Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification. EMNLP 2020.
- Matthias Boehm: Technical Perspective: Declarative Recursive Computation on an RDBMS. SIGMOD Record 2020 49(1). [paper]
- Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthör, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqi, Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle CIDR 2020. [paper, slides]
- Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, Peter J. Haas: MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions. SIGMOD 2019. [paper, slides, poster]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning. Commun. ACM 2019 62(5). [paper, Link]
- Matthias Boehm, Arun Kumar, Jun Yang: Data Management in Machine Learning Systems. Synthesis Lectures on Data Management 11 (1), Morgan & Claypool Publishers 2019. [book]
- Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald: Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning. BTW 2019. [paper, slides]
- Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, Niketan Pansare: On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. PVLDB 2018 11(12). [paper, slides, poster]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning. VLDB Journal 2018 27(5). [paper, link]
- Matthias Boehm: Apache SystemML – Declarative Large-Scale Machine Learning. Encyclopedia of Big Data Technologies 2018. [paper]
- Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen: Deep Learning with Apache SystemML. SysML 2018. [paper]
- Arun Kumar, Matthias Boehm, Jun Yang: Data Management in Machine Learning: Challenges, Techniques, and Systems. SIGMOD 2017. [paper, slides, video]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Scaling Machine Learning via Compressed Linear Algebra. SIGMOD Record 2017 46(1). [paper]
- Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning. CIDR 2017. [paper, slides]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning. PVLDB 2016 9(12). [paper, slides, poster]
- Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, Shirish Tatikonda: SystemML: Declarative Machine Learning on Spark. PVLDB 2016 9(13). [paper, slides]
- Matthias Boehm, Alexandre V. Evfimievski, Niketan Pansare, Berthold Reinwald: Declarative Machine Learning - A Classification of Basic Properties and Types. CoRR 2016 abs/1605.05826. [paper]
- Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On Optimizing Machine Learning Workloads via Kernel Fusion. PPOPP 2015. [paper]
- Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning. SIGMOD 2015. [paper, slides, poster]
- Matthias Boehm: Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs. CoRR 2015 abs/1503.06384. [paper]
- Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 2014 37(3). [paper]
- Matthias Boehm, Dirk Habich, Wolfgang Lehner: On-Demand Re-Optimization of Integration Flows. Inf. Syst. 2014 45. [paper]
- Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, J. Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning. IPDPS Workshop ParLearning 2014. [paper, slides]
- Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML. PVLDB 2014 7(7). [paper, slides, poster]
This list summarizes PC memberships and review activities, again of the last six years.
- Program Committee PVLDB 2022, EDBT 2022, CIDR 2022;
- Program Committee SIGMOD 2021, SIGMOD 2021 Industry, CIDR 2021, ICDE 2021 Demo, PVLDB PhD 2021; Journal Reviewer SIGMOD Record 2021, Datenbankspektrum 2021;
Workshop Co-Chair DEEM 2021, Track Chair (Data Science) BTW 2021
- Program Committee SIGMOD 2020, PVLDB 2020, ICDE 2020, CIDR 2020, DEEM 2020, PVLDB PhD 2020; Journal Reviewer VLDBJ 2020; GI working group Data Science (since 03/2020)
- Program Committee SIGMOD 2019, PVLDB 2019, ICDE 2019, EDBT 2019, DEEM 2019, AIDB 2019; Journal Reviewer TKDE 2019
- Program Committee PVLDB 2018, EDBT 2018 Industry, DEEM 2018, WebDB 2018, EBDVF 2018
- Program Committee ICDE 2017 Demo, DEEM 2017; Journal Reviewer SIGMOD Record 2017/18
- Journal Reviewer TKDE 2016/17, ACM Computing Surveys 2016, IBM Journal R&D 2016; External Reviewer CIKM 2016
- Program Committee SSDBM 2015; Journal Reviewer Information Systems 2015; External Reviewer SIGMOD Record 2015
Our research group is grateful for funding support from BMK, the BMK/FFG program "ICT of the Future", the European Union's Horizon 2020 research and innovation program, TU Graz, AVL LIST, Infineon Technologies Austria, Magna Steyr Fahrzeugtechnik, voestalpine Stahl Donawitz, and Know-Center.