Matthias Boehm is a full professor for large-scale data engineering at Technische Universität Berlin
and the BIFOLD center of excellence for AI research. His research group focuses on high-level,
data science-centric abstractions as well as systems and tools to execute these tasks in an efficient and scalable manner.
From 2018 through 2022, Matthias was a BMK-endowed professor for data management at Graz University of Technology,
Austria, and a research area manager for data management at the co-located Know-Center GmbH.
Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA,
with a major focus on compilation and runtime techniques for declarative, large-scale machine learning in Apache SystemML.
Matthias received his Ph.D. from Dresden University of Technology, Germany in 2011 with a
dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting
as well as in-memory indexing and query processing.
Current Projects:
Apache SystemDS (An open source ML system for the end-to-end data science lifecycle),
DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines; w/ AVL, DLR, ETH Zurich, HPI Potsdam, ICCS, Infineon, Intel, ITU Copenhagen, KAI, TU Dresden, Uni Maribor, Uni Basel), and
FONDA II (Foundations of Workflows for Large-Scale Scientific Data Analysis; w/ BAM, Charite, FU Berlin, GFZ, HU Berlin, MDC, TU Darmstadt, Uni Potsdam, Zuse-Institut)
Completed Projects:
ExDRa (06/2019-08/2022, exploratory data science and federated ML over raw data; w/ Siemens, DFKI, TU Berlin, and TU Graz),
ReWaste F (04/2021-08/2022, digital platform for austrian recycling economy;
w/ 4 scientific and 14 industrial partners)
Team
The DAMS Lab (data management for data science laboratory) is a cross-organizational research group uniting the chair for big data engineering at TU Berlin and external members from multiple universities and industry.
Secretary:
- Sarah Hashmi (since 09/2023): N/A
Postdocs:
PhD students:
Undergrad research assistants:
- Rene Richard Enjilian (since 09/2023): N/A
- Maximilian Schreff (since 10/2023): N/A
- Alexander Richard Terschlüsen (since 11/2023): N/A
- Elias Strauß (since 05/2024): N/A
- Eric Theiler (since 06/2024): N/A
- Samin Bassiri (since 06/2024): N/A
- Dominic Arne Stöcker (since 07/2024): N/A
PhD Theses (completed):
Master Theses (completed): Svetlana Sagadeeva (2020), Simon Kysela (2021), Florijan Klezin (2022), Florian Lackner (2022), Pooja Veeresh Yeli (2022), Mito Kehayov (2022), Michael Hofer (2023), Vlad-Andrei Dumitru (2023), Christina Dionysio (2023), Philipp Ortner (2023), Damian Dinoiu (2024), Marlon Adam (2024), Moneer Martini (2024), Obeidah Awni Salim Smadi (2024), Linus Bruckner (2024), Pablo Uxo Castillo (2024), Louis Le Page (2024), Ann-Sophie Messerschmid (2024), Lotta Fagel (2024), Sujitkumar Suresh Gavali (2024), Ahmed Boulila (2024)
Bachelor Theses (completed): Benjamin Rath (2019), Sandro Letter (2020), Valentin Edelsbrunner (2021), Tobias Rieger (2021), Kevin Innerebner (2021), Thomas Krametter (2022), Thomas Wedenig (2022), Jonathan Resch (2022), Olga Ovcharenko (2022), David Weissteiner (2022), Dževad Ćoralić (2022), Lukas Erlbacher (2022), Jonathan Haberl (2023), Gabriel Alexandru Muresan (2023), Emil Winterleitner (2023), Mario Schwaiger (2023), Richard Bendler (2023), Mark Paranskij (2023), Fares Kataf (2023), Danial Alnicola (2023), Elias Strauß (2024), Andreas Kreppold (2024), Kubilay Eren (2024), Kristiyan Blagov (2024), Maltrim Ebipi (2024), Tessa Heidkamp (2024), Marvin Seidel (2024), Eduard Chalovski (2024), Kasem Celebi (2024), Cenk Özdaglar (2024), Thanh Nguyen (2024), Frederic Caspar Zoepffel (2024), Paul Olaf Theodor Pohlitz (2024), Anton Simon Horeis (2024), Marcel Scholand (2024), Yoana Tsoneva (2024)
We're looking for motivated PhD, master, and bachelor students to join our team.
Our research focuses on building ML systems and tools for simplifying the data science liefecycle –
from data integration over model training to deployment and scoring – via high-level language
abstractions and specialized compiler and runtime techniques. If you're interested in working with us, please contact us
directly via email to jobs@dams.tu-berlin.de.
New Master/Bachelor Theses:
The DAMS Lab currently receives a large number of requests for supervising master and bachelor theses.
In order to make the process of matching topics to students more scalable, while ensuring a high-quality
of supervision, as of Jan 2024, we are introducing the following process:
- Open thesis topics are listed at TU Berlin (I)STROD (direct link)
- If interested in a particular topic, contact the listed supervisor (PhD student, postdoc, or me) directly.
- We are also open to joint topics with industrial partners or your own project proposal, but note that TU Berlin
Faculity IV does not allow confidential tags (Sperrvermerke). For such projects, please directly contact me.
- Basic administrative aspects of writing a thesis with us include:
(1) the thesis must be written in English language and using LaTeX,
(2) you need to agree to defend the thesis between thesis submission and grading
(but without major impact on grading) in front of the DAMS Lab (30min talk, 15min Q&A),
(3) we usually have regular one-on-one meetings (~ every three weeks) to talk about progress and problems, and
(4) the thesis is officially registered about 1-2 months after starting the project (once there is
mutual agreement about the scope and approach for addressing the given problem or research question).
- If you successfully completed other courses of our group, please indicate
that in your application because we preferrably accept students with matching background.
Publications
This publication list covers the time after my PhD and postdoc.
For a full list see
DBLP and
Google Scholar.
My ORCID is
0000-0003-1344-3663.
For additional interactive rankings see
CSRankings and
Influential DB Papers.
2025
- Arnab Phani, Matthias Boehm: MEMPHIS: Holistic Lineage-based Reuse and
Memory Management for Multi-backend ML Systems, EDBT 2025.
2024
- Vlad Dumitru, Matthias Boehm, Martin Hagmueller, Barbara Schuppler: Version Control for Speech Corpora, KONVENS 2024.
- Matthias Boehm, Nesime Tatbul: Special issue on Machine learning and Databases, VLDB Journal 2024 33(4).
- Matthias Boehm: Contribution to Reminiscences on Influential Papers: Database Cracking, SIGMOD Record 2024 53(2). [paper]
- David Justen, Daniel Ritter, Campbell Fraser, Andrew Lamb, Nga Tran, Allison Lee, Thomas Bodner, Mhd Yamen Haddad, Steffen Zeuch, Volker Markl, Matthias Boehm: POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance, PVLDB 2024 17(6). [paper]
- Jiwon Chang, Christina Dionysio, Fatemeh Nargesian, Matthias Boehm: Plutus: Understanding Data Distribution Tailoring for Machine Learning (Demo), SIGMOD 2024. [paper]
- Shafaq Siddiqi, Roman Kern, Matthias Boehm: SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications, SIGMOD 2024. [paper]
2023
- Matthias Boehm, Matteo Interlandi, Chris Jermaine: Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques (Tutorial), SIGMOD 2023. [paper, slides (pdf, pptx)]
- Saeed Fathollahzadeh, Matthias Boehm: GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example, SIGMOD 2023 [paper, slides, poster, repro].
- Sebastian Baunsgaard, Matthias Boehm: AWARE: Workload-aware, Redundancy-exploiting Linear Algebra, SIGMOD 2023 [paper, slides, poster, repro].
- Matthias Boehm, Madelon Hulsebos, Shreya Shankar, Paroma Varma: Seventh Workshop on Data Management for End-to-End Machine Learning (DEEM) DEEM@SIGMOD 2023 [paper].
- Manisha Luthra, Andreas Kipf, Matthias Boehm: A Tutorial Workshop on ML for Systems and Systems for ML, WoLS@BTW 2023.
- Patrick Damme, Matthias Boehm: Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility, NoDMC@BTW 2023. [paper]
2022
- Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, Florian Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner and Sebastian Benjamin Wrede: Federated Data Preparation, Learning, and Debugging in Apache SystemDS (Demo), CIKM 2022. [paper, poster, ACM DL (OpenAccess)]
- Arnab Phani, Lukas Erlbacher, Matthias Boehm: UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads, PVLDB 2022 15(11). [paper]
- Matthias Boehm, Paroma Varma, Doris Xin: DEEM'22: Data Management for End-to-End Machine Learning, DEEM@SIGMOD 2022. [paper]
- Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, Florina Ciorba, Mark Dokter, Pawel Dowgiallo, Ahmed Eleliemy, Christian Faerber, Georgios Goumas, Dirk Habich, Niclas Hedam, Marlies Hofer, Wenjun Huang, Kevin Innerebner, Vasileios Karakostas, Roman Kern, Tomaž Kosar, Alexander Krause, Daniel Krems, Andreas Laber, Wolfgang Lehner, Eric Mier, Marcus Paradies, Bernhard Peischl, Gabrielle Poerwawinata, Stratos Psomadakis, Tilmann Rabl, Piotr Ratuszniak, Pedro Silva, Nikolai Skuppin, Andreas Starzacher, Benjamin Steinwender, Ilin Tolovski, Pınar Tözün, Wojciech Ulatowski, Yuanyuan Wang, Izajasz Wrosz, Aleš Zamuda, Ce Zhang, Xiao Xiang Zhu: DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines, CIDR 2022. [paper, slides, video]
2021
- Svetlana Sagadeeva, Matthias Boehm: SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging, SIGMOD 2021. [paper, repro]
- Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Marian Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch: ExDRa: Exploratory Data Science on Federated Raw Data, SIGMOD 2021. [paper, slides, ACM DL (OpenAccess), repro]
- Arnab Phani, Benjamin Rath, Matthias Boehm: LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems, SIGMOD 2021. [paper, repro]
2020
- Prithviraj Sen, Marina Danilevsky, Yunyao Li, Siddhartha Brahma, Matthias Boehm, Laura Chiticariu, Rajasekar Krishnamurthy: Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification, EMNLP 2020.
- Matthias Boehm: Technical Perspective: Declarative Recursive Computation on an RDBMS, SIGMOD Record 2020 49(1). [paper]
- Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthör, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqi, Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle, CIDR 2020. [paper, slides]
2019
- Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, Peter J. Haas: MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions, SIGMOD 2019. [paper, slides, poster]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, Commun. ACM 2019 62(5). [paper, Link]
- Matthias Boehm, Arun Kumar, Jun Yang: Data Management in Machine Learning Systems. Synthesis Lectures on Data Management 11 (1), Morgan & Claypool Publishers 2019. [book]
- Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald: Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning, BTW 2019. [paper, slides]
2018
- Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, Niketan Pansare: On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML, PVLDB 2018 11(12). [paper, slides, poster]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, VLDB Journal 2018 27(5). [paper, link]
- Matthias Boehm: Apache SystemML – Declarative Large-Scale Machine Learning, Encyclopedia of Big Data Technologies 2018. [paper]
- Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen: Deep Learning with Apache SystemML, SysML 2018. [paper]
2017
- Arun Kumar, Matthias Boehm, Jun Yang: Data Management in Machine Learning: Challenges, Techniques, and Systems (Tutorial), SIGMOD 2017. [paper, slides, video]
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Scaling Machine Learning via Compressed Linear Algebra, SIGMOD Record 2017 46(1). [paper]
- Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning, CIDR 2017. [paper, slides]
2016
- Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, PVLDB 2016 9(12). [paper, slides, poster]
- Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, Shirish Tatikonda: SystemML: Declarative Machine Learning on Spark, PVLDB 2016 9(13). [paper, slides]
- Matthias Boehm, Alexandre V. Evfimievski, Niketan Pansare, Berthold Reinwald: Declarative Machine Learning - A Classification of Basic Properties and Types, CoRR 2016 abs/1605.05826. [paper]
2015
- Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On Optimizing Machine Learning Workloads via Kernel Fusion, PPOPP 2015. [paper]
- Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning, SIGMOD 2015. [paper, slides, poster]
- Matthias Boehm: Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs, CoRR 2015 abs/1503.06384. [paper]
2014
- Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs, IEEE Data Eng. Bull. 2014 37(3). [paper]
- Matthias Boehm, Dirk Habich, Wolfgang Lehner: On-Demand Re-Optimization of Integration Flows. Inf. Syst. 2014 45. [paper]
- Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, J. Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning, IPDPS Workshop ParLearning 2014. [paper, slides]
- Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML, PVLDB 2014 7(7). [paper, slides, poster]
Service
Reviewing: This list summarizes PC memberships and review activities, again after my PhD and postdoc.
- Program Committee EDBT 2026
- Program Committee SIGMOD 2025, PVLDB 2025 (SPC), EDBT 2025 (SPC); SIGMOD 2025 Finance Chair; EDBT 2025 Workshop Co-Chair
- Program Committee SIGMOD 2024, PVLDB 2024 (SPC and best paper), SoCC 2024; DBML@ICDE 2024, DEEM@SIGMOD 2024, GUIDE-AI@SIGMOD 2024, DEARING@ECMLPKDD 2024; Datenbankspektrum 2024
- Program Committee SIGMOD 2023, PVLDB 2023 Industry, EDBT 2023 (SPC), SoCC 2023; SIGMOD 2023 SRC, SIGMOD 2023 Reproducibility, BTW 2023 Reproducibility, DE4DS@BTW 2023; Journal Reviewer VLDBJ 2023, TKDE 2023, Information Systems 2023, Datenbankspektrum 2023; Workshop Co-Chair DEEM@SIGMOD 2023, WoLS@BTW 2023, EVEREST+DAPHNE 2023; VLDB 2023 Workshop Co-Chair; VLDBJ 2023 Special Issue Editor
- Program Committee PVLDB 2022 (PC and SPC), PVLDB 2022 Industry, EDBT 2022, EDBT 2022 Industry, CIDR 2022, SIGMOD 2022 SRC, LWDA 2022; SIGMOD 2022 Reproducibility; Workshop Co-Chair DEEM@SIGMOD 2022; Journal Reviewer VLDBJ 2022, JMLR OSS track 2022, Datenbankspektrum 2022
- Program Committee SIGMOD 2021, SIGMOD 2021 Industry, CIDR 2021, ICDE 2021 Demo, PhD@PVLDB 2021, DBAI@NeurIPS 2021; Journal Reviewer VLDBJ 2021, SIGMOD Record 2021, Datenbankspektrum 2021;
Workshop Co-Chair DEEM@SIGMOD 2021, Track Chair (Data Science) BTW 2021
- Program Committee SIGMOD 2020, PVLDB 2020, ICDE 2020, CIDR 2020, DEEM@SIGMOD 2020, PhD@PVLDB 2020; Journal Reviewer VLDBJ 2020; GI working group Data Science (since 03/2020)
- Program Committee SIGMOD 2019 (SPC), PVLDB 2019, ICDE 2019, EDBT 2019, DEEM@SIGMOD 2019, AIDB@PVLDB 2019; Journal Reviewer TKDE 2019
- Program Committee PVLDB 2018, EDBT 2018 Industry, DEEM@SIGMOD 2018, WebDB 2018, EBDVF 2018
- Program Committee ICDE 2017 Demo, DEEM@SIGMOD 2017; Journal Reviewer SIGMOD Record 2017/18
- Journal Reviewer TKDE 2016/17, ACM Computing Surveys 2016, IBM Journal R&D 2016; External Reviewer CIKM 2016
- Program Committee SSDBM 2015; Journal Reviewer Information Systems 2015; External Reviewer SIGMOD Record 2015
Professor Hiring Committee Memberships (completed): Artificial Intelligence (TU Graz, 2019), Remote Sensing (TU Graz, 2021),
Data Science Processes (TU Berlin, 2023), Information Integration (TU Berlin, 2023), Data Systems (HPI, 2023),
Data Systems (ITU Copenhagen, 2023), Big Data Infrastructures (University of St.Gallen, 2023),
Systems for Data Analysis (KTH Stockholm, 2023/2024), Computer Science (Aalborg University, 2024)
PhD Committee Memberships (completed): Andreas Kunft (TU Berlin DIMA, 2019), Joseph Vinish D'Silva (McGill University, 2020),
Shaoduo Gan (ETH Zurich, 2021), Gábor Gévay (TU Berlin DIMA, 2022), Clemens Lutz (TU Berlin DIMA, 2022),
Alexander Renz-Wieland (TU Berlin DIMA, 2022), Gencer Sümbül (TU Berlin RSiM, 2023), Martino Ciaperoni (Alto University, 2023),
Philipp Grulich (TU Berlin DIMA, 2023), Viktor Rosenfeld (TU Berlin DIMA, 2023), Lisa Raithel (TU Berlin QU, 2024),
Francesco Tosoni (University of Pisa, 2024), Aikaterini Katsarou (TU Berlin SNET, 2024), Sören Becker (TU Berlin DOS, 2024),
Vera Schmitt (TU Berlin QU, 2024), Binger Chen (TU Berlin D2IP, 2024), Felix Neutatz (TU Berlin D2IP, 2024),
Thorsten Wittkopp (TU Berlin DOS, 2024), Dennis Treder-Tschechlov (University of Stuttgart, 2024)
Acknowledgements
Our research group is grateful for funding support from
- 2022-present: BMBF, TU Berlin, BIFOLD, as well as the European Union's Horizon 2020 research and innovation program, and German Research Foundation (DFG) program for collaborative research centers.
- 2018-2022: BMK, the BMK/FFG program "ICT of the Future", the European Union's Horizon 2020 research and innovation program, TU Graz, AVL LIST, Infineon Technologies Austria, Magna Steyr Fahrzeugtechnik, voestalpine Stahl Donawitz, and Know-Center.