Matthias Boehm

Technische Universität Berlin
Faculty IV - Electrical Engineering and Computer Science
Berlin Institute for the Foundations of Learning and Data (BIFOLD)
Big Data Engineering (DAMS Lab) Group
Office: Room TEL-0815, Ernst-Reuter-Platz 7, 10587 Berlin
Matthias Boehm

Matthias Boehm is a full professor for large-scale data engineering at Technische Universität Berlin and the BIFOLD center of excellence. His cross-organizational research group focuses on high-level, data science-centric abstractions as well as systems and tools to execute these tasks in an efficient and scalable manner. From 2018 through 2022, Matthias was a BMK-endowed professor for data management at Graz University of Technology, Austria, and a research area manager for data management at the co-located Know-Center GmbH. Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA, with a major focus on compilation and runtime techniques for declarative, large-scale machine learning in Apache SystemML. Matthias received his Ph.D. from Dresden University of Technology, Germany in 2011 with a dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award, a 2016 SIGMOD Research Highlight Award, a 2016 IBM Pat Goldberg Memorial Best Paper Award, and the 2021 SIGMOD DS&E Best Paper Award.

Current Projects: Apache SystemDS (An open source ML system for the end-to-end data science lifecycle), and DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines, w/ AVL, DLR, ETH Zurich, HPI Potsdam, ICCS, Infineon, Intel, ITU Copenhagen, KAI, TU Dresden, Uni Maribor, Uni Basel)

Completed Projects: ExDRa (06/2019-08/2022, exploratory data science and federated ML over raw data, w/ Siemens, DFKI, TU Berlin, and TU Graz)


The DAMS Lab (data management for data science laboratory) is a cross-organizational research group uniting the chair for big data engineering at TU Berlin and external members from multiple universities and industry.


  • Sarah Hashmi (since 09/2023): N/A


PhD students:

Undergrad research assistants:

  • Rene Richard Enjilian (since 09/2023): N/A
  • Maximilian Schreff (since 10/2023): N/A
  • Alexander Richard Terschlüsen (since 10/2023): N/A
  • Elias Strauß (since 05/2024): N/A

Master Theses (completed): Svetlana Sagadeeva (2020), Simon Kysela (2021), Florijan Klezin (2022), Florian Lackner (2022), Pooja Veeresh Yeli (2022), Mito Kehayov (2022), Michael Hofer (2023), Vlad-Andrei Dumitru (2023), Christina Dionysio (2023), Philipp Ortner (2023), Damian Dinoiu (2024), Marlon Adam (2024), Moneer Martini (2024), Obeidah Awni Salim Smadi (2024)

Bachelor Theses (completed): Benjamin Rath (2019), Sandro Letter (2020), Valentin Edelsbrunner (2021), Tobias Rieger (2021), Kevin Innerebner (2021), Thomas Krametter (2022), Thomas Wedenig (2022), Jonathan Resch (2022), Olga Ovcharenko (2022), David Weissteiner (2022), Dževad Ćoralić (2022), Lukas Erlbacher (2022), Jonathan Haberl (2023), Gabriel Alexandru Muresan (2023), Emil Winterleitner (2023), Mario Schwaiger (2023), Richard Bendler (2023), Mark Paranskij (2023), Fares Kataf (2023), Danial Alnicola (2023), Elias Strauß (2024), Andreas Martin Krepphold (2024)

We're looking for motivated PhD, master, and bachelor students to join our team. Our research focuses on building ML systems and tools for simplifying the data science liefecycle – from data integration over model training to deployment and scoring – via high-level language abstractions and specialized compiler and runtime techniques. If you're interested in working with us, please contact us directly via email to

New Master/Bachelor Theses:
The DAMS Lab currently receives a large number of requests for supervising master and bachelor theses. In order to make the process of matching topics to students more scalable, while ensuring a high-quality of supervision, as of Jan 2024, we are introducing the following process:

  • Open thesis topics are listed at TU Berlin (I)STROD (direct link)
  • If interested in a particular topic, contact the listed supervisor (PhD student, postdoc, or me) directly.
  • We are also open to joint topics with industrial partners or your own project proposal, but note that TU Berlin Faculity IV does not allow confidential tags (Sperrvermerke). For such projects, please directly contact me.
  • Basic administrative aspects of writing a thesis with us include: (1) the thesis must be written in English language and using LaTeX, (2) you need to agree to defend the thesis between thesis submission and grading (but without major impact on grading) in front of the DAMS Lab (30min talk, 15min Q&A), (3) we usually have regular one-on-one meetings (~ every three weeks) to talk about progress and problems, and (4) the thesis is officially registered about 1-2 months after starting the project (once there is mutual agreement about the scope and approach for addressing the given problem or research question).
  • If you successfully completed other courses of our group, please indicate that in your application because we preferrably accept students with matching background.


This publication list covers the time after my PhD and postdoc. For a full list see DBLP and Google Scholar. My ORCID is 0000-0003-1344-3663. For additional interactive rankings see CSRankings and Influential DB Papers.
  • David Justen, Daniel Ritter, Campbell Fraser, Andrew Lamb, Nga Tran, Allison Lee, Thomas Bodner, Mhd Yamen Haddad, Steffen Zeuch, Volker Markl, Matthias Boehm: POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance, PVLDB 2024 17(6). [paper]
  • Jiwon Chang, Christina Dionysio, Fatemeh Nargesian, Matthias Boehm: Plutus: Understanding Data Distribution Tailoring for Machine Learning (Demo), SIGMOD 2024.
  • Shafaq Siddiqi, Roman Kern, Matthias Boehm: SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications, SIGMOD 2024. [paper]
  • Matthias Boehm, Matteo Interlandi, Chris Jermaine: Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques (Tutorial), SIGMOD 2023. [paper, slides (pdf, pptx)]
  • Saeed Fathollahzadeh, Matthias Boehm: GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example, SIGMOD 2023 [paper, slides, poster, repro].
  • Sebastian Baunsgaard, Matthias Boehm: AWARE: Workload-aware, Redundancy-exploiting Linear Algebra, SIGMOD 2023 [paper, slides, poster, repro].
  • Matthias Boehm, Madelon Hulsebos, Shreya Shankar, Paroma Varma: Seventh Workshop on Data Management for End-to-End Machine Learning (DEEM) DEEM@SIGMOD 2023 [paper].
  • Manisha Luthra, Andreas Kipf, Matthias Boehm: A Tutorial Workshop on ML for Systems and Systems for ML, WoLS@BTW 2023.
  • Patrick Damme, Matthias Boehm: Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility, NoDMC@BTW 2023. [paper]
  • Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, Florian Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner and Sebastian Benjamin Wrede: Federated Data Preparation, Learning, and Debugging in Apache SystemDS (Demo), CIKM 2022. [paper, poster, ACM DL (OpenAccess)]
  • Arnab Phani, Lukas Erlbacher, Matthias Boehm: UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads, PVLDB 2022 15(11). [paper]
  • Matthias Boehm, Paroma Varma, Doris Xin: DEEM'22: Data Management for End-to-End Machine Learning, DEEM@SIGMOD 2022. [paper]
  • Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, Florina Ciorba, Mark Dokter, Pawel Dowgiallo, Ahmed Eleliemy, Christian Faerber, Georgios Goumas, Dirk Habich, Niclas Hedam, Marlies Hofer, Wenjun Huang, Kevin Innerebner, Vasileios Karakostas, Roman Kern, Tomaž Kosar, Alexander Krause, Daniel Krems, Andreas Laber, Wolfgang Lehner, Eric Mier, Marcus Paradies, Bernhard Peischl, Gabrielle Poerwawinata, Stratos Psomadakis, Tilmann Rabl, Piotr Ratuszniak, Pedro Silva, Nikolai Skuppin, Andreas Starzacher, Benjamin Steinwender, Ilin Tolovski, Pınar Tözün, Wojciech Ulatowski, Yuanyuan Wang, Izajasz Wrosz, Aleš Zamuda, Ce Zhang, Xiao Xiang Zhu: DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines, CIDR 2022. [paper, slides, video]
  • Svetlana Sagadeeva, Matthias Boehm: SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging, SIGMOD 2021. [paper, repro]
  • Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Marian Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch: ExDRa: Exploratory Data Science on Federated Raw Data, SIGMOD 2021. [paper, slides, ACM DL (OpenAccess), repro]
  • Arnab Phani, Benjamin Rath, Matthias Boehm: LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems, SIGMOD 2021. [paper, repro]
  • Prithviraj Sen, Marina Danilevsky, Yunyao Li, Siddhartha Brahma, Matthias Boehm, Laura Chiticariu, Rajasekar Krishnamurthy: Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification, EMNLP 2020.
  • Matthias Boehm: Technical Perspective: Declarative Recursive Computation on an RDBMS, SIGMOD Record 2020 49(1). [paper]
  • Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthör, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqi, Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle, CIDR 2020. [paper, slides]
  • Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, Peter J. Haas: MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions, SIGMOD 2019. [paper, slides, poster]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, Commun. ACM 2019 62(5). [paper, Link]
  • Matthias Boehm, Arun Kumar, Jun Yang: Data Management in Machine Learning Systems. Synthesis Lectures on Data Management 11 (1), Morgan & Claypool Publishers 2019. [book]
  • Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald: Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning, BTW 2019. [paper, slides]
  • Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, Niketan Pansare: On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML, PVLDB 2018 11(12). [paper, slides, poster]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, VLDB Journal 2018 27(5). [paper, link]
  • Matthias Boehm: Apache SystemML – Declarative Large-Scale Machine Learning, Encyclopedia of Big Data Technologies 2018. [paper]
  • Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen: Deep Learning with Apache SystemML, SysML 2018. [paper]
  • Arun Kumar, Matthias Boehm, Jun Yang: Data Management in Machine Learning: Challenges, Techniques, and Systems (Tutorial), SIGMOD 2017. [paper, slides, video]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Scaling Machine Learning via Compressed Linear Algebra, SIGMOD Record 2017 46(1). [paper]
  • Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning, CIDR 2017. [paper, slides]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, PVLDB 2016 9(12). [paper, slides, poster]
  • Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, Shirish Tatikonda: SystemML: Declarative Machine Learning on Spark, PVLDB 2016 9(13). [paper, slides]
  • Matthias Boehm, Alexandre V. Evfimievski, Niketan Pansare, Berthold Reinwald: Declarative Machine Learning - A Classification of Basic Properties and Types, CoRR 2016 abs/1605.05826. [paper]
  • Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On Optimizing Machine Learning Workloads via Kernel Fusion, PPOPP 2015. [paper]
  • Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning, SIGMOD 2015. [paper, slides, poster]
  • Matthias Boehm: Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs, CoRR 2015 abs/1503.06384. [paper]
  • Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs, IEEE Data Eng. Bull. 2014 37(3). [paper]
  • Matthias Boehm, Dirk Habich, Wolfgang Lehner: On-Demand Re-Optimization of Integration Flows. Inf. Syst. 2014 45. [paper]
  • Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, J. Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning, IPDPS Workshop ParLearning 2014. [paper, slides]
  • Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML, PVLDB 2014 7(7). [paper, slides, poster]


Summer 2024
  • Architecture of Machine Learning Systems (Lecture/Exercises). [course website]
  • Programmierpraktikum: Datensysteme (Practicals). [course website]
  • Large-scale Data Engineering (Seminar/Project by Patrick Damme). [course website]
  • Joint Seminar on Machine Learning and Data Management (with ML group). [course website]
Winter 2023/24
  • Data Integration and Large-Scale Analysis (Lecture/Exercises). [course website]
  • Programmierpraktikum: Datensysteme (Practicals). [course website]
  • Architecture of Database Systems (@ TU Graz). [course website]
  • Large-scale Data Engineering (Seminar/Project by Patrick Damme). [course website]
  • Joint Seminar on Machine Learning and Data Management (with ML group). [course website]
Summer 2023
  • Architecture of Machine Learning Systems (Lecture/Exercises). [course website]
  • Large-scale Data Engineering (Seminar/Project by Patrick Damme). [course website]
  • Joint Seminar on Machine Learning and Data Management (with ML group). [course website]
Winter 2022/23
  • Architecture of Machine Learning Systems (@ Aalborg University). [course website]
  • Joint Seminar on Machine Learning and Data Management (with ML group). [course website]
Summer 2022
Winter 2021/22
Summer 2021
Winter 2020/21
Summer 2020
Winter 2019/20
Summer 2019


Reviewing: This list summarizes PC memberships and review activities, again after my PhD and postdoc.
  • Program Committee SIGMOD 2025, PVLDB 2025 (SPC), EDBT 2025 (SPC); SIGMOD 2025 Finance Chair; EDBT 2025 Workshop Co-Chair
  • Program Committee SIGMOD 2024, PVLDB 2024 (SPC), SoCC 2024; DBML@ICDE 2024, DEEM@SIGMOD 2024, GUIDE-AI@SIGMOD 2024
  • Program Committee SIGMOD 2023, PVLDB 2023 Industry, EDBT 2023 (SPC), SoCC 2023; SIGMOD 2023 SRC, SIGMOD 2023 Reproducibility, BTW 2023 Reproducibility, DE4DS@BTW 2023; Journal Reviewer VLDBJ 2023, TKDE 2023, Information Systems 2023, Datenbankspektrum 2023; Workshop Co-Chair DEEM@SIGMOD 2023, WoLS@BTW 2023, EVEREST+DAPHNE 2023; VLDB 2023 Workshop Co-Chair; VLDBJ 2023 Special Issue Editor
  • Program Committee PVLDB 2022 (PC and SPC), PVLDB 2022 Industry, EDBT 2022, EDBT 2022 Industry, CIDR 2022, SIGMOD 2022 SRC, LWDA 2022; SIGMOD 2022 Reproducibility; Workshop Co-Chair DEEM@SIGMOD 2022; Journal Reviewer VLDBJ 2022, JMLR OSS track 2022, Datenbankspektrum 2022
  • Program Committee SIGMOD 2021, SIGMOD 2021 Industry, CIDR 2021, ICDE 2021 Demo, PhD@PVLDB 2021, DBAI@NeurIPS 2021; Journal Reviewer VLDBJ 2021, SIGMOD Record 2021, Datenbankspektrum 2021; Workshop Co-Chair DEEM@SIGMOD 2021, Track Chair (Data Science) BTW 2021
  • Program Committee SIGMOD 2020, PVLDB 2020, ICDE 2020, CIDR 2020, DEEM@SIGMOD 2020, PhD@PVLDB 2020; Journal Reviewer VLDBJ 2020; GI working group Data Science (since 03/2020)
  • Program Committee SIGMOD 2019 (SPC), PVLDB 2019, ICDE 2019, EDBT 2019, DEEM@SIGMOD 2019, AIDB@PVLDB 2019; Journal Reviewer TKDE 2019
  • Program Committee PVLDB 2018, EDBT 2018 Industry, DEEM@SIGMOD 2018, WebDB 2018, EBDVF 2018
  • Program Committee ICDE 2017 Demo, DEEM@SIGMOD 2017; Journal Reviewer SIGMOD Record 2017/18
  • Journal Reviewer TKDE 2016/17, ACM Computing Surveys 2016, IBM Journal R&D 2016; External Reviewer CIKM 2016
  • Program Committee SSDBM 2015; Journal Reviewer Information Systems 2015; External Reviewer SIGMOD Record 2015

Professor Hiring Committee Memberships (completed): Artificial Intelligence (TU Graz, 2019), Remote Sensing (TU Graz, 2021), Data Science Processes (TU Berlin, 2023), Information Integration (TU Berlin, 2023), Data Systems (HPI, 2023), Data Systems (ITU Copenhagen, 2023), Big Data Infrastructures (University of St.Gallen, 2023), Systems for Data Analysis (KTH Stockholm, 2023/2024), Computer Science (Aalborg University, 2024)

PhD Committee Memberships (completed): Andreas Kunft (TU Berlin, 2019), Joseph Vinish D'Silva (McGill University, 2020), Shaoduo Gan (ETH Zurich, 2021), Gábor Gévay (TU Berlin, 2022), Clemens Lutz (TU Berlin, 2022), Alexander Renz-Wieland (TU Berlin, 2022), Gencer Sümbül (TU Berlin, 2023), Martino Ciaperoni (Alto University, 2023), Philipp Grulich (TU Berlin, 2023), Viktor Rosenfeld (TU Berlin, 2023), Lisa Raithel (TU Berlin, 2024), Francesco Tosoni (University of Pisa, 2024), Aikaterini Katsarou (TU Berlin, 2024), Sören Becker (TU Berlin, 2024)


  • Distinguished Reviewer Awards at SIGMOD 2024, SIGMOD 2023, EDBT 2023, PVLDB 2022, SIGMOD 2020, SIGMOD 2019, PVLDB 2019, ICDE 2019.
  • SIGMOD 2024 Best Paper Honorable Mention for our paper on "SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications".
  • SIGMOD 2021 Best Paper Award (track Data Science and Engineering) for our paper on "SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging".
  • VLDB 2016 Best Paper Award for our paper on "Compressed Linear Algebra for Large-Scale Machine Learning", followed by ACM SIGMOD Research Highlight Award, ACM CACM Research Highlight Award, and IBM Pat Goldberg Memorial Best Paper Award.
  • IBM A-Level Accomplishment for contributions to the SystemML project, 2015.
  • Finalist and 2nd place at the 1st SIGMOD Programming Contest (contest challenge: transactional main-memory index) 2009, published at BTW 2011.
  • Distinguished for best diploma thesis at German universities of applied sciences (FBTI, award), and as best graduate of the year at HTW Dresden, in the study program business informatics, 2007.


Our research group is grateful for funding support from
  • 2022-present: BMBF, TU Berlin, BIFOLD, as well as the European Union's Horizon 2020 research and innovation program.
  • 2018-2022: BMK, the BMK/FFG program "ICT of the Future", the European Union's Horizon 2020 research and innovation program, TU Graz, AVL LIST, Infineon Technologies Austria, Magna Steyr Fahrzeugtechnik, voestalpine Stahl Donawitz, and Know-Center.