Architecture of ML Systems SS2022
(VU, 706.550 Architecture of Machine Learning Systems)

AMLS is a 5 ECTS master course, applicable to the master catalogs 'Data Science', 'Machine Learning', 'Software Technology', and 'Interactive and Visual Information Systems'. This course covers the architecture and essential concepts of modern ML systems for both local and large-scale machine learning (ML). These architectures include systems for data-parallel execution (e.g., Spark, Mahout, SystemML), Parameter Servers (e.g., TensorFlow, MXNet, PyTorch), ML lifecycle systems, and the integration of ML into database systems. The covered topics focus primarily on a microscopic view of internal compilation, execution, and data management techniques, but also include a macroscopic view of entire ML pipelines.


Lectures

In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures, which take place Friday's 12.15pm in HS-i5 or virtually.

A: Overview and ML System Internals

  • 01 Introduction and Overview [Mar 04, pdf, pptx]
  • 02 Languages, Architectures, and System Landscape [Mar 11, pdf, pptx]
  • 03 Size Inference, Rewrites, and Operator Selection [Mar 18, pdf, pptx]
  • 04 Operator Fusion and Runtime Adaptation [Apr 01, pdf, pptx]
  • 05 Data- and Task-Parallel Execution [Apr 08, pdf, pptx]
  • 06 Parameter Servers [Apr 29, pdf, pptx]
  • 07 Hybrid Execution and HW Accelerators [May 06, pdf, pptx]
  • 08 Caching, Partitioning, Indexing and Compression [May 13, pdf, pptx]

B: ML Lifecycle Systems

  • 09 Data Acquisition, Cleaning, and Preparation [May 20, pdf, pptx]
  • 10 Model Selection and Management [Jun 03, pdf, pptx]
  • 11 Model Debugging, Fairness, and Explainability [Jun 10, pdf, pptx]
  • 12 Model Serving Systems and Techniques [Jun 17, pdf, pptx]


Project / Exercises

The lectures are accompanied by mandatory programming projects (to the extend of 2 ECTS, i.e, roughly 50 working hours), preferrably in Apache SystemDS (an open source ML system for the end-to-end data science lifecycle), or DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines, OSS release 03/2022).
A list of project proposals and details on alternative exercises (programming contest or ML pipeline) are available here:


Organization

  • Lecturer: Univ.-Prof. Dr.-Ing. Matthias Boehm, ISDS
  • Teaching Assistant: M.Sc. Shafaq Siddiqi, M.Sc. Sebastian Baunsgaard, ISDS
  • Final oral/written exam: TBD
  • Grading: 30% project, 70% final exam