Architecture of ML Systems SS2020
(VU, 706.550 Architecture of Machine Learning Systems)

AMLS is a 5 ECTS master course, applicable in the catalog 'Knowledge Technologies' as well as the upcoming catalogs 'Data Science', 'Machine Learning', and 'Interactive and Visual Information Systems'. This course covers the architecture and essential concepts of modern ML systems for supporting large-scale machine learning (ML). These architectures include systems for data-parallel execution (e.g., Spark, Mahout, SystemML), Parameter Servers (e.g., TensorFlow, MXNet, PyTorch), ML lifecycle systems, and the integration of ML into database systems. The covered topics focus primarily on a microscopic view of internal compilation, execution, and data management techniques, but also include a macroscopic view of entire ML pipelines.


Lectures

In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures.

A: Overview and ML System Internals

  • 01 Introduction and Overview [Mar 06, pdf, pptx]
  • 02 Languages, Architectures, and System Landscape [Mar 13, pdf, pptx]
  • 03 Size Inference, Rewrites, and Operator Selection [Mar 20, pdf, pptx]
  • 04 Operator Fusion and Runtime Adaptation [Mar 27, pdf, pptx]
  • 05 Data- and Task-Parallel Execution [Apr 03]
  • 06 Parameter Servers [Apr 24]
  • 07 Hybrid Execution and HW Accelerators [May 08]
  • 08 Caching, Partitioning, Indexing and Compression [May 15]

B: ML Lifecycle Systems

  • 09 Data Acquisition, Cleaning, and Preparation [May 29]
  • 10 Model Selection and Management [Jun 05]
  • 11 Model Debugging Techniques [Jun 12]
  • 12 Model Serving Systems and Techniques [Jun 19]
  • 13 Trends and Research Directions 2020 [Jun 26]


Project

The lectures are accompanied by a mandatory open source programming project for gaining practical experience (at the extend of 2 ECTS, i.e, roughly 50 working hours). We'll make project topic suggestions in the context of SystemDS (an open source ML system for the end-to-end data science lifecycle), but your own project proposals (potentially in other open source systems) are welcome as well.


Organization

  • Lecturer: Univ.-Prof. Dr.-Ing. Matthias Boehm, ISDS
  • Teaching Assistant: M.Sc. Sebastian Baunsgaard, ISDS
  • Final oral exam: TBD
  • Grading: 40% project, 60% final exam