Architecture of ML Systems Fall 2022
(VU, 706.550 Architecture of Machine Learning Systems)

AMLS is a 5 ECTS master course at Graz University of Technology, which is given at Aalborg University in reduced form as a two-day blocked course. This course covers the architecture and essential concepts of modern ML systems for both local and large-scale machine learning (ML). These architectures include systems for data-parallel execution (e.g., Spark, Dask, SystemDS), Parameter Servers and similar distribution stategies (e.g., TensorFlow, MXNet, PyTorch), ML lifecycle systems, and the integration of ML into database systems. The covered topics focus both, on a microscopic view of internal compilation, execution, and data management techniques, but also a macroscopic view of entire ML pipelines.


Lectures

In detail, the course covers the following topics, which also reflects the timeline. The individual lectures will take place August 29-30, 8am-5pm in room 0.2.13 (Selma Lagerlöfs Vej 300, 9220 Aalborg).

A: ML Lifecycle Systems

  • 01 Introduction and System Landscape [Aug 29, 8am; pdf, pptx]
  • 02 Data Preparation, Cleaning, and Augmentation [Aug 29, 10.15am; pdf, pptx]
  • 03 Model Selection, Debugging/Explainability, and Fairness [Aug 29, 12.45pm; pdf, pptx]
  • Discussion/Implementation Programming Projects [Aug 29, 3pm]
  • 04 Model Deployment and Serving [Aug 29, 3.30pm; pdf, pptx]

B: ML System Internals

  • 05 Compilation and Optimization Techniques [Aug 30, 8am; pdf, pptx]
  • 06 Execution and Parallelization Strategies [Aug 30, 10.15am; pdf, pptx]
  • 07 HW Accelerators and Data Access Methods [Aug 30, 12.45am; pdf, pptx]
  • Discussion/Implementation Programming Projects [Aug 30, 3pm]


Project / Exercises

The lectures are accompanied by mandatory programming projects or exercises. For this blocked course, we recommend the exercise which touches upon many aspects of ML pipelines. Alternatively, we also allow programming projects in Apache SystemDS (an open source ML system for the end-to-end data science lifecycle), or DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines, OSS release 03/2022).
Exercise details and a list of project proposals are available here:


Organization

  • Lecturer: Univ.-Prof. Dr.-Ing. Matthias Boehm (ISDS)
  • Teaching Assistants: M.Sc. Sebastian Baunsgaard, M.Tech. Arnab Phani (ISDS)
  • Grading: pass/fail (completed project)