Data Integration and Large-Scale Analysis WS2021/22
(VU, 706.520 Data Integration and Large-Scale Analysis)

DIA is a 5 ECTS bachelor and master course, applicable to the bachelor programs computer science or software engineering and management, as well as the master catalog 'Data Science'. This course covers major data integration architectures, key techniques for data integration and cleaning, as well as methods for large-scale, i.e., distributed, data storage and analysis.


In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures, which take place Friday's 3pm in HS-i5 or virtually.

A: Data Integration and Preparation

  • 01 Introduction and Overview [Oct 08, pdf, pptx]
  • 02 Data Warehousing, ETL, and SQL/OLAP [Oct 15, pdf, pptx]
  • 03 Message-oriented Middleware, EAI, and Replication [Oct 22, pdf, pptx]
  • 04 Schema Matching and Mapping [Oct 22, pdf, pptx]
  • 05 Entity Linking and Deduplication [Nov 05, pdf, pptx]
  • 06 Data Cleaning and Data Fusion [Nov 12, pdf, pptx]
  • 07 Data Provenance and Blockchain [Nov 19, pdf, pptx]

B: Large-Scale Data Management and Analysis

  • 08 Cloud Computing Fundamentals [Nov 26, pdf, pptx]
  • 09 Cloud Resource Management and Scheduling [Dec 03, pdf, pptx]
  • 10 Distributed Data Storage [Jan 07, pdf, pptx]
  • 11 Distributed, Data-Parallel Computation [Jan 14, pdf, pptx]
  • 12 Distributed Stream Processing [Jan 21, pdf, pptx]
  • 13 Distributed Machine Learning Systems [Jan 28, pdf, pptx]


The lectures are accompanied by mandatory programming projects (to the extend of 2 ECTS, i.e, roughly 50 working hours), preferrably in Apache SystemDS (an open source ML system for the end-to-end data science lifecycle) instead.

A list of project proposals (and the details on an alternative exercise) can be found at the beginning of Lecture 03.


  • Lecturer: Univ.-Prof. Dr.-Ing. Matthias Boehm, ISDS
  • Teaching Assistant: M.Sc. Shafaq Siddiqi, ISDS
  • Final written exams: Feb 04 (DIA), Mar 25 (DIA)
    (additional oral exam slots on demand, e.g., for international students)
  • Grading: 30% project, 70% final exam