Data Integration and Large-scale Analysis WiSe2023/24
(VL/UE, 41112 Data Integration and Large-scale Analysis)

DIA is a 6 ECTS module, applicable to the bachelor and master study courses computer science, computer engineering, information systems management, and electrical engineering, as well as the study areas data and software engineering, cognitive systems, and distributed systems and networks. This course covers major data integration architectures, key techniques for data integration and cleaning, as well as methods for large-scale, i.e., distributed, data storage and analysis.


In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures, which take place Thursday's 4pm-6pm in H 0107 and virtually via zoom (link).

A: Data Integration and Preparation

  • 01 Introduction and Overview [Oct 19]
  • 02 Data Warehousing, ETL, and SQL/OLAP [Oct 26]
  • 03 Message-oriented Middleware, EAI, and Replication [Nov 02]
  • 04 Schema Matching and Mapping [Nov 09]
  • 05 Entity Linking and Deduplication [Nov 16]
  • 06 Data Cleaning and Data Fusion [Nov 23]
  • 07 Data Provenance and Catalogs [Nov 30]

B: Large-Scale Data Management and Analysis

  • 08 Cloud Computing Fundamentals [Dec 07]
  • 09 Cloud Resource Management and Scheduling [Dec 14]
  • 10 Distributed Data Storage [Jan 11]
  • 11 Distributed, Data-Parallel Computation [Jan 18]
  • 12 Distributed Stream Processing [Jan 25]
  • 13 Distributed Machine Learning Systems [Feb 01]

Project / Exercises

The lectures are accompanied by mandatory programming projects (to the extend of 3 ECTS, i.e, roughly 80-90 working hours), preferably in Apache SystemDS (an open source ML system for the end-to-end data science lifecycle), or DAPHNE (an open and extensible system infrastructure for integrated data analysis pipelines).
A list of project proposals and details on alternative exercises (distributed data and ML pipeline) are available here:

  • Apache SystemDS: TBD
  • Alternative Exercise: TBD


  • Lecturer: Univ.-Prof. Dr.-Ing. Matthias Boehm, DAMS
  • Teaching Assistant: M.Tech. Arnab Phani, DAMS
  • Final written exams (preliminary dates): Feb 08, 4pm and Feb 15, 4pm (additional oral exam slots on demand, e.g., for international students)
  • Grading: 100% final exam, project as prerequisite