DIA is a 5 ECTS bachelor and master course, applicable to the bachelor programs computer science or software engineering and management, as well as the master catalog 'Data Science'. This course covers major data integration architectures, key techniques for data integration and cleaning, as well as methods for large-scale, i.e., distributed, data storage and analysis.
In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures.
The lectures will start October 08 (not October 01) in hybrid form (in-person in HS i5 with 3G and video recording, and webex for remote live questions).
A: Data Integration and Preparation
B: Large-Scale Data Management and Analysis
The lectures are accompanied by mandatory programming projects (to the extend of 2 ECTS, i.e, roughly 50 working hours), preferrably in Apache SystemDS (an open source ML system for the end-to-end data science lifecycle) instead.
A list of project proposals (and the details on an alternative exercise) can be found at the beginning of Lecture 03.