DIA is a 6 ECTS module, applicable to the bachelor and master study courses computer science, computer engineering, information systems management, and electrical engineering, as well as the study areas data and software engineering, cognitive systems, and distributed systems and networks. This course covers major data integration architectures, key techniques for data integration and cleaning, as well as methods for large-scale, i.e., distributed, data storage and analysis.
In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures, which take place Thursday's 4pm-6pm (start 4.15pm) in H 0107 and virtually via zoom (link). Furthermore, we also offer weekly office hours, which take place Wednesday, 4.30pm-6pm in TEL 0811 and virtually via zoom (call-in: office hour, starting Nov 01). Lecture attendance is optional and videos of the recorded zoom sessions will be made available a few days after the individual lectures.
A: Data Integration and Preparation
B: Large-Scale Data Management and Analysis
The lectures are accompanied by mandatory programming projects
(to the extend of 3 ECTS, i.e, roughly 80-90 working hours), preferably in
Apache SystemDS
(an open source ML system for the end-to-end data science lifecycle), or
DAPHNE
(an open and extensible system infrastructure for integrated data analysis pipelines).
A list of project proposals and details on alternative exercises
(local and distributed entity resolution pipeline) are available here: