Prospectus

nl en

Information Theoretic Data Mining

Course
2024-2025

Admission requirements

Recommended prior knowledge

Advanced course, recommended for second- and third-semester students. It is recommended that students have knowledge of data mining, algorithms, probability theory, and statistics. It is recommended that the students have good scientific writing and presenting skills, as per programme recommendations.

Description

How can we gain insight from data? How can we discover and explain structure in data if we don't know what to expect? What is the optimal model for our data? How do we develop principled algorithms for exploratory data mining? To answer these questions, we study and discuss the state of the art in the research area of information theoretic data mining. We focus on theory, problems, and algorithms; not on implementation and experimentation.

Over the last decade information theoretic methods for selecting the best model have become popular in the academic data mining community. This course provides an overview of the use of information theory for exploratory data mining, with a focus on pattern-based modelling. This includes the theoretical foundations, modelling and model selection, and algorithms.

In particular, the course covers concepts from Shannon's information theory, such as entropy and mutual information, and more advanced topics from algorithmic information theory (AIT), such as Kolmogorov Complexity. We show how the Minimum Description Length (MDL) principle and the Maximum Entropy (MaxEnt) principle can be used for exploratory data analysis and discuss problems, models, and algorithms that have been recently proposed.

This advanced course will have one meeting of two hours per week. The first part of the course will have both regular lectures and seminars, in which we discuss the material covered in the lectures and additional reading material (scientific articles). During the second part the students will write a scientific essay on an assigned topic (based on a scientific article) and give a presentation. In this phase students will have the opportunity for individual tutoring.

Note that there is a strict limit on the capacity of this course; see Registration and Remarks for the details.

Course objectives

At the end of the course, students are able to:

  • Paraphrase and explain the theory (e.g., Shannon’s information theory, Kolmogorov complexity, the Minimum Description Length principle) underlying the field of information theoretic data mining.

  • Illustrate and explain algorithms for data mining based on information theory, with a focus on pattern-based modelling (e.g., Krimp, Slim, Translator).

  • Analyse and categorise scientific publications from top journals and proceedings in the field of information theoretic data mining.

  • Analyse a scientific publication chosen from the state of the art in information theoretic data mining, and relate it to theory and algorithms discussed in the course (in the form of a presentation).

  • Evaluate (e.g., discuss, interpret, and criticise) a scientific publication chosen from the state of the art in information theoretic data mining (in a scientific essay).

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Mode of instruction

  • Lectures

  • Seminar

  • Tutoring

  • Essay

  • Presentation

Assessment method

Attendance of all course meetings is mandatory. Students work individually in this course. The final grade is computed as the weighted average of the grades of

  • Three assignments on the content of the lectures and associated literature (5% each, together 15%);

  • Presentation (including Q&A) on a scientific publication chosen by the student from the state of the art in information theoretic data mining (25%);

  • Scientific essay evaluating a scientific publication chosen by the student from the state of the art in information theoretic data mining (the same as for the presentation). (60%).

If an assignment is not completed, the resulting grade is a 0. There will be no retakes for the assignments and the presentation, and there will be a retake for the essay (identical to the initial essay). The final grade can only be sufficient if 1) the presentation and essay have both been completed, and 2) the grade for the scientific essay is at least a 5.5.

The teacher will inform the students how the inspection of and follow-up discussion of the essays will take place.

Reading list

The literature list, including mandatory and optional reading material, and lecture slides will be made available on Brightspace.

Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

  • Enrolment for the fall opens in July

  • Enrolment for the spring opens in December

See this page for more information about deadlines and enrolling for courses and exams.

Note:

  • It is mandatory to enrol for all activities of a course that you are going to follow.

  • Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

  • Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

Contact

Lecturers: dr. Francesco Bariatti and dr. Matthijs van Leeuwen
Website: Website ITDM

Remarks

Important: because of the format of the course, there is a strict limit on the number of participants: at most 20 students can participate in this course.

Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.