nl en

Advances in Data Mining


Admission requirements

Assumed/Recommended prior knowledge
Knowledge of some Machine Learning techniques (classification, regression, clustering). Elementary knowledge of data structures (hash functions, dictionaries, graphs), calculus, statistics. Basic Python programming skills.


The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:

  • real-time analysis of data streams (internet traffic, sensor data, electronic transactions, etc.),

  • searching for similar pairs of objects such as texts, images, songs, etc., in huge collections of such objects,

  • finding anomalies in data,

  • clustering of massive sets of records,

  • recommendation systems,

  • reduction of data dimensionality

During the course you will learn several techniques, algorithms and tools for addressing these new and challenging data mining problems.

Course objectives

After completing the course, the students should be able to:

  • list the basics of data mining

  • execute a chosen algorithm in Python and Scikit-learn

  • explain how classification works, and how Support Vector Machines, Random Forests and XGBoost work

  • apply Support Vector Machines, Random Forests and XGBoost to data using Python

  • explain how Locality-Sensitive Hashing

  • explain how collaborative filtering and recommender systems work

  • build a recommender system based on UV-decomposition

  • communicate about a developed piece of software and report the quality of the results it produces

  • apply Subgroup Discovery using SubDisc and pySubDisc

  • apply basic anomaly detection methods


The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

  • Lectures

  • Practical assignments

  • Self-evaluated homework

Assessment method

The final mark is composed of:

  • multiple choice exam (60%)

  • practical assignment (40%)

In order to pass the course, marks for both components must be at least 5.5.

Reading list

  • Selected chapters from the book "Mining of massive datasets"

  • Additional papers published on the internet


Every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page ) for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.


Lecturer: Dr. Arno Knobbe