Studiegids

nl en

Advances in Data Mining

Vak
2024-2025

Admission requirements

Assumed/Recommended prior knowledge
Knowledge of some Machine Learning techniques (classification, regression, clustering). Elementary knowledge of data structures (hash functions, dictionaries, graphs), calculus, statistics. Basic Python programming skills.

Description

The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:

  • real-time analysis of data streams (internet traffic, sensor data, electronic transactions, etc.),

  • searching for similar pairs of objects such as texts, images, songs, etc., in huge collections of such objects,

  • finding anomalies in data,

  • clustering of massive sets of records,

  • recommendation systems,

  • reduction of data dimensionality

During the course you will learn several techniques, algorithms and tools for addressing these new and challenging data mining problems.

Course objectives

After completing the course, the students should be able to:

  • list the basics of data mining

  • execute a chosen algorithm in Python and Scikit-learn

  • explain how classification works, and how Support Vector Machines, Random Forests and XGBoost work

  • apply Support Vector Machines, Random Forests and XGBoost to data using Python

  • explain how Locality-Sensitive Hashing

  • explain how collaborative filtering and recommender systems work

  • build a recommender system based on UV-decomposition

  • communicate about a developed piece of software and report the quality of the results it produces

  • apply Subgroup Discovery using SubDisc and pySubDisc

  • apply basic anomaly detection methods

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Mode of instruction

  • Lectures

  • Practical assignments

  • Self-evaluated homework

Assessment method

The final mark is composed of:

  • multiple choice exam (60%)

  • practical assignment (40%)

In order to pass the course, marks for both components must be at least 5.5.

Reading list

  • Selected chapters from the book "Mining of massive datasets"
    http://www.mmds.org/#book

  • Additional papers published on the internet

Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

  • Enrolment for the fall opens in July

  • Enrolment for the spring opens in December

See this page for more information about deadlines and enrolling for courses and exams.

Note:

  • It is mandatory to enrol for all activities of a course that you are going to follow.

  • Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

  • Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

Contact

Lecturer: Dr. Arno Knobbe

Remarks

Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.