Prospectus

nl en

Advances in Data Mining

Course
2023-2024

Admission requirements

Assumed/Recommended prior knowledge
Knowledge of Machine Learning algorithms (classification, regression, clustering). Elementary knowledge of data structures (hash functions, dictionaries, graphs), calculus, statistics. Basic Python programming skills; familiarity with the scikit-learn package.

Description

The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:

  • processing huge (tera- or petabytes big) data sets

  • real-time analysis of data streams (internet traffic, sensor data, electronic transactions, etc.),

  • searching for similar pairs of objects such as texts, images, songs, etc., in huge collections of such objects,

  • finding anomalies in data,

  • clustering of massive sets of records,

  • recommendation systems,

  • reduction of data dimensionality

  • applications of DeepLearning to data mining

During the course you will learn several techniques, algorithms and tools for addressing these new and challenging data mining problems:

  • Recommender Systems: Collaborative Filtering, MatrixFactorization

  • Algorithms for dimensionality reduction: LLE, t-SNE, UMAP

  • RandomForest and XGBoost: the most popular algorithms for classification and regression trees

  • Algorithms for detecting anomalies in data

  • Locality Sensitive Hashing (LSH): a general technique for finding similar items in huge collections of items

  • Algorithms for mining data streams: sampling, filtering (Bloom filters), probabilistic counting

  • Applications of DeepLearning to data mining

  • Distributed Processing of Massive Data: Hadoop, MapReduce, Spark

Course objectives

After completing the course, the students should:

  • know most successful algorithms and techniques used in Data Mining;

  • gain some hands-on experience with several algorithms for mining complex data sets;

  • be able to apply the acquired knowledge and skills to new problems.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

  • Lectures

  • Computer Labs

  • Practical assignments

  • Self-evaluated homework

Assessment method

The final mark is composed of:

  • written exam (40%)

  • practical assignments (60%)

In order to pass the course, marks for both components must be at least 5.5.
The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

  • Selected chapters from the book "Mining of massive datasets"
    http://www.mmds.org/#book

  • Additional papers published on the internet

Registration

From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page ) for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

Lecturers: Dr. Wojtek Kowalczyk Dr. Arno Knobbe

Remarks

None.