Admission requirements
Assumed/Recommended prior knowledge
Knowledge of Machine Learning algorithms (classification, regression, clustering). Elementary knowledge of data structures (hash functions, dictionaries, graphs), calculus, statistics. Basic Python programming skills; familiarity with the scikit-learn package.
Description
The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:
processing huge (tera- or petabytes big) data sets
real-time analysis of data streams (internet traffic, sensor data, electronic transactions, etc.),
searching for similar pairs of objects such as texts, images, songs, etc., in huge collections of such objects,
finding anomalies in data,
clustering of massive sets of records,
recommendation systems,
reduction of data dimensionality
applications of DeepLearning to data mining
During the course you will learn several techniques, algorithms and tools for addressing these new and challenging data mining problems:
Recommender Systems: Collaborative Filtering, MatrixFactorization
Algorithms for dimensionality reduction: LLE, t-SNE, UMAP
RandomForest and XGBoost: the most popular algorithms for classification and regression trees
Algorithms for detecting anomalies in data
Locality Sensitive Hashing (LSH): a general technique for finding similar items in huge collections of items
Algorithms for mining data streams: sampling, filtering (Bloom filters), probabilistic counting
Applications of DeepLearning to data mining
Distributed Processing of Massive Data: Hadoop, MapReduce, Spark
Course objectives
After completing the course, the students should:
know most successful algorithms and techniques used in Data Mining;
gain some hands-on experience with several algorithms for mining complex data sets;
be able to apply the acquired knowledge and skills to new problems.
Timetable
The most recent timetable can be found at the Computer Science (MSc) student website.
You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.
MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).
For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.
Mode of instruction
Lectures
Computer Labs
Practical assignments
Self-evaluated homework
Assessment method
The final mark is composed of:
written exam (40%)
practical assignments (60%)
In order to pass the course, marks for both components must be at least 5.5.
The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.
Reading list
Selected chapters from the book "Mining of massive datasets"
http://www.mmds.org/#bookAdditional papers published on the internet
Registration
From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page ) for more information.
Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.
Extensive FAQ's on MyStudymap can be found here.
Contact
Lecturers: Dr. Wojtek Kowalczyk Dr. Arno Knobbe
Remarks
None.