Admission requirements
Assumed/Recommended prior knowledge
Knowledge of some Machine Learning techniques (classification, regression, clustering). Elementary knowledge of data structures (hash functions, dictionaries, graphs), calculus, statistics. Basic Python programming skills.
Description
The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:
real-time analysis of data streams (internet traffic, sensor data, electronic transactions, etc.),
searching for similar pairs of objects such as texts, images, songs, etc., in huge collections of such objects,
finding anomalies in data,
clustering of massive sets of records,
recommendation systems,
reduction of data dimensionality
During the course you will learn several techniques, algorithms and tools for addressing these new and challenging data mining problems.
Course objectives
After completing the course, the students should be able to:
list the basics of data mining
execute a chosen algorithm in Python and Scikit-learn
explain how classification works, and how Support Vector Machines, Random Forests and XGBoost work
apply Support Vector Machines, Random Forests and XGBoost to data using Python
explain how Locality-Sensitive Hashing
explain how collaborative filtering and recommender systems work
build a recommender system based on UV-decomposition
communicate about a developed piece of software and report the quality of the results it produces
apply Subgroup Discovery using SubDisc and pySubDisc
apply basic anomaly detection methods
Timetable
The most recent timetable can be found at the Computer Science (MSc) student website.
You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.
MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).
For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.
Mode of instruction
Lectures
Practical assignments
Self-evaluated homework
Assessment method
The final mark is composed of:
multiple choice exam (60%)
practical assignment (40%)
In order to pass the course, marks for both components must be at least 5.5.
Reading list
Selected chapters from the book "Mining of massive datasets"
http://www.mmds.org/#bookAdditional papers published on the internet
Registration
Every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page ) for more information.
Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.
Extensive FAQ's on MyStudymap can be found here.
Contact
Lecturer: Dr. Arno Knobbe
Remarks
None.