Advances in Data Mining, 2022-2023 - Prospectus

Admission requirements

Assumed/Recommended prior knowledge
Elementary knowledge of Machine Learning algorithms (classification, regression, clustering, forecasting); common data structures (hash functions, hash tables, dictionaries), statistics. Basic programming skills in Python (including NumPy, Pandas, and plotting packages).

Description

During the course we will cover the most popular algorithms for common data mining tasks: data visualization (PCA, MDS, LLE, t-SNE); classification and regression (RandomForest; XGBoost; SVMs), anomaly detection (LOF, IsolationForest, GenerativeModels, etc); recommender systems (Matrix Factorization) and others. Additionally, we will discuss recent developments in automatic tuning of ML algorithms and mining very big data on distributed systems (Hadoop, Spark) and mining data streams. Additionally, we will discuss in-depth the Kaggle platform as a great source of knowlege and inspiration for future Data Scientists.

Course objectives

After completing the course, the students should:

know most successful algorithms and techniques used in Data Mining;
gain some hands-on experience with several algorithms for mining complex data sets;
be able to apply the acquired knowledge and skills to new problems.

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

Lectures
Computer Lab
Practical assignments
Self-evaluated homework

Assessment method

The final mark is composed of

written exam (40%)
practical assignments (60%)

In order to pass the course, marks for both components must be at least 5.5.
The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

Several papers and books published on the internet
(optional) The Kaggle Book: Data analysis and machine learning for competitive data science (https://www.packtpub.com/product/the-kaggle-book/9781801817479)

Registration

From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page ) for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

Lecturer: Dr. Wojtek Kowalczyk.

Remarks

None.