Recommended prior knowledge
Elementary knowledge of data structures (sparse matrices, hash tables, dictionaries, graphs, sparse matrices), statistics (binomial distribution) and combinatorics (permutations, combinations). Basic programming skills in Python.
The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:
- recommending items to visitors of internet shops,
- fast searching for similar objects, such as documents, images, songs, routes, etc., in collections of millions or billions of such objects,
- clustering of massive data sets,
- real-time analysis of data streams (e.g., electronic transactions, internet traffic),
- reduction of dimensionality and visualisation of big data sets
- processing huge (tera- or petabytes big) data sets.
During the course we will focus on these areas. The practical part of the course will consist of several programming assignments (in Python) and writing reports.
After completing the course, the students should:
- have a general knowledge of the recent developments in the field of Data Mining,
- have detailed knowledge of selected techniques and their applications,
- gain some hands-on experience with several algorithms for mining big data sets,
- be able to apply the acquired knowledge and skills to new problems.
The most recent timetable can be found at the students' website.
Mode of instruction
- Computer Lab
- Practical assignments
- Self-evaluated homework
Total hours of study: 168 hrs. (= 6 EC)
Lectures: 26:00 hrs.
Practical work: 64:00 hrs.
Reporting: 42:00 hrs.
Exam preparation: 36:00 hrs.
The final grade is a weighted combination of grades for:
(1) the exam (40%),
(2) the practical assignments (60%).
To pass the course both grades (for the exam and the practicals) must be at least 5.5.
A. Rajaraman, J. Leskovec, J. Ullman, Mining of Massive Datasets.
- You have to sign up for classes and examinations (including resits) in uSis. Check this link for more information and activity codes.
- Please also register for the course in Blackboard.
Lecturer: dr. Wojtek Kowalczyk