Admission requirements
Recommended prior knowledge
Elementary knowledge of data structures (sparse matrices, hash tables, dictionaries, graphs, sparse matrices), statistics (binomial distribution) and combinatorics (permutations, combinations). Basic programming skills in Python.
Description
The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of massive data sets and related data mining problems. Consequently, the field of data mining has rapidly expanded to cover new areas of research, such as:
recommending items to visitors of internet shops,
fast searching for similar objects, such as documents, images, songs, routes, etc., in collections of millions or billions of such objects,
clustering of massive data sets,
real-time analysis of data streams (e.g., electronic transactions, internet traffic),
reduction of dimensionality and visualisation of big data sets
processing huge (tera- or petabytes big) data sets.
During the course we will focus on these areas. The practical part of the course will consist of several programming assignments (in Python) and writing reports.
Course objectives
After completing the course, the students should:
have a general knowledge of the recent developments in the field of Data Mining,
have detailed knowledge of selected techniques and their applications,
gain some hands-on experience with several algorithms for mining big data sets,
be able to apply the acquired knowledge and skills to new problems.
Timetable
The most recent timetable can be found at the students' website.
Mode of instruction
Lectures
Computer Lab
Practical assignments
Self-evaluated homework
Course load
Total hours of study: 168 hrs. (= 6 EC)
Lectures: 26:00 hrs.
Practical work: 64:00 hrs.
Reporting: 42:00 hrs.
Exam preparation: 36:00 hrs.
Assessment method
The final grade is a weighted combination of grades for:
(1) the exam (40%),
(2) the practical assignments (60%).
To pass the course both grades (for the exam and the practicals) must be at least 5.5.
Reading list
A. Rajaraman, J. Leskovec, J. Ullman, Mining of Massive Datasets.
Registration
You have to sign up for classes and examinations (including resits) in uSis. Check this link for more information and activity codes.
- Please also register for the course in [Blackboard]
Contact information
Lecturer: dr. Wojtek Kowalczyk