Elementary knowledge of data structures (sparse matrices, hash tables, graphs), statistics (binomial distribution) and combinatorics (permutations, combinations).
The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of (massive) data sets and related data mining problems. Consequently, the field of data mining rapidly expanded to cover new areas of research, such as:
processing huge (tera- or petabytes big) data sets,
fast searching for similar objects (such as documents, images, DNA-sequences) in collections of millions or billions of objects,
recommending items to visitors of internet shops
web advertising and dynamic auctions,
analyzing big (network) graphs, such as web sites, social networks, collaboration networks, etc.,
real-time analysis of data streams (internet traffic, sensor data, electronic transactions).
During the course we will discuss all these areas. However, we will focus on four areas: similarity search (Locality Sensitive Hashing), mining data streams (random sampling, counting, estimating moments), analyzing data from social networks, and processing huge data sets on distributed computers (Hadoop and MapReduce framework).
After completing the course, the students should:
have a general knowledge of the recent developments in the field of Data Mining
have detailed knowledge of a few selected techniques and their applications
gain some hands-on experience with some algorithms and tools for complex data sets
be able to apply his knowledge and experience to new problems
gain some practical knowledge of mining big data sets on a cluster computer
The most recent timetable can be found at the LIACS website
Mode of instruction
The final mark is composed of
(1) written exam (40%)
(2) practical assignment (60%)
A. Rajaraman, J. Leskovec, J. Ullman, Mining of Massive Datasets
You have to sign up for classes and examinations (including resits) in uSis. Check this link for more information and activity codes.
There is a limited capacity for students from outside the master Computer Science programme. Please contact the study advisor.
Study coordinator Computer Science, Riet Derogee