Prospectus

nl en

Advances in Data Mining

Course
2013-2014

Admission requirements

Elementary knowledge of data structures (sparse matrices, hash tables, graphs), statistics (binomial distribution) and combinatorics (permutations, combinations).

Description

The traditional data mining techniques are mainly focused on solving classification, regression and clustering problems. However, the recent developments in ICT led to the emergence of new sorts of (massive) data sets and related data mining problems. Consequently, the field of data mining rapidly expanded to cover new areas of research, such as:

  • processing huge (tera- or petabytes big) data sets,

  • fast searching for similar objects (such as documents, images, DNA-sequences) in collections of millions or billions of objects,

  • recommending items to visitors of internet shops

  • web advertising and dynamic auctions,

  • analyzing big (network) graphs, such as web sites, social networks, collaboration networks, etc.,

  • real-time analysis of data streams (internet traffic, sensor data, electronic transactions).
    During the course we will discuss all these areas. However, we will focus on four areas: similarity search (Locality Sensitive Hashing), mining data streams (random sampling, counting, estimating moments), analyzing data from social networks, and processing huge data sets on distributed computers (Hadoop and MapReduce framework).

Course objectives

After completing the course, the students should:

  • have a general knowledge of the recent developments in the field of Data Mining

  • have detailed knowledge of a few selected techniques and their applications

  • gain some hands-on experience with some algorithms and tools for complex data sets

  • be able to apply his knowledge and experience to new problems

  • gain some practical knowledge of mining big data sets on a cluster computer

Timetable

The most recent timetable can be found at the LIACS website

Mode of instruction

  • Lectures

  • Computer Lab

  • Practical assignments

  • Self-evaluated homework

Assessment method

The final mark is composed of
(1) written exam (40%)
(2) practical assignment (60%)

Blackboard

See Blackboard

Reading list

A. Rajaraman, J. Leskovec, J. Ullman, Mining of Massive Datasets

Registration

You have to sign up for classes and examinations (including resits) in uSis. Check this link for more information and activity codes.

There is a limited capacity for students from outside the master Computer Science programme. Please contact the study advisor.

Contact information

Study coordinator Computer Science, Riet Derogee