This course gives an overview of techniques for automated learning from ill-understood data for which it is hard or impossible to formulate a model that is even approximately correct. Here “learning” means: “finding structure, patterns, regularities” and using these patterns to predict future data. The field is very similar to an area within computer science called “machine learning”, since many contributions in this field have their origin in computer science (pattern recognition, artificial intelligence).
Main topics in the course will be (1) supervised learning (regression and classification, but with a strong focus on the latter); (2) model selection and model averaging, (3) basic clustering. The methods discussed will include various classical and state-of-the-art classification methods: LDA (1930s), naive Bayes, perceptrons (1960s), neural networks, decision trees (1980s), logistic regression , boosting, support vector machines and other kernel approaches (2000s). We explain interrelations between these methods and analyze their large-sample behaviour. As for model selection and averaging, we again consider both classical and state of the art methods including AIC, BIC, Bayes factor model averaging, Minimum Description Length (MDL), various forms of cross-validation, shrinkage, Ridge, Lasso and other L1- methods. As to clustering, we consider the classic k-means and EM methods.
An introduction to Statistical Learning
For the course days, course location and class hours check the Time Table 2014-15 under the tab “Masters Programme” at http://www.math.leidenuniv.nl/statscience
Mode of Instruction
Lectures and practicals (partly computer practicals, partly exercises).
- A written open-book exam (50%)
- Two assignments (each 25%)
Both homework assignments involve setting up some experiments in R, experimenting, and writing a short report about the results. Discussing the problems in the group is encouraged, but every participant must do her or his experiments and write her or his report on her or his own.
Date information about the exam and resit can be found in the Time Table 2014-15 pdf document under the tab “Masters Programme” at http://www.math.leidenuniv.nl/statscience. The exams take place in the Snellius building, the room will be announced on the electronic billboard, to be found at the opposite of the entrance, the content can also be viewed online at:“http://info.liacs.nl/math/”:http://info.liacs.nl/math/
If the exam does not take place in the Snellius building, then an announcement will be sent via blackboard.
- T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edition, 2009.
Handouts of some (very few) papers
Enroll in Blackboard for the course materials and course updates.
To be able to obtain a grade and the ECTS for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.
Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.
Peter [dot] Grunwald [at] cwi [dot] nl
- This is an elective course in the Master’s programme of the specialisation Statistical Science for the Life & Behavioural sciences.