Statistical learning, 2018-2019 - Prospectus

Please note that this course description is preliminary. The final course description will be released in June 2018.

Admission Requirements

Familiarity with least squares linear regression
Ability to program in R or in Python

Description

This course gives an overview of techniques for automated learning from ill-understood data for which it is hard or impossible to formulate a model that is even approximately correct. Here "learning" means: "finding structure, patterns, regularities" and using these patterns to predict future data. Statistical Learning is very similar to an area within computer science called “machine learning”, since many methods have their origin in computer sicence (pattern recognition, artificial intelligence).

Main topics in the course will be (1) supervised learning (regression and classification, but with a strong focus on the latter); (2) model selection; (3) basic clustering; basic optimization.The methods discussed will include various classical and state-of-the-art classification methods: LDA (1930s), naive Bayes, perceptrons (1960s), decision trees (1980s), logistic regression, boosting and support vector machines (2000s), neural networks and deep learning. We explain interrelations between these methods and analyze their behaviour. As for model selection, we again consider both classical and state of the art methods including various forms of cross-validation, Ridge, Lasso and other L1- methods. As to clustering, we consider the classic k-means and EM methods. For optimization, we will cover stochastic gradient descent, which is the most widely used method to train neural networks.

See www.timvanerven.nl/teaching/statlearn2018/ for detailed course information.

Course objectives

An introduction to Statistical Learning.

Time Table

See the Leiden University students' website for the Statistical Science programme -> Schedules 2018-2019

Mode of Instruction

Lectures and computer practicals.

Assesssment method

A written open-book exam (50%) You are allowed to bring any information on paper to the exam, and it is recommended to bring the book. However, digital copies of the book will not be allowed.
Two homework assignments (each 25%) The final homework grade will be determined as an average of the grades for the two assignments, without any rounding.

It is required to have a passing score both for the assignments and for the exam. This means at least a 5.5 average for the assignments and a 5.5 for the exam.

Both homework assignments involve setting up some experiments in R or Python, experimenting, and writing a short report about the results. Discussing the problems with other students is encouraged, but every participant must do their own experiments and write a report on their own.

Date information about the exam and resit can be found in the Time Table pdf document under the tab “Masters Programme” at http://www.math.leidenuniv.nl/statscience. The room and building for the exam will be announced on the electronic billboard, to be found at the opposite of the entrance, the content can also be viewed here http://info.liacs.nl/math/.

Reading list

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edition, 2009.
Handouts of some (very few) papers and about optimization

Registration

Make sure to enroll in Blackboard for grades and course materials and course updates.

To be able to obtain a grade and the ECTS for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.

Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.

Contact information

Tim van Erven: tim [at] timvanerven [dot] nl

Remarks

This is an elective course of the Master Statistical Science for the Life and Behavioural sciences / Data Science.