Statistical learning, 2019-2020 - Prospectus

Admission requirements

Familiarity with least squares linear regression
Ability to program in R or in Python

Description

This course gives an overview of techniques for automated learning from ill-understood data for which it is hard or impossible to formulate a model that is even approximately correct. Here "learning" means: "finding structure, patterns, regularities" and using these patterns to predict future data. Statistical Learning is very similar to an area within computer science called “machine learning”, since many methods have their origin in computer sicence (pattern recognition, artificial intelligence).

Main topics in the course will be (1) supervised learning (regression and classification, but with a strong focus on the latter); (2) model selection; (3) basic clustering; basic optimization.The methods discussed will include various classical and state-of-the-art classification methods: LDA (1930s), naive Bayes, perceptrons (1960s), decision trees (1980s), logistic regression, boosting and support vector machines (2000s), neural networks and deep learning. We explain interrelations between these methods and analyze their behaviour. As for model selection, we again consider both classical and state of the art methods including various forms of cross-validation, Ridge, Lasso and other L1- methods. As to clustering, we consider the classic k-means and EM methods. For optimization, we will cover stochastic gradient descent, which is the most widely used method to train neural networks.

See here for detailed course information.

Course objectives

An introduction to Statistical Learning

Time Table

See the Leiden University students' website for the Statistical Science programme -> Schedules

Mode of Instruction

Lectures and computer practicals.

Assesssment method

A written open-book exam (50%) You are allowed to bring any information on paper to the exam, and it is recommended to bring the book. However, digital copies of the book will not be allowed.
Two homework assignments (each 25%) The final homework grade will be determined as an average of the grades for the two assignments, without any rounding.

It is required to have a passing score both for the assignments and for the exam. This means at least a 5.5 average for the assignments and a 5.5 for the exam.

Both homework assignments involve setting up some experiments in R or Python, experimenting, and writing a short report about the results. Discussing the problems with other students is encouraged, but every participant must do their own experiments and write a report on their own.

Reading list

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edition, 2009. (A paper copy of this book is required for the open book exam.)
B. Efron, T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence and Data Science, 2016. (We will use selected parts from Chapter 18. It is not required to buy this book.)
Handouts of some (very few) papers and about optimization.

Registration

Enroll in Blackboard for the course materials and course updates.

To be able to obtain a grade and the ECTS for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.

Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.

Contact information

Tim van Erven: tim@timvanerven.nl

Remarks

This is an elective course in the Master’s programme of the specialisation Statistical Science for the Life & Behavioural sciences.