Studiegids

nl en

Foundations of Statistics and Machine Learning

Vak
2024-2025

Admission requirements

Prerequisites are basic probability theory (including laws of large numbers, central limit theorem) and statistics (maximum likelihood, least squares). Knowledge of machine learning or more advanced probability/statistics may be useful but is not essential. In particular, all stochastic process/martingale/game theory that is needed will be developed from scratch.

Description

A large fraction (some claim > 1/2) of published research in top journals in applied sciences such as medicine and psychology is irreproduceable. In light of this 'replicability crisis', classical statistical methods, most notably testing based on p-values, have recently come under intense scrutiny. Indeed, p-value based tests but also other methods like confidence intervals and Bayesian methods have mostly been developed in the 1930s - and they are not really suitable at all for many 21st century applications of statistics.

In this course, we describe statistics based on the E-VALUE, a recently developed alternative for the p-value that does fit 21st century requirements. This is cutting-edge research: the e-value was introduced in modern form only in 2019 - by now there are 100s of papers about it, by top statistics groups in Berkeley, Stanford and the like, and Your Teacher has obtained a prestigious ERC Advanced Grant of the European Union to develop e-values further.

Most importantly, classical methods do not deal well with situations in which new data can keep coming in. For example, based on the results of existing trials, one decides to do a new study of the same medication in a new hospital; or: whenever you type in new search terms, google can adjust the model that decides what advertisements to show to you. E-values handle such situations without problems.
At its core, the e-value is based on viewing statistics as a betting game where you "bet against a null hypothesis". If you make a lot of money in this bet, you have evidence that the hypothesis is false. This puts it in touch with the very foundations of probability theory - a part of the course will be about replacing the standard measure-theoretic foundations of probability by game-theoretic ones - the idea of 'probability as limiting frequency' gets replaced by 'probability as fair price', which puts a lot of existing results in a different light, and avoids hard to interpret 'almost-sure' statements. The game-theoretic, "betting" treatment will serve as an organizing principle throughout the class.

On top of introducing e-value based methods (tests, confidence intervals etc.) we will review classical approaches and discuss what each of them can and cannot achieve. These include Fisherian testing, Neyman-Pearson testing, Jeffreys-Bayesian (all from the 1930s), sequential testing (1940s) and pure likelihood-based (1960s) approaches. We will also treat approaches from the 1980s and 1990s based on data-compression ideas.

Course Objectives

  • Understand the notions of likelihood and its application in the classical statistical paradigms (frequentist, Bayesian, sequential)

  • Understand the notion of e-value, e-process, nonnegative test martingale, testing by betting and its application in always-valid testing and estimation

  • Understand the powers and limitations of existing statistical methods

Timetable

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.  

Mode of instruction

Weekly lectures. Bi-weekly exercise sessions in which homework of type (a) is discussed.
Homework consisting of (a) math exercises and (b) a project involving doing a few experiments with an R package.

Assessment method

The final grade consists of homework (40%) and a written (retake) exam (60%). To pass the course, the grade for the (retake) exam should be at least 5 and the (unrounded) weighted average of the two partial grades at least 5.5. No minimum grade is required for the homework in order to take the exam or to pass the course. The homework counts as a practical and there is no retake for it; it consists of at least 5 written assignments, of which the lowest grade is dropped, as well as a small programming assignment.

Reading list

Parts of

  • R. Royall, Statistical Evidence: a likelihood paradigm ( Chapman & Hall/CRC, 1999)

  • P. Grünwald, the Minimum Description Length Principle (MIT Press, 2007, freely available on internet)

  • handouts that will be made available during the lectures

Registration

Please register for the course in MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December.

Please note that it is compulsory to register your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Not being registered for an exam means your grade will not be processed.

Contact

send email to the teaching assistant Tyron Lardy, t.d.lardy@math.leidenuniv.nl

Remarks