# Foundations of Statistics and Machine Learning

Course
2024-2025

Prerequisites are basic probability theory (including laws of large numbers, central limit theorem) and statistics (maximum likelihood, least squares). Knowledge of machine learning or more advanced probability/statistics may be useful but is not essential. In particular, all stochastic process/martingale/game theory that is needed will be developed from scratch.

## Description

A large fraction (some claim > 1/2) of published research in top journals in applied sciences such as medicine and psychology is irreproduceable. In light of this 'replicability crisis', classical statistical methods, most notably testing based on p-values, have recently come under intense scrutiny. Indeed, p-value based tests but also other methods like confidence intervals and Bayesian methods have mostly been developed in the 1930s - and they are not really suitable at all for many 21st century applications of statistics.

In this course, we describe statistics based on the E-VALUE, a recently developed alternative for the p-value that does fit 21st century requirements. This is cutting-edge research: the e-value was introduced in modern form only in 2019 - by now there are 100s of papers about it, by top statistics groups in Berkeley, Stanford and the like, and Your Teacher has obtained a prestigious ERC Advanced Grant of the European Union to develop e-values further.

Most importantly, classical methods do not deal well with situations in which new data can keep coming in. For example, based on the results of existing trials, one decides to do a new study of the same medication in a new hospital; or: whenever you type in new search terms, google can adjust the model that decides what advertisements to show to you. E-values handle such situations without problems.
At its core, the e-value is based on viewing statistics as a betting game where you "bet against a null hypothesis". If you make a lot of money in this bet, you have evidence that the hypothesis is false. This puts it in touch with the very foundations of probability theory - a part of the course will be about replacing the standard measure-theoretic foundations of probability by game-theoretic ones - the idea of 'probability as limiting frequency' gets replaced by 'probability as fair price', which puts a lot of existing results in a different light, and avoids hard to interpret 'almost-sure' statements. The game-theoretic, "betting" treatment will serve as an organizing principle throughout the class.

On top of introducing e-value based methods (tests, confidence intervals etc.) we will review classical approaches and discuss what each of them can and cannot achieve. These include Fisherian testing, Neyman-Pearson testing, Jeffreys-Bayesian (all from the 1930s), sequential testing (1940s) and pure likelihood-based (1960s) approaches. We will also treat approaches from the 1980s and 1990s based on data-compression ideas.

## Course Objectives

• Understand the notions of likelihood and its application in the classical statistical paradigms (frequentist, Bayesian, sequential)

• Understand the notion of e-value, e-process, nonnegative test martingale, testing by betting and its application in always-valid testing and estimation

• Understand the powers and limitations of existing statistical methods

## Timetable

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

## Mode of instruction

Weekly lectures. Bi-weekly exercise sessions in which homework of type (a) is discussed.
Homework consisting of (a) math exercises and (b) a project involving doing a few experiments with an R package.

## Assessment method

The final grade consists of homework (40%) and a written (retake) exam (60%). To pass the course, the grade for the (retake) exam should be at least 5 and the (unrounded) weighted average of the two partial grades at least 5.5. No minimum grade is required for the homework in order to take the exam or to pass the course. The homework counts as a practical and there is no retake for it; it consists of at least 5 written assignments, of which the lowest grade is dropped, as well as a small programming assignment.

Parts of

• R. Royall, Statistical Evidence: a likelihood paradigm ( Chapman & Hall/CRC, 1999)

• P. Grünwald, the Minimum Description Length Principle (MIT Press, 2007, freely available on internet)

• handouts that will be made available during the lectures

## Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

• Enrolment for the fall opens in July

• Enrolment for the spring opens in December

Note:

• It is mandatory to enrol for all activities of a course that you are going to follow.

• Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

• Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

## Contact

Send email to the teaching assistant Tyron Lardy, t.d.lardy@math.leidenuniv.nl

## Remarks

Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.