# Foundations of Statistics and Machine Learning

Vak
2020-2021

Prerequisites are basic probability theory (including laws of large numbers, central limit theorem) and statistics (maximum likelihood, least squares). Knowledge of machine learning or more advanced probability/statistics may be useful but is not essential. In particular, all stochastic process/martingale theory that is needed will be developed from scratch.

Description
A large fraction (some claim > 1/2) of published research in top journals in applied sciences such as medicine and psychology is irreproduceable. In light of this 'replicability crisis', classical statistical methods, most notably testing based on p-values, have recently come under intense scrutiny. Indeed, p-value based tests but also other methods like confidence intervals and Bayesian methods have mostly been developed in the 1930s - and they are not really suitable at all for many 21st century applications of statistics. Most importantly, they do not deal well with situations in which new data can keep coming in. For example, based on the results of existing trials, one decides to do a new study of the same medication in a new hospital; or: whenever you type in new search terms, google can adjust the model that decides what advertisements to show to you.

In this class we first review the classical approaches to statistical testing, estimation and uncertainty quantification (confidence) and discuss what each of them can and cannot achieve. These include Fisherian testing, Neyman-Pearson testing, Jeffreys-Bayesian (all from the 1930s), sequential testing (1940s) and pure likelihood-based (1960s) approaches. From the confidence perspective, it includes classical (Neyman-Pearson) confidence intervals, Fisher's fiducial distributions and Bayesian posteriors. For each of these we treat the mathematical results underlying them (such as complete class theorems and the 'law of likelihood') and we give examples of common settings in which they are mis-used. All these approaches, while quite different and achieving different goals, have difficulties in the modern age, in which "optional continuation" is the rule rather than the exception. We will also treat approaches from the 1980s and 1990s based on data-compression ideas.

We will then treat the one approach which seems more suitable for the modern context: the always-valid-confidence sets of Robbins, Darling and Lai (late 1960s), which has its roots in sequential testing (Wald, 1940s). The always-valid-approach has recently been re-invigorated and extended. The mathematics behind it involves martingale-based techniques such as Doob's optional stopping theorem, advanced concentration inequalities such as a finite-time law of the iterated logarithm and information-theoretic concepts such as the relative entropy.

The central organizing principle in our treatment is the concept of likelihood and its generalization, nonnegative supermartingales. We will also discuss the close connections between the always-valid statistical methods and modern machine learning methods based on the 'multi-armed bandits' which are used by e.g. Google to decide what adds to show to you.

Course Objectives

• Understand the notions of likelihood and its application in the classical statistical paradigms (frequentist, Bayesian, sequential)

• Understand the notion of nonnegative test martingale and its application in always-valid testing and estimation

• Understand the powers and limitations of existing statistical methods

Mode of instruction
Weekly lectures. Bi-weekly exercise sessions in which homework of type (a) is discussed.
Homework consisting of (a) math exercises and (b) a project involving doing a few experiments with an R package.

Assessment method
homework (a): math exercises 20%
homework (b): analyzing data via an R-package 20%
written open-book exam 60%
To pass the course, both the total grade and the grade for the open-book exam must be > 5.5.

Literature
Parts of

• R. Royall, Statistical Evidence: a likelihood paradigm ( Chapman & Hall/CRC, 1999)

• P. Grünwald, the Minimum Description Length Principle (MIT Press, 2007, freely available on internet)

• handouts that will be made available during the lectures

Website
Both brightspace and a course web site will be used. Course website available via www.safestatistics.com .

Contact
By email to the teacher: pdg[at]cwi.nl