# High-dimensional data analysis

Course
2024-2025

Basic knowledge on statistics and probability, linear algebra (e.g., matrices, eigenvalues and eigenvectors, singular value decomposition), generalized linear models (linear regression, logistic regression) and Bayesian methods is required.

## Description

Modern day high-throughput techniques characterize many traits (easily thousands) of an individual simultaneously. Often the resulting data are available for a comparatively small number of individuals. This unbalance in number of covariates to the sample size is typical to high-dimensional data. Such data arise in genomics, where genetic information is measured for many thousands of genes simultaneously, but also in economics and psychometrics. Analysis of high-dimensional data requires adjustments to well-known statistical methods, and the introduction of several novel concepts.

The course teaches students the adjustments to classical statistical methodology necessary to analyse high-dimensional data. This encompasses estimation methods, testing procedures, and shrinkage. More specifically, a) model-based inference for Gaussian and count data (classical and Bayesian methods); b) multiple testing (family-wise error rate and false discovery rate control); c) penalized regression (lasso and ridge); and d) shrinkage. Several types of high-dimensional data will be discussed and used during the course.

## Course objectives

At the end of the course, the student
1) is familiar with the pros and cons of the novel/adjusted statistical techniques.
2) can apply the novel/adjusted techniques to data and calculate, e.g., a prediction.
3) can reflect on the suitability of, e.g., the multiple testing procedure, to the situation at hand.
4) can motivate why certain methods, e.g. shrinkage, are beneficial in high-dimensional settings.
5) can discuss the limitations of the conclusions drawn from the results generated by, e.g., penalized regression.

## Timetable

## Mode of Instruction

The course consists of a series of lectures and practicals (partly computer practicals, partly exercises).

## Assessment method

Hand-in bonus assignments + written exam

Literature will be specified during course, no books are required.

## Contact

mark.vdwiel@vumc.nl and w.n.van.wieringen@vu.nl

## Remarks

Software
