High-dimensional data analysis

This course gives an overview of statistical methods that are used for analyzing high- dimensional data sets in which many variables (often thousands) have been measured for a limited number of subjects. This type of data arises in genomics, where genetic information is measured for many thousands of genes simultaneously, but also in functional MRI imaging of the brain. The course covers the most important statistical issues in this field, which include: a) initial processing of the data; b) model- based differential expression analysis for Gaussian and count data (classical and Bayesian methods); c) multiple testing (family-wise error rate and false discovery rate control); d) penalized regression (lasso and ridge); e) shrinkage; and f) graphical models for constructing networks. Several specific types of high-dimensional data will be discussed and used during the course. Philosophy: Teaching students the adjustments to classical statistical methodology, necessary to tackle high-dimensional data.

Students should be able to perform and understand the most common analysis types on several types of high-dimensional data, and be familiar with the specific issues in important types of high dimensional data sets.

The course consists of a series of lectures and practicals (partly computer practicals, partly exercises).

