High-dimensional data analysis, 2010-2011 - Prospectus

This course gives an overview of statistical methods that are used for analyzing high-dimensional data sets in which many variables (often thousands) have been measured for a limited number of subjects. This type of data arises in genomics, where genetic information is measured for many thousands of genes simultaneously, but also in functional MRI imaging of the brain. The first part of the course covers the most important statistical issues in this field, which include multiple testing (empirical Bayes methods, familywise error rate and false discovery rate control), gene set testing, prediction methods in high dimensions (penalized regression methods, principal component regression, cross-validation). The second part explores specific issues in specific types of high dimensional data (gene expression studies, DNA copy number analysis, proteomics and fMRI imaging). Philosophy: Teaching students the adjustments to classical statistical methodology, necessary to tackle high-dimensional data Goals: Students should be able to perform and understand the most common analysis types: limma, FDR and FWER control methods, clustering, global test, ridge regression, lasso and principal components regression with cross-validation, and be familiar with the specific issues in important types of high dimensional data sets.

Tentaminering
Grades will be based on a written exam and a practical assignment.