Studiegids

nl en

High-dimensional data analysis

Vak
2023-2024

Admission requirements

Basic knowledge on statistics and probability, linear algebra (e.g., matrices, eigenvalues and eigenvectors, singular value decomposition), generalized linear models (linear regression, logistic regression) and Bayesian methods is required.

Description

Modern day high-throughput techniques characterize many traits (easily thousands) of an individual simultaneously. Often the resulting data are available for a comparatively small number of individuals. This unbalance in number of covariates to the sample size is typical to high-dimensional data. Such data arise in genomics, where genetic information is measured for many thousands of genes simultaneously, but also in economics and psychometrics. Analysis of high-dimensional data requires adjustments to well-known statistical methods, and the introduction of several novel concepts.

The course teaches students the adjustments to classical statistical methodology necessary to analyse high-dimensional data. This encompasses estimation methods, testing procedures, and shrinkage. More specifically, a) model-based inference for Gaussian and count data (classical and Bayesian methods); b) multiple testing (family-wise error rate and false discovery rate control); c) penalized regression (lasso and ridge); and d) shrinkage. Several types of high-dimensional data will be discussed and used during the course.

Course objectives

At the end of the course, the student
1) is familiar with the pros and cons of the novel/adjusted statistical techniques.
2) can apply the novel/adjusted techniques to data and calculate, e.g., a prediction.
3) can reflect on the suitability of, e.g., the multiple testing procedure, to the situation at hand.
4) can motivate why certain methods, e.g. shrinkage, are beneficial in high-dimensional settings.
5) can discuss the limitations of the conclusions drawn from the results generated by, e.g., penalized regression.

Timetable

See the Leiden University students' website for the Statistical Science programme -> Schedules

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of Instruction

The course consists of a series of lectures and practicals (partly computer practicals, partly exercises).

Assessment method

Hand-in bonus assignments + written exam

Reading list

Literature will be specified during course, no books are required.

Registration

It is the responsibility of every student to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

mark.vdwiel@vumc.nl and w.n.van.wieringen@vu.nl

Remarks