Statistical Learning and Prediction, 2016-2017 - Prospectus

Entry requirements

Only open to Master’s and Research Master’s students from Psychology.

In this course, we will only work with the R software. Students who have not attended the “Introduction in R and Statistical Computing” course should spend a few hours, before the course starts, by: 1) Studying and working through the examples and exercises of the first chapter of Beaujean’s book. 2) Going through the examples in the introduction to R on http://data.princeton.edu/R/default.html.

Description

Statistical learning refers to a vast set of tools for understanding data. Two classes of such tools can be distinguished: “supervised” and “unsupervised”. Supervised statistical learning involves building a statistical model for predicting an output (response, dependent) variable based on one or more input (predictor) variables. There are many areas of psychology where such a predictive question is of interest. For example, finding early markers for Alzheimer’s or other diseases, selection studies for personnel or education, or prediction of treatment outcomes. In unsupervised statistical learning, there are only input variables but no supervising output (dependent) variable; nevertheless we can learn relationships and structures from such data using cluster analysis and methods for dimension reduction. In this course we aim to give the student a firm theoretical basis for understanding and evaluating statistical learning techniques and teach the students skills to apply statistical learning techniques in empirical research.

Course objectives

Upon completion of this course, students will:

Have knowledge about the difference between explanation and prediction, about the bias-variance trade-off, and about “learners”.
Have a good understanding of several important classes of learning techniques and be able to apply them in R to data: linear regression and classification methods, nonlinear models (splines, GAM), ensemble methods (regression/classification trees, bagging, random forest, boosting), support vector machines and unsupervised learning methods (dimension reduction and clustering).
Know how to evaluate the performance of a statistical learning method by using resampling methods (validation approach, cross-validation, bootstrap) and are able to apply these methods with R to empirical data.

Timetable

For the timetables of your lectures, work groups and exams, please select your study programme in:
Psychology timetables

Lectures

Registration

Course

Students need to enroll for lectures and work group sessions.
Master’s course registration

Mode of instruction

The course consists of 7 lectures (4 hours each) in which we alternate between theory and practice, and an additional question and answer session (2 hours). In a final session (7 hours) all students give an oral presentation and students ask each other questions regarding their presentation. For each lecture, students receive a list of online videos (and associated parts of the book of James et. al, 2013, see below) that should be studied at home beforehand (about 3 hours work per lecture).

Assessment method

The final grade is based on (each with a weight of 1/3):

a written structured assignment (individual, half way the course) estimated preparation time: 6 hours
a written structured assignment (individual, at the end of the course) estimated preparation time: 6 hours
oral presentation regarding the analysis of a data set of students’ own choice (individual, at the end of the course) estimated preparation time: 10 hours

Students receive feedback on the assignments and the oral presentation.

The Faculty of Social and Behavioural Sciences has instituted that instructors use a software programme for the systematic detection of plagiarism in students’ written work. In case of fraud disciplinary actions will be taken. Please see the information concerning fraud.

Reading list

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with applications in R. New York: Springer. A free copy and online tutorials are available online.

Additional suggested material (not required):

Berk, R. A. (2008). Statistical learning from a regression perspective. Springer. (a PDF is available via Leiden University Library)
Kuhn, M. & Johnson, K. (2013). Applied predictive modelling. Springer. (a PDF is available via Leiden University Library)

Contact information

Dr. Tom F. Wilderjans
<t.f.wilderjans@fsw.leidenuniv.nl>