Prospectus

nl en

Statistical Learning and Prediction

Course
2024-2025

Entry requirements

Only open to Master’s students in Psychology with specialisation Methodology and Statistics in Psychology and Research Master’s students from Psychology.

This course is offered twice a year

In this course, we will work with the R software. Students who have not attended the Introduction to R" or "Computational Statistics with R” course should spend a few hours, before the course starts, by: 1) Studying and working through the examples and exercises of the first chapter of Beaujean’s book (see below). 2) Going through the examples in the Introduction to R, or use RStudioPrimers on this website

Description

Statistical learning refers to a vast set of tools for understanding data. Two classes of such tools can be distinguished: “supervised” and “unsupervised”. Supervised statistical learning involves building a statistical model for predicting an output (response, dependent) variable based on one or more input (predictor) variables. There are many areas of psychology where such a predictive question is of interest. For example, finding early markers for Alzheimer’s or other diseases, selection studies for personnel or education, or prediction of treatment outcomes. In unsupervised statistical learning, there are only input variables but no supervising output (dependent) variable; nevertheless we can learn relationships and structures from such data using cluster analysis and methods for dimension reduction. In this course we aim to give the student a firm theoretical basis for understanding and evaluating statistical learning techniques and teach the students skills to apply statistical learning techniques in empirical research. The course (and its assessment) is fully in English.

Course objectives

Upon completion of this course, students can:

  1. Explain key concepts from statistical learning (e.g., the bias-variance trade-off, the difference between explanation and prediction, learners);
  2. Identify several important classes of learning techniques (e.g., regression and classification methods, nonlinear models, ensemble methods, support vector machines and unsupervised learning methods);
  3. Indicate how to evaluate the performance of a statistical learning method and to choose a correct model (by resampling methods, like using validation, cross-validation, bootstrap);
  4. Apply important classes of learning techniques and resampling methods with R to empirical data;
  5. Apply several data manipulation steps in R to empirical data;
  6. Select a suitable analysis method for given data (also including methods not discussed in the course), apply it and interpret the results correctly;
  7. Clearly report on their analytic choices, pipelines and results for colleagues in the field and non-experts in the company/organisation.

Timetable

For the timetable of this course please refer to MyTimetable

Registration

Education

Students must register themselves for all course components (lectures, tutorials and practicals) they wish to follow. You can register up to 5 days prior to the start of the course.

Mode of instruction

The course consists of 14 lectures (2 hours each) in which we alternate between theory and practice (exercises in R). In a final session (7 hours) all students give an oral presentation and students ask each other questions regarding their presentation.

For each couple of two lectures, students receive a list of online videos and associated parts of the book of James et. al, 2013 (see below) that should be studied at home beforehand (about 3 hours work per lecture couple).

The course is fully in English (lectures, assignments, presentation).

Assessment method

The final grade is based on:

1) a written structured assignment (individual, half way the course, only English language); estimated preparation time: 6 hours
2) a written structured assignment (individual, at the end of the course, only English language); estimated preparation time: 6 hours
3) oral presentation regarding the analysis of a data set of students’ own choice (in group, at the end of the course, only English language); estimated preparation time: 10 hours

Students receive (during the lecture) feedback on the assignments and the oral presentation.

The Institute of Psychology follows the policy of the Faculty of Social and Behavioural Sciences to systematically check student papers for plagiarism with the help of software. All students are required to take and pass the Scientific Integrity Test with a score of 100% in order to learn about the practice of integrity in scientific writing. Students are given access to the quiz via a module on Brightspace. Disciplinary measures will be taken when fraud is detected. Students are expected to be familiar with and understand the implications of this fraud policy.

Reading list

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with applications in R. New York: Springer. A free copy and online tutorials are available online.

Beaujean, A. A. (2014). Latent variable modeling using R. A step by step guide. New York: Routledge.

Additional suggested material (not required):

  • Berk, R. A. (2008). Statistical learning from a regression perspective. Springer. (a PDF is available via Leiden University Library)

  • Kuhn, M. & Johnson, K. (2013). Applied predictive modelling. Springer. (a PDF is available via Leiden University Library)

Contact information

Dr. Tom F. Wilderjans t.f.wilderjans@fsw.leidenuniv.nl