Admission requirements
- Familiarity with least squares linear regression
- Ability to program in R (preferred) or in Python
- Basic knowledge of university-level probability theory, calculus, and linear algebra
Description
Statistical learning refers to a vast set of tools for understanding data. These techniques are used in a wide range of industries and research fields. They have, for example, been used for: product and movie recommendations, predicting disease status, identifying fraudulent bank transactions, and identifying genes associated with specific diseases. This course provides a basis for understanding statistical learning techniques and teaches the skills to apply and evaluate them.
The course will cover both supervised and unsupervised learning methods:
Supervised statistical learning involves building a model for predicting an outcome (response, dependent) variable based on one or more input (predictor) variables. The supervised learning methods discussed will include classical and state-of-the-art classification methods: regularized regression (Ridge, Lasso), naive Bayes, linear and quadratic discriminant analysis, decision trees, support vector machines, generalized additive models, random forests and gradient boosting. We explain the interrelations between these methods and analyze their behaviour. We will also discuss model selection, where we consider both classical and state-of-the-art methods, including cross-validation.
In unsupervised statistical learning, there are only input variables but no supervising outcome (dependent) variable; nevertheless, we can learn relationships and structures from such data. We will consider methods for clustering (i.e., the classic k-means and hierarchical clustering) and dimension reduction methods (like PCA).
Course Objectives
After the course, the student can:
Explain the key concepts and techniques of supervised and unsupervised learning methods.
Reason about the relative strength and weaknesses of different statistical learning methods and their resulting suitability for real-world data problems.
Select appropriate models and performance metrics for a given statistical learning task.
Create an experiment to select the optimal model parameters.
Apply the chosen model to the dataset and evaluate its performance using appropriate metrics.
Evaluate the importance of features and their relationships to the outcome by interpreting the model.
Timetable
In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.
Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.
Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.
Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.
Mode of Instruction
Lectures and computer practicals. We will use Brightspace to share all course material.
Assessment method
The final grade is based on (each with a weight of 1/3):
1. a written structured assignment (individual, half way the course)
2. a written structured assignment (individual, at the end of the course)
3. oral presentation regarding the analysis of a data set of students’ own choice (in group, at the end of the course)
4. Students receive (during the lecture) feedback on the assignments and the oral presentation.
Reading list
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: with applications in R. New York: Springer. A free copy and online tutorials are available online
- Beaujean, A. A. (2014). Latent variable modeling using R. A step by step guide. New York: Routledge.
Additional resources:
1. Berk, R. A. (2008). Statistical learning from a regression perspective. Springer. (a PDF is available via Leiden University Library)
2. Kuhn, M. & Johnson, K. (2013). Applied predictive modelling. Springer. (a PDF is available via Leiden University Library)
3. T. Hastie, R. Tibshirani, J. Friedman (2009). The Elements of Statistical Learning, (2nd edition) (available for free at https://web.stanford.edu/~hastie/Papers/ESLII.pdf)
4. Bishop, C. M. (2006). Pattern recognition and machine learning (1st edition). Springer.
5. Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
Registration
As a student, you are responsible for enrolling on time through MyStudyMap.
In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.
There are two enrolment periods per year:
Enrolment for the fall opens in July
Enrolment for the spring opens in December
See this page for more information about deadlines and enrolling for courses and exams.
Note:
It is mandatory to enrol for all activities of a course that you are going to follow.
Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.
Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.
Contact
Anikó Lovik: a.lovik@fsw.leidenuniv.nl
Remarks
Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.