Applied Data Science and Explainable AI, 2026-2027 - Studiegids

Admission requirements

No formal requirements.

The course builds on concepts from the bachelor courses Statistics (Ba2) and Machine Learning (Ba3). Students are expected to program in Python.

Description

Applied Data Science and Explainable AI places data mining; machine learning; (explainable) AI and statistics in context; both experimentally and socially. If you want to correctly deploy artificial intelligence techniques; you must be able to translate a (broadly formulated) question by a customer or a co-worker into an experimental set-up; to make the right choices for the methods you use; and to be able to process the data in the right form to apply those methods. After performing your experiments; you should not only be able to evaluate the results but also interpret and translate it back to the original question (e.g. by visualization and explainable AI). Socially; data science and AI are of great importance because the media simplify many data-driven results and statistical research; often making mistakes. Thus; a lot of misinformation comes down on us and it is up to you; the data scientists of the future; to recognize; explain and correct that misinformation. This course is a combination of lectures and practical sessions; in which you take a hands-on approach to solving real-world data science problems.

Course objectives

A. Knowledge

You can explain basic machine learning concepts: supervised learning; unsupervised learning; classification; regression
You know and can explain the following experimental and statistical principles: bias; overfitting; cross validation; high-dimensional data; sparseness; dimensionality reduction; feature extraction; class imbalance.
You can explain the purpose and principles of feature extraction from semi-structured data; text data; image data and sensor data.
You can explain the difference between engineered features and raw features; in content and in dimensionality.
You can explain different types of missing data and how to handle them.
You can explain the use and importance of measuring the quality and reliability of human-labeled data.
You can explain the most important evaluation measures: Accuracy; Mean Squared Error; Precision; Recall; F1 and Mean Average Precision.
You can explain the benefits and challenges of big data.
You can explain the principles of responsible data science.
You can explain and use the four different types of Explainable AI (XAI); local, global, model agnostic and model intrinsic.
You can explain different feature attribution methods such as SHAP and LIME.
You can explain the principles of Mechanistic Interpretability.
You understand the importance of transparency in AI; particularly in business and scientific contexts; and how XAI can improve trust and decision-making.

B. Skills

You can recognize statistical misinformation in the media and misleading visualizations; explain and correct them.
You can apply different explainable AI methods to assess AI model performance for classification problems.
You can apply different types of XAI methods to derive different types of explanations; such as counterfactual examples; saliency maps; global feature rankings; local feature rankings and others.
You can apply the most important evaluation measures: Accuracy; Mean Squared Error; Precision; Recall; F1 and Mean Average Precision.
You can apply imputation methods to handle missing values.
You can perform preprocessing steps such as feature extraction and dimensionality reduction on different data types.
You can communicate insights from data analyses and machine learning models to both technical and non-technical audiences; with a focus on explainability.

After completing the course; you can independently take the steps to set up and execute an experiment within data science; given a (broadly formulated) question:

Task definition: You can create a clear definition of a task based on a general description of a task; consisting of (a) the research question; (b) whether the task is supervised or unsupervised; (c) whether it is a classification; regression or ranking task (or something else); (d) what the data are and (e) what the labels are;
Data collection: If answering the question requests data is not given; then you can define what data you need and how to collect it. If you need explicit labels; you can set up a data annotation task for human raters;
Data exploration: You can collect and visualize statistics about the data. You can calculate and interpret the inter-annotator agreement for annotated data.
Pre-processing and feature extraction: You can write a Python script to read and process the data; extract features and store the feature vectors. You know how to engineer a low-dimensional feature set
Model learning: You can apply unsupervised and supervised models to your data. You know how to make an informed decision on the type of classifier given the feature set. You can generate output for unseen data.
Evaluation: You can correctly set up your model evaluation with a train / test split and cross validation if necessary. You can evaluate your output against human data. You know which evaluation measures you should use given the type of data and model. You can perform significance testing. You can do a sensible error analysis and feature analysis.
Explanation: You can explain the machine learning models both globally and for several specific samples (locally) using XAI methods.

Schedule

In MyTimetable; you can find all course and programme schedules; allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally; you can easily link MyTimetable to a calendar app on your phone; and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video; read the instructions; or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Teaching method

14 lectures; 2x45 minutes
- 1st 45 minutes: lecture
- 2nd 45 minutes: practical session (working on data science problems in Python)

Assesment method

The assessment of the course consists of a (mostly multiple choice) exam (60% of course grade) and a practical part (40% of course grade). The practical part is subdivided in three assignments. The weights for the assignments are 10%; 10% and 20% respectively for assignment 1 to 3 (of the total grade). The grade for the written exam should be 5.5 or higher in order to complete the course. The weighted average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the tasks is not submitted the grade for that task is 0.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Resit, review & feedback

Each assignment has a normal and resit deadline. When an assignment is submitted as a resit, the maximum grade of 6.0 can be obtained for that assignment.

Reading list

You are expected to read 3 research papers during the course and a few chapters from an open-source book about XAI. The papers and chapters are announced during the lectures and will be published on Brightspace. The other materials are the course slides and the practical session instructions.

Applied Data Science and Explainable AI