Admission Requirements
This course is part of the minor "Data Science & Artificial Intelligence" and as such has the same admission requirements.
Students are also expected to follow the course "Basic Programming for AI" in the aforementioned minor at the same time, or to achieve a proficiency in Python programming at the same rate as given in that course.
Description
Data is everywhere, available in large quantities and for many fields: collected from businesses, healthcare, social media, government, scientific research, and more. Even more data is continuously produced by users and processes. However, very rarely this raw data can directly be used in (AI) models. Most available data contains errors, missing information, and different data sources will present the same conceptual information in different shapes and forms. Additionally, different models and applications have their own expectations about how data should be fed to them: should it be normalised? Discretised and/or binned? Should it be binary or continuous? These terms and limitations need to be understood for their effective use.
In this course, students will learn how real-life data can be cleaned, transformed, structured, and generally "massaged" into a coherent shape that allows analysis and integration in automated models, both in principle and in practice (e.g., via Python scripting).
Feeding data to models is however not the only task of a data scientist! Science is based on communication, and as such scientists need to be able to visualise their data and the results of their models. Clear visualisations allow humans to easily make sense of large amounts of data, and to present and convince others about the validity of their observations.
In this course students will learn the principles behind effective, clear and fair visualisation of data, and how to use visualisation techniques to craft a compelling and informative narrative. Students will be able to put these principles in practice with data from their own fields of interest.
Course Objectives
At the end of the course, students are able to:
Recognise the different types of data available to a data scientist: from the classic tabular, to time series, spatial and beyond. And to recognise the different types of features present in the data: categorical, ordinal, numerical (continuous).
Use the visualisation tools and techniques that are appropriate for each type of data and feature.
Choose which features of a dataset should be included in a model and/or visualisation and which ones should be avoided to avoid unnecessary cluttering and complexity.
Construct a clean and prepared dataset from raw data, such that it can be successfully fed to (AI) models, both manually and in an automated Python pipeline.
Understand basics techniques for feature selection and dimensionality reduction
Make use of visualisations to present the content and key characteristics of large datasets, and drive a convincing narrative to an audience in a scientific context.
Mode of Instruction
Lectures and workgroups.
Students can work either alone or in pairs during workgroups.
Students are expected to bring their own computers to the workgroups.
Assessment method
The final grade is composed of the following parts (all graded on a 0 to 10 scale):
60% Written exam. A minimum grade of 5.5 in this component is required to pass the course.
30% Group project. Late submissions are penalised with minus 1 point (on the 10-point scale) for each 24 hours of delay, starting from 5 minutes (i.e., submitting 10 min late: -1 point; submitting 25 hours late: -2 points, etc.). Deadline extensions are only possible in special circumstances: they are granted by the lecturer on an individual basis, and only if the student(s) initiate contact about this at least 2 days prior to the deadline.
10% Completion of workgroups (= lab sessions) exercises. This will be evaluated during the workgroups themselves.
The final grade is the weighted average of the aforementioned components, rounded to the nearest 0.5.
In case the student does not pass the course, they can participate in a retake exam. The grade of the retake exam will be their final grade for the course.
Registration
Application period
For application EduXchange is used, application will start on Thursday 15th of May 2025 at 13:00h.
For minor students, TU Delft, Erasmus and LDE students: Thursday 15 May 13.00h until 30 June
More information about the application procedure can be found on this website: