Prospectus

nl en

Essentials for Data Science

Course
2024-2025

Admission requirements

Elementary statistical skills and elements of linear algebra.

Description

The course offers a practical introduction to a few programming languages and tools currently used in data science:

  • Python is a general-purpose, high-level and easy to learn programming language. It provides a large number of data science libraries (e.g. machine learning, neural networks, data manipulation, data visualization).

  • SQL is a standard language used to create, query, update and manage relational databases. For example, such databases are used to store large tables with results of experiments.

  • Git is a tool that allows to track changes in files during development of programs. It is the current standard for collaborative code development.

During the course you will develop Python programs of growing complexity.
You will use state-of-the-art Python-specific data manipulation/visualization (e.g. pandas, Matplotlib) data science libraries.
After the course you will be able to program simple reproducible data analyses (consisting of data reading, cleaning, simple modelling, and reporting steps).
You will also learn about fundamentals of the relational databases and of the SQL language, and you will practice this knowledge on an example database (SQLite).
You will access the database through direct SQL statements and through high-level object-oriented Python library (SQLAlchemy).

First, you will work alone and practice code development.
Later, shared code development will be practiced in groups.
The students will be requested to use git to track changes in their code and to share their code with other students through GitHub.

Finally, the relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.

Course Objectives

During the course you will practice writing Python code.
After the course you will be able to:

  • Create Python code using collections (‘list’, ‘tuple’, ‘set’, ‘dict’), flow control statements (‘if’, ‘for’, ‘while’, exceptions), context managers (‘with’).

  • Develop user functions.

  • Use Python classes (instance variables, methods, inheritance).

  • Combine functions from the Python standard libraries (reading/writing files in different formats; ‘math’, ‘statistics’, ‘random’) into own code.

  • Analyse example data with common data science libraries (NumPy, pandas, Matplotlib).

  • Understand relational databases and apply the SQL language to create, query, and update a relational database.

  • Understand basics of SQLAlchemy for Python object-oriented database access.

  • Practice Python programming through running several machine learning algorithms.

  • Practice individual and collaborative code development by using git and GitHub.

  • Explain the relevance of data stewardship and FAIR principles for scientific research.

Timetable

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Mode of instruction

Lectures and practical sessions.

Assessment method

Two homework assignments (each 10% of the final grade), a group assignment (20%), the final written exam (60%). Moreover, a data stewardship quiz needs to be passed.

Reading list

Materials of the last year: https:⁠/⁠/github.com/LUMC/EfDS.

Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

  • Enrolment for the fall opens in July

  • Enrolment for the spring opens in December

See this page for more information about deadlines and enrolling for courses and exams.

Note:

  • It is mandatory to enrol for all activities of a course that you are going to follow.

  • Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

  • Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

Contact

email: Szymon M. Kiełbasa

Remarks

Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.