Studiegids

nl en

Essentials for Data Science

Vak
2024-2025

Admission requirements

Elementary statistical skills and elements of linear algebra.

Description

The course offers a practical introduction to a few programming languages and tools currently used in data science:

  • Python is a general-purpose, high-level and easy to learn programming language. It provides a large number of data science libraries (e.g. machine learning, neural networks, data manipulation, data visualization).

  • SQL is a standard language used to create, query, update and manage relational databases. For example, such databases are used to store large tables with results of experiments.

  • Git is a tool that allows to track changes in files during development of programs. It is the current standard for collaborative code development.

During the course you will develop Python programs of growing complexity.
You will use state-of-the-art Python-specific data manipulation/visualization (e.g. pandas, Matplotlib) data science libraries.
After the course you will be able to program simple reproducible data analyses (consisting of data reading, cleaning, simple modelling, and reporting steps).
You will also learn about fundamentals of the relational databases and of the SQL language, and you will practice this knowledge on an example database (SQLite).
You will access the database through direct SQL statements and through high-level object-oriented Python library (SQLAlchemy).

First, you will work alone and practice code development.
Later, shared code development will be practiced in groups.
The students will be requested to use git to track changes in their code and to share their code with other students through GitHub.

Finally, the relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.

Course Objectives

During the course you will practice writing Python code.
After the course you will be able to:

  • Create Python code using collections (‘list’, ‘tuple’, ‘set’, ‘dict’), flow control statements (‘if’, ‘for’, ‘while’, exceptions), context managers (‘with’).

  • Develop user functions.

  • Use Python classes (instance variables, methods, inheritance).

  • Combine functions from the Python standard libraries (reading/writing files in different formats; ‘math’, ‘statistics’, ‘random’) into own code.

  • Analyse example data with common data science libraries (NumPy, pandas, Matplotlib).

  • Understand relational databases and apply the SQL language to create, query, and update a relational database.

  • Understand basics of SQLAlchemy for Python object-oriented database access.

  • Practice Python programming through running several machine learning algorithms.

  • Practice individual and collaborative code development by using git and GitHub.

  • Explain the relevance of data stewardship and FAIR principles for scientific research.

Timetable

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

Lectures and practical sessions.

Assessment method

Two homework assignments (each 10% of the final grade), a group assignment (20%), the final written exam (60%). Moreover, a data stewardship quiz needs to be passed.

Reading list

Materials of the last year: https:⁠/⁠/github.com/LUMC/EfDS.

Registration

Every student must register for courses with the new enrollment tool MyStudymap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to register for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam.

Extensive FAQ on MyStudymap can be found here.

Contact

email: Szymon M. Kiełbasa: smkielbasa@lumc.nl

Remarks