Studiegids

nl en

Essentials for Data Science

Vak
2023-2024

Admission requirements

Elementary statistical skills and elements of linear algebra.

Description

The course offers a practical introduction to a few programming languages and tools currently used in data science:

  • Python is a general-purpose, high-level and easy to learn programming language. It provides a large number of data science libraries (e.g. machine learning, neural networks, data manipulation, data visualization).

  • SQL is a standard language used to create, query, update and manage relational databases. For example, such databases are used in data science to store large tables with results of experiments.

  • Git is a tool that allows to track changes in files during development of programs. It is a current standard for collaborative code development.

During the course the students will write Python programs of growing complexity (from basic coding examples to fitting a machine learning model). After the course the students shall be able to program simple reproducible data analyses (consisting of data reading, cleaning, simple modelling, and reporting steps). The state-of-the-art Python-specific data manipulation/visualization (pandas, Matplotlib) and data science libraries will be discussed.
Fundamentals of the relational databases and of the SQL language will be presented in a context of an example database (SQLite). The database will be accessed through direct SQL statements and through high-level, object-oriented Python library (SQLAlchemy).

First, the students will work alone and practice individual code development. Later, shared code development will be practiced in groups. The students will be requested to use git to track changes in their code and to share their code with other students through GitHub.

Finally, relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.

Course Objectives

During the course you will practice writing Python code. After the course you will be able to:

  • Use Python collections (list, tuple, set, dict).

  • Use Python flow control statements (if, for, while, exceptions), context managers (with) and define user functions.

  • Understand Python classes (instance variables, methods, inheritance).

  • Use Python standard libraries (reading/writing files in different formats; math, statistics, random).

  • Use common data science libraries (NumPy, pandas, Matplotlib).

  • Understand relational databases and use SQL to create, query, update a database.

  • Understand basics of SQLAlchemy for Python object-oriented database access.

  • Understand how to execute several machine learning algorithms.

  • Use git and GitHub for individual and collaborative code development.

  • Explain the relevance of data stewardship and FAIR principles for scientific research.

Timetable

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

Lectures and practical sessions.

Assessment method

Three homework assignments (each 10% of the final grade), a group assignment (30%), the final written exam (40%). Moreover, a data stewardship quiz needs to be passed.

Reading list

Materials of the last year: https://github.com/LUMC/EfDS.

Registration

It is the responsibility of every student to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.
Extensive FAQ's on MyStudymap can be found here.

Contact

email: Szymon M. Kiełbasa: smkielbasa@lumc.nl

Remarks