Prospectus

nl en

Essentials for Data Science

Course
2021-2022

Admission requirements

  • knowledge of basic linear algebra and linear models

Description

Python and R are the most frequently used programming languages of data science.
After this course the student shall be able to program reproducible analyses in Python.
An analysis would consist of data reading, cleaning, modelling and reporting steps.
Implementation would be based on the state of the art Python-specific data science libraries.
Data manipulation methods (with a brief reference to the SQL language) will be shown.
The students will be requested to practice usage of machine learning algorithms introduced earlier.
Moreover, relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.

Course Objectives

After the course you will be able to:

  • write and execute a Python program or Python-notebook script/report

  • read/write data stored in standard tabular/hierarchical formats

  • perform data manipulation operations (table filtering, merging, wide/long conversion)

  • visualise histograms, scatter plots, etc.

  • execute several machine learning algorithms

  • explain the relevance of data stewardship for scientific research

  • properly handle research data during the complete data life cycle (planning research, collecting data, processing & analyzing data, preserving data, giving access to data, re-using data)

  • apply the FAIR principles (Findable, Accessible, Interoperable, Reusable)

Mode of instruction

Assessment method

tba

Literature

  • Python tutorial: https://docs.python.org/3/tutorial/index.html

  • Current tutorials for Python Libraries for Data Science: NumPy, SciPy, Pandas, Matplotlib, TensorFlow

Contact

email: Szymon M. Kiełbasa smkielbasa@lumc.nl