Essentials for Data Science, 2021-2022 - Prospectus

Admission requirements

knowledge of basic linear algebra and linear models

Description

Python and R are the most frequently used programming languages of data science.
After this course the student shall be able to program reproducible analyses in Python.
An analysis would consist of data reading, cleaning, modelling and reporting steps.
Implementation would be based on the state of the art Python-specific data science libraries.
Data manipulation methods (with a brief reference to the SQL language) will be shown.
The students will be requested to practice usage of machine learning algorithms introduced earlier.
Moreover, relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.

Course Objectives

After the course you will be able to:

write and execute a Python program or Python-notebook script/report
read/write data stored in standard tabular/hierarchical formats
perform data manipulation operations (table filtering, merging, wide/long conversion)
visualise histograms, scatter plots, etc.
execute several machine learning algorithms
explain the relevance of data stewardship for scientific research
properly handle research data during the complete data life cycle (planning research, collecting data, processing & analyzing data, preserving data, giving access to data, re-using data)
apply the FAIR principles (Findable, Accessible, Interoperable, Reusable)

Mode of instruction

Assessment method

tba

Literature

Python tutorial: https://docs.python.org/3/tutorial/index.html
Current tutorials for Python Libraries for Data Science: NumPy, SciPy, Pandas, Matplotlib, TensorFlow

Contact

email: Szymon M. Kiełbasa smkielbasa@lumc.nl