Studiegids

nl en

Essentials for Data Science

Vak
2022-2023

Admission requirements

  • knowledge of basic linear algebra and linear models

Description

Python and SQL belong to the most frequently used programming languages of data science.
After this course the student shall be able to program simple reproducible analyses in Python.
An analysis will consist of data reading, cleaning, simple modelling, and reporting steps.
The state-of-the-art Python-specific data manipulation/visualization and data science libraries will be discussed.
The students will be requested to write Python programs of growing complexity (from basic coding examples to fitting a machine learning model).
Fundamentals of the relational databases and of the SQL language will be discussed in a context of an example database (SQLite).
We will practice database usage through direct SQL statements and through high-level, object-oriented Python library (SQLAlchemy).
Finally, the students will be requested to work in groups and practice shared code development (GitHub).
Moreover, relevance of data stewardship and FAIR principles (Findable, Accessible, Interoperable, Reusable) will be discussed.

Course Objectives

After the course you will be able to:

  • Develop and execute a Python program or Python-notebook script/report.

  • Read/write data stored in standard formats or extract data from a relational database.

  • Perform data manipulation operations (table filtering, joining/merging) in Python and SQL.

  • Visualise histograms, scatter plots, etc.

  • Execute several machine learning algorithms.

  • Properly handle research data during the complete data life cycle (planning research, collecting data, processing & analyzing data, preserving data, giving access to data, re-using data).

  • Apply the FAIR principles (Findable, Accessible, Interoperable, Reusable).

Timetable

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

  • 14 sessions (4 hours each) consisting of a lecture part merged with a computer practical.

  • 4-6 compulsory small hand-in assignments (homeworks).

  • One large group assignment.

Assessment method

The final grade will be calculated based on:

  • The grades of small individual Python programming assignments (homeworks; with possibility to improve grades from earlier submissions).

  • The grade of the exam (covering Python data manipulation, visualization, and simple modeling topics).

  • The grade of the group assignment (implementation of a small relational database, with SQL and object-oriented access methods, with some reports and delivered as a well-documented GitHub repository).

  • The grade of the data stewardship quiz.

To pass the course, all the grades will have to be above a certain threshold and the final grade will be calculated as a weighted average. The threshold, the weights and rounding strategy will be specified before the start of the course.

Reading list

The detailed lecture/practical materials will be made available before each session. Moreover:

  • Python tutorial: https://docs.python.org/3/tutorial/index.html.

  • Current tutorials for Python Libraries for Data Science: NumPy, SciPy, Pandas, Matplotlib.

  • SQL tutorial: https://www.w3schools.com/sql/, SQLite Python tutorial: https://www.sqlitetutorial.net/sqlite-python/, SQLAlchemy tutorial: https://www.sqlalchemy.org/.

  • Git and GitHub introduction: https://www.w3schools.com/git/git_intro.asp.

Registration

From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.

Contact

email: Szymon M. Kiełbasa: smkielbasa@lumc.nl

Remarks