Studiegids

nl en

Exploratory Data Analysis

Vak
2024-2025

Admission requirements

There are no entry requirements for the course. However, we assume that students are acquainted with the contents of the following courses of the Statistics & Data Science program:

  • Linear algebra

  • Linear and generalized linear models

  • Statistical computing with R

Description

Studying the relationship between two (or maybe three) variables is easy; you can visualise them in two-dimensional (or maybe three-dimensional) graphs. However, when you are interested in the relationship between more than three variables, the human brain often falls short and you need specialised methods to get more insight in the dependencies between either cases or variables, even more so if the relations are not linear.

Many of these specialised methods include the option to transform data in order to explore any non-linear relationships and to reduce the dimensionality. Additionally, these methods allow for a mix different types of variables, continuous or categorical. These models are generally data-driven and therefore descriptive in nature, but some statistical inference can be done.

The techniques that will be covered in class are:

  • Linear and optimal scaling regression analysis (i.a. catreg)

  • Linear and optimal scaling principal components analysis (i.a. catpca)

  • Multiple correspondence analysis

  • Classical scaling analysis (i.a. pcoa and isomap)

  • Multidimensional scaling analysis (i.a. Sammon mapping)

  • Nonlinear dimension reduction (i.a. t-SNE, UMAP)

  • Clustering (i.a. k-means and hierarchical)

Note: In this course the focus is on models to explore associations, not on data visualizations. Of course, these models may visualize their output to ease interpretation, but if you want to know more about data explorations via visualization only, the course Data Visualization is recommended.

Course objectives

By the end of the course, students can:
1. motivate which technique is suitable to explore or answer a research question about a particular dataset;
2. discuss the differences in the assumptions and objectives of the techniques covered in the course;
3. identify the different parts of the loss functions of techniques covered in the course;
4. program some of the algorithms in R;
5. analyse data using the various techniques discussed in the course and evaluate the results.

Timetable

In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.

Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.

Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.

Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.

Mode of Instruction

The course consists of one course-day per week in which we focus on exercises to understand the workings of the various techniques. Students are expected to prepare for each exercise class by, for example, reading the literature, watching the lecture videos, and making preparatory exercises.

Make sure you have a laptop available during each lecture with SPSS version 27 or higher and the latest version of R and R-Studio (for details see Brightspace).

Assessment method

  • Two partial exams: each exam is 1/3 of final grade and should be at least 5.0 to pass the course;

  • Four home assignments: mean of assignments is 1/3 of final grade and should be at least 5.0 to pass the course.

Resit opportunities:

  • Partial exams: there are resit exams for each partial exam;

  • Home assignment: There are no resit opportunities for the individual home assignments, but there is one resit assignment that can replace the lowest grade of the four home assignments.

Reading List

Reading material will be announced at the start of the course via Brightspace and is available via Leiden University Library.

Registration

As a student, you are responsible for enrolling on time through MyStudyMap.

In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.

There are two enrolment periods per year:

  • Enrolment for the fall opens in July

  • Enrolment for the spring opens in December

See this page for more information about deadlines and enrolling for courses and exams.

Note:

  • It is mandatory to enrol for all activities of a course that you are going to follow.

  • Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.

  • Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.

Contact

Course coordinator: Dr. Sanne Willems (s.j.w.willems@fsw.leidenuniv.nl)

Remarks

Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.