# Statistics and Data Analysis

Vak
2021-2022

Familiarity with least-squares analysis (Praktische Sterrenkunde). Basic Python skills such as: making figures, working with functions, for-loops, and executing scripts (Programmeermethoden NA).

## Description

After you have conducted observations, and have finished reducing these into what we call a dataset, follows an important question: what can you learn from these data? Perhaps you have a certain hypothesis that needs to be tested. Or perhaps you have stumbled on a potential correlation between two observables of your sample. For each of these scenarios a set of tools is available to assess the relevance of your observations. In Statistics and Data Analysis you will get familiar with these assessment tools. By creating your own simulated datasets you will understand how and why these tools work, and also find out about their limitations. Finally, you will work with real astronomical datasets and apply what you have learned in practice.

## Course objectives

You will learn how to simulate data using a Monte Carlo approach, you will also test the boundaries of statistical methods, thus learning how to avoid common problems such as “overfitting” and the “look-elsewhere effect”.
After this course, you are able to:

• Simulate data using Monte Carlo methods.

• Apply two different statistical tests (Pearson’s r and Kendall’s tau) to measure the correlation strength between two variables.

• Apply two different statistical tests to examine the difference between two distributions (Kolmogorov-Smirnov and Anderson-Darling).

• Explain how these tests work and under which circumstances they can be applied.

• Explain the difference between a correlation and causal connection.

• Be aware of the risk of confounding factors.

• Identify and correct for Malmquist bias in astronomical data.

• Identify when the “look-elsewhere” effect is important in your data analysis.

• Quantify when you are “overfitting” the data.

## Soft skills

• Visualizing key properties of a dataset in a clear figure.

• Structured thinking, including computational thinking and programming.

• Summarizing the properties of a dataset in a written report.

## Mode of instruction

• Lectures

• Exercise classes
All the exercise classes will involve writing and running scripts in Python. A laptop with a working Python environment is preferred for these classes.

## Assessment method

• Two homework sets (50%)

• Written report for final assessment (50%)

## Brightspace

Instructions and course material can be found on Brightspace. Registration for Brightspace occurs automatically when students enroll in uSis via uSis by registration for a class activity using a class number.

Background material will be made available during the course.

## Registration

Register via uSis. More information about signing up for classes and exams can be found here. Exchange and Study Abroad students, please see the Prospective students website for information on how to register. For a la carte and contract registration, please see the dedicated section on the Prospective students website.

## Contact information

Lecturer: Sjoert van Velzen
Assistants: Dr. Helgi Hrodmarsson, Stan Barmentloo, Eliot Schwander , Puck Rooijakkers