## Admission requirements

Not applicable.

## Description

This course equips students with a knowledge base and general overview of the Data Science field necessary for this master specialization. It consists of four elements: Intro R, Statistical Theory, Data Science Guest Lectures / Company visits, and Statistical Computing.

In the Intro R section, students are quickly introduced to the R programming language commonly used in Data Science and Statistics. A solid background in computer programming (e.g. in Python or C++) is assumed. This section prepares the students for the Statistical Computing and the Statistical Theory sections.

Statistics is the art and science of uncertainty, for instance in prediction or classification tasks pivotal to modern Data Science. Working with practitioners to select plausible models and communicating results from real data is the art, the mathematics behind statistical methods and algorithms the science. The Statistical Theory section focuses on the basic tools to prepare Data Science students for courses such as Linear Models, Multivariate Analysis and Statistical Learning Theory that follow in the curriculum. The Statistical Computing section trains students in popular algorithms and builds further upon theory by challenging students to program and visualize statistical routines themselves. This section prepares students for the Advanced Statistical Computing course followed in year 2.

Finally, in a more informal and social setting, students are brought up-to-date on developments in the Data Science field through guest lectures and company visits.

A solid background in computer programming (e.g. in Python or C++) is assumed.

## Course objectives

In addition to an introduction to the field of Data Science through guest lectures and company visits, the course objectives include knowledge of the following topics:

Probabilities, distributions, sampling, law of large numbers, central limit theorem, maximum likelihood estimation, hypothesis testing, confidence intervals, prediction, functions and objects in R, visualization, R markdown, bootstrapping, cross validation, Monte Carlo routines, optimization.

## Mode of instruction

A mix of self-study, plenary lectures, tutorials, computer labs and guest lectures are included in this course. The guest lectures, company visits and the statistical programming section are followed jointly with the Data Science specialization students from the Master in Statistical Science.

## Course load

Total hours of study: 168 hrs.

Lectures 0:00 hrs.

Practical work 0:00 hrs.

Tutoring 0:00 hrs.

Examination 0:00 hrs.

Other 0:00 hrs.

## Timetable

The most recent timetable can be found at the students' website. Changes are communicated through Blackboard.

## Assessment method

The examination of the Statistics and Intro R sections consists of three homework assignments, we denote their average by E1.

The examination of the Statistical Computing section consists of one homework assignment (H) and one written exam (W). Their weighted average is denoted by E2=H*1/3 + W*2/3.

The final grade is determined as the average of the examinations, (E1 + E2) / 2. In order to pass the course, students should score at least 5.5 at both examinations E1 and E2.

Resits of E1 and E2 are possible after active participation on the regular exam opportunities. The resit of E1 will consist of one homework assignment (final weight 50%). The resit of E2 will consist of one online written exam (final weight 50%). Resit date to be announced via Blackboard.

Attendance in guest lectures and company visits is mandatory. Without extenuating circumstances reported to the course coordinator, students that are not present for all guest lectures and company visits will fail the course.

## Non-compulsory reading list

Compulsory material will be provided through lecture slides. The following books are optional literature:

Mathematical Statistics and Data Analysis. John A. Rice. Duxbury press (3rd ed. 2007)

The art of R programming. Norman Matloff, No Starch Press (2011)

The Elements of Statistical Learning. Hastie, Tibshirani & Friedman. Springer Series in Statistics (2009 )

Resampling Methods: A Practical Guide to Data Analysis Phillip I. Good, Birkhauser Boston (3rd ed , 2006)

Permutation, Parametric and Bootstrap Tests of Hypotheses Phillip I. Good, Springer Series in Statistics (2005)

## Course registration

Enroll in Blackboard for course materials and updates.

To be able to obtain a grade and the ECTS for the course, sign up for the (re-)exam (for the second part E2) in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in the program and therefore participates in and registers for the first exam opportunity.

## Contact information

## Remarks

This is a mandatory course in the Master's programme Data Science: Computer Science.