The student should be acquainted with Chapters 1-8, 10 of the book “The Art of R Programming” by Norman Matloff (2011).
Make sure you have a laptop available during each lecture with SPSS version 21 or higher and the latest version of R and R-Studio (for details see Blackboard).
This course is about a large variety of methods for multivariate analysis and multidimensional data analysis. The first part (seven course-days) deals with the analysis of measurements for N objects (persons) on P variables (attributes), and we typically wish to understand the relationships between those objects and variables. The data are usually given in one or more multivariate data matrices. The course extends classical approaches to multivariate analysis in various ways. We will not only deal with numeric, but also with categorical (both nominal and ordinal) multivariate data. In addition, we will be able to deal with nonlinear relationships between variables. Both extensions are part of the same optimal quantification/nonlinear transformation framework. Key concepts are dimension reduction and visualization (in principal components and correspondence analysis), and prediction and regularization (in multiple regression analysis).
The second part of the course (two course-days) is about a very important group of multidimensional techniques for the analysis of proximity data between objects (given in one or more N by N matrices) and preference data between row objects and column objects (in one or more N by M matrices). For the analysis of proximities and preferences, we use the terms multidimensional scaling and multidimensional unfolding, respectively. Here dimension reduction and visualization are of utmost importance by definition, while nonlinear transformations also play an important part.
The third part of the course (five course-days) will focus on classification methods. Here the interest is primarily in the question whether we can predict the class an object (subject, person) belongs to from a predefined set of classes given a set of explanatory variables. Three methods will be presented in detail: discriminant analysis, multinomial logistic regression and cluster analsyis. The methods will be presented, and students will also learn how to program some methods in R. Next to R, the first two parts of the course will also use the IBM-SPSS package CATEGORIES, which has been developed in Leiden.
See the Leiden University students' website for the Statistical Science programme -> Schedules 2018-2019
Mode of Instruction
The course consists of 2 course-days per week. Each course-day contains a two-hour lecture and a two-hour practical.
Assessment will be based on a written exam (60%) and 4 assignments (40%). The minimum required grade for the written exam is a 5. The minimum required average grade for the home assignments is a 5.
Date information about the exam and resit can be found in the Time Table. The room and building for the exam will be announced on the electronic billboard, to be found at the opposite of the entrance, the content can also be viewed here http://info.liacs.nl/math/.
The written exam is a closed exam. Books, laptop, internet or any other sources of external information are not allowed during the exam.
Reading material will be announced at the start of the course.
Enroll in Blackboard for the course materials and course updates.
To be able to obtain a grade and the EC for the course, sign up for the (re-)exam in uSis ten calendar days before the actual (re-)exam will take place. Note, the student is expected to participate actively in all activities of the program and therefore uses and registers for the first exam opportunity.
Exchange and Study Abroad students, please see the Prospective students website for information on how to apply.
elise.dusseldorp [at] math [dot] leidenuniv [dot] nl
- This is a compulsory course of the Master Statistical Science for the Life and Behavioural sciences / Data Science.