nl en

Reinforcement Learning


Admission requirements

Assumed prior knowledge
1. One or more of a Bachelor level course on Artificial Intelligence, Machine Learning, Data Science, or Data Mining.
2. Bachelor level proficiency in the Python programming language.
3. Good familiarity with deep learning is required.


Deep reinforcement learning is a field of Artificial Intelligence that has attracted much attention since impressive achievements in Robotics, Atari, and most recently Go, where human world champions were defeated by computer players. These results build upon a combination of the rich history of reinforcement learning research and deep learning.
This course teaches the field of deep reinforcement learning: How does it work, why does it work, and what are the reinforcement learning methods on which Robotics and AlphaGo’s success are based? By the end of the course you should have acquired a good understanding of the field of deep reinforcement learning.

The defining characteristic of reinforcement learning is that agents learn through interaction with an environment, not unlike humans learn by doing. Instead of telling a learner which action to take, the agent analyzes which action to take so as to maximize a reward signal. Reinforcement learning is a powerful technique for solving sequential decision problems.

The defining characteristic of deep learning is that the model generalizes, it build a hierarchy of abstract features from its inputs.

Prominent reinforcement learning problems occur, amongst others, in games and robotics. In this course you will learn the necessary theory to apply reinforcement learning to realistic problems from the field of computer game playing.
The following topics and algorithms are planned to be discussed:

  • Tabular Value-based Reinforcement Learning, such as Q-learning

  • Deep Value-based Reinforcement Learning, such as DQN

  • Policy-based Reinforcement Learning, such as PPO

  • Model-based Reinforcement Learning

  • Two-Agent Self-Play (AlphaGo)

  • Multi-Agent Reinforcement Learning (Poker, StarCraft)

  • Hierarchical Reinforcement Learning

  • Meta-Learning, such as MAML

  • Brief Summary of Deep Supervised Learning

In addition the role of reinforcement learning in artificial intelligence and the relation with psychology will be discussed (human learning).
This a hands-on course, in which you will be challenged to build working game playing programs with different reinforcement learning methods. This is a challenging course in which proficiency in Python and deep learning libraries (such as Keras and PyTorch) is important.
All assignments should be made in Python.

Course objectives

After completing the reinforcement learning course, the students should be able to:

  • Understand the key features and components of deep reinforcement learning;

  • Knowledge of theoretical foundations on basic and advanced deep reinforcement learning techniques;

  • Understand the scientific state-of-the-art in the field of deep reinforcement learning.


The most recent timetable can be found at the Computer Science (MSc) student website.

You will find the timetables for all courses and degree programmes of Leiden University in the tool MyTimetable (login). Any teaching activities that you have sucessfully registered for in MyStudyMap will automatically be displayed in MyTimeTable. Any timetables that you add manually, will be saved and automatically displayed the next time you sign in.

MyTimetable allows you to integrate your timetable with your calendar apps such as Outlook, Google Calendar, Apple Calendar and other calendar apps on your smartphone. Any timetable changes will be automatically synced with your calendar. If you wish, you can also receive an email notification of the change. You can turn notifications on in ‘Settings’ (after login).

For more information, watch the video or go the the 'help-page' in MyTimetable. Please note: Joint Degree students Leiden/Delft have to merge their two different timetables into one. This video explains how to do this.

Mode of instruction

  • Literature (see below). The relevant chapters should be read before the corresponding lecture.

  • Lectures

  • Computer lab

Course load
Hours of study: 168 hrs (= 6 EC)
Lectures: 26:00 hrs
Seminars: 26:00 hrs
Practical assignments: 70:00 hrs
Examination and preparation: 46:00 hrs

Assessment method

Assignments (2-4) and theory exam.
The final grade is a combination of grades for: (1) the written exam (50%, mandatory) and (2) the reports for the assignments (50%, mandatory).
Completed assignments are valid for one year. Failing the course means redoing all assignments again next year.
For assignments there is no re-take, for the exam there is.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list


  • A. Plaat, Deep Reinforcement Learning, Springer 2022. Freely available here.


  • R. Sutton and A. Barto, Reinforcement Learning: an introduction, MIT Press, Second Edition, 2018. Freely available here.


From the academic year 2022-2023 on every student has to register for courses with the new enrollment tool MyStudyMap. There are two registration periods per year: registration for the fall semester opens in July and registration for the spring semester opens in December. Please see this page for more information.

Please note that it is compulsory to both preregister and confirm your participation for every exam and retake. Not being registered for a course means that you are not allowed to participate in the final exam of the course. Confirming your exam participation is possible until ten days before the exam.

Extensive FAQ's on MyStudymap can be found here.



There is limited space for students who are not enrolled in the Computer Science programme or one of the Data Science specialisations (Data Science: Computer Science and Astronomy and Data Science). Please contact the programme coordinator/study advisor ( if you are an external student.