Admission requirements
Basic prior knowledge of data structures, machine learning, probability theory, text representation (embeddings), linear algebra (vector spaces) is recommended. For students starting in February, it is advised to take this course in the second year of the master and not in the first semester.
Description
The internet, search engines, and large language models have drastically changed the way humans deal with information. Whereas in the previous century librarians were still classifying books and articles using subject codes, nowadays search technology is available to everyone, everywhere, and in many different contexts. We all use search engines on a daily basis to find relevant information. Not only on the web (e.g. with a search engine or chatGPT), but also in e-commerce websites, social media and news platforms, travel portals, video/music apps, and our email clients. This course covers both the theory and practice of the field of Information Retrieval, with a focus on textual content.
Outline:
Fundamentals:
week 1. Introduction
week 2. Boolean retrieval, indexing and compression
week 3. Evaluation and test collections
Models:
week 4. Vector space model
week 5. Neural IR and Transformers for ranking 1
week 6. Probabilistic IR
week 7 Language Modeling for IR
week 8. Neural IR and Transformers for ranking 2
week 9. (student presentations about critical review of research paper)
Applications:
week 10. Web search and recommender systems
week 11. User interaction and conversational search
week 12. IR in practice (guest lecture)
week 13. IR in the age of LLMs
Course objectives
After successful completion of this course, students are able to:
explain the theoretical underpinnings and implementation of information retrieval models, in particular Boolean retrieval, probabilistic models, vector space models, and transformer models
apply, analyse, and discuss IR models for a given problem setting
explain and apply the common evaluation methods and metrics for IR systems
explain and perform compression algorithms for indexing
list and discuss challenges and models in IR applications, such as web search and conversational search
discuss and evaluate a scientific IR publication
experiment with IR models and evaluate and analyse the outcome
Timetable
The most recent timetable can be found at the Computer Science (MSc) student website.
In MyTimetable, you can find all course and programme schedules, allowing you to create your personal timetable. Activities for which you have enrolled via MyStudyMap will automatically appear in your timetable.
Additionally, you can easily link MyTimetable to a calendar app on your phone, and schedule changes will be automatically updated in your calendar. You can also choose to receive email notifications about schedule changes. You can enable notifications in Settings after logging in.
Questions? Watch the video, read the instructions, or contact the ISSC helpdesk.
Note: Joint Degree students from Leiden/Delft need to combine information from both the Leiden and Delft MyTimetables to see a complete schedule. This video explains how to do it.
Mode of instruction
Lectures, homework exercises, literature, assignments (no lab sessions).
Assessment method
The course grade will be computed as follows:
- Homework (weekly exercises, individual) – 10%
- Critical review of a scientific paper (in groups) – 10%
- Practical assignment (in groups) – 20%
- Final written exam (closed book) – 60%
The grade of the homework exercises is based on the number of completed exercises (n_completed/n_total∗10). Because the purpose is exercising, not testing, the homework exercises are not graded, only checked for completion.
Completion of exercises and assignments is not mandatory for passing the course, but the grade for exercises or assignments that are not submitted is 0.
Group work is an integral part of the course. You will be expected to complete the assignments together with a team mate.
The grade for the exam needs to be at least 5.5 to pass the course. The exam has a regular written re-sit opportunity. A weighted average of all components of at least 5.5 is needed to pass the course.
The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.
Reading list
Both textbooks are publicly available online.
Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan (2008): Introduction to information retrieval. Cambridge University Press. ISBN: 978-0521865715 https://nlp.stanford.edu/IR-book/
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates (2021): Pretrained Transformers for Text Ranking: BERT and Beyond. Morgan & Claypool https://arxiv.org/abs/2010.06467
Additional literature will be distributed on Brightspace.
Registration
As a student, you are responsible for enrolling on time through MyStudyMap.
In this short video, you can see step-by-step how to enrol for courses in MyStudyMap.
Extensive information about the operation of MyStudyMap can be found here.
There are two enrolment periods per year:
Enrolment for the fall opens in July
Enrolment for the spring opens in December
See this page for more information about deadlines and enrolling for courses and exams.
Note:
It is mandatory to enrol for all activities of a course that you are going to follow.
Your enrolment is only complete when you submit your course planning in the ‘Ready for enrolment’ tab by clicking ‘Send’.
Not being enrolled for an exam/resit means that you are not allowed to participate in the exam/resit.
Contact
Lecturers:
Remarks
Software
Starting from the 2024/2025 academic year, the Faculty of Science will use the software distribution platform Academic Software. Through this platform, you can access the software needed for specific courses in your studies. For some software, your laptop must meet certain system requirements, which will be specified with the software. It is important to install the software before the start of the course. More information about the laptop requirements can be found on the student website.