Vanwege de coronamaatregelen kan de onderwijsvorm of tentaminering afwijken. Zie voor actuele informatie de betreffende cursuspagina’s op Brightspace.

Studiegids

nl en

Text Mining

Vak
2021-2022

Admission requirements

Assumed prior knowledge

A Bachelor in AI or Computer Science is recommended for this course, as well as experience with programming in Python.

Description

Text mining, also known as 'knowledge discovery from text', is a research and development field that has gained increasing focus in the past two decades, attracting researchers from data science, natural language processing, and machine learning. Key applications are text categorization, information extraction, social media mining and automatic summarization. This course gives an overview of the field from both a theoretical angle (underlying models) and a practical angle (applications, challenges with data). In addition to the lectures, the students work on practical assignments.

Outline:
week 1. Introduction
week 2. Text processing
week 3. Vector Semantics
week 4. Text categorization
week 5. Data collection and annotation
week 6. Neural NLP and transfer learning
week 7. Information Extraction
week 8. Text summarization
week 9. Sentiment analysis
week 10. Biomedical text mining
week 11. Industrial Text Mining
week 12. Conclusions

Course objectives

After successful completion of this course, students have an understanding, both at the conceptual and the technical level, of the application of natural language processing (NLP) for the purpose of text mining. Students can build models for a text mining task using machine learning algorithms and language data, and they can evaluate and report on the developed models and modules. Also, students understand, from a theoretical perspective, which tools are applicable in which situations, and which real-world challenges prevent the application of certain techniques (such as language variation and noise due to document processing errors).

Timetable

The most recent timetable can be found at the Computer Science (MSc) student website.

Mode of instruction

Lectures.

Assessment method

  • a written exam (50% of course grade)

  • practical assignments (50% of course grade)

    • two assignments (10% each) during the course
    • one more substantial assignment (30%) at the end of the course

The grade for the written exam should be 5.5 or higher in order to complete the course. The average grade for the practical assignments should be 5.5 or higher in order to complete the course. If one of the tasks is not submitted the grade for that task is 0.

The teacher will inform the students how the inspection of and follow-up discussion of the exams will take place.

Reading list

The literature will be distributed on Brightspace. The majority of the chapters comes from this book: Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed), December 2020 https://web.stanford.edu/~jurafsky/slp3/

Registration

  • You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.

  • Due to limited capacity, external students can only register after consultation with the programme coordinator/study adviser mastercs@liacs.leideuniv.nl.

Contact

Lecturer: dr. S. Verberne
Website: Course website

Remarks