Studiegids

nl en

Text Mining

Vak
2019-2020

Admission requirements

Assumed prior knowledge

A Bachelor in Computer Science is recommended for this course.

Description

Text mining, also known as 'knowledge discovery from text', is an ICT research and development field that has gained increasing focus in the last decade, attracting researchers from data science, computational linguistics, and machine learning. Example key applications text categorization, information extraction, social media mining and automatic summarization. This course gives an overview of the field from both a theoretical angle (underlying models) and a practical angle (applications). In addition to the lectures, the students work on practical assignments.

Outline:
week 1. introduction
week 2. text processing
week 3. vector semantics
week 4. text categorization
week 5. data collection and annotation
week 6. neural NLP and transfer learning
week 7. information extraction
week 8. text summarization
week 9. opinion mining and sentiment analysis
week 10. biomedical text mining
week 11. authorship attribution
week 12. industrial text mining (guest lecture)
week 13. conclusions

Course objectives

After successful completion of this course, students have an understanding, both at the conceptual and the technical level, of the application of natural language processing (NLP) in the text mining area. Students can build models for a text mining task using machine learning algorithms and language data, and they can evaluate and report on the developed models and modules. Also, students understand, from a theoretical perspective, which tools are applicable in which situations, and which real-world challenges prevent the application of certain techniques (such as language variation and noise due to document processing errors).

Timetable

The most recent timetable can be found on the students' website.

Mode of instruction

Lectures.

Course load

Total hours of study: 168 hrs.

lectures: 26 hrs
literature reading: 26 hrs
studying for exam: 30 hrs
examination: 6 hrs
practical exercises: 40 hrs
assignments: 40 hrs

Assessment method

  • a written exam (50% of course grade)

  • practical assignments (50% of course grade)

    • three assignments (10% each) during the course
    • one more substantial assignment (20%) at the end of the course

Reading list

  • 4 chapters from: Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft, 2018).

  • 6 chapters from: ChengXiang Zhai and Sean Massung. Text Data Management and Analysis. A Practical Introduction to Information Retrieval and Text Mining (2016).

  • Additional literature.

Registration

  • You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.

  • Please also register for the course in Blackboard as soon as the lecturer has made the course page available.

  • Due to limited capacity, external students can only register after consultation with the programme coordinator/study adviser (mailto:mastercs@liacs.leideuniv.nl).

Contact information

Lecturer: dr. S. Verberne & H.P. de Vos MA
Website: Course website