nl en

Constructing Digital Language Toolkits


Admission requirements

MA Students from the Master Asian Studies; the Master Linguistics and the Master Middle Eastern Studies can join unconditionally. Other Master students interested in the course should contact the instructor (contact details below).


This course is designed to introduce humanities researchers to digital techniques for analyzing languages. It requires no previous familiarity with programming, script writing or the state of digital humanities as a discipline. Students should have a specific language in mind that they wish to investigate. Each student will learn to create a toolkit for his or her language by adapting existing software, writing simple script queries and programs, and then deploying the result as a usable application. We shall consider a variety of common desired tasks, including: converting traditional dictionaries to relational databases; corpus searching and corpus linguistics; text alignment; automated transliteration; and optical character recognition (OCR). We also examine options for deploying custom software locally (e.g., on a desktop or laptop computer); in combination with webservers (e.g., as a searchable online dictionary); and across multiple clusters (e.g., a server farm, a cloud). This course is suitable for anyone with a set of texts to analyze digitally and no idea where to begin.

Course objectives

  • overcome fear of technology and understand its potential for language research questions

  • create a linguistics toolkit for a language of the student’s choice using open source software
    and custom modules

  • understand the current state of digital humanities with regard to the various aspects of
    language research in the student’s chosen language; computational research in general; how to
    stay on top of the latest news in DH

  • understand deployment options for software solutions


Visit MyTimetable.

Mode of instruction

Choose from:

  • Seminar is online. The seminar is a mix of lecturing and individual guidance to practice and develop one’s own toolkit.

Course Load

A brief calculation of the course load, broken down by:

  • Total course load for the course (number of EC x 28 hours), for a course of 5 EC is 140 hours, for 10 EC 280.

  • 13 hours lecture (1 hour lecture each week x 13 weeks)

  • 13 hours lab (1 hour computer lab exercises each week x 13 weeks)

  • 114 hours reading

  • 50 hours homework exercises (5 hours each week x 10 assignments)

  • 50 hours independent research

  • 40 hours presentation preparation
    Total = 280 hours

Assessment method

Assessment (10 EC)


  • 20% Lecture attendance

  • 20% Computer lab participation

  • 30% Short, weekly homework assignments

  • 20% Final project (student language toolkit)

  • 10% Poster Session Class Presentation
    the final mark for the course is established by determination of the weighted average combined with the additional requirement that the final toolkit must always be sufficient (5,5 or higher).


In order to pass the course, students must obtain an overall mark of 5.50 (=6) or higher. A revised and improved version of the toolkit may be made if the student failed. The deadline for this version will be determined in consultation.


How and when a review takes place will be determined by the examiner.

Reading list

TBA in Brightspace


Enrolment through uSis is mandatory.
General information about uSis is available in [English]) and Dutch


Dr. Christopher Handy