During this seminar the fundamentals of audio processing and indexing will be studied. Applications in the area of speech recognition and understanding, audio synthesis and content based audio retrieval will be discussed. State of the art work on speech recognition and content based audio retrieval will be studied and presented by the participants.
The seminar starts with several lectures and accompanying assignments in the form of workshops; followed by a literature selection, study, and presentations by all the students; the seminar ends with final project demos / presentations.
At the end of the seminar, students:
Should have a clear understanding of the fundamentals of speech recognition and understanding, and audio processing and indexing.
Are able to apply the basic audio processing algorithms to sets of audio files and databases.
Have experienced and studied the general setup of a scientific experiment in the field of content based audio retrieval.
Are able to acquire necessary knowledge of state of the art scientific methods in the field of audio indexing by studying scientific publications from journals and proceedings.
Are able to design, implement, execute and report on a scientific audio processing or indexing experiment.
The most recent timetable can be found at the students' website.
Mode of instruction
Hours of study: 168 (= 6 EC)
Practical work: 62
Presentations and Project (60% of grade). Class discussions, attendance, and workshops (40% of grade).
Lecture slides and further materials will be made available on the website of the course.
List of recommended books:
Discrete-Time Speech Signal Processing, Principles and Practice by T.F. Quatieri, Prentice Hall PTR; ISBN 013242942, 2002.
Fundamentals of Speech Recognition by Lawrence Rabiner, and Biing-Hwang Juang (Hardcover, 507 pages; Publisher: Pearson Education POD; ISBN: 0130151572; 1st edition, April 12, 1993)
Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang , Alex Acero , Hsiao-Wuen Hon , Raj Reddy (Hardcover, 980 pages; Publisher: Prentice Hall PTR; ISBN: 0130226165; 1st edition, April 25, 2001)
Dong Yu, Li Deng, Automatic Speech Recognition: A Deep Learning Approach (Signals and Communication Technology), Springer; 2015 edition (November 11, 2014).
- You have to sign up for courses and exams (including retakes) in uSis. Check this link for information about how to register for courses.