Speech Technology (Specialization Module)

  • Instructor: Timo Baumann
  • Summer term 2016
  • Wednesdays, 12-16 F-334
  • Description in Stine
  • this course will be taught in English unless all participants agree on a different language (most likely German)


  • course time will be split into lecture (50%) and seminar (50%);
    additional time will be required for practical Labs (in groups) and self-study (see below for details)
  • grading is based on active participation, Lab poster presentation, seminar presentation, seminar paper, and oral exam (see below for details)

Learning Outcomes

The learning outcomes consist of three building blocks:
  1. learning to work in a scholarly manner,
  2. learning about speech & language processing, and
  3. aquiring skills on using and evaluating NLP/speech software:
  • students have an overview of the speech technology field: tasks, challenges, foundational techniques
  • students are able to analyze and classify central problems of speech processing and are able to deliberate about solutions and their alternatives
    • levels of competency: knowledge, understanding
  • students are able to explain and discuss selected aspects of speech processing in detail and to illustrate their consequences for applications
    • levels of competency: knowledge, understanding, application, analysis, valuation
  • in group projects, students have developed skills in using and experimenting with existing speech technology and the corresponding evaluation methodology
    • levels of competency: understand, apply, valuate, present
    • competencies: theoretical understanding, practical skills, teamwork and collaboration
  • students are able to reflect on their scholarly behaviour
  • students are able to autonomously study specialization areas that are similar to speech technology (in AI, CS, or linguistics), find and digest relevant scientific literature and discuss findings and further questions with colleagues

Commented collection of possible topics/questions for the oral exam.
comments on the community of practice in speech processing. See also: isca-speech.org, aclweb.org, sigdial.org, ...

Expected Workload

  • 6 credit points (LP) -> expected workload ~150-180h
  • active participation in the course and exam: 13×3.5h+.5h=46h
  • preparation and post-processing of course topic: 14×1h=14h
  • practical work in lab groups:
    • learning to use the chosen application, understanding the application domain: 15h
    • coordination in the group: 5h
    • jointly propose experiments in the chosen domain, hypothesize outcomes: 5h
    • perform and document experiments: 15h
    • jointly design poster+presentation of domain, application, experiments and results: 5h
  • development of the chosen seminar topic incl. literature research: 15h
  • preparation of the seminar talk: 15h
  • writing the term paper (incl. revisions): 15h
  • peer review of 2 other term papers: 5h
  • preparation for the oral exam: 20h


# date part1 12-14 part2 14-16
1. 2016-04-06 S description of the specilization module
L layered communication
slides, discussion notes; presentation of Lab choices → please see lab topics below!
2. 2016-04-13 L spoken dialogue systems as examples of modular complex systems
preliminary slides, discussion notes
S presentation of seminar topics
3. 2016-04-20 L acoustic phonetics
preliminary slides
L speech synthesis I
preliminary slides, discussion notes
4. 2016-04-27 L speech parametrization and the source-filter model
preliminary slides
L speech recognition I
preliminary slides
5. 2016-05-04 L pronunciation and language modelling
preliminary slides
L speech recognition II
preliminary slides
6. 2016-05-11 L speech synthesis II
preliminary slides
L realtime behaviour with incremental processing
preliminary slides
-- 2016-05-18 half-term break / study period
7. 2016-05-25 S reading assignment (Timo not present); time for Lab group discussions
8. 2016-06-01 seminar talks and discussion Benedikt, Khooshal, Bente, Phil, Julian, Liisa
9. 2016-06-08 seminar talks and discussion Ibrahim, Tim, Ahmed, Katinka, Konstantin, Cuong
10. 2016-06-15 seminar talks and discussion Erik, Max, Morteza, Waleed, Chi
11. 2016-06-22 seminar talks and discussion Abtin, Yiming, Nam, Sebastian, Quan, Thomas, Kolja
12. 2016-06-29 S Wrap-up/interrelation of the individual talks S how to write a term paper
13. 2016-07-06 Lab group poster presentations and discussion
14. 2016-07-13 S discussion of term paper outline L closing remarks, wrap up

  • submission of Lab experiment proposal: 13. May
  • due date for Lab experiment poster: 4. July
  • submission of the term paper draft: 28. August
  • review phase: 1.-20. September (your review needs to be in my Inbox by 16.20. September)
  • submission of the final term paper: 30. September 10. October17. October (due to delays in the reviewing process)
  • exam dates: 18./19. July, 27./28. September, 11. October

Seminar topics

When your topic largely consists of material from a textbook chapter or a referenced article belwo, then you are still required to search and find other articles/papers on the topic, to describe this related work, and to pick one for description and discussion in your term paper! Please send me the results of your literature search and the finalisation of your topic by the 22. April so that I can comment on it (probably within a week) and coordinate topics into presentation groups.

  • seminar talks will be 20 minutes plus 5 minutes interaction/discussion (may be integrated into the presentation and maybe more!), plus 5 minutes of feedback/meta-discussion.

  • Turn-taking: foundational theory: Sacks, Schegloff and Jefferson (1974), many current papers on different ways of finding the current speaker's end-of-turn Benedikt, Khooshal
    • Wilson and Wilson (2005): An oscillator model of the timing of turn-taking;
    • Chao and Thomaz (2016): Timed Petri nets for fluent turn-taking over multimodal interaction resources in human-robot collaboration
    • I didn't find much on neural networks/deep learning, but you will certainly do better.
  • Grounding / finding Common Ground: Bente
    • Clark (1996), Schegloff (1968), many current papers on aligning and entrainment
    • Poesio and Traum (2002) on the units of understanding in dialog interaction
  • Dialogue Management: background in Jokinen and McTear (2010), chapter 2-4;
    • many papers on rule-based systems; present and compare different approaches in practical systems (at least StateChartXML, not just VoiceXML)
    • the ISU (information-state-update) approach (Staffan Larsson and colleagues) Morteza
    • MPD/POMDP-based (partially observable Markov-Decision Processes: describe the general idea, describe reinforcement-learning Max, Waleed
    • hybrid (rule-based/statistical) approaches to dialog management (e.g. Lison 2014) Erik
    • handling errors/miscommunication: e.g. Skantze (2007) Chi
  • Natural Language Understanding: basics: Jurafsky and Martin (2009), chapter 17/18 Ibrahim, Tim
    • semantic frame-based NLU: e.g. Tur and Demori (2011), chapter 3 and one small research paper
  • Natural Language Generation: e.g. Reiter and Dale (2000): Building natural language generation systems; Stent and Bangalore (2014): NLG in Interactive Systems Konstantin
    • Reiter's chapter 20 in The handbook of computational linguistics and natural language processing.
    • work by Nina Dethlefs on reinforcement learning for NLG: Cuong
    • generating referring expressions (Stent and Bangalore, chapters 5/6) Katinka
  • Important applications / data collections: Switchboard, Verbmobil Nam
  • Important applications / data collections: the CMU Let's Go dialogue system(s) Abtin
  • The Dialog State Tracking Challenge: present overall idea and one interesting solution (why is it interesting?) Thomas, Quan
  • Evaluating Dialogue Systems: Jokinen and McTear (2010), chapter 6, the PARADISE paradigm (or related evaluation methodologies) Kolja
  • Multi-modal dialogue systems
    • sensory integration
    • Embodied dialog systems in robots
    • Intelligent virtual agents
  • Multi-party dialogue and multi-party dialogue systems (e.g. Branigan (2006): Perspectives on multi-party dialogue, Traum (2004): Issues in Multiparty Dialogues, ...) Phil (have you been able to access to the paper?)
  • Applied systems: Lewis (2011): Practical Speech User Interface Design Julian
    • look for recent advances in applied dialogue/IVR technology
    • deficiencies of current-day dialogue systems, how can they be measured and avoided?
  • Applied systems: Siri/Google Now: collect available (scientific) papers and discuss Sebastian
  • Applied systems: Paek and Pierracini (2008): Automating spoken dialogue management design using machine learning: An industry perspective (and possibly other literature). Liisa

Lab topics

Poster hints:

  • have a very clear message and get this message across as best as possible. It often helps to state this message explicitly.
  • use bold font, color, arrows/visual help to get your message across!
  • the university [[https://www.uni-hamburg.de/beschaeftigtenportal/services/oeffentlichkeitsarbeit/corporate-design/manual/corporate-manual-2016.pdf#page=42][corporate identity guidelines] aren't too bad * they propose at minimum font size of 22pt (18pt for captions)
  • use bullet lists rather than long sentences; would a visual representation simplify complex algorithms/ideas/relationships?
  • for drawings: specify a width of each line (at least 1mm wide, maybe even 3 or 5 mm); otherwise, the drawing will disappear from the distance
  • your main results should be somewhere in the center of the poster, not only at the bottom!
  • include a bibliography (bottom right part is probably best; at least if you are right-handed, the bottom right is the least important while you explain things)

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback