UHH>Informatik>NatS>SLP16 Web>WebHome (27 Sep 2016, TimoBaumann) Print version

Speech Technology (Specialization Module)

Instructor: Timo Baumann
Summer term 2016
Wednesdays, 12-16 F-334
Description in Stine
this course will be taught in English unless all participants agree on a different language (most likely German)

Approach
Learning Outcomes
Expected Workload
Sessions
Seminar topics
Lab topics
- Poster hints:

Approach

course time will be split into lecture (50%) and seminar (50%);
additional time will be required for practical Labs (in groups) and self-study (see below for details)
grading is based on active participation, Lab poster presentation, seminar presentation, seminar paper, and oral exam (see below for details)

Learning Outcomes

The learning outcomes consist of three building blocks:

learning to work in a scholarly manner,
learning about speech & language processing, and
aquiring skills on using and evaluating NLP/speech software:

students have an overview of the speech technology field: tasks, challenges, foundational techniques
students are able to analyze and classify central problems of speech processing and are able to deliberate about solutions and their alternatives
- levels of competency: knowledge, understanding
students are able to explain and discuss selected aspects of speech processing in detail and to illustrate their consequences for applications
- levels of competency: knowledge, understanding, application, analysis, valuation
in group projects, students have developed skills in using and experimenting with existing speech technology and the corresponding evaluation methodology
- levels of competency: understand, apply, valuate, present
- competencies: theoretical understanding, practical skills, teamwork and collaboration
students are able to reflect on their scholarly behaviour
students are able to autonomously study specialization areas that are similar to speech technology (in AI, CS, or linguistics), find and digest relevant scientific literature and discuss findings and further questions with colleagues

Commented collection of possible topics/questions for the oral exam.
comments on the community of practice in speech processing. See also: isca-speech.org, aclweb.org, sigdial.org, ...

Expected Workload

6 credit points (LP) -> expected workload ~150-180h
active participation in the course and exam: 13×3.5h+.5h=46h
preparation and post-processing of course topic: 14×1h=14h
practical work in lab groups:
- learning to use the chosen application, understanding the application domain: 15h
- coordination in the group: 5h
- jointly propose experiments in the chosen domain, hypothesize outcomes: 5h
- perform and document experiments: 15h
- jointly design poster+presentation of domain, application, experiments and results: 5h
development of the chosen seminar topic incl. literature research: 15h
preparation of the seminar talk: 15h
writing the term paper (incl. revisions): 15h
peer review of 2 other term papers: 5h
preparation for the oral exam: 20h

Sessions

#	date	part1 12-14	part2 14-16
1.	2016-04-06	S description of the specilization module slides	L layered communication slides, discussion notes; presentation of Lab choices → please see lab topics below!
2.	2016-04-13	L spoken dialogue systems as examples of modular complex systems preliminary slides, discussion notes	S presentation of seminar topics
3.	2016-04-20	L acoustic phonetics preliminary slides	L speech synthesis I preliminary slides, discussion notes
4.	2016-04-27	L speech parametrization and the source-filter model preliminary slides	L speech recognition I preliminary slides
5.	2016-05-04	L pronunciation and language modelling preliminary slides	L speech recognition II preliminary slides
6.	2016-05-11	L speech synthesis II preliminary slides	L realtime behaviour with incremental processing preliminary slides
--	2016-05-18	half-term break / study period
7.	2016-05-25	S reading assignment (Timo not present); time for Lab group discussions
8.	2016-06-01	seminar talks and discussion	Benedikt, Khooshal, Bente, ~~Phil~~, Julian, Liisa
9.	2016-06-08	seminar talks and discussion	Ibrahim, Tim, Ahmed, Katinka, Konstantin, Cuong
10.	2016-06-15	seminar talks and discussion	Erik, Max, Morteza, ~~Waleed~~, Chi
11.	2016-06-22	seminar talks and discussion	Abtin, ~~Yiming~~, ~~Nam~~, Sebastian, Quan, Thomas, Kolja
12.	2016-06-29	S Wrap-up/interrelation of the individual talks	S how to write a term paper
13.	2016-07-06	Lab group poster presentations and discussion
14.	2016-07-13	S discussion of term paper outline	L closing remarks, wrap up

submission of Lab experiment proposal: 13. May
due date for Lab experiment poster: 4. July
submission of the term paper draft: 28. August
review phase: 1.-20. September (your review needs to be in my Inbox by ~~16.~~20. September)
submission of the final term paper: ~~30. September 10. October~~17. October (due to delays in the reviewing process)
exam dates: 18./19. July, 27./28. September, 11. October

Seminar topics

When your topic largely consists of material from a textbook chapter or a referenced article belwo, then you are still required to search and find other articles/papers on the topic, to describe this related work, and to pick one for description and discussion in your term paper! Please send me the results of your literature search and the finalisation of your topic by the 22. April so that I can comment on it (probably within a week) and coordinate topics into presentation groups.

seminar talks will be 20 minutes plus 5 minutes interaction/discussion (may be integrated into the presentation and maybe more!), plus 5 minutes of feedback/meta-discussion.

Turn-taking: foundational theory: Sacks, Schegloff and Jefferson (1974), many current papers on different ways of finding the current speaker's end-of-turn Benedikt, Khooshal
- Wilson and Wilson (2005): An oscillator model of the timing of turn-taking;
- Chao and Thomaz (2016): Timed Petri nets for fluent turn-taking over multimodal interaction resources in human-robot collaboration
- I didn't find much on neural networks/deep learning, but you will certainly do better.
Grounding / finding Common Ground: Bente
- Clark (1996), Schegloff (1968), many current papers on aligning and entrainment
- Poesio and Traum (2002) on the units of understanding in dialog interaction
Dialogue Management: background in Jokinen and McTear (2010), chapter 2-4;
- many papers on rule-based systems; present and compare different approaches in practical systems (at least StateChartXML, not just VoiceXML)
- the ISU (information-state-update) approach (Staffan Larsson and colleagues) Morteza
- MPD/POMDP-based (partially observable Markov-Decision Processes: describe the general idea, describe reinforcement-learning Max, Waleed
- hybrid (rule-based/statistical) approaches to dialog management (e.g. Lison 2014) Erik
- handling errors/miscommunication: e.g. Skantze (2007) Chi
Natural Language Understanding: basics: Jurafsky and Martin (2009), chapter 17/18 Ibrahim, Tim
- semantic frame-based NLU: e.g. Tur and Demori (2011), chapter 3 and one small research paper
Natural Language Generation: e.g. Reiter and Dale (2000): Building natural language generation systems; Stent and Bangalore (2014): NLG in Interactive Systems Konstantin
- Reiter's chapter 20 in The handbook of computational linguistics and natural language processing.
- work by Nina Dethlefs on reinforcement learning for NLG: Cuong
- generating referring expressions (Stent and Bangalore, chapters 5/6) Katinka
Important applications / data collections: Switchboard, Verbmobil Nam
Important applications / data collections: the CMU Let's Go dialogue system(s) Abtin
The Dialog State Tracking Challenge: present overall idea and one interesting solution (why is it interesting?) Thomas, Quan
Evaluating Dialogue Systems: Jokinen and McTear (2010), chapter 6, the PARADISE paradigm (or related evaluation methodologies) Kolja
Multi-modal dialogue systems
- sensory integration
- Embodied dialog systems in robots
- Intelligent virtual agents
Multi-party dialogue and multi-party dialogue systems (e.g. Branigan (2006): Perspectives on multi-party dialogue, Traum (2004): Issues in Multiparty Dialogues, ...) Phil (have you been able to access to the paper?)
Applied systems: Lewis (2011): Practical Speech User Interface Design Julian
- look for recent advances in applied dialogue/IVR technology
- deficiencies of current-day dialogue systems, how can they be measured and avoided?
Applied systems: Siri/Google Now: collect available (scientific) papers and discuss Sebastian
Applied systems: Paek and Pierracini (2008): Automating spoken dialogue management design using machine learning: An industry perspective (and possibly other literature). Liisa

Lab topics

PitchTracking: Erik, Julian, Bente, Khooshal POSTER
Phonemisation: Benedikt, Tim POSTER
SpeechDecoding: Max, Liisa, Ahmed, Morteza POSTER
LanguageModelling with SRILM: Chi, Cuong, Quan POSTER
LanguageModelling with RNNLM: Thomas, Abtin, Sebastian POSTER
SignificanceTesting: Kolja, Ibrahim POSTER
IncrementalProcessing: Katinka, Konstantin POSTER

Poster hints:

have a very clear message and get this message across as best as possible. It often helps to state this message explicitly.
use bold font, color, arrows/visual help to get your message across!
the university [[https://www.uni-hamburg.de/beschaeftigtenportal/services/oeffentlichkeitsarbeit/corporate-design/manual/corporate-manual-2016.pdf#page=42][corporate identity guidelines] aren't too bad * they propose at minimum font size of 22pt (18pt for captions)
use bullet lists rather than long sentences; would a visual representation simplify complex algorithms/ideas/relationships?
for drawings: specify a width of each line (at least 1mm wide, maybe even 3 or 5 mm); otherwise, the drawing will disappear from the distance
your main results should be somewhere in the center of the poster, not only at the bottom!
include a bibliography (bottom right part is probably best; at least if you are right-handed, the bottom right is the least important while you explain things)

SLP16

NatsWiki
Main
User
Sandbox
System

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback