- Instructors: User.WolfgangMenzel, User.TimoBaumann, User.ArneKoehn
- Mixed project (MSc/BSc)
- Thursdays, 12:30-17:45 (lunch break 13:15-13:45, another 15' break)
- F-429 (and F-432), we'll usually start in F-429 and then continue in the Lab
- grading based on active participation, spot talks, and final project report, self management in the group
Proposed project outline
Subtitles for videos are crucial for the hard-of-hearing and can also help normal listener in noisy environments, when consuming video in a foreign language, or to digest complex descriptions. Automatic subtitling requires multiple steps (speech recognition, text preparation and segmentation, time-alignment of text) for result in meaningful subtitles. In a previous project we have worked on time-alignment of text, in this project we would like to focus on the text preparation and segmentation based on speech recognition output.
Aspects of good and bad subtitles
- presence of hesitations/fillers
- structure: reparandum, repair marker (quatsch/ähm/falsch/...), reparans/alteration
- different types
- fresh start of sentence
- 'abridged repairs' (repair marker only)
- mark reparandum instead of deleting it?
- "de-normalize" numbers (precondition for text-processing software to work correctly)
- chunking of subtitles wrt. syntactical chunking
- focus on sentence structure instead of speech timing
- rephrase/simplify/"finalize" the structure
- punctuation (in particular sentence boundaries)
- length of subtitles
- one "idea" per subtitle
- prefer one-line subtitles (what if the "idea" is too long)
- cut too long lines into two lines
- use colors to mark importance? → helpful for debugging?
- precise timing of subtitles → not so important as it's not important for our target audience(s)
|| Everyone adds a milestone for their workgroup
|| Infrastructure: Repo + Redmine
|| Get an evaluation framework running with videos + subtitles
|| parser in old project should be fixed / replaced
|| Buffer + Evaluation Phase
Status 2016-05-12 (before the session)
- post-processing with Language Models (Kolja): in progress, but unsure whether fixable
- post-processing with LSTMs (Theresa): in progress, not working yet
- de-normalization (Felix): using MaryTTS stuff, works well; next: multi-token abbreviations
- extend the main readme of the prosub code (Felix): works
- chunkify module in Python (Michael): works just partially (for English); searches optimal solution
- chunkify module in Java (Tore): works just partially (for German); doesn't find optimal solution?
- evaluation framework (Björn): web evaluation framework is running
- repairs/fillers (Khooshal): repairs, simple algorithms work, context-ones more difficult (German)
| Fillers + Repairs
|| Khooshal, Wladimir
| Punctuation, Capitalization
|| Kolja, Theresa
| Framework, old project (Parser ...)
|| Michael, Felix
|| Tore, Antonia
| Optional: shortening subtitles, tagging meta-information (audience questions)
Work items towards good subtitles
- a framework for web-based subtitle evaluation!!
- Jasper takes a look at BeaqleJS (or alternatives)
- set up some framework of interaction (codepaths and data structures)
- getting everyone to become familiar with Git, decide on programming language(s)?
- finding/fixing repairs (vs. simply skipping all fillers)
- literature search and/or existing implementations and/or algorithmic approaches in literature (Kolja and Khooshal add links)
- Papers we found in the first session
- punctuation recovery
- Theresa looks at the literature on punctuation recovery (the stuff cited by the sphinx postprocessing framework)
- Kolja continues the effort on language modelling (i.e., getting the existing tool to work)
- de-normalizing numbers, abbreviations, etc.
- Felix will work on that (probably based on MaryTTS stuff)
- finding the right length for subtitles (mostly syntactically motivated)
Good for whom/for what situation?
- hearing-impaired persons? → they won't know the congruence anyway
- [irrelevant discussion of the weather] → mark that, but still spell it out
- ask Benjamin Kuffel about his preferences/requirements?
- Antonia and Theresa will ask whether he's interested in an interview (and prepare the interview)
- turn lecturer's speech into a textual form → will improve anyway
- translation of subtitles is probably much easier than translation of transcripts
- searchability of a lecture → display of search results
- second-language listeners / dialectal differences → stronger congruence between timing/what is said and subtitles
- display subtitles to "normal" students → NOPE, they already use their visual channel for the slides