Projekt Sprachtechnologie


  • Instructors: User.WolfgangMenzel, User.TimoBaumann, User.ArneKoehn
  • Mixed project (MSc/BSc)
  • Thursdays, 12:30-17:45 (lunch break 13:15-13:45, another 15' break)
  • F-429 (and F-432), we'll usually start in F-429 and then continue in the Lab
  • grading based on active participation, spot talks, and final project report, self management in the group
  • Anwesenheitsliste

Proposed project outline

Subtitles for videos are crucial for the hard-of-hearing and can also help normal listener in noisy environments, when consuming video in a foreign language, or to digest complex descriptions. Automatic subtitling requires multiple steps (speech recognition, text preparation and segmentation, time-alignment of text) for result in meaningful subtitles. In a previous project we have worked on time-alignment of text, in this project we would like to focus on the text preparation and segmentation based on speech recognition output.

Aspects of good and bad subtitles

  • presence of hesitations/fillers
  • repairs/hesitations
    • structure: reparandum, repair marker (quatsch/ähm/falsch/...), reparans/alteration
    • different types
      • modification
      • fresh start of sentence
      • 'abridged repairs' (repair marker only)
    • approaches
    • mark reparandum instead of deleting it?
  • "de-normalize" numbers (precondition for text-processing software to work correctly)
  • chunking of subtitles wrt. syntactical chunking
    • focus on sentence structure instead of speech timing
    • rephrase/simplify/"finalize" the structure
  • punctuation (in particular sentence boundaries)
  • length of subtitles
    • one "idea" per subtitle
    • prefer one-line subtitles (what if the "idea" is too long)
    • cut too long lines into two lines
  • use colors to mark importance? → helpful for debugging?
  • precise timing of subtitles → not so important as it's not important for our target audience(s)

Literature and other links

Various Documents


Date What Who
28.4 Everyone adds a milestone for their workgroup everyone
28.4 Infrastructure: Repo + Redmine ?
5.5 Get an evaluation framework running with videos + subtitles Jasper?
5.5 parser in old project should be fixed / replaced Michael
19.5. Pfingstferien everyone
30.6 Buffer + Evaluation Phase everyone
14.7. Presentation everyone

Status 2016-05-12 (before the session)

  • post-processing with Language Models (Kolja): in progress, but unsure whether fixable
  • post-processing with LSTMs (Theresa): in progress, not working yet
  • de-normalization (Felix): using MaryTTS stuff, works well; next: multi-token abbreviations
  • extend the main readme of the prosub code (Felix): works
  • chunkify module in Python (Michael): works just partially (for English); searches optimal solution
  • chunkify module in Java (Tore): works just partially (for German); doesn't find optimal solution?
  • evaluation framework (Björn): web evaluation framework is running
  • repairs/fillers (Khooshal): repairs, simple algorithms work, context-ones more difficult (German)


What who
Fillers + Repairs Khooshal, Wladimir
Punctuation, Capitalization Kolja, Theresa
Denormalization Felix?
Framework, old project (Parser ...) Michael, Felix
Chunking Tore, Antonia
Optional: shortening subtitles, tagging meta-information (audience questions)

Work items towards good subtitles

Good for whom/for what situation?
  • hearing-impaired persons? → they won't know the congruence anyway
    • [irrelevant discussion of the weather] → mark that, but still spell it out
    • ask Benjamin Kuffel about his preferences/requirements?
      • Antonia and Theresa will ask whether he's interested in an interview (and prepare the interview)
  • turn lecturer's speech into a textual form → will improve anyway
    • translation of subtitles is probably much easier than translation of transcripts
  • searchability of a lecture → display of search results
  • second-language listeners / dialectal differences → stronger congruence between timing/what is said and subtitles
  • display subtitles to "normal" students → NOPE, they already use their visual channel for the slides

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback