Projekt Sprachtechnologie

Formalia

  • Instructors: User.WolfgangMenzel, User.TimoBaumann, User.ArneKoehn
  • Mixed project (MSc/BSc)
  • Thursdays, 12:30-17:45 (lunch break 13:15-13:45, another 15' break)
  • F-429 (and F-432), we'll usually start in F-429 and then continue in the Lab
  • grading based on active participation, spot talks, and final project report, self management in the group
  • Anwesenheitsliste

Proposed project outline

Subtitles for videos are crucial for the hard-of-hearing and can also help normal listener in noisy environments, when consuming video in a foreign language, or to digest complex descriptions. Automatic subtitling requires multiple steps (speech recognition, text preparation and segmentation, time-alignment of text) for result in meaningful subtitles. In a previous project we have worked on time-alignment of text, in this project we would like to focus on the text preparation and segmentation based on speech recognition output.

Aspects of good and bad subtitles

  • presence of hesitations/fillers
  • repairs/hesitations
    • structure: reparandum, repair marker (quatsch/ähm/falsch/...), reparans/alteration
    • different types
      • modification
      • fresh start of sentence
      • 'abridged repairs' (repair marker only)
    • approaches
    • mark reparandum instead of deleting it?
  • "de-normalize" numbers (precondition for text-processing software to work correctly)
  • chunking of subtitles wrt. syntactical chunking
    • focus on sentence structure instead of speech timing
    • rephrase/simplify/"finalize" the structure
  • punctuation (in particular sentence boundaries)
  • length of subtitles
    • one "idea" per subtitle
    • prefer one-line subtitles (what if the "idea" is too long)
    • cut too long lines into two lines
  • use colors to mark importance? → helpful for debugging?
  • precise timing of subtitles → not so important as it's not important for our target audience(s)

Literature and other links

Various Documents

Timeline

Date WhatSorted descending Who
14.7. Presentation everyone
19.5. Pfingstferien everyone
5.5 parser in old project should be fixed / replaced Michael
28.4 Infrastructure: Repo + Redmine ?
5.5 Get an evaluation framework running with videos + subtitles Jasper?
28.4 Everyone adds a milestone for their workgroup everyone
30.6 Buffer + Evaluation Phase everyone

Status 2016-05-12 (before the session)

  • post-processing with Language Models (Kolja): in progress, but unsure whether fixable
  • post-processing with LSTMs (Theresa): in progress, not working yet
  • de-normalization (Felix): using MaryTTS stuff, works well; next: multi-token abbreviations
  • extend the main readme of the prosub code (Felix): works
  • chunkify module in Python (Michael): works just partially (for English); searches optimal solution
  • chunkify module in Java (Tore): works just partially (for German); doesn't find optimal solution?
  • evaluation framework (Björn): web evaluation framework is running
  • repairs/fillers (Khooshal): repairs, simple algorithms work, context-ones more difficult (German)

Workgroups

What who
Fillers + Repairs Khooshal, Wladimir
Punctuation, Capitalization Kolja, Theresa
Denormalization Felix?
Framework, old project (Parser ...) Michael, Felix
Chunking Tore, Antonia
Optional: shortening subtitles, tagging meta-information (audience questions)

Work items towards good subtitles

Good for whom/for what situation?
  • hearing-impaired persons? → they won't know the congruence anyway
    • [irrelevant discussion of the weather] → mark that, but still spell it out
    • ask Benjamin Kuffel about his preferences/requirements?
      • Antonia and Theresa will ask whether he's interested in an interview (and prepare the interview)
  • turn lecturer's speech into a textual form → will improve anyway
    • translation of subtitles is probably much easier than translation of transcripts
  • searchability of a lecture → display of search results
  • second-language listeners / dialectal differences → stronger congruence between timing/what is said and subtitles
  • display subtitles to "normal" students → NOPE, they already use their visual channel for the slides

 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback