Acoustic models are at the core of speech recognition (and modern speech synthesis) technology. Models are trained on large amounts of data. While "cleaner" data is ...
This (incomplete, constantly extended/revised) list contains your ideas for exam questions (slightly edited) with my comments as to how well I think they cover the ...
Incremental processing has its set of evaluation metrics to measure not just the overall processing quality but also the temporal aspects of processing. (It may be ...
Language models are needed to describe the probability of words following each other (select previous word to see). You will use tools (SRILM) to build N gram models ...
Phonemisation is the task of turning text into sound sequences (grapheme to phoneme conversion, G2P). Phonemisation is the linking point between text and speech. While ...
Pitch tracking is the task of determining the fundamental frequency of the speech sounds in vocalisation. A speaker's pitch changes over time and some stretches of ...
Significance testing is crucial to determine whether the results of two different recognizer settings/recognizers/training conditions are just randomly different or ...
Decoding is the task of applying speech recognition models to a speech stream (audio file, microphone, ...). Common recognizers for which we have models readily available ...
You will find and test tools for visualizing speech sounds and speech samples from larger corpora. Your primary outcome will not be an evaluation of various techniques ...
Speech parametrization reduces the dimensionality of the speech signal to be manageable. Based e.g. on the tool SimpleFFT, you design an interactive tutorial that ...