Acoustic models are at the core of speech recognition (and modern speech synthesis) technology. Models are trained on large amounts of data. While "cleaner" data is better, the key insight is that "there is no better data than more data". Data from the Spoken Wikipedia is available upon request.
Your task is to train acoustic models for speech recognition and to investigate the effects of training data size and quality on speech recognition performance. Use either
SphinxTrain or
Kaldi. No programming is required but good command of Linux/POSIX tools is crucial. You will likely need to use our computing cluster via SSH.
Datasets:
http://www.openslr.org/,
http://www.voxforge.org/
--
TimoBaumann - 06 Apr 2016