Acoustic models are at the core of speech recognition (and modern speech synthesis) technology. Models are trained on large amounts of data. While "cleaner" data is better, the key insight is that "there is no better data than more data". Data from the Spoken Wikipedia is available upon request.

Your task is to train acoustic models for speech recognition and to investigate the effects of training data size and quality on speech recognition performance. Use either SphinxTrain or Kaldi. No programming is required but good command of Linux/POSIX tools is crucial. You will likely need to use our computing cluster via SSH.

Datasets: http://www.openslr.org/, http://www.voxforge.org/

-- TimoBaumann - 06 Apr 2016
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback