Phonemisation is the task of turning text into sound sequences (grapheme-to-phoneme conversion, G2P). Phonemisation is the linking point between text and speech. While dictionaries can be used, they are never complete as new words are created all the time. One crucial aspect is the build-up of words, i.e., their morphology. Morpheme boundaries play an important role in phonemisation: e.g. German "Comicheldin" vs. "Comic-Heldin" (highlight preceding word to see it).

Your task is to analyze G2P performance without and with morpheme boundaries for English, to create morpheme chunking for German (using morfessor or similar software), and to study the effect of learned boundaries in G2P. The proposed tool for G2P is Sequitur G2P, for training a morpheme model you could use Morfessor. As a lexicon for training the g2p model you should use CMUdict (we also have a German dictionary available on request).

-- TimoBaumann - 06 Apr 2016
