This (incomplete, constantly extended/revised) list contains your ideas for exam questions (slightly edited) with my comments as to how well I think they cover the contents of the course.

01. Speech as a Communication System

  • Describe the Shannon-Weaver model of communication.
    • the sort of factual knowledge that I may ask about the contents of the first lecture;
  • Give a practical example of Shannon-Weaver model of communication and it's effectiveness in modelling the communication process.
    • I would probably split the question in two and ask the second part only if your first answer relates to speech communication. * Give an example of layered communication that you know and specify at what layer speech processing is involved.
    • I will probably also ask about other processing areas/modules of oral communication beyond "speech processing". e.g., I might ask about the many layers and modules that are involved in determining who is to speak when in a dialogue. (We haven't covered this yet in the first lecture.)
  • Give an example of oral mis-communication and indicate what language domains (phonology, morphology, syntax, semantics or pragmatics) that error belongs to.
  • Where can errors come from during the communication process? (referring to the scenario when A was talking to B to get the saltshaker)
  • Welche Faktoren können in einer Kommunikation stören?
    • This are good questions. Based on the example that you give, I might ask you to elaborate how error recovery could work, what domains could help to fix the error, and ask about "partial" errors.
  • What do you know about speech processing? does it includes text processing? (this should be an open question, well, I think someone might ask)
    • This question could go into two directions. Either we talk about the differences or the communalities of speech and text processing. Of course, text is an execellent representation for linguistic information. Thus, in applications, many kinds of text processing techniques will often be appropriate in a speech-based application.
  • What are the different conventional divisions of linguistics? and what divisions are not covered? (e.g prosody)
    • yes, more on this in the upcoming lecture
  • Wie sieht das linguistische Modell von Descartes aus und mit welchem Modell aus der Informatik ist es verwandt?
    • I may ask about the scientific principle as keyed by Descartes and how it relates to the common subdivision of linguistics. But please note that Descartes was a philosopher and mathematician, not a linguist!
  • Wie sehen die Wissensformen nach Bloom aus?
    • well, this would be a question about you remembering a fact (according to Bloom's taxonomy). I am more likely to ask about the two dimensions (not their names but what they describe) and what kinds of knowledge you acquired most in the class (in which kinds of knowlegde you like to acquire most). Notice that these questions are related to you valuating your procedural knowledge of learning, rather than remembering facts of learning theory. Later in the semester, we may also relate both dimensions to kinds of processes used in speech and language processing.

02. Spoken Dialogue, a Complex interactive System

  • Why is the simple model of communication insufficient for modelling dialogue and what extensions are required?
    • and then try to order extensions by how much we need them to get towards dialogue
  • describe the general architecture of the chain model of communication.
    • nice, could be answered by drawing and then later be used as reference during later stages of the exam
  • What are key aspects of dialogue?
    • I don't know if I would ask this question as it is really open-ended. I would probably rather ask how to differentiate dialogue from other means of communication (such as meetings, interviews or e-mailing). Do you think chatting is dialogue or not?
  • What can go wrong in human communication/interaction?
    • and what methods are there to avoid full chaos?
  • How do we use a single (acoustic) channel for dialog?
    • and would there be alternatives?
  • What is turn-taking?
    • sufficiently narrow and at the same time open enough for you to steer me into a comfortable direction
  • what turn-taking signals are there?
    • ... and on what layer(s) of communication can they be placed?
  • Why is turn-taking important?
  • How do systems manage turn-taking? Why is it hard to model?
  • What happens when a dialog system takes very long to respond? What's the "right" time for a dialog system to wait before it takes the turn?
  • What are common indicators to know whether the speaker's turn has ended? What would be important linguistic categories involved in making this decision?
    • I like these questions, first asking about phenomena and then asking to group them.
  • key differences between pipeline-based and blackboard-based architectures; strengths and weaknesses
    • I'll probably use the terms "reductionist" and "connectionist"
  • describe a simple dialogue system/simplest model of dialog system
    • e.g. pipeline-based, what modules are involved/common, what are their interfaces
    • where would you locate "turn-taking" in such a system?
  • what are pipeline-based or blackboard-based architectures? How do they compare, what are strengths, weaknesses, and limits?
  • what is a dialogue manager?
    • please also refer to the corresponding student presentations in your answer smile
  • what do the terms attraction and emergence mean in a complex system?
  • what differenciates dialog from "simple" one-way speech utterances?
    • context, of course. but on what various levels can the context be classified?
  • can a pipeline-based agent communicate with a blackboard-based one?

03. Acoustic Phonetics

  • Describe the difference between Phonetics and Phonology. Is the distinction important?
  • Give an overview how sounds (phones) are produced with our vocal organs. Is there a difference between vowels and consonants? What are voiced in contrast to un-voiced sounds?
  • How can vowel frequency be analyzed (formants)?
  • What does phonemic system of a language mean? What differences are there between languages? What is Phonotactics?
  • What does a speech signal look like (components)? How can you analyze it? How does Fourier Synthesis work? What does a Spectrogram show?
  • phones/phonemes/phonotactics: what do we have to take into account when we are trying to describe speech? (oder so)
  • How does a human/system process audio/speech information?

04. Speech Synthesis (the TTS Problem)

  • What is the main problem/challenge of speech synthesis? Which information is lost in written language?
  • What is the goal Information Structure? What are its main notions (topic, focus, given/new information)?
  • Can you name some suprasegmental properties of speech? How is accentuation related to focus? What options do computer systems have?
  • Sketch a process diagram of speech synthesis and explain the main components.
  • Which different approaches are there for Waveform Synthesis (formant-based, pattern-based, model-based)? What would be an example for pattern-based synthesis (Diphone Synthesis)? Advantages and disadvantages?
  • What are challenges when synthesizing speech? How can we try to solve them?
  • the process diagram of speech synthesis
    • also be able to discuss alternatives, such as leaving out modules or connecting them differently

05. Speech Parametrization

  • What is meant by primary signal and vocal filtering?
    • Yes. I may even dig a little deeper into speech production/the ways of articulation/the kinds of primary signal (at least in the model), and so on
    • How can we separate the two in a speech signal and why is this useful? -> reasonable followup question
  • What is Mel-binning and how does it work?
    • I'll have to trust you to read up on some details, e.g. in Taylor (2009).
  • What are potential problems with the sliding window technique (for detecting phonemes)? Are there alternatives?
    • the interesting followup could be: why doesn't anyone use alternatives?
  • Explain how humans produce speech sounds
    • yes and then see next question.
  • Why is phonemic information largely contained in lower quefrency components?
    • very specific question that begs for an answer that explains cepstral processing and requires understanding about what differentiates phonemic information from other parts of the speech signal (irrelevant or not). Good!
  • What are MFCCs, explain the pros and cons
    • I do expect you to know the processing steps but discussing the reasons why they are (i.e., understanding the topic) is more important
  • Are MFCCs optimal features?
    • if so: optimal to what criteria, does this optimality (or any other) matter in real life?

06. Speech Recognition Decoding

  • State Bayes' Theorem and explain why it's important
    • in particular: can you describe the parts P(O|W) and P(W)? Why is P(O) less important and are there implications to dropping it?
  • What are Hidden Markov Models
  • What are the approaches for Search problem for Hidden Markov model?
    • I think I would split this in two: why is the HMM a good model for speech recognition (it reduces data sparsity and search space issues) and how is the search organized.
  • What does P(W|O) mean and how do you solve the problem associated with it?
    • I'll probably rater ask about the fundamental problem in speech recognition, why it is problematic, and how Bayes'ian reasoning helps.
  • What is the difference between word error rate and sentence error rate? What does this mean? If WER is more suitable than SER, is it also "optimal"?
  • Describe the token pass algorithm.
  • Describe strategies for search space limitation.
  • what is the main difference between recognition and synthesis?
    • the obvious answer is of course not interesting. What I mentioned in class is that recognition is a process of reducinig a very rich ambiguous signal to a parametric representation (e.g. text) and speech synthesis is the opposite: you start from text (at least in text-to-speech) and want to come up with a rich natural signal.
  • How can we measure the confidence/certainty that a recognizer has in its own results?
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback