Topics for Lectures and Essays

Below you will find the topics for the lectures to be prepared and held by the participants. Note that on purpose most topics deal with techniques of NLP, not applications. The rational behind this decision was that most of the techniques can and actually are used within quite different applications. To develop a fundamental understanding of NLP solutions we should highlight these interconnections instead of hiding them behind the surface of different application scenarios.

Please select the topic you are most interested in and prepare a teaching-oriented presentation for it. If you want to propose a topic not listed below, don't hesitate to do so. If it meets general interest we can easily take it into consideration.

For each topic I have compiled a list of questions that could and should be answered by the lecture. These lists are by no means complete. You can adapt them to your individual preferences. But you should be aware that my proposals provide good intuition what somebody wants to know who is new in the field. One standard question which I did not include, but which is relevant for many topics is the one about the state-of-the-art in this particular subfield. To answer it you need to retrieve typical benchmark results from the literature.

Most of the literature recommendations refer to

  • Daniel Jurafsky and James H. Martin: Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd ed. (draft). Pearson education.

which can be found under https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.

Additional information about neural networks in general can be found in

  • Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning. MIT Press, 2016.

available under https://www.deeplearningbook.org/. Another good source of information about neural networks in NLP is Chris Mannings lecture CS224N: NLP with Deep Learning at Stanford University on youtube.

List of topics

  1. Classification
    • What's the purpose of a classification procedure?
    • What kind of data are necessary to train a classifier?
    • What are typical classification tasks in NLP?
    • What is a probability and a conditional probability?
    • How can probabilities be estimated?
    • What's a naive Bayesian classifier and how can it be trained?
    • What's a multi-layer perceptron and how can it be applied to a classification task?
    • How a multi-Layer perceptron can be trained?
    • How a classifier can be evaluated?
    • Starting point: Almost every book on machine learning or data mining
  2. Non-determinism and the string edit distance
    • What's the meaning of an algorithm being a non-deterministic one?
    • What kinds of algorithms are necessary to solve non-deterministic problems?
    • What are metrics for string similarity?
    • How the string edit distance can be computed?
    • Why the algorithm for computing the string edit distance a non-deterministic formulation is advantageous?
    • What are possible search strategies? Which one fits the problem best?
    • How expensive is the computation of the minimum string edit distance?
    • Can simplifying assumptions be made to reduce the search effort?
    • Can the error model be extended to consider different error probabilities?
    • What are typical applications for a string similarity metrics?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 2
  3. Finite-state models
    • What's the difference between a finite-state automaton and a finite-state transducer?
    • How can finite-state automata be represented (written down)?
    • What's the difference between deterministic and non-deterministic automata?
    • What kind of regularities can be modeled with finite-state automata?
    • What are the limitations of finite-state machines?
    • How expensive is computing with finite-state machines?
    • What are application areas for finite-state machines, e.g. in corpus data preparation?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 2
  4. Finite-state morphology
    • Which algebraic operations can be applied to finite-state machines?
    • What means to minimize a finite-state model?
    • Which morphological phenomena can be modeled with finite-state machines?
    • Which algebraic operations are used for this purpose?
    • How can the root-pattern morphology of semitic languages be modeled by finite-state techniques?
    • Starting points:
      • Daniel Jurafsky and James H. Martin: Speech and Language Processing, as above, but 2nd edition! chapt. 2+3. If you cannot find the 2nd edition, please ask me.
      • Kenneth R. Beesley, Arabic Morphology Using Only Finite-State Operations, in Proceedings Coling-1998 workshop on Computational Approaches to Semitic Languages.
  5. Markov Chains
    • What is a probability and a conditional probability?
    • How can probabilities be estimated?
    • What's a Markov chain?
    • What kinds of Markov chains can be distinguished?
    • Which ones are used for NLP?
    • What kind of information is captured by a Markov chain?
    • How can a Markov chain be used?
    • What are application areas for Markov chains?
    • What are limitations of Markov chains?
    • How can the probabilities of a Markov model be estimated?
    • What's Zipf's law? How does it affect the probability estimation?
    • What's smoothing and backoff?
    • How can the quality of a (probabilistic) language model be measured?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 3
  6. Hidden Markov models
    • What is a hidden Markov model?
    • What are typical applications of HMMs in NLP?
    • What are typical tasks that can be solved by means of an HMM?
    • What kinds of algorithms are required to solve these tasks?
    • What kind training data are required?
    • How can a hidden Markov-Model be trained?
    • What kind of algorithmic approaches are needed to solve the differents HMM tasks?
    • What are typical application areas of HMMs?
    • How do the models for tagging and speech recognition differ?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 8 and Appendix A
  7. Context-free grammars
    • How a context-free grammar is defined?
    • What kinds of context-free grammars can be distinguished?
    • What kind of information is captured by a context-free grammar?
    • How does a context-free grammar differ from a finite-state model?
    • How can a context-free grammar be used?
    • What are application areas of context-free grammars?
    • Are there context-free grammars for Ethiopean languages?
    • Have they been evaluated on real data? If so, how?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 12
  8. Chart parsing
    • What's parsing?
    • What makes parsing a non-deterministic procedure?
    • What's a chart?
    • What's the major benefit of a chart?
    • How to parse a sentence with a chart?
    • How expensive is chart-parsing?
    • How can different parsing strategies and algorithms be implemented with a chart?
    • How can a rule derivation probabilities be included into the parsing process?
    • How can the quality of parsing output be measured?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 13
  9. Dependency parsing
    • What's a dependency structure?
    • What are properties of a dependency structure?
    • Which kinds of dependency structures are used for NLP?
    • What are universal dependencies?
    • Which parsing algorithms are used for dependency parsing?
    • How expensive are they?
    • How can dependency parsing be extended to probabilistic models?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 14
  10. Learning of word representations
    • What's a vector representation for a word?
    • What's a multi-layer perceptron?
    • How to train a multi-layer perceptron?
    • How to apply multi-layer perceptrons to language modeling?
    • How to extract word representations from a multi-layer perceptron?
    • Which simplifying assumptions have been made for word2vec?
    • What are typical properties of word vector representations?
    • How can the quality of a word vector representation be measured?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 5 and 6
  11. Recurrent neural networks (RNN)
    • How can the network of word2vec be modified for n-gram language modeling?
    • How does a typical architecture of a recurrent neural language model look like?
    • What are the advantages of recurrent network models?
    • How can a recurrent model be extended to sequence-to-sequence transformation?
    • What are applications of sequence-to-sequence models?
    • What are the limitations of recurrent language models?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 9
  12. History-sensitive RNNs
    • What's the problem of vanishing gradient in recursive neural nets?
    • What's a long short-term memory (LSTM)? What's a gated recursive unit (GRU)?
    • What's a attention?
    • Why these network architectures are needed? Which benefits they provide?
    • How can they contribute to making a sequence-to-sequence model less opaque?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 5 and ch. 9
  13. Learning of sentence representations
    • What is self-attention?
    • How can self-attention be used for representation learning?
    • What are pretrained representations?
    • What are multi-layer architectures for representation learning?
    • Why are multi-layer architectures superior?
    • What are typical applications for pretrained representations?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 9 and ch. 10/11 (machine learning)
  14. Subword-based neural models
    • Which problem is addressed by training models based on subword-units?
    • How can neural models take morphological information into consideration?
    • What is byte-pair encoding? What is wordpiece encoding?
    • Can these approaches be used to deal with semitic languages?
    • Starting point: Stanford CS 224N, Lecture 12 (on youtube)
  15. Machine translation
    • What's the traditional architecture of rule-based and stochastic MT systems?
    • Which kind of architecture is used in neural MT?
    • What's the advantage of a neural architectures for MT?
    • How can MT systems be evaluated?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 10/11
  16. Semantic role labeling
    • What are semantic roles? Why are they needed?
    • Why are semantic roles called a shallow semantic representation?
    • Which sets of semantic roles are commonly used? What are their advantages and drawbacks?
    • How can semantic roles be assigned to sentence constituents?
    • Which other semantic representations do exist?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 19
  17. Coreference resolution
    • What's the task of coreference resolution?
    • What's the difference to entity linking?
    • Which applications can potentially profit from coreference resolution?
    • What kinds of referring expressions can be distinguished?
    • How are they related to their antecedents?
    • How are training data for coreference resolution annotated?
    • Which approaches to do coreference resolution do exist?
    • How reliable they are?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 21
  18. Question answering
    • What's the task of question answering?
    • How does it break down into subtasks?
    • Which approaches have been developed for question answering?
    • How can the quality of a question answering system be evaluated?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 23
  19. Neural network models for speech recognition
    • Which knowledge sources are combined for speech recognition?
    • What are neural architectures for speech processing?
    • How does speech processing differ from other sequence-to-sequence tasks?
    • Why compression is needed? Can it be trained?
    • Are there alternatives to compression?
    • Why a separate language model is needed?
    • How can a speech recognition system be evaluated?
    • Starting point: Jurafsky/Martin 3rd ed., ch. 26

-- WolfgangMenzel - 07 Apr 2022
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback