UHH>Informatik>NatS>Addis2022 Web>TopicsFoundations (29 May 2022, WolfgangMenzel) Print version

- Daniel Jurafsky and James H. Martin: Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd ed. (draft). Pearson education.

- Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning. MIT Press, 2016.

- Classification
- What's the purpose of a classification procedure?
- What kind of data are necessary to train a classifier?
- What are typical classification tasks in NLP?
- What is a probability and a conditional probability?
- How can probabilities be estimated?
- What's a naive Bayesian classifier and how can it be trained?
- What's a multi-layer perceptron and how can it be applied to a classification task?
- How a multi-Layer perceptron can be trained?
- How a classifier can be evaluated?
- Starting point: Almost every book on machine learning or data mining

- Non-determinism and the string edit distance
- What's the meaning of an algorithm being a non-deterministic one?
- What kinds of algorithms are necessary to solve non-deterministic problems?
- What are metrics for string similarity?
- How the string edit distance can be computed?
- Why the algorithm for computing the string edit distance a non-deterministic formulation is advantageous?
- What are possible search strategies? Which one fits the problem best?
- How expensive is the computation of the minimum string edit distance?
- Can simplifying assumptions be made to reduce the search effort?
- Can the error model be extended to consider different error probabilities?
- What are typical applications for a string similarity metrics?
- Starting point: Jurafsky/Martin 3rd ed., ch. 2

- Finite-state models
- What's the difference between a finite-state automaton and a finite-state transducer?
- How can finite-state automata be represented (written down)?
- What's the difference between deterministic and non-deterministic automata?
- What kind of regularities can be modeled with finite-state automata?
- What are the limitations of finite-state machines?
- How expensive is computing with finite-state machines?
- What are application areas for finite-state machines, e.g. in corpus data preparation?
- Starting point: Jurafsky/Martin 3rd ed., ch. 2

- Finite-state morphology
- Which algebraic operations can be applied to finite-state machines?
- What means to minimize a finite-state model?
- Which morphological phenomena can be modeled with finite-state machines?
- Which algebraic operations are used for this purpose?
- How can the root-pattern morphology of semitic languages be modeled by finite-state techniques?
- Starting points:
- Daniel Jurafsky and James H. Martin: Speech and Language Processing, as above, but 2nd edition! chapt. 2+3. If you cannot find the 2nd edition, please ask me.
- Kenneth R. Beesley, Arabic Morphology Using Only Finite-State Operations, in Proceedings Coling-1998 workshop on Computational Approaches to Semitic Languages.

- Markov Chains
- What is a probability and a conditional probability?
- How can probabilities be estimated?
- What's a Markov chain?
- What kinds of Markov chains can be distinguished?
- Which ones are used for NLP?
- What kind of information is captured by a Markov chain?
- How can a Markov chain be used?
- What are application areas for Markov chains?
- What are limitations of Markov chains?
- How can the probabilities of a Markov model be estimated?
- What's Zipf's law? How does it affect the probability estimation?
- What's smoothing and backoff?
- How can the quality of a (probabilistic) language model be measured?
- Starting point: Jurafsky/Martin 3rd ed., ch. 3

- Hidden Markov models
- What is a hidden Markov model?
- What are typical applications of HMMs in NLP?
- What are typical tasks that can be solved by means of an HMM?
- What kinds of algorithms are required to solve these tasks?
- What kind training data are required?
- How can a hidden Markov-Model be trained?
- What kind of algorithmic approaches are needed to solve the differents HMM tasks?
- What are typical application areas of HMMs?
- How do the models for tagging and speech recognition differ?
- Starting point: Jurafsky/Martin 3rd ed., ch. 8 and Appendix A

- Context-free grammars
- How a context-free grammar is defined?
- What kinds of context-free grammars can be distinguished?
- What kind of information is captured by a context-free grammar?
- How does a context-free grammar differ from a finite-state model?
- How can a context-free grammar be used?
- What are application areas of context-free grammars?
- Are there context-free grammars for Ethiopean languages?
- Have they been evaluated on real data? If so, how?
- Starting point: Jurafsky/Martin 3rd ed., ch. 12

- Chart parsing
- What's parsing?
- What makes parsing a non-deterministic procedure?
- What's a chart?
- What's the major benefit of a chart?
- How to parse a sentence with a chart?
- How expensive is chart-parsing?
- How can different parsing strategies and algorithms be implemented with a chart?
- How can a rule derivation probabilities be included into the parsing process?
- How can the quality of parsing output be measured?
- Starting point: Jurafsky/Martin 3rd ed., ch. 13

- Dependency parsing
- What's a dependency structure?
- What are properties of a dependency structure?
- Which kinds of dependency structures are used for NLP?
- What are universal dependencies?
- Which parsing algorithms are used for dependency parsing?
- How expensive are they?
- How can dependency parsing be extended to probabilistic models?
- Starting point: Jurafsky/Martin 3rd ed., ch. 14

- Learning of word representations
- What's a vector representation for a word?
- What's a multi-layer perceptron?
- How to train a multi-layer perceptron?
- How to apply multi-layer perceptrons to language modeling?
- How to extract word representations from a multi-layer perceptron?
- Which simplifying assumptions have been made for word2vec?
- What are typical properties of word vector representations?
- How can the quality of a word vector representation be measured?
- Starting point: Jurafsky/Martin 3rd ed., ch. 5 and 6

- Recurrent neural networks (RNN)
- How can the network of word2vec be modified for n-gram language modeling?
- How does a typical architecture of a recurrent neural language model look like?
- What are the advantages of recurrent network models?
- How can a recurrent model be extended to sequence-to-sequence transformation?
- What are applications of sequence-to-sequence models?
- What are the limitations of recurrent language models?
- Starting point: Jurafsky/Martin 3rd ed., ch. 9

- History-sensitive RNNs
- What's the problem of vanishing gradient in recursive neural nets?
- What's a long short-term memory (LSTM)? What's a gated recursive unit (GRU)?
- What's a attention?
- Why these network architectures are needed? Which benefits they provide?
- How can they contribute to making a sequence-to-sequence model less opaque?
- Starting point: Jurafsky/Martin 3rd ed., ch. 5 and ch. 9

- Learning of sentence representations
- What is self-attention?
- How can self-attention be used for representation learning?
- What are pretrained representations?
- What are multi-layer architectures for representation learning?
- Why are multi-layer architectures superior?
- What are typical applications for pretrained representations?
- Starting point: Jurafsky/Martin 3rd ed., ch. 9 and ch. 10/11 (machine learning)

- Subword-based neural models
- Which problem is addressed by training models based on subword-units?
- How can neural models take morphological information into consideration?
- What is byte-pair encoding? What is wordpiece encoding?
- Can these approaches be used to deal with semitic languages?
- Starting point: Stanford CS 224N, Lecture 12 (on youtube)

- Machine translation
- What's the traditional architecture of rule-based and stochastic MT systems?
- Which kind of architecture is used in neural MT?
- What's the advantage of a neural architectures for MT?
- How can MT systems be evaluated?
- Starting point: Jurafsky/Martin 3rd ed., ch. 10/11

- Semantic role labeling
- What are semantic roles? Why are they needed?
- Why are semantic roles called a shallow semantic representation?
- Which sets of semantic roles are commonly used? What are their advantages and drawbacks?
- How can semantic roles be assigned to sentence constituents?
- Which other semantic representations do exist?
- Starting point: Jurafsky/Martin 3rd ed., ch. 19

- Coreference resolution
- What's the task of coreference resolution?
- What's the difference to entity linking?
- Which applications can potentially profit from coreference resolution?
- What kinds of referring expressions can be distinguished?
- How are they related to their antecedents?
- How are training data for coreference resolution annotated?
- Which approaches to do coreference resolution do exist?
- How reliable they are?
- Starting point: Jurafsky/Martin 3rd ed., ch. 21

- Question answering
- What's the task of question answering?
- How does it break down into subtasks?
- Which approaches have been developed for question answering?
- How can the quality of a question answering system be evaluated?
- Starting point: Jurafsky/Martin 3rd ed., ch. 23

- Neural network models for speech recognition
- Which knowledge sources are combined for speech recognition?
- What are neural architectures for speech processing?
- How does speech processing differ from other sequence-to-sequence tasks?
- Why compression is needed? Can it be trained?
- Are there alternatives to compression?
- Why a separate language model is needed?
- How can a speech recognition system be evaluated?
- Starting point: Jurafsky/Martin 3rd ed., ch. 26

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Foswiki? Send feedback

Ideas, requests, problems regarding Foswiki? Send feedback