Natural Language Processing
Outline
Any content on Natural Language Processing (NLP) will include models, formalisms and algorithms that can be used for development of systems for processing text in terms of both analysis and generation. Techniques include traditional grammar-based and the more recent statistical/corpus-based methods.
Topics
1 Introduction
- Applications of NLP techniques (MT, grammar checkers, dictation, document generation, NL interfaces)
- The different analysis levels used for NLP (morpho-lexical, syntactic, semantic, pragmatic)
- Recursive and augmented transition networks
2 Lexical level
- Error-tolerant lexical processing (spelling error correction)
- Transducers for the design of morphologic analyzers
- Towards syntax: Part-of-speech tagging (Brill, HMM)
- Efficient representations for linguistic resources (lexica, grammars,...): tries and finite-state automata
3 Syntactic level
- Grammars (e.g. Formal/Chomsky hierarchy, DCGs, systemic, case, unification, stochastic)
- Parsing (top-down, bottom-up, chart (Earley algorithm), CYK algorithm)
- Automated estimation of probabilistic model parameters (inside-outside algorithm)
4 Semantic level
- Semantic networks and parsers
5 Pragmatic level
6 Natural language generation
7 Other approaches
- statistical/corpus-based NLP
Prerrequisites
The course is designed to be self-sufficient. However, some previous experience with
probabilities and programming concepts such as
abstract data type or
computational complexity could be helpful for quick understanding of the formal parts.