The main goal of the DEREKO corpus is to provide a large general purpose resource for the German language. A linguist using such a resource will expect detailed yet reliable information. The state of the art in syntactic annotation, however, shows that beyond the syntactic level of chunks, automatic syntactic annotation has to deal with rapidly increasing ambiguity, and consequently, the quality of automatic annotation declines.
For DEREKO, a finite-state approach to parsing was adopted, solving both the problems of speed and accuracy outlined above. Finite-state grammars can be applied efficiently, so that huge volumes of text can be processed quickly. Second, the phenomena that can be described by finite-state grammars coincide with those syntactic phenomena that are only moderately ambiguous. Annotation using finite-state grammars is still very useful for linguistic research, as the overall syntactic ambiguity is reduced, and further annotation can take direct advantage of it. Please browse the documentation for details about the linguistic markup, and details about implementing the robust and efficient annotation system.