+ Finite State Query

The canonical tool to search the NEGRA corpus and others in its format is TIGERSearch. Unfortunately it has many limitations, such as a weird and useless concept of precedence and the lack of existential quantors.

An alternative tool is fsq (Finite State Query) which surpasses all other available corpus queriers by providing full first-order logic. This means that you can search for things such as "How many clauses have no subject?". Here's how to express this query:

(E x (& 
      (cat x S) 
      (! (fct x CJ)) 
      (E y (& (> x y) (fct y HD) (cat y VVFIN))) 
      (! (E y (& (> x y) (fct y SB))))))

That is, find [S] nodes that are not co-ordinated, have a finite full verb as the head, and no subject.

Unfortunately, the advantages end here. FSQ has a bizarre lisplike query syntax that can only appeal to me. Results are written to the filesystem as a list of sentence numbers. No easy interoperation with a tree viewer seems possible. No hint of the variable bindings is given (i.e., which of the [S] phrases in that sentence lacks the subject). The GUI is written in java. Incorrect queries yield unhelpful java exceptions.

Still, for the queries that do need its power, it is useful to have FSQ around. Call it as /opt/bin/fsq, click on "Corpus" (it looks like a textbox label, but is a button), select negra-corpus.cdat, click "Form::New" and compose your query.

You can run the system in batch mode like this:

/opt/bin/fsq-batch '(E x (tok x Schnee)'

Documentation: User's manual, EACL report.

-- KilianAFoth - 22 Apr 2003
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback