+ Finite State Query
The canonical tool to search the NEGRA corpus and others in its format
is TIGERSearch. Unfortunately it has many limitations, such as a weird
and useless concept of precedence and the lack of existential
quantors.
An alternative tool is
fsq (Finite State
Query) which surpasses all other available corpus queriers by
providing full first-order logic. This means that you can search for
things such as "How many clauses have no subject?". Here's how to
express this query:
(E x (&
(cat x S)
(! (fct x CJ))
(E y (& (> x y) (fct y HD) (cat y VVFIN)))
(! (E y (& (> x y) (fct y SB))))))
That is, find
[S]
nodes that are not co-ordinated, have a
finite full verb as the head, and no subject.
Unfortunately, the advantages end here. FSQ has a bizarre lisplike
query syntax that can only appeal to me. Results are written to the
filesystem as a list of sentence numbers. No easy interoperation with
a tree viewer seems possible. No hint of the variable bindings is
given (i.e., which of the
[S]
phrases in that sentence lacks the
subject). The GUI is written in java. Incorrect queries yield
unhelpful java exceptions.
Still, for the queries that do need its power, it is useful to have
FSQ around. Call it as /opt/bin/fsq, click on "Corpus" (it looks like
a textbox label, but is a button), select
negra-corpus.cdat
, click
"Form::New" and compose your query.
You can run the system in batch mode like this:
/opt/bin/fsq-batch '(E x (tok x Schnee)'
Documentation:
User's manual,
EACL report.
--
KilianAFoth - 22 Apr 2003