University of Hamburg - Computer Science - NATS - Ingo's home page

Programming stuff of Ingo Schröder



Overview

This page is about some programming I did. Please note that the status of the individual projects vary to a large degree. Some are just hacked together while others are carefully designed. Use at your own risk.


Constraint parser

Last updated 2000/12/14

This is by far my largest software project; it's the most ambigous and the most challenging. It's also the most complicated.

I started, designed and implemented major parts of the constraint parsing system of our project DAWAI. However, the system grew over time and a lot of people contributed to the code. Right now, we have more than 55,000 lines of C code and more than 27,000 lines of Tcl/Tk code (which I didn't write). Find more information on the software page of the DAWAI project.

ICOPOST: POS taggers

Last updated 2001/08/21

ICOPOST is a collection of free implementations of POS taggers. It has its own page.

A container library

Last updated 2000/12/14

A Container Library (ACL) is an off-spring of the CDG system. It is a C library for different data containers and consists of code covering the following items:
  • hash tables
  • lists
  • vectors
  • arrays
  • bitstrings
  • bitvectors
  • strings
  • memory management
The manual is available online. It works under UNIX (tested on Solaris and Linux) and Windows. Get an tar archive dated 2000-12-14 (acl-2000-12-14.tar.gz) but this (acl-latest.tar.gz) might be newer (not older).

Non deterministic tokenizer for natural language parsing

Last updated 1999/03/09

Nothing yet...


An experimental travelling sales person problem solver

Last updated 2001/02/21

tsp.c is a small C program to solve euclidic travelling sales person problems. It has the following search methods implemented: local search with node exchange, local search with edge exchange, dpeth first systematic search, nearest neighbor heuristics and successive extension heuristics. It's not well commented or tested. Be warned.

Two small Perl scripts may help you while experimenting: compute-distance.pl computes the distance of a given path and generate-euclidic-tsp.pl can generate random instances of the euclidic TSP problem.


PP-attachment learner

Last updated 2000/12/14

This is something I did in 1998 in order to better understand two approaches to the PP attachment problem: transformation-based learning and back-off models. In case you don't know the PP attachment problem it deals with question how one can automatically decide the question where a prepositional phrase should be attached to. For instance, in a sentence like He knocked the burglar with the telescope. the prepositional phrase with the telescope can belong to the noun burglar or to verb knocked.

The first program pp-tbl.pl re-implements the transformation-based learning method described in the article
Eric Brill and Philip Resnik: A Rule-Based Approach To Prepositional Phrase Attachment Disambiguation, COLING 94
It achieves an accuracy of 80.2%/79.2% (210/1780 rules) on the test set (500 instances) and 99.55% on the training set (12266 instances).

The second program pp-backed-off.pl is simpler and is described in the article
Michael John Collins and James Brooks, Prepositional Phrase Attachment through a Backed-Off Model, Proceedings of the 3rd Workshop on Very Large Corpora, 1995
It achieves an accuracy of 81.2% on the test set and 99.55% on the training set.

Both programs have been implemented in Perl with the emphasis on rapid prototyping and simplicity.


TeX hyphenation algorithm

Last updated 2002/03/08 (added archive only)

This program (hyphenation-0.0.8) re-implements (in C) the (La)TeX hyphenation algorithm which is described in The TeXbook by Donald Knuth, Appendix H. Although simple, it works quite well and processes more than 15,000 words per second.

I wrote this program just for myself. Don't expect well-documented source code or a user manual. Sorry. I'll try to answer mails though. Feel free to fix bugs ;-)

Here're some difficult German examples:
+Gas+ab+sperr+hahn
+Blu+men+top+fer+de
Ur+in+stink+t
A+tom
+Rhyth+mus
+Stau+ecken
erb+lich
Wachtraum
+Bein+hal+tung
    
Here's a link to a page where I describe an experiment in transformation-based learning (see PP attachment) of rules for hyphenation that we did with some students some time ago.

Creating images from tree descriptions

Last updated 2001/08/24

Phrase structure trees for natural language utterances can be visualized on demand of WWW requests. The logical description of the trees takes the following form (given in EBNF):
Tree         -> Node
Node         -> LeafNode
                |  InternalNode
LeafNode     -> Label
Label        -> sequence of letters, digits and underscore
InternalNode -> Label ( Nodes )
Nodes        -> Node
                |  Node Nodes
	
This script tree.pl can be used as a CGI program or from the shell command line. It uses the Perl module GD.pm which in turn capsulates the gd library. The following call created the image (PNG, GIF) to the right:
http://host.yourdomain.net
  /cgi-bin/tree.pl?
    S(NP(DET(ART(the))
         NBAR(AP(AG(A(blue)))
              NBAR(N(ball))))
      VP(VBAR(AUX1(AUX(is))
              VBAR(V(lying)))
         PP(PG(P(on))
            NP(DET(ART(the))
               NBAR(N(floor))))))
	



Creating images from NEGRA tree descriptions

Last updated 2001/08/24

The NEGRA corpus is a corpus of GERMAN newspaper text with a discontinuous phrase structure annotation.

This script negratree.pl uses the Perl module GD.pm which in turn capsulates the gd library and creates small PNG images (such as the one to the right) of the tree.

The following description belongs to the image (PNG, GIF) to the right:
Wege                    NN      --              NK      502
,                       $,      --              --      0
die                     PRELS   --              SB      501
im                      APPRART --              AC      500
Blauen                  NN      --              NK      500
enden                   VVFIN   --              HD      501
#500                    PP      --              MO      501
#501                    S       --              RC      502
#502                    NP      --              --      0
	



Converter from GIF to HTML

Last updated 2000/12/14

The program gif2html.c takes a GIF image as input and outputs an HTML page that looks like the image. This page (100KB) was made from the image to the right.

Here's the usage information: gif2html GIFFILE produces an HTML page with a giant table that mimics the image and gif2html GIFFILE TEXTFILE uses colored characters from TEXTFILE instead of a table. Here's an example for the second form (500KB).


Fractal images toolkit

Last updated 2000/12/14

I wrote a set of small programs that can generate different fractal images. The following page shows a zoom into the mandelbrot set as an animated GIF (ca. 1.1MB). In order to create this I wrote a small C program fractal-zoom.c that computed a sequence of 600 images; you'll need the GD library. I then used gifmerge to combine the single images into an animated GIF.

Here's a link to my Java fractal image generator.


Nine puzzle as CGI script

Last updated 1996/09/21

Phee, puzzle.pl is an old script that can run as a CGI script so that you can play the nine puzzle game in your browser. Be careful, I don't do security checks!
10 8 1 7
15 9 3 4
14   2 5
11 13 12 6

Something wrong with these web pages? Please mail me.
last revised 14/12/2000 Ingo Schröder
http://nats-www.informatik.uni-hamburg.de/~ingo/prg/