Corpus Linguistics

Kerstin Fischer, University of Bremen



On this page, I have put together some links for you where you can find additional information and publicly available tools and resources for doing corpus linguistics.

Concordancing:

The following is Michael Barlow's web site with free concordancing programs for many different kinds of corpora:

http://www.ruf.rice.edu/~barlow/corpus.html#search

Below is a selection of the links displayed on Michael Barlow's page; the following concordancing programs are free, quick, and easy to use:

CobuildDirect Corpus Sampler: http://titania.cobuild.collins.co.uk/form.html

British National Corpus Sample Queries: http://sara.natcorp.ox.ac.uk/lookup.html

Texts by `great authors': http://www.concordance.com/

More texts by `great authors': http://www.dundee.ac.uk/english/wics/wics.htm

Below is a link to an online concordancing program for business and personal letters, letters by historical figures, and various literary texts, as well as some journalistic texts. The search facility is very comfortable regarding left and right sorting, length of context, and display of the source:

http://isweb9.infoseek.co.jp/school/ysomeya/

Unfortunately, their server does not always answer.

The Verbmobil appointment scheduling dialogues (English, German, Japanese, Denglish (Germans speaking English), and translated German-English) can be queried at:

http://www.ims.uni-stuttgart.de/projekte/verbmobil/Dialogs/

At the same link you can also get information about word frequencies and the tag sets (which allow querying for syntactic patterns) used. The best search program is the advanced search, do not attempt to do what they call a linguistic search.

Another useful link is: http://www.webcorp.org.uk/index.html

where you can query the texts available in the internet with their URLs, that is, the corpus you are using is the world wide web itself. The most readable results you get when you have the results mailed to you by e-mail.



Further corpus linguistics links:

Texts, text centres, resources and programs on the Web, compiled by Knut Hofland: http://www.hd.uib.no/text.htm

Michael Barlow's page: http://www.ruf.rice.edu/~barlow/corpus.html

The EAGLES Text Corpora Working Group: http://www.ilc.pi.cnr.it/EAGLES96/tcwg.html

A corpus linguistic tutorial by C.N.Ball: http://www.georgetown.edu/cball/corpora/tutorial.html

Tim John's Data-Driven Learning Page: http://web.bham.ac.uk/johnstf/timconc.htm

Stanford University: http://www-nlp.stanford.edu/links/statnlp.html

Textmining page by Henrik Heine: http://nats-www.informatik.uni-hamburg.de/~henrik/textmining/

Schlobinski's commented link list: http://www.fbls.uni-hannover.de/sdls/schlobi/text-ton/korpora.htm

ICAME (an international organization of linguists and information scientists working with English corpora) page: http://www.hit.uib.no/icame.html



Further free corpora:

Trains Corpus: http://www.cs.rochester.edu/research/speech/93dialogs/

Online Speech Bank: http://www.americanrhetoric.com/speechbank.htm

English human-computer dialogues (e401, e403, e405, e406 are male; e402, e404, e407, and e408 are female speakers)

IViE Corpus http://www.phon.ox.ac.uk/~esther/ivyweb/Beta_Version.html



Automatic Corpus Annotation:

IntraText Service: http://www.intratext.com/SelfServer/



Class Notes

There are some tasks for getting started with corpus queries here. You can solve these tasks by using the freely available concordancing programs listed above.