Corpus Linguistics

Kerstin Fischer, University of Bremen

On this page, I have put together some links for you where you can find additional information and publicly available tools and resources for doing corpus linguistics.


The following is Michael Barlow's web site with free concordancing programs for many different kinds of corpora:

Below is a selection of the links displayed on Michael Barlow's page; the following concordancing programs are free, quick, and easy to use:

CobuildDirect Corpus Sampler:

British National Corpus Sample Queries:

Texts by `great authors':

More texts by `great authors':

Below is a link to an online concordancing program for business and personal letters, letters by historical figures, and various literary texts, as well as some journalistic texts. The search facility is very comfortable regarding left and right sorting, length of context, and display of the source:

Unfortunately, their server does not always answer.

The Verbmobil appointment scheduling dialogues (English, German, Japanese, Denglish (Germans speaking English), and translated German-English) can be queried at:

At the same link you can also get information about word frequencies and the tag sets (which allow querying for syntactic patterns) used. The best search program is the advanced search, do not attempt to do what they call a linguistic search.

Another useful link is:

where you can query the texts available in the internet with their URLs, that is, the corpus you are using is the world wide web itself. The most readable results you get when you have the results mailed to you by e-mail.

Further corpus linguistics links:

Texts, text centres, resources and programs on the Web, compiled by Knut Hofland:

Michael Barlow's page:

The EAGLES Text Corpora Working Group:

A corpus linguistic tutorial by C.N.Ball:

Tim John's Data-Driven Learning Page:

Stanford University:

Textmining page by Henrik Heine:

Schlobinski's commented link list:

ICAME (an international organization of linguists and information scientists working with English corpora) page:

Further free corpora:

Trains Corpus:

Online Speech Bank:

English human-computer dialogues (e401, e403, e405, e406 are male; e402, e404, e407, and e408 are female speakers)

IViE Corpus

Automatic Corpus Annotation:

IntraText Service:

Class Notes

There are some tasks for getting started with corpus queries here. You can solve these tasks by using the freely available concordancing programs listed above.