Kerstin Fischer, University of Bremen
On this page, I have
put together some links for you where you can find additional
information and publicly available tools and resources for doing
corpus linguistics.
The following is Michael Barlow's web site with free concordancing programs for many different kinds of corpora:
http://www.ruf.rice.edu/~barlow/corpus.html#search
Below is a selection of the links displayed on Michael Barlow's page; the following concordancing programs are free, quick, and easy to use:
CobuildDirect Corpus Sampler: http://titania.cobuild.collins.co.uk/form.html
British National Corpus Sample Queries: http://sara.natcorp.ox.ac.uk/lookup.html
Texts by `great authors': http://www.concordance.com/
More texts by `great authors': http://www.dundee.ac.uk/english/wics/wics.htm
Below is a link to an online concordancing program for business and personal letters, letters by historical figures, and various literary texts, as well as some journalistic texts. The search facility is very comfortable regarding left and right sorting, length of context, and display of the source:
http://isweb9.infoseek.co.jp/school/ysomeya/
Unfortunately, their server does not always answer.
The Verbmobil appointment scheduling dialogues (English, German, Japanese, Denglish (Germans speaking English), and translated German-English) can be queried at:
http://www.ims.uni-stuttgart.de/projekte/verbmobil/Dialogs/
At the same link you can also get information about word frequencies and the tag sets (which allow querying for syntactic patterns) used. The best search program is the advanced search, do not attempt to do what they call a linguistic search.
Another useful link is: http://www.webcorp.org.uk/index.html
where you can query the texts available in the internet with their URLs, that is, the corpus you are using is the world wide web itself. The most readable results you get when you have the results mailed to you by e-mail.
Texts, text centres, resources and programs on the Web, compiled by Knut Hofland: http://www.hd.uib.no/text.htm
Michael Barlow's page: http://www.ruf.rice.edu/~barlow/corpus.html
The EAGLES Text Corpora Working Group: http://www.ilc.pi.cnr.it/EAGLES96/tcwg.html
A corpus linguistic tutorial by C.N.Ball: http://www.georgetown.edu/cball/corpora/tutorial.html
Tim John's Data-Driven Learning Page: http://web.bham.ac.uk/johnstf/timconc.htm
Stanford University: http://www-nlp.stanford.edu/links/statnlp.html
Textmining page by Henrik Heine: http://nats-www.informatik.uni-hamburg.de/~henrik/textmining/
Schlobinski's commented link list: http://www.fbls.uni-hannover.de/sdls/schlobi/text-ton/korpora.htm
ICAME (an international organization of linguists and information scientists working with English corpora) page: http://www.hit.uib.no/icame.html
Trains Corpus: http://www.cs.rochester.edu/research/speech/93dialogs/
Online Speech Bank: http://www.americanrhetoric.com/speechbank.htm
English human-computer dialogues (e401, e403, e405, e406 are male; e402, e404, e407, and e408 are female speakers)
IViE Corpus http://www.phon.ox.ac.uk/~esther/ivyweb/Beta_Version.html
IntraText Service: http://www.intratext.com/SelfServer/
There are some tasks for getting started with corpus queries here. You can solve these tasks by using the freely available concordancing programs listed above.