Corpus Preparation and Corpus Methods
Querying corpora: Getting started
John Bateman and Kerstin Fischer
17 April, 2002
1. Try answering some of the questions below by means of corpus queries:
What's wrong with `allow to investigate...'?
Is `would' in if-clauses always wrong?
What's the spelling of the Italian vegetable beginning with `bro...'?
Is it `wholistic/wholism' or `holistic/holism'?
Is `disinterested' or `uninterested' correct?
Is it `a book/paper/article/lecture on' or `about' something?
What is the difference between 'sheer' and 'pure'? Some answers may be available here.
(What you should consider in answering such questions:
What does the question involve, what could the problems be?
What could be a suitable corpus or suitable corpora for answering this query?
What kinds of topics should my corpus deal with? For instance, dialogues on appointment scheduling may not be particularly useful for answering questions about vegetable.
How old should the texts in my corpus be? For instance, can Shakespeare's plays help me determining whether to write `wholism' or `holism'?
Is written or spoken language more suitable or is it of no relevance? For instance, `oh' may have very different functions in spoken and written language, whereas checking a transcribed corpus for the spelling of a particular word may tell us only something about transcription conventions, not about language use.
Do I need frequency information? In this case a concordance program that yields only 40 examples may not be useful.
Do I need syntactic information? In this case I may want to choose a tagged corpus.
How should I formulate my query? For instance, in the case of different spelling, it may be useful to ask several queries or to use a query syntax that allows alternatives.
How valid are my results, is there a good reason to suppose that querying more corpora would be necessary for a founded answer?)
Now try to formulate some questions yourself.