CorpusSurvey

Interesting Products of the LinguisticDataConsortium

Pos Name ID Membership Year(s) Distribution Non-Member Price Short Description
1 AclDciCorpus LDC93T1 1993 1 CD $100 grammatically tagged and parsed materials from the Penn Treebank
2 TipsterCorpus LDC93T3A 1993 3 CDs $250 data similar to MUC, sgml, Q&A, information retrieval
3 CelexLexicon LDC96L14 1995, 1996 1 CD $100 lexical databases of English (41,000 lemmas) , Dutch, and German (51,728 lemmas)
4 MucSixAddonCorpus LDC96T10 1996 download $100 additional training data, tagged, but not annotated, needs Muc6Corpus for replication of evaluation results
5 BllipCorpus LDC2000T43 2000 2 CDs $100 statistically parsed text from the WSJ by Charniak parser
6 MucSevenCorpus LDC2001T02 2001 download $100 evaluation data for the message understanding conference no. 7
7 MucSixCorpus LDC2003T13 2003 download $100 318 annotated Wall Street Journal articles used in MUC6 evaluation
8 AceTwoCorpus LDC2003T11 2003 download $500 automatic content extraction, information detection
9 EnglishGigawordCorpus LDC2003T05 2003 1 DVD $2500  
10 SaidCorpus LDC2003T10 2003 download $200 A Syntactically Annotated Idiom Dataset
11 AmericanNationalCorpus LDC2003T20 2003 1 CD $75 first release, see also BritishNationalCorpus

Others

Related Topics: IngosCorporaMail, HeiseGrammar, ExternalLinks, WeightedConstraintDependencyGrammars
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback