Pos | Name | ID | Membership Year(s) | Distribution | Non-Member Price | Short Description |
---|---|---|---|---|---|---|
1 | AclDciCorpus | LDC93T1 | 1993 | 1 CD | $100 | grammatically tagged and parsed materials from the Penn Treebank |
2 | TipsterCorpus | LDC93T3A | 1993 | 3 CDs | $250 | data similar to MUC, sgml, Q&A, information retrieval |
3 | CelexLexicon | LDC96L14 | 1995, 1996 | 1 CD | $100 | lexical databases of English (41,000 lemmas) , Dutch, and German (51,728 lemmas) |
4 | MucSixAddonCorpus | LDC96T10 | 1996 | download | $100 | additional training data, tagged, but not annotated, needs Muc6Corpus for replication of evaluation results |
5 | BllipCorpus | LDC2000T43 | 2000 | 2 CDs | $100 | statistically parsed text from the WSJ by Charniak parser |
6 | MucSevenCorpus | LDC2001T02 | 2001 | download | $100 | evaluation data for the message understanding conference no. 7 |
7 | MucSixCorpus | LDC2003T13 | 2003 | download | $100 | 318 annotated Wall Street Journal articles used in MUC6 evaluation |
8 | AceTwoCorpus | LDC2003T11 | 2003 | download | $500 | automatic content extraction, information detection |
9 | EnglishGigawordCorpus | LDC2003T05 | 2003 | 1 DVD | $2500 | |
10 | SaidCorpus | LDC2003T10 | 2003 | download | $200 | A Syntactically Annotated Idiom Dataset |
11 | AmericanNationalCorpus | LDC2003T20 | 2003 | 1 CD | $75 | first release, see also BritishNationalCorpus |