$Id: index.html 14 2006-12-05 15:21:13Z nldata $
NAIST Text Corpus
(version 1.5)

2010-08-23(Mon) Changed converted format for ease of use. Instead of XML format, we adopted KyotoCorpus- and CabochaOutoput-style formats.
Added tags of anaphoric relations of determiners and pronouns.
Added tags of noun classes of event nouns.
2010-08-23(Mon) Added tags of anaphoric relations of determiners and pronouns.
2006-11-20(Mon) Fixed crossing elements in XML.
2006-10-17(Tue) Added description of Instruction of converting rawdata into XML format to README.
2006-10-6(Fri) First beta release of predicate-argument and coreference relations tagged corpus.

Fill out the form below and press submit button and download starts immediately.

Name
Affiliation
E-mail

We annotated the same portion of Mainichi Shimbun Newspaper, which is used for Kyoto Text Corpus. It contains all articles (ca. 20,000 sentences) which start from 1 January 1995 and end with 17 January 1995, and all editorial articles (ca. 20,000 sentences) from January to December. We annotated predicate-argument relation (surface case: nominative, accusative, and dative cases), event noun and its relation (surface case: nominative, accusative, and dative cases), and coreference information to the corpus.

References


Instruction of converting rawdata into KyotoCorpus- or CabochaOutput-style format

We only distribute tag information and their offsets. To obtain KyotoCorpus- or CabochaOutput-formatted data, you need

Instruction is as follows:

You will need UNIX system to convert NAIST Text Corpus into the above formats.



Please send any questions, suggestions and comments to ryu-i@cl.cs.titech.ac.jp.