$Id: index.html 14 2006-12-05 15:21:13Z nldata $
NAIST Text Corpus
(version 1.4β)

2006-11-20(Mon) Fixed crossing elements in XML.
2006-10-17(Tue) Added description of Instruction of converting rawdata into XML format to README.
2006-10-6(Fri) First beta release of predicate-argument and coreference relations tagged corpus.

Fill out the form below and press submit button and download starts immediately.

Name
Affiliation
E-mail

We annotated the same portion of Mainichi Shimbun Newspaper, which is used for Kyoto Text Corpus. It contains all articles (ca. 20,000 sentences) which start from 1 January 1995 and end with 17 January 1995, and all editorial articles (ca. 20,000 sentences) from January to December. We annotated predicate-argument relation (surface case: nominative, accusative, and dative cases), event noun and its relation (surface case: nominative, accusative, and dative cases), and coreference information to the corpus.

References


Instruction of converting rawdata into XML format

We only distribute anaphor and coreference tagged corpus. To obtain full XML formatted data, you need

Instruction is as follows:

The XML formatted data (UTF-8 encoded) will reside in the xml/ directory.

You will need UNIX system to convert NAIST Text Corpus into XML format.



Please send any questions, suggestions and comments to ryu-i@is.naist.jp.