| 2006-11-20(Mon) | Fixed crossing elements in XML. |
| 2006-10-17(Tue) | Added description of Instruction of converting rawdata into XML format to README. |
| 2006-10-6(Fri) | First beta release of predicate-argument and coreference relations tagged corpus. |
Fill out the form below and press submit button and download starts
immediately.
We annotated the same portion of Mainichi Shimbun Newspaper, which is used for Kyoto Text Corpus. It contains all articles (ca. 20,000 sentences) which start from 1 January 1995 and end with 17 January 1995, and all editorial articles (ca. 20,000 sentences) from January to December. We annotated predicate-argument relation (surface case: nominative, accusative, and dative cases), event noun and its relation (surface case: nominative, accusative, and dative cases), and coreference information to the corpus.
References
- Ryu Iida, Mamoru Komachi, Kentaro Inui and Yuji Matsumoto. Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations. ACL Workshop `Linguistic Annotation Workshop', pp.132-139. 2007
Instruction of converting rawdata into XML format
We only distribute anaphor and coreference tagged corpus. To obtain full XML formatted data, you need
- CD-ROM of Mainichi Shimbun Newspaper of 1995
- perl 5.8.6 or higher
Instruction is as follows:
- Mount a CD-ROM of Mainichi Shimbun Newspaper of 1995
- Run ./auto_conv -d /mnt/cdrom
(Replace /mnt/cdrom with the mountpoint of CD-ROM)
The XML formatted data (UTF-8 encoded) will reside in the xml/ directory.
You will need UNIX system to convert NAIST Text Corpus into XML format.
Please send any questions, suggestions and comments to ryu-i@is.naist.jp.