Code and Data

Code

AROW++
Yet Another AROW (Adaptive Regularization of Weights) Tool
CaboCha
A Japanese dependency parser (Japanese page).
ChaIME
Stochastic input method editor with Google Japanese n-gram language model.
ChaSen
A morphological analysis system. (Japanese page)
Coordinate structure analysis code
An implementation of coordinate structure analysis/learning method presented in (Hara et al. 2009).
lda
A Latent Dirichlet Allocation package written in MATLAB and C.
MeCab
Yet another morphological analysis system (Japanese page).
SynCha
A Japanese predicate-argument analysis system.
TinySVM
An implementation of Support Vector Machines.
VisualMorphs
A corpus annotation tool with graphical user interface.
YamCha: Yet Another Multipurpose CHunk Annotator
A generic text chunker suitable for diverse NLP tasks.

Data

NAIST Text Corpus
A Japanese corpus annotated with co-reference and predicate-argument relations
NLP citation network data
A network of 3000 citations extracted from papers on Natural Language Processing.
Lang-8 Learner Corpora
Corpora of language learners extracted from a language exchange SNS site, Lang-8