The Ubuntu NLP Repository v6.06
Powered by Falcon

Component nlp

Language-independent NLP tools.

You can use apt to download and install the packages. Use the following lines in /etc/apt/sources.list and use the command sudo apt-get update to enable downloading from this component.

Don't forget to read the notice on the frontpage!

deb http://cl.naist.jp/~eric-n/ubuntu-nlp dapper nlp
deb-src http://cl.naist.jp/~eric-n/ubuntu-nlp dapper nlp

Packages

giza-pp
Version:1:1.0.3-3nlp1~0dapper1
Source (dsc):giza-pp_1.0.3-3nlp1~0dapper1.dsc
Source (tar.gz):giza-pp_1.0.3-3nlp1~0dapper1.tar.gz
giza++
Description:A tool for training statistical alignment models More...

GIZA++: Training of statistical translation models.

GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT)
which was developed by the Statistical Machine Translation team during the
summer workshop in 1999 at the Center for Language and Speech Processing
at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional
features. The extensions of GIZA++ were designed and written by Franz Josef
Och.

About GIZA++

The program includes the following extensions to GIZA:

* IBM Model 4;
* IBM Model 5;
* Alignment models depending on word classes
* Implements the HMM alignment model: Baum-Welch training, Forward-Backward
algorithm, empty word, dependency on word classes, transfer to fertility
models
* Includes a variant of Model 3 and Model 4 which allow the training of the
parameter p_0;
* Various smoothing techniques for fertility, distortion/alignment parameters;
* Significant more efficient training of the fertility models;
* Correct implementation of pegging as described in (Brown et al. 1993), a
series of heuristics in order to make pegging sufficiently efficient;

For more information, consult the following publication:

@ARTICLE{och03:asc,
AUTHOR = {Franz Josef Och and Hermann Ney},
TITLE = {A Systematic Comparison of Various Statistical Alignment Models},
JOURNAL= {Computational Linguistics},
NUMBER = 1,
VOLUME = 29,
YEAR = 2.0.2003,
PAGES = {19--51}}

or the GIZA++ project homepage <http://www.fjoch.com/GIZA++.html>

Package:giza++_1.0.3-3nlp1~0dapper1_i386.deb
Package:giza++_1.0.3-3nlp1~0dapper1_amd64.deb
mkcls
Description:A tool for training statistical alignment models More...

mkcls: word class training with maximum likelihood-criterion.

mkcls is a tool to train word classes by using a maximum-likelihood-criterion.
The resulting word classes are especially suited for language models or
statistical translation models. The program mkcls was written by Franz Josef
Och.

For more information, consult the following publication:

* Franz Josef Och: "An Efficient Method for Determining Bilingual Word
Classes"; pp. 71-76, Ninth Conf. of the Europ. Chapter of the Association
for Computational Linguistics; EACL'99, Bergen, Norway, June 1999.

or the mkcls project homepage <http://www.fjoch.com/mkcls.html>

Package:mkcls_1.0.3-3nlp1~0dapper1_i386.deb
Package:mkcls_1.0.3-3nlp1~0dapper1_amd64.deb

mgiza++
Version:0.1-1nlp3~0dapper1
Source (dsc):mgiza++_0.1-1nlp3~0dapper1.dsc
Source (tar.gz):mgiza++_0.1-1nlp3~0dapper1.tar.gz
mgiza++
Description:A multi-threaded tool for training statistical alignment models More...

Multi-Threaded GIZA++ is an extension to the GIZA++ word aligning tool by
Qin Gao <qing@cs.cmu.edu> of CMU. It can perform much faster training
than origin GIZA++ if you have more than one CPUs. In addition it fixed
some bugs in GIZA, and the final aligning perplexity is generally lower
than the original GIZA++.

GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT)
which was developed by the Statistical Machine Translation team during the
summer workshop in 1999 at the Center for Language and Speech Processing
at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional
features. The extensions of GIZA++ were designed and written by Franz Josef
Och.

About GIZA++

The program includes the following extensions to GIZA:

* IBM Model 4;
* IBM Model 5;
* Alignment models depending on word classes
* Implements the HMM alignment model: Baum-Welch training, Forward-Backward
algorithm, empty word, dependency on word classes, transfer to fertility
models
* Includes a variant of Model 3 and Model 4 which allow the training of the
parameter p_0;
* Various smoothing techniques for fertility, distortion/alignment parameters;
* Significant more efficient training of the fertility models;
* Correct implementation of pegging as described in (Brown et al. 1993), a
series of heuristics in order to make pegging sufficiently efficient;

For more information, consult the following publication:

@ARTICLE{och03:asc,
AUTHOR = {Franz Josef Och and Hermann Ney},
TITLE = {A Systematic Comparison of Various Statistical Alignment Models},
JOURNAL= {Computational Linguistics},
NUMBER = 1,
VOLUME = 29,
YEAR = 2.0.2003,
PAGES = {19--51}}

or the GIZA++ project homepage <http://www.fjoch.com/GIZA++.html>
or Qin Gao's homepage <http://www.cs.cmu.edu/~qing/>

Package:mgiza++_0.1-1nlp3~0dapper1_i386.deb
Package:mgiza++_0.1-1nlp3~0dapper1_amd64.deb

moses
Version:20090831svn-1nlp1
Source (dsc):moses_20090831svn-1nlp1.dsc
Source (tar.gz):moses_20090831svn-1nlp1.tar.gz
moses
Description:Moses: a factored phrase-based beam-search decoder for machine translation More...

Moses is a statistical machine translation system that allows you to automatically train translation
models for any language pair. All you need is a collection of translated texts (parallel corpus).
* beam-search: an efficient search algorithm finds quickly the highest probability translation
among the exponential number of choices
* phrase-based: the state-of-the-art in statistical machine translation allows the translation of
short text chunks
* factored: words may have factored representation (surface forms, lemma, part-of-speech,
morphology, word classes...)

Features
* Moses is a drop-in replacement for Pharaoh, the popular phrase-based decoder, with many extensions.
* Moses allows the decoding of confusion networks, enabling easy integration with ambiguous
upstream tools, such as automatic speech recognizers
* Moses features novel factored translation models, which enable the integration linguistic and
other information at many stages of the translation process

For more information, visit <http://www.statmt.org/moses/>

Package:moses_20090831svn-1nlp1_i386.deb
Package:moses_20090831svn-1nlp1_amd64.deb
moses-doc
Description:Documentation for Moses More...

Moses is a statistical machine translation system that allows you to automatically train translation
models for any language pair. All you need is a collection of translated texts (parallel corpus).
* beam-search: an efficient search algorithm finds quickly the highest probability translation
among the exponential number of choices
* phrase-based: the state-of-the-art in statistical machine translation allows the translation of
short text chunks
* factored: words may have factored representation (surface forms, lemma, part-of-speech,
morphology, word classes...)

Features
* Moses is a drop-in replacement for Pharaoh, the popular phrase-based decoder, with many extensions.
* Moses allows the decoding of confusion networks, enabling easy integration with ambiguous
upstream tools, such as automatic speech recognizers
* Moses features novel factored translation models, which enable the integration linguistic and
other information at many stages of the translation process

This package contains additional documentation for Moses.

Package:moses-doc_20090831svn-1nlp1_all.deb

mosesmake
Version:0.0.20091215hg-3nlp2~0dapper1
Source (dsc):mosesmake_0.0.20091215hg-3nlp2~0dapper1.dsc
Source (tar.gz):mosesmake_0.0.20091215hg-3nlp2~0dapper1.tar.gz
mosesmake
Description:Makefile utilities for rapid deployment of Moses SMT systems More...

Moses Make is a set of makefiles and utilities for automatic setup of Moses SMT systems.

Moses Make will tokenize and annotate data with POS, lemma form, and morphology factors.
Currently, Moses Make supports English, Italian, Japanese, and Spanish, but it can easily be extended to support any language with a POS tagger and morphological analyzer.

For more information, see Moses Make's homepage at http://cl.naist.jp/~/eric-n/hg/mosesmake/

Package:mosesmake_0.0.20091215hg-3nlp2~0dapper1_all.deb

python-nltk
Version:0.9.2-1nlp2~0dapper1
Source (dsc):python-nltk_0.9.2-1nlp2~0dapper1.dsc
Source (tar.gz):python-nltk_0.9.2-1nlp2~0dapper1.tar.gz
python-nltk
Description:Natural Language Toolkit More...

NLTK — the Natural Language Toolkit — is a suite of open source
Python modules, data and documentation for research and development
in natural language processing.

NLTK contains Code supporting dozens of NLP tasks, along with
40 popular Corpora and extensive Documentation including a 375-page
online Book.

For more information, see the project homepage:
<http://nltk.org>

This package is an empty dummy package that always depends on
a package built for Debian's default Python version.

Package:python-nltk_0.9.2-1nlp2~0dapper1_all.deb
python2.4-nltk
Description:Natural Language Toolkit More...

NLTK — the Natural Language Toolkit — is a suite of open source
Python modules, data and documentation for research and development
in natural language processing.

NLTK contains Code supporting dozens of NLP tasks, along with
40 popular Corpora and extensive Documentation including a 375-page
online Book.

For more information, see the project homepage:
<http://nltk.org>
.

Package:python2.4-nltk_0.9.2-1nlp2~0dapper1_i386.deb
Package:python2.4-nltk_0.9.2-1nlp2~0dapper1_amd64.deb

python-nltk-data
Version:0.9.2-1nlp2~0dapper1
Source (dsc):python-nltk-data_0.9.2-1nlp2~0dapper1.dsc
Source (tar.gz):python-nltk-data_0.9.2-1nlp2~0dapper1.tar.gz
python-nltk-data
Description:Natural Language Toolkit Data More...

NLTK — the Natural Language Toolkit — is a suite of open source
Python modules, data and documentation for research and development
in natural language processing.

NLTK contains Code supporting dozens of NLP tasks, along with
40 popular Corpora and extensive Documentation including a 375-page
online Book.

For more information, see the project homepage:
<http://nltk.org>

This package contains data including corpora for use with NLTK.

Package:python-nltk-data_0.9.2-1nlp2~0dapper1_all.deb

python-nltk-doc
Version:0.9.2-2nlp1~0dapper1
Source (dsc):python-nltk-doc_0.9.2-2nlp1~0dapper1.dsc
Source (tar.gz):python-nltk-doc_0.9.2-2nlp1~0dapper1.tar.gz
python-nltk-doc
Description:Natural Language Toolkit Documentation More...

NLTK — the Natural Language Toolkit — is a suite of open source
Python modules, data and documentation for research and development
in natural language processing.

NLTK contains Code supporting dozens of NLP tasks, along with
40 popular Corpora and extensive Documentation including a 375-page
online Book.

For more information, see the project homepage:
<http://nltk.org>

This package contains documentation and examples for NLTK.

Package:python-nltk-doc_0.9.2-2nlp1~0dapper1_all.deb

srilm
Version:1.5.9-1nlp1~0dapper1
Source (dsc):srilm_1.5.9-1nlp1~0dapper1.dsc
Source (tar.gz):srilm_1.5.9-1nlp1~0dapper1.tar.gz
srilm
Description:The SRI Language Model Toolkit More...

SRILM is a toolkit for building and applying statistical language models (LMs),
primarily for use in speech recognition, statistical tagging and segmentation.
It has been under development in the SRI Speech Technology and Research
Laboratory since 1995.

SRILM consists of the following components:

* A set of C++ class libraries implementing language models, supporting data
stuctures and miscellaneous utility functions.
* A set of executable programs built on top of these libraries to perform
standard tasks such as training LMs and testing them on data, tagging or
segmenting text, etc.
* A collection of miscellaneous scripts facilitating minor related tasks.

For more information, visit <http://www.speech.sri.com/projects/srilm/>

Package:srilm_1.5.9-1nlp1~0dapper1_i386.deb
Package:srilm_1.5.9-1nlp1~0dapper1_amd64.deb
srilm-dev
Description:The SRI Language Model Toolkit More...

SRILM is a toolkit for building and applying statistical language models (LMs),
primarily for use in speech recognition, statistical tagging and segmentation.
It has been under development in the SRI Speech Technology and Research
Laboratory since 1995.

SRILM consists of the following components:

* A set of C++ class libraries implementing language models, supporting data
stuctures and miscellaneous utility functions.
* A set of executable programs built on top of these libraries to perform
standard tasks such as training LMs and testing them on data, tagging or
segmenting text, etc.
* A collection of miscellaneous scripts facilitating minor related tasks.

This package contains headers and other files used for development with SRILM.

Package:srilm-dev_1.5.9-1nlp1~0dapper1_i386.deb
Package:srilm-dev_1.5.9-1nlp1~0dapper1_amd64.deb
srilm-doc
Description:Documentation for the SRI Language Model Toolkit More...

SRILM is a toolkit for building and applying statistical language models (LMs),
primarily for use in speech recognition, statistical tagging and segmentation.
It has been under development in the SRI Speech Technology and Research
Laboratory since 1995.

SRILM consists of the following components:

* A set of C++ class libraries implementing language models, supporting data
stuctures and miscellaneous utility functions.
* A set of executable programs built on top of these libraries to perform
standard tasks such as training LMs and testing them on data, tagging or
segmenting text, etc.
* A collection of miscellaneous scripts facilitating minor related tasks.

This package contains additional documentation for SRILM.

Package:srilm-doc_1.5.9-1nlp1~0dapper1_i386.deb
Package:srilm-doc_1.5.9-1nlp1~0dapper1_amd64.deb

treetagger
Version:3.2-3nlp2~0dapper1
Source (dsc):treetagger_3.2-3nlp2~0dapper1.dsc
Source (tar.gz):treetagger_3.2-3nlp2~0dapper1.tar.gz
treetagger
Description:a language independent part-of-speech tagger More...

The TreeTagger is a tool for annotating text with part-of-speech and
lemma information which has been developed within the TC project at
the Institute for Computational Linguistics of the University of
Stuttgart. The TreeTagger has been successfully used to tag German,
English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek,
Portuguese, Chinese and old French texts and is easily adaptable to
other languages if a lexicon and a manually tagged training corpus
are available.

This package downloads and installs the TreeTagger binaries and
helper scripts. The source code for the TreeTagger has not been
released but its license permits free use "for research purposes."
Installation of this package implies consent with its terms. For the
full text of the license, see
http://www.ims.uni-stuttgart.de/~schmid/Tagger-Licence or the
TreeTagger's homepage at
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Package:treetagger_3.2-3nlp2~0dapper1_i386.deb
Package:treetagger_3.2-3nlp2~0dapper1_amd64.deb

treetagger-english
Version:3.1-1nlp2~0dapper1
Source (dsc):treetagger-english_3.1-1nlp2~0dapper1.dsc
Source (tar.gz):treetagger-english_3.1-1nlp2~0dapper1.tar.gz
treetagger-english
Description:English language parameter files for TreeTagger More...

The TreeTagger is a tool for annotating text with part-of-speech and
lemma information which has been developed within the TC project at
the Institute for Computational Linguistics of the University of
Stuttgart. The TreeTagger has been successfully used to tag German,
English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek,
Portuguese, Chinese and old French texts and is easily adaptable to
other languages if a lexicon and a manually tagged training corpus
are available.

This package downloads and installs the parameter files necessary for
tagging Englis data. The source code for the TreeTagger has not been
released but its license permits free use "for research purposes."
Installation of this package implies consent with its terms. For the
full text of the license, see
http://www.ims.uni-stuttgart.de/~schmid/Tagger-Licence or the
TreeTagger's homepage at
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Package:treetagger-english_3.1-1nlp2~0dapper1_all.deb

treetagger-italian
Version:3.1-1nlp1~0dapper1
Source (dsc):treetagger-italian_3.1-1nlp1~0dapper1.dsc
Source (tar.gz):treetagger-italian_3.1-1nlp1~0dapper1.tar.gz
treetagger-italian
Description:Italian language parameter files for TreeTagger More...

The TreeTagger is a tool for annotating text with part-of-speech and
lemma information which has been developed within the TC project at
the Institute for Computational Linguistics of the University of
Stuttgart. The TreeTagger has been successfully used to tag German,
Italian, French, Italian, Dutch, Italian, Bulgarian, Russian, Greek,
Portuguese, Chinese and old French texts and is easily adaptable to
other languages if a lexicon and a manually tagged training corpus
are available.

This package downloads and installs the parameter files necessary for
tagging Englis data. The source code for the TreeTagger has not been
released but its license permits free use "for research purposes."
Installation of this package implies consent with its terms. For the
full text of the license, see
http://www.ims.uni-stuttgart.de/~schmid/Tagger-Licence or the
TreeTagger's homepage at
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Package:treetagger-italian_3.1-1nlp1~0dapper1_all.deb

treetagger-spanish
Version:3.1-1nlp1~0dapper1
Source (dsc):treetagger-spanish_3.1-1nlp1~0dapper1.dsc
Source (tar.gz):treetagger-spanish_3.1-1nlp1~0dapper1.tar.gz
treetagger-spanish
Description:Spanish language parameter files for TreeTagger More...

The TreeTagger is a tool for annotating text with part-of-speech and
lemma information which has been developed within the TC project at
the Institute for Computational Linguistics of the University of
Stuttgart. The TreeTagger has been successfully used to tag German,
Spanish, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek,
Portuguese, Chinese and old French texts and is easily adaptable to
other languages if a lexicon and a manually tagged training corpus
are available.

This package downloads and installs the parameter files necessary for
tagging Englis data. The source code for the TreeTagger has not been
released but its license permits free use "for research purposes."
Installation of this package implies consent with its terms. For the
full text of the license, see
http://www.ims.uni-stuttgart.de/~schmid/Tagger-Licence or the
TreeTagger's homepage at
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Package:treetagger-spanish_3.1-1nlp1~0dapper1_all.deb