概要
機械翻訳について勉強します。
日時
- 時間:木曜日15:15〜
- 場所:A707
研究テーマ
- 機械翻訳です
主なメンバー
日程
2009年度
- 09/12/24
- 論文紹介: Chiang et al. 11,001 New Features for Statistical Machine Translation. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL (2009) pp. 218–226 (eric-n)
- 09/12/3
- 論文紹介: NTT Statistical Machine Translation for IWSLT 2006. Watanabe et al. IWSLT 2006. (eric-n)
- http://www.mt-archive.info/IWSLT-2006-Watanabe.pdf
- 09/11/26
- 論文紹介: Minimum Error Rate Training in Statistical Machine Translation (jessic-r)
- 09/11/19
- 論文紹介: A Hierarchical Phrase-Based Model for Statistical Machine Translation (eric-n)
- 09/11/12
- 論文紹介: A Hierarchical Phrase-Based Model for Statistical Machine Translation (eric-n)
- 09/11/5
- 論文紹介: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. Quirk et al. ACL 2005. の続き (ryuta-k)
- 09/10/29
- 論文紹介: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. Quirk et al. ACL 2005. (ryuta-k)
- http://www.mt-archive.info/ACL-2005-Quirk.pdf
- 09/10/05
- 論文紹介: Automatic Learning of Parallel Dependency Treelet Pairs. Ding and Palmer. IJCNLP 2004. (naonori-a)
- http://ydsite.googlepages.com/epd_final.pdf
- 09/7/9
- 論文紹介: Pharaoh: a Beam Search Decoder for Phrase-based Statistical Machine Translation Models . Koehn. AMTA 2004. (ryuta-k)
- 09/6/25
- 論文紹介: IBM Model の続き (eric-n)
- 09/6/18
- 論文紹介: IBM Model 3- (eric-n)
- 進捗報告: (jessic-r)
- 09/6/11
- 論文紹介: IBM Model 1,2- (eric-n)
- 進捗報告: (eric-n)
- 09/6/3
- kickoff
- 09/5/28
- 進捗報告: (eric-n)
- 09/5/14
- 論文紹介: Quirk et al. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. ACL 2005. の続き (eric-n)
- 09/4/15
- 論文紹介: Quirk et al. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. ACL 2005. (eric-n)
- 09/4/8
- 勉強会の紹介 (eric-n)
関連研究
- The Basics
- IBM word-alignment models
- A Statistical Approach to Machine Translation. Brown et al. Computational Linguistics, pp. 79-85. 1990.
- The mathematics of statistical machine translation: parameter estimation. Brown et al. Computational Linguistics, Volume 19 , Issue 2 , 1993.
- Phrasal alignment
- Statistical Phrase-Based Translation. Koehn et al. HLT-NAACL 2003, pp. 48-54.
- Phrase-based decoding
- N-gram based automatic evaluation
- BLEU: a Method for Automatic Evaluation of Machine Translation. Papineni et al. ACL 2002, pp. 311-318.
- Automatic parameter tuning
- Minimum Error Rate Training in Statistical Machine Translation. Och. ACL 2003, pp. 160-167.
- IBM word-alignment models
- Alignment
- Dependency tree based
- Automatic Learning of Parallel Dependency Treelet Pairs. Ding and Palmer. IJCNLP 2004.
- Dependency Treelet Translation: Syntactically Informed Phrasal SMT. Quirk et al. ACL 2005.
- Dependency tree based
- Phrase structure tree based
- Robust Language Pair-Independent Sub-Tree Alignment. Tinsley et al. MT Summit 2007.
- Decoding
- Phrasal
- Improvements in Dynamic Programming Beam Search for Phrase-based Statistical Machine Translation. Zens and Ney. IWSLT 2008.
- Efficient Speech Translation through Confusion Network Decoding. Bertoldi et al. IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 9, pp. 1696-1705, 2008.
- Hierarchical
- A Hierarchical Phrase-Based Model for Statistical Machine Translation. Chiang. ACL 2005.
- NTT Statistical Machine Translation for IWSLT 2006. Watanabe et al. IWSLT 2006.
- Phrasal
- Evaluation
- N-gram based precision with stemming and synonyms
- METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Banerjee and Lavie. MT Summit 2005.
- N-gram based precision with stemming and synonyms
リソース
- コーパス
- Europarl Parallel Corpus <http://people.csail.mit.edu/koehn/publications/europarl/>
- Europarl Parallel Corpus (更新版、文レベルのアラインメントあり) <http://www.statmt.org/europarl/>
- FreeLing (English, Spanish tokenization, POS tagging, NE recognition, dependency parsing) <http://www.lsi.upc.es/~nlp/freeling/>
- OPUS Corpus (ソフトの説明書、西和データあり) <http://logos.uio.no/opus/>
- JRC-Acquis (EU法律の訳) <http://langtech.jrc.it/JRC-Acquis.html>
- 田中コーパス <http://www.csse.monash.edu.au/%7Ejwb/tanakacorpus.html>
- 辞書
- EDICT (和英英和辞典) <http://www.csse.monash.edu.au/~jwb/j_edict.html>
- 西和辞典(4500語程度、ダウンロード可能) http://aulex.ohui.net/es-ja/?idioma=ja
- 西語辞典 http://hp.vector.co.jp/authors/VA016777/link/dict.html#10
- オンライン西和辞典 http://www.k3.dion.ne.jp/~sugiura/diccSJ.htm
- 鍋田辞書(データの形式でも入手出来る?) http://www1.udn.ne.jp/~yoiko/nabeta/spain_torikomi.html
- PDIC用西和辞典データ(シェアウェア) http://www.vector.co.jp/soft/data/writing/se277556.html
- PDIC用西和辞典旧データ(フリーウェア) http://www1.udn.ne.jp/~yoiko/nabeta/espana08e2_utf8.zip
- SMTの道具
- アラインメント
- GIZA++ <http://www.fjoch.com/GIZA++.html>
- デコーダー
- Pharaoh (non-open source phrase-based beam0search decoder) <http://www.isi.edu/publications/licensed-sw/pharaoh/>
- Moses (open source factored phrase-based beam-search decoder) <http://www.statmt.org/moses/>
- ISI ReWrite Decoder (IBM model 4 greedy decoder) <http://www.isi.edu/licensed-sw/rewrite-decoder/>
- Cubit (Cube pruning を実装した Python で書かれたデコーダ。Pharaoh 互換) <http://www.cis.upenn.edu/~lhuang3/cubit/>
- 言語モデル
- SRILM (n-gram language models) <http://www.speech.sri.com/projects/srilm/>
- Statistic Language Modeling Toolkit (CMU) <http://mi.eng.cam.ac.uk/%7Eprc14/toolkit.html>
- IRST LM Toolkit <http://sourceforge.net/projects/irstlm> (開発がオープンな言語モデルのツールキット。Moses からも使える)
- Palmkit <http://palmkit.sourceforge.net/> (CMU LM Toolkit とコマンドレベルで互換性がある。ライセンスが緩い)
- mkcls (word class training -- 多くのデコーダーに使用されている) <http://www.fjoch.com/mkcls.html>
- 評価
- mteval (Bleu, NIST score calculation) <http://www.nist.gov/speech/tests/mt/resources/scoring.htm>
- アラインメント
- Moses SMT systems
- thyme.naist.jp:/work/mt/moses
- 松本研のアカウントでログイン