概要
機械翻訳について勉強します。
日時
- 時間:木曜日13:30〜
- 場所:A707
研究テーマ
- 機械翻訳の研究をしています。特に構文情報を用いた機械翻訳の研究に焦点を当てています。
- P.Koehn. Statistical Machine Translation. 2010. (CUP) の輪読もしています。
- NTCIR-9 Patent Translation Task に参加します。
主なメンバー
- 林,雨宮,近藤,水本
予定
日程
11/01/17 15:10- ゼミナール発表練習 近藤 (時間注意)
11/01/13 修士論文進捗報告 近藤
10/12/16 論文 林, 修士論文目次発表練習 雨宮, 近藤
- Zhifei Li, et al. Variational Decoding for Statistical Machine Translation, ACL 2009.
10/12/09 論文 水本, 進捗 近藤
- Chris Brockett, William B. Dolan, and Michael Gamon. Correcting ESL Errors Using Phrasal SMT Techniques. COLING/ACL 2006.
10/12/02 論文 近藤, 進捗 雨宮
- "Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions", Zhongqiang Huang, Martin Cmejrek, Bowen Zhou, EMNLP 2010.
10/11/30 発表練習 林, 近藤
10/11/25 論文 雨宮, 進捗 林
- Bing Xiang et al., Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages, ACL. 2010.
10/11/18 NL研のため中止
10/11/11 論文 林, 進捗 水本
- Hui Zhang et. al. Fast Translation Rule Matching for Syntax-based Statistical Machine Translation EMNLP-2009.
10/11/04 論文 水本, 進捗 近藤
- George Foster and Cyril Goutte and Roland Kuhn. Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation. EMNLP-2010.
10/10/29 ゼミナール2発表練習 雨宮
10/10/28 論文 近藤, 進捗 雨宮
- Steve DeNeefe and Kevin Knight. Synchronous Tree Adjoining Machine Translation. EMNLP 2009.
10/10/21 論文 雨宮, 進捗 林
- Chang Liu and Daniel Dahlmeier and Hwee Tou Ng. TESLA: Translation Evaluation of Sentences with Linear-programming-based Analysis. Proc of the Joint Workshop on Statistical Machine Translation and MetricsMATR, pp.354-359, 2010.
10/10/14 論文 林, 進捗 小町 (shared task の調査)
- Zhifei Li and Sanjeev Khudanpur., "Forest Reranking for Machine Translation with the Perceptron Algorithm", 2009.
10/10/07 kickoff meeting
- 今後の予定決め
10/09/09 近藤
- Sudoh et al., "Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation", WMT 2010.
10/09/02 林
- Haitao Mi, Liang Huang, Qun Liu, "Machine Translation with Lattices and Forests", COLING 2010.
10/07/22 林
- 北 研二,川端 豪,斎藤 博昭, "HMM音韻認識と拡張LR構文解析を用いた連続音声認識". 情報処理学会論文誌. Vol.31. No.3. 1990.
10/07/15 林, 小町
林
- Aria Haghighi and Percy Liang and Taylor Berg-Kirkpatrick and Dan Klein. Learning Bilingual Lexicons from Monolingual Corpora. ACL. 2008.
小町
- Jason Riesa and Daniel Marcu. Hierarchical Search for Word Alignment. ACL. 2010.
10/07/08 林, 近藤
林
- Haitao Mi and Qun Liu. Constituency to Dependency Translation with Forests. ACL. 2010.
近藤 (SMT 本)
- Chapter 4.2.4-
10/07/01 林, 近藤
林
- Liang Huang. K-best Knuth algorithm. unpublished. 2005.
近藤 (SMT 本)
- Chapter 4-4.2.3
10/06/24 小町
- Michel Galley and Christopher D. Manning. Accurate Non-Hierarchical Phrase-Based Translation. NAACL 2010.
10/06/17 ジェシカ
- Maja Popovic and Hermann Ney. Syntax-oriented evaluation measures for machine translation output. WMT-09.
- Preslav Nakov. Improving English-Spanish Statistical Machine Translation: Experiments in Domain Adaptation, Sentence Paraphrasing, Tokenization, and Recasing. WMT-08.
10/06/10 林
- Liang Huang, David Chiang: Forest Rescoring: Faster Decoding with Integrated Language Model, ACL, 2007
10/06/03 松本
- Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, Ignacio Thayer: Scalable Inference and Training of Context-Rich Syntactic Translation Models. ACL 2006
10/05/27
- NL 研でお休み
10/05/20 小町
- Christopher Dyer, Aaron Cordoba, Alex Mont and Jimmy Lin. Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce WMT 2008. (komachi)
- 機械翻訳最新事情 : (上)統計的機械翻訳入門
- (参考) 機械翻訳最新事情 : (下)評価型ワークショップの動向と日本からの貢献
10/05/13 林
- Liang Huang, Kevin Knight and Aravind Joshi. Statistical Syntax-Directed Translation with Extended Domain of Locality. AMTA 2006
関連研究
- The Basics
- IBM word-alignment models
- A Statistical Approach to Machine Translation. Brown et al. Computational Linguistics, pp. 79-85. 1990.
- The mathematics of statistical machine translation: parameter estimation. Brown et al. Computational Linguistics, Volume 19 , Issue 2 , 1993.
- Phrasal alignment
- Statistical Phrase-Based Translation. Koehn et al. HLT-NAACL 2003, pp. 48-54.
- Phrase-based decoding
- N-gram based automatic evaluation
- BLEU: a Method for Automatic Evaluation of Machine Translation. Papineni et al. ACL 2002, pp. 311-318.
- Automatic parameter tuning
- Minimum Error Rate Training in Statistical Machine Translation. Och. ACL 2003, pp. 160-167.
- IBM word-alignment models
- Alignment
- Dependency tree based
- Automatic Learning of Parallel Dependency Treelet Pairs. Ding and Palmer. IJCNLP 2004.
- Dependency Treelet Translation: Syntactically Informed Phrasal SMT. Quirk et al. ACL 2005.
- Dependency tree based
- Phrase structure tree based
- Robust Language Pair-Independent Sub-Tree Alignment. Tinsley et al. MT Summit 2007.
- Decoding
- Phrasal
- Improvements in Dynamic Programming Beam Search for Phrase-based Statistical Machine Translation. Zens and Ney. IWSLT 2008.
- Efficient Speech Translation through Confusion Network Decoding. Bertoldi et al. IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 9, pp. 1696-1705, 2008.
- Hierarchical
- A Hierarchical Phrase-Based Model for Statistical Machine Translation. Chiang. ACL 2005.
- NTT Statistical Machine Translation for IWSLT 2006. Watanabe et al. IWSLT 2006.
- Phrasal
- Evaluation
- N-gram based precision with stemming and synonyms
- METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Banerjee and Lavie. MT Summit 2005.
- N-gram based precision with stemming and synonyms
リソース
- コーパス
- Europarl Parallel Corpus <http://people.csail.mit.edu/koehn/publications/europarl/>
- Europarl Parallel Corpus (更新版、文レベルのアラインメントあり) <http://www.statmt.org/europarl/>
- FreeLing (English, Spanish tokenization, POS tagging, NE recognition, dependency parsing) <http://www.lsi.upc.es/~nlp/freeling/>
- OPUS Corpus (ソフトの説明書、西和データあり) <http://logos.uio.no/opus/>
- JRC-Acquis (EU法律の訳) <http://langtech.jrc.it/JRC-Acquis.html>
- 田中コーパス <http://www.csse.monash.edu.au/%7Ejwb/tanakacorpus.html>
- 辞書
- EDICT (和英英和辞典) <http://www.csse.monash.edu.au/~jwb/j_edict.html>
- 西和辞典(4500語程度、ダウンロード可能) http://aulex.ohui.net/es-ja/?idioma=ja
- 西語辞典 http://hp.vector.co.jp/authors/VA016777/link/dict.html#10
- オンライン西和辞典 http://www.k3.dion.ne.jp/~sugiura/diccSJ.htm
- 鍋田辞書(データの形式でも入手出来る?) http://www1.udn.ne.jp/~yoiko/nabeta/spain_torikomi.html
- PDIC用西和辞典データ(シェアウェア) http://www.vector.co.jp/soft/data/writing/se277556.html
- PDIC用西和辞典旧データ(フリーウェア) http://www1.udn.ne.jp/~yoiko/nabeta/espana08e2_utf8.zip
- SMTの道具
- アラインメント
- GIZA++ <http://www.fjoch.com/GIZA++.html>
- デコーダー
- Pharaoh (non-open source phrase-based beam0search decoder) <http://www.isi.edu/publications/licensed-sw/pharaoh/>
- Moses (open source factored phrase-based beam-search decoder) <http://www.statmt.org/moses/>
- ISI ReWrite Decoder (IBM model 4 greedy decoder) <http://www.isi.edu/licensed-sw/rewrite-decoder/>
- Cubit (Cube pruning を実装した Python で書かれたデコーダ。Pharaoh 互換) <http://www.cis.upenn.edu/~lhuang3/cubit/>
- 言語モデル
- SRILM (n-gram language models) <http://www.speech.sri.com/projects/srilm/>
- Statistic Language Modeling Toolkit (CMU) <http://mi.eng.cam.ac.uk/%7Eprc14/toolkit.html>
- IRST LM Toolkit <http://sourceforge.net/projects/irstlm> (開発がオープンな言語モデルのツールキット。Moses からも使える)
- Palmkit <http://palmkit.sourceforge.net/> (CMU LM Toolkit とコマンドレベルで互換性がある。ライセンスが緩い)
- mkcls (word class training -- 多くのデコーダーに使用されている) <http://www.fjoch.com/mkcls.html>
- 評価
- mteval (Bleu, NIST score calculation) <http://www.nist.gov/speech/tests/mt/resources/scoring.htm>
- アラインメント