概要
Web文書など大規模データからの知識獲得に興味がある人向けの勉強会です。
主な議論の対象
- Google 日本語Nグラム、Google Web 1T 5gram の使い方や論文紹介、実装紹介
- Hadoop/Hbase の勉強と設定および Hadoop を用いた自然言語処理
- 河原さん Web 5億文コーパスからの知識獲得
- Wikipedia からの大規模なカテゴリ付き固有表現知識の獲得
- はてなキーワードからの辞書構築
- 統計的仮名漢字変換への大規模コーパスの応用
- 統計的機械翻訳への大規模コーパスの応用
- 述語項構造解析への大規模コーパスの応用
読みたい論文
- Towards Terascale Knowledge Acquisition
- In Proceedings of the COLING conference
- Patrick Pantel, Deepak Ravichandran, and Eduard Hovy
- 2004
- Automatically Labeling Semantic Classes
- In Proceedings of the HLT-NAACL conference
- Patrick Pantel and Deepak Ravichandran
- 2004
- Learning Surface Text Patterns for a Question Answering system
- In Proceedings of the 40th ACL conference
- Deepak Ravichandran and Eduard Hovy
- 2002
- Organizing and Searching the World Wide Web of Facts - Step One: the One-Million Fact Extraction Challenge
- In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06)
- Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, Alpa Jain,
- 2006
- Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web
- In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-2005)
- Marius Pasca, Peter Dienes
- 2005
- Using Contexts of One Trillion Words for WSD
- In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007)
- Tobias Hawker
- http://mandrake.csse.unimelb.edu.au/pacling2007/files/final/36/36_Paper_meta.pdf
- Large-Scale Supervised Models for Noun Phrase Bracketing
- In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007)
- David Vadas and James R. Curran
- http://mandrake.csse.unimelb.edu.au/pacling2007/files/final/53/53_Paper_meta.pdf
- Minimising semantic drift with Mutual Exclusion Bootstrapping
- In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007)
- James Curran, Tara Murphy and Bernhard Scholz
- http://mandrake.csse.unimelb.edu.au/pacling2007/files/final/84/84_Paper_meta.pdf
- WiQA: Question Answering using Wikipedia
- Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering.
- In Proceedings of ACL, Ann Arbour, MI. 2005.
- Deepak Ravichandran, Patrick Pantel, and Eduard Hovy.
- http://www.isi.edu/natural-language/people/ravichan/papers/clustering.pdf
- An Integrated Approach to Measuring Semantic Similarity between Words Using Information available on the Web
- In Proceedings of NAACL/HLT 2007
- Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka