自然言語処理学研究室

奈良先端科学技術大学院大学
松本裕治研究室

Terascale Knowledge Acquisition 勉強会

概要

Web文書など大規模データからの知識獲得に興味がある人向けの勉強会です。

主な議論の対象

Google 日本語Nグラム、Google Web 1T 5gram の使い方や論文紹介、実装紹介
Hadoop/Hbase の勉強と設定および Hadoop を用いた自然言語処理
河原さん Web 5億文コーパスからの知識獲得
Wikipedia からの大規模なカテゴリ付き固有表現知識の獲得
はてなキーワードからの辞書構築
統計的仮名漢字変換への大規模コーパスの応用
統計的機械翻訳への大規模コーパスの応用
述語項構造解析への大規模コーパスの応用

読みたい論文

Towards Terascale Knowledge Acquisition
In Proceedings of the COLING conference
Patrick Pantel, Deepak Ravichandran, and Eduard Hovy
2004

Automatically Labeling Semantic Classes
In Proceedings of the HLT-NAACL conference
Patrick Pantel and Deepak Ravichandran
2004

Learning Surface Text Patterns for a Question Answering system
In Proceedings of the 40th ACL conference
Deepak Ravichandran and Eduard Hovy
2002

Organizing and Searching the World Wide Web of Facts - Step One: the One-Million Fact Extraction Challenge
In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06)
Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, Alpa Jain,
2006

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web
In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-2005)
Marius Pasca, Peter Dienes
2005

Using Contexts of One Trillion Words for WSD
In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007)
Tobias Hawker
http://mandrake.csse.unimelb.edu.au/pacling2007/files/final/36/36_Paper_meta.pdf

Large-Scale Supervised Models for Noun Phrase Bracketing
In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007)
David Vadas and James R. Curran
http://mandrake.csse.unimelb.edu.au/pacling2007/files/final/53/53_Paper_meta.pdf

Minimising semantic drift with Mutual Exclusion Bootstrapping
In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007)
James Curran, Tara Murphy and Bernhard Scholz
http://mandrake.csse.unimelb.edu.au/pacling2007/files/final/84/84_Paper_meta.pdf

WiQA: Question Answering using Wikipedia
- http://ilps.science.uva.nl/WiQA/

Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering.
In Proceedings of ACL, Ann Arbour, MI. 2005.
Deepak Ravichandran, Patrick Pantel, and Eduard Hovy.
http://www.isi.edu/natural-language/people/ravichan/papers/clustering.pdf

An Integrated Approach to Measuring Semantic Similarity between Words Using Information available on the Web
In Proceedings of NAACL/HLT 2007
Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka