けいはんな自然言語研究会

ATR音声翻訳通信研究所 (行き方), 奈良先端科学技術大学院大学 (行き方), NTTコミュニケーション科学基礎研究所 (行き方) で持ち回りの自然言語研究会を不定期に開催しています．毎回一人の話題提供者を中心に1時間から2時間をかけて何でも議論できるインフォーマルな研究会です．参加はどなたでも自由です．

メーリングリストへの登録は，本文に

subscribe Your Name

(注：Your Name の部分は自分の名前をローマ字(半角)で書いたものと置き換えてください. 例: ○Taro Yamada, ×山田太郎) と書いたメールを keihanna-ctl@is.naist.jp までお送りください.

第77回研究会

講演者: Prof. Anoop Sarkar (Simon Fraser University, Canada)
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
Talk 1: 講演タイトル: Interactive visualization of facts extracted from natural language text
とき: 2015/09/11 (金) 13:30 - 15:00
概要: In natural language processing, the summarization of information in a large amount of text has typically been viewed as a type of natural language generation problem, e.g. "produce a 250 word summary of some documents based on some input query". An alternative view, which will be the focus of this talk, is to use natural language parsing to extract facts from a collection of documents and then use information visualization to provide an interactive summarization of these facts. The first step is to extract detailed facts about events from natural language text using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic roles in order to create a predicate-centric ontology for entities which is used to create a knowledge base of facts about entities and their relationship with other entities. The next step is to use information visualization to provide a summarization of the facts in this knowledge base. The user can interact with the visualization to find summaries that have different granularities. This enables the discovery of extremely uncommon facts easily, unlike large scale "macro-reading" approaches to information extraction. We have used this methodology to build an interactive visualization of events in human history by machine reading Wikipedia articles (available on the web at http://lensingwikipedia.cs.sfu.ca).
Talk 2: 講演タイトル: Simultaneous translation for hierarchical phrase-based machine translation
とき: 2015/09/14 (月) 10:00 - 11:30
概要: Hierarchical phrase-based machine translation (Chiang, CL 2005) (Hiero) is a prominent approach for statistical machine translation which uses synchronous context-free grammars. Hiero typically uses a bottom-up (CKY) decoding algorithm which requires the entire input sentence before decoding begins. Left-to-right (LR) decoding (Watanabe, ACL 2006) is a promising decoding algorithm for Hiero that produces the output translation in left to right order. I will briefly summarize how we have extended the LR decoding approach for Hiero (which we call LR-Hiero) and then focus on simultaneous translation in this framework. In simultaneous translation, translations are generated incrementally as source language speech input is processed. We propose a novel approach for incremental translation by integrating segmentation and decoding in LR-Hiero. We compare two incremental decoding algorithms for LR-Hiero and present translation quality scores (BLEU) and the latency of generating translations for both decoders for audio lectures from the TED collection.
講演者紹介: Anoop Sarkar is Professor of Computer Science at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca). His research uses machine learning methods applied to natural language processing, specifically statistical machine translation between all human languages, and the summarization of information in language using a combination of visualization and semantic parsing algorithms. He sometimes dreams about a computational decipherment of ancient scripts and mysterious manuscripts. He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars. http://www.cs.sfu.ca/~anoop

↑

第76回研究会

とき: 2014/4/30 (木) 13:45 - 15:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: New Approaches to Learn on Histograms using Fast Optimal Transport
講演者: Marco Cuturi (Kyoto University)
概要: Optimal transport distances (a.k.a earth mover's distance or Wasserstein distances) can define a geometry on histograms of features (e.g. bags-of-words) when a metric on the features (e.g. a metric between words) is known. After reviewing the basic concepts of the optimal transport problem, we will show how an adequate regularization of that problem can result in substantially faster computations [a]. I will then show how this regularization can enable several applications of optimal transport to compute average histograms [b,c,d,e] as well as carry out dictionary learning/topic modeling for text using the earth mover's distance [f].

[a] MC, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013.

[b] MC, A. Doucet, Fast Computation of Wasserstein Barycenters, ICML 2014.

[c] J.D. Benamou, G. Carlier, MC, L. Nenna, G. Peyré, Iterative Bregman Projections for Regularized Transportation Problems, to appear in SIAM Journ. on Scientific Computing.

[d] J. Solomon, F. de Goes, G. Peyré, MC, A. Butscher, A. Nguyen, T. Du, L. Guibas. Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH 2015.

[e] A. Gramfort, G. Peyré, MC, Fast Optimal Transport Averaging of Neuroimaging Data, IPMI 2015.

[f] MC, G. Peyré, A. Rolet, A Smoothed Dual Formulation for Variational Wasserstein Problems, arxiv:1503.02533, 2015.

講演者紹介

I received my Ph.D. in applied maths in 11/2005 from the Ecole des Mines de Paris under the supervision of Jean-Philippe Vert after receiving my undergraduate degree from the ENSAE and my master degree from ENS Cachan. I worked as a post-doctoral researcher at the Institute of Statistical Mathematics, Tokyo, between 11/2005 and 03/2007. Between 04/2007 and 09/2008 I worked in the financial industry. After working at the ORFE department of Princeton University between 02/2009 and 08/2010 as a lecturer, I joined the Graduate School of Informatics in 09/2010 as a G30 associate professor, and I am since 11/2013 the associate professor of the Yamamoto-Cuturi lab.

↑

第75回研究会

とき: 2015/3/3 (火) 13:30 - 14:30
ところ: 奈良先端大情報棟 B707 号室 (B棟7F 中村研究室)
講演タイトル: Recent trends in far-field speech recognition
講演者: Dr. Shinji Watanabe
概要: With the success of voice search applications using mobile devices, the application area for speech recognition has widen from close talk to distant talk scenarios. However, far-field speech recognition is a significantly harder problem, because speech signals are distorted due to noises, reverberations, and attenuation. In such scenarios, the performance of current speech recognition systems drastically degrades due to their lack of robustness. In this presentation, we introduce several research trends in far-field speech recognition, in particular revolving around the CHiME speech separation and recognition challenge series, the REVERB challenge, and the ASpIRE challenge. We also introduce several promising techniques used in our systems which showed their effectiveness in these challenges, including non-negative matrix factorization, long short-term memory network for speech enhancement, and discriminative techniques for acoustic modeling.
講演者紹介: Shinji Watanabe is a Senior Principal Researcher at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA. He received his Ph.D. from Waseda University, Tokyo, Japan, in 2006. From 2001 to 2011, he was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan. In 2009, he was a visiting scholar at the Georgia Institute of Technology, Atlanta, GA. His research interests include Bayesian machine learning, and speech and language processing. He has published more than 100 papers in journals and conferences, and received several awards including the best paper award from IEICE in 2003. He is currently an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing, and member of several committees including the IEEE Signal Processing Society Speech and Language Technical Committee (SLTC).

↑

第74回研究会

とき: 2014/11/11 (火) Talk1: 13:30-15:00; Talk2: 15:20-16:50
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演者: Prof Robert Kowalski (Emeritus Professor and Distinguished Research Fellow at Imperial College, London)
講演者紹介: Robert Kowalski studied at the University of Chicago, the University of Bridgeport, Stanford University, the University of Warsaw, and the University of Edinburgh, where he completed his PhD in 1970. Kowalski has been an advisor to the UNDP Knowledge Based Systems Project in India and to DFKI, the German Institute for Artificial Intelligence. He co-ordinated the European Community Basic Research Project, Compulog, and was the founder of the European Compulog Network of Excellence. Since 2009, he has been an advisor to the Department of Immunization, Vaccines and Biologicals, of the World Health Organization in Geneva. Kowalski is a Fellow of the Association for the Advancement of Artificial Intelligence, the European Co-ordinating Committee for Artificial Intelligence, and the Association for Computing Machinery. He received the IJCAI (International Joint Conference of Artificial Intelligence) award for Research Excellence in 2011.
Talk1 講演タイトル: Towards a Science of Computing　
概要: As a scientific discipline, the field of Computing today lacks a unifying framework. It consists, instead, of diverse languages, tools and techniques in the mostly disjoint areas of programming, databases, and artificial intelligence. But, despite this diversity, it is possible to identify a number of similar features lying beneath the surface. These features include such notions as states and state transitions, declarative and procedural representations, external events and internally generated actions, active versus goal-oriented behaviour, and hierarchical organisation of structures and procedures. In my talk, I will highlight some of the similarities in such different frameworks for Computing as logic programming, production systems, agent-oriented programming, active databases, action languages in AI, abstract state machines and other models of computation. I will argue that it is possible to unify many of the most important features of these frameworks, and to combine them in a single logic-based framework that can be applied to all areas of Computing.
Talk2 講演タイトル: Computational Logic and its Relationship with Guidelines for English Writing Style
概要: Formal Logic is a natural candidate for representing computer-intelligible knowledge extracted from natural language texts on the WWW. I will argue that the logic of natural language texts is normally not visible, but is hidden beneath the surface, and that it can be uncovered more easily by studying texts that are designed to be as clear and easy to understand as possible. I will support my argument in two ways: by giving examples of English language texts and their hidden logic, and by interpreting guidelines for English writing style in computational logic terms. I will also argue that kind of logic that is most useful for representing natural language is both simpler and richer than classical logic. It is simpler because it has a simpler syntax in the form of conditionals, and it is more powerful because it distinguishes between the logic of beliefs and the logic of goals.

↑

第73回研究会

とき: 2014/8/1 (金) 14:00 - 15:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Algorithms for Large-Scale Semantic Knowledge Graphs

講演者: Gerard de Melo (Tsinghua University)
概要: In recent years, there has been a growing conviction, both in academia and at companies like Google, that so-called knowledge graphs will play an important role in improving natural language processing, Web search, and artificial intelligence. Edges in such graphs reflect relationships between entities, e.g. people, places, or words and their meanings. In this talk, I provide an overview of recent advances on collecting such knowledge from the Web using novel information extraction and joint link prediction methods. I will highlight some semantic applications of these methods, e.g. for taxonomy induction and adjective intensities. I will also present new resources like Lexvo.org, WebChild, and UWN/MENTA, which is currently the largest multilingual knowledge taxonomy and covers over 100 languages.
講演者紹介: Gerard de Melo is an Assistant Professor at Tsinghua University, where he is heading the Web Mining and Language Technology group. Previously, he was a post-doctoral researcher at UC Berkeley working in the ICSI AI/FrameNet group, and a doctoral candidate at the Max Planck Institute for Informatics. He has published over 30 research papers on Web Mining and Natural Language Processing, winning Best Paper Awards at conferences like CIKM and ICGL. For more information, please visit http://gerard.demelo.org/.

↑

第72回研究会

とき: 2014/04/24 (木) 14:00 - 15:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Composition in Syntactico-Semantic Tensor Space
講演者: William Blacoe (Univ. of Edinburgh)
概要: Distributional semantics has proven successful in many tasks at the level of single words and short phrases. Typically vectors are used to represent the meaning of words and phrases, where the dimensions can come from syntactic, semantic or other features. One major challenge in this field is to combine lexical meaning to obtain the meaning of full sentences. After comparing different combinations of representation and composition we conclude that a natural match between the two is essential. We present a new unsupervised model for capturing and composing syntactico-semantic information in tensors. The method for composing these tensors emerges straightforwardly from their representational structure. Strengths and weaknesses of our model are analysed by comparing it with other unsupervised models, and evaluating them against the the tasks of word similarity, two word composition and textual entailment recognition.
講演者紹介: William Blacoe is a PhD student at the Institute for Language, Cognition and Computation at the University of Edinburgh. His PhD advisor is Mirella Lapata. While he studied computer science at the University of Frankfurt, Germany, his minor was Cognitive Linguistics. In Edinburgh he is combining the two in the form of Computational Semantics, where he working on unifying the paradigms of distributional and compositional semantics in unsupervised ways.

↑

第71回研究会

とき: 2014/1/17 (金) 13:30 - 15:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Better NLP with Topic Models and Better Topic Models with NLP
講演者: Prof. Tim Baldwin (Univ. of Melbourne)
概要: Latent Dirichlet Allocation (LDA) provides a means of learning the latent structure of documents and document collections, by modelling each document as a mixture of topics, and each topic as a mixture of words. It was originally proposed in the machine learning community, largely independently of NLP, but has seen strong adoption within NLP circles. In this talk, I will discuss the interaction between LDA-based topic modelling and NLP, in two parts. First, I will describe work where we apply topic modelling to the NLP tasks of word sense induction and novel word sense detection, achieving state-of-the-art results over both tasks. Second, I will describe attempts to improve topic modelling through more NLP-informed document tokenisation, based on collocations and named entities. I will explore the impact of n-gram tokenisation on LDA topic models, and demonstrate that a richer document representation enhances topic model quality. I will also discuss recent work on evaluating topic model quality.

講演者紹介: Prof Timothy Baldwin is a Professor in the Department of Computing and Information Systems, The University of Melbourne, an Australian Research Council Future Fellow, and a contributed research staff member of the NICTA Victoria Research Laboratories. He has previously held visiting positions at the University of Washington, University of Tokyo, Saarland University, and NTT Communication Science Laboratories. His research interests include text mining of social media, computational lexical semantics, information extraction and web mining, with a particular interest in the interface between computational and theoretical linguistics. Current projects include web user forum mining, text mining of Twitter, and intelligent interfaces for Japanese language learners. He is currently Secretary of the Australasian Language Technology Association and a member of the Executive Committee of the Asian Federation of Natural Language Processing, and was PC Chair of EMNLP 2013. Tim completed a BSc(CS/Maths) and BA(Linguistics/Japanese) at The University of Melbourne in 1995, and an MEng(CS) and PhD(CS) at the Tokyo Institute of Technology in 1998 and 2001, respectively. Prior to joining The University of Melbourne in 2004, he was a Senior Research Engineer at the Center for the Study of Language and Information, Stanford University (2001-2004).

↑

第70回研究会

とき: 2013/10/10 (木) 4:50pm
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Principled induction of Translation Grammars for Hierarchical Phrase-based Models
講演者: Dr. Baskaran Sankaran (Simon Fraser University)
概要: The hierarchical phrase-based translation (Chiang 2007) has gained wide acceptance within the MT community. It has been shown to be more effective than the equivalent phrase-based models for language pairs involving long-distance reordering. We propose a Bayesian model for extracting Hiero rules by reasoning over the derivations of phrase-pairs. Our model employing scalable Variational Bayesian inference extracts a sparse Hiero grammar with better discriminative power. We evaluate our model across three different language pairs demonstrating improvements in small data setting and competitive performance in large-scale datasets. We then take a step back to consider the Hiero training pipeline in its entirety, where the alignments and Hiero rules are learned in disconnected steps. We propose a novel unified-cascade framework for jointly learning the alignments and Hiero rules in distinct but iterative steps. Using two distinct models for the two components, we demonstrate the effectiveness of our framework for the translation task across two language-pairs.
講演者紹介: Baskaran Sankaran recently completed his PhD at Simon Fraser University. His current research interests include machine learning and statistical NLP applied to machine translation, language modelling and word sense disambiguation. He was previously affiliated with AU-KBC Research Center and Microsoft Research Labs. In the past he have extensively worked on developing and commercializing language technology products for Indian languages. He has also contributed to several projects including Indian languages MT and designing common Parts-of-Speech tagset framework.

↑

第69回研究会

とき: 2013/10/4 (金) 3:10pm
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Embracing Data and Noise through Interactive Systems and Applications
講演者: Dr. Koji Yatani (Microsoft Research Asia)
概要: We are surrounded by data. As sensors are embedded in various devices, the system can collect the data about users and environment from these sensors constantly. Text data on the Web can contain useful information to the user. Although these data can be huge and are usually noisy, we can discover interesting approaches which lead to new interfaces and applications by looking at the data from different perspectives. In this talk, I present three demonstrations exploiting data and noise from wearable sensors and online text data. I then discuss how they have helped me broaden my research agenda. I also briefly introduce my HCI group and internship program at MSRA.
講演者紹介: Dr. Koji Yatani (http://yatani.jp) is an Associate Researcher in Human-Computer Interaction Group at Microsoft Research Asia. He is also a Visiting Associate Professor in Graduate School of Information Science and Technology, at The University of Tokyo. His main research interests lie in Human-Computer Interaction (HCI) and its intersections with Ubiquitous Computing and Computational Linguistics. More specifically, he is interested in designing new forms of interacting with mobile devices, and developing new hardware and sensing technologies to support user interactions in mobile/ubiquitous computing environments. He is also interested in developing interactive systems and exploring new applications using computational linguistics methods. He received B.Eng. and M.Sci. from University of Tokyo in 2003 and 2005, respectively, and his Ph.D. in Computer Science from University of Toronto in 2011. On November 2011, he joined HCI group at Microsoft Research Asia in Beijing. On October 2013, he started to work as a Visiting Associate Professor in Graduate School of Information Science and Technology, at The University of Tokyo. He received the Best Paper Award at CHI 2011, and served as a conference or program committee on major international conferences in the field of HCI, Ubiquitous computing and Haptics, including CHI, Ubicomp, UIST, MobiSys, and WHC.

↑

第68回研究会

とき: 2013/5/21 (火) 13:30 - 15:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Building better Multilingual Wordnets
講演者: Francis Bond (Nanyang Technological University)
概要: In this talk I show how we can extend semantic networks (wordnets) to more languages using wiktionary, and discuss some of the issues involved with linking different languages: in particular issues of orthography. We link existing wordnets for twenty languages to wiktionary, and produce a semantic network with over 2 million nodes. This wordnet has reasonable cover for many languages, but is still lacking in various ways. I discuss how we can start to address some of them: definitions and sense distributions, non-English concepts and more.
講演者紹介: Francis Bond: is an Associate Professor at the Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore. He received a BA in 1988, a BEng (1st) in 1990 and a PhD in 2001, all from the University of Queensland. He worked on machine translation and natural language understanding at Nippon Telegraph and Telephone Corporation from 1991 to 2006. From 2006-2009 he worked at the National Institute of Information and Communications Technology in Japan, where his focus was on open source natural language processing. He is an active member of the Deep Linguistic Processing with HPSG Initiative (DELPH-IN) and the Global WordNet Association. His main research interest is in natural language understanding. Francis has developed and released wordnets for Japanese, Malay and Indonesian and coordinates the open multilingual wordnet. He is secretary-general of the Asian Federation for Natural Language Processing.

↑

第67回研究会

とき: 2013/01/10 (木) 16:00 - 17:30
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: 大脳皮質とベイジアンネット
講演者: 一杉裕志 (産総研)
概要: 大脳皮質は脳の中で知能をつかさどるもっとも重要な部分であり，その情報処理原理の解明が強く望まれている．最近，複数の計算論的神経科学者が，大脳皮質の本質的メカニズムをベイジアンネットで説明する大変有望なモデルを提案している．大脳皮質に関する膨大な知見が，ベイジアンネットを核としたモデルにより急速に統一されつつあり，遠くない将来に人間のような高い知能を持ったロボットが出現することも，あり得ない話ではなくなってきている．
講演者紹介: 1990年東京工業大学大学院情報科学専攻修士課程修了． 1993年東京大学大学院情報科学専攻博士課程修了．博士(理学)．同年電子技術総合研究所(2001年より産業技術総合研究所)入所．プログラミング言語，ソフトウエア工学の研究に従事． 2005年より計算論的神経科学の研究に従事．

↑

第66回研究会

とき: 2012/11/22 (木) 18:30 - 19:30 (終了時刻は前後する可能性があります)
ところ: 奈良先端大情報棟 L1 教室
講演タイトル: Recursive Deep Learning in Natural Language Processing and Computer Vision
講演者: Richard Socher (Stanford University)
概要: Hierarchical and recursive structure is commonly found in different modalities, including natural language sentences and scene images. I will introduce several models based on recursive neural networks that can learn compositional meaning vector representations for phrases or images. These models obtain state-of-the-art performance on a variety of semantic language tasks such as analyzing the sentiment of movie reviews or social data, paraphrase detection and relation classification for extracting knowledge from the web. Because no language specific assumptions are made the same architectures can be used for visual scene understanding and object classification. Besides the good performance, the models capture interesting phenomena in language such as compositionality. For instance the models learn that “not good” has worse sentiment than “good”, which in turn is less positive than “very good”. Furthermore, unlike most machine learning approaches that rely on human designed feature sets, features are learned as part of the model. In the last section, I will introduce a novel dataset that allows fully supervised training and a novel evaluation of the compositional power of various recursive models.
講演者紹介: Richard Socher is a PhD student at Stanford working with Chris Manning and Andrew Ng. His research interests are machine learning for NLP and vision. He is interested in techniques that learn useful and accurate features, capture recursive and hierarchical structure in multiple modalities and perform well across multiple tasks. Most recently he developed several recursive deep learning models. In 2011, he was awarded the Yahoo! Key Scientific Challenges Award, the Distinguished Application Paper Award at ICML and a Microsoft Research PhD Fellowship.

↑

第65回研究会

とき: 2012/07/20 (金) 15:00 - 16:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Studying Collective Memories and Predictions using Large Scale Text Mining
講演者: Adam Jatowt
概要: This talk will present the results of our study on how the past is collectively remembered and the future collectively predicted through mining of large document collections. First, we show that the analysis of references to the past in news articles allows us to gain insight into the collective memories and societal views of different countries. Our work demonstrates how computational tools can assist in studying history by revealing interesting topics and correlations. Second, we analyze collective images of the future by extracting and summarizing future-related information from news articles and web pages. Such information could allow people to recognize possible future scenarios and be better prepared for future events.
講演者紹介: Adam Jatowt is as an Associate Professor at the Department of Social Informatics in Kyoto University. His research interests include: temporal information extraction, computational history, content readability, and information access to web archives. He has co-organized four international workshops on web content credibility (WICOW 2008/2009/2010 and WebQuality 2011/2012) at CIKM and WWW conferences. He has also served as PC member of SIGIR, JCDL, HT, COLING, DASFAA and AIRS conferences.

↑

第64回研究会

とき: 2012/4/27 (金) 13:30 - 15:00
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Structures in Statistical Machine Translation
講演者: Taro Watanabe (NICT)
概要: Recent advances in statistical machine translation have used more complicated latent structures in representing bilingual correspondences. Starting from a word-based model, SMT employs phrasal structures and tee structures with an approximated beam search technique in order to manage NP-complete inference problem. This tutorial introduces fundamental topics in phrase-based and tree-based models used in recent SMT, covering the problems of training, search and optimization. We will also discuss the use of more sophisticated structures in system combination. Unlike a confusion network based approach in which multiple hypotheses are encoded as a lattice structure, we propose a confusion forest approach in which multiple hypotheses are syntactically combined. Experiments indicate comparable performance to the conventional confusion network based method with smaller space.
講演者紹介: Taro Watanabe received the B.E. and M.E. degrees in informaiton science from Kyoto Univ., Kyoto, Japan in 1994 and 1997, respectively, and obtained the Master of Science degree in language and information technologies from the School of Computer Science, Carnegie Mellon University in 2000. In 2004, he received the Ph.D. in informatics from Kyoto Univ., Kyoto, Japan. After serving as a researcher at ATR and NTT, Dr. Watanabe is a senior researcher at National Institute of Information and Communications Technology. His research interests include natural language processing, machine learning and statistical machine translation.

↑

第63回研究会

とき: 2012/03/30 (金) 15:00 - 16:00 (終了時刻は前後する可能性があります)
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Parsing Spoken Language
講演者: Mari Ostendorf (University of Washington, USA)
概要: Parsing is a core natural language processing technology, but most parsing work has been developed on written text. Spoken language poses challenges for systems trained on text due to the presence of disfluencies, recognizer errors, and differences in word choice and grammatical style. With increasing interest in language processing applied to spoken documents, there are now several applications that show a benefit from parsing speech, apart from simply improving transcription accuracy. This talk surveys examples of applications that directly benefit from parsing speech, and illustrates the impact on transcription of parsing language models.We also discuss ways in which parsers can be modified to be more effective with spoken language.
講演者紹介: Mari Ostendorf is a Professor of Electrical Engineering at the University of Washington, currently also serving as the Associate Dean for Research and Graduate Studies in the College of Engineering. After receiving her PhD in electrical engineering from Stanford University, she worked at BBN Laboratories, then Boston University, and then joined the University of Washington (UW) in 1999. She has also been a visiting researcher at the ATR Interpreting Telecommunications Laboratory and at the University of Karlsruhe. At UW, she is currently an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 200 publications and 2 paper awards. Prof. Ostendorf has served as co-Editor of Computer Speech and Language, as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing, and is currently the VP Publications for the IEEE Signal Processing Society and a member of the ISCA Advisory Council. She is a Fellow of IEEE and ISCA and a recipient of the 2010 IEEE HP Harriett B. Rigas Award.

↑

第62回研究会

とき: 2011/09/08 (木) 15:00 - 16:00 (終了時刻は前後する可能性があります)
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)
講演タイトル: Biomedical Event Extraction, Joint Inference and Dual Decompostion
講演者: Sebastian Riedel (University of Massachusetts Amherst, USA)
概要: The cell is the core building block of life, and the subject of a large and ever-growing body of research publications. For life scientists it is hence becoming increasingly difficult to keep track of all information relevant to the cell processes of their interest. This in turn reduces the pace of progress in this field. In this work we show how information about cell processes, or so called biomedical events, can be automatically extracted from literature. While this task has gathered much recent attention, most work has either used a pipeline of classifiers that is prone to cascading errors, or joint models for which inference is slow and which so far have failed to yield competitive results. We present novel joint models of biomedical event extraction that address the cascading error problem and are very efficient. This is achieved through framing event extraction as a global optimization problem, and solving this problem through dual decomposition. This technique allows us to decompose the optimization problem into several tractable subproblems for which fast optimization sub-routines can be designed. Our proposed method achieves the best results on the current benchmark datasets for the task. It is also the basis of a joint UMass-Stanford entry to the 2011 Biomedical Event Extraction Shared Task. This entry ranked 1st in 3 of the 4 tasks it was submitted to.

↑

第61回研究会

とき: 2010/09/21 (火) 11:00 - 12:00 (終了時刻は前後する可能性があります)
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)

講演タイトル: Knowing the Unseen: Estimating Vocabulary Size over Unseen Samples
講演者: Suma Bhat (Beckman Institute, University of Illinois, Urbana-Champaign, USA)
概要: Empirical studies on corpora involve making measurements of several quantities for the purpose of comparing corpora, creating language models or to make generalizations about specific linguistic phenomena in a language. Quantities such as average word length are stable across sample sizes and hence can be reliably estimated from large enough samples. However, quantities such as vocabulary size change with sample size. Thus measurements based on a given sample will need to be extrapolated to obtain their estimates over larger unseen samples. In this work, we propose a novel nonparametric estimator of vocabulary size. Our main result is to show the statistical consistency of the estimator -- the first of its kind in the literature. Finally, we compare our proposed estimator with the state of the art estimators (both parametric and nonparametric) on large standard corpora; apart from showing the favorable performance of our estimator, we also see that the classical Good-Turing estimator consistently underestimates the vocabulary size.
講演者紹介: Suma Bhat received the Ph.D. degree in ECE from the University of Illinois at Urbana-Champaign in May 2010. Since then she has been a post-doctoral researcher in the Beckman Institute at the University of Illinois, Urbana-Champaign. Her research interests lie in the area of speech and natural language processing. Her doctoral research work covered techniques of automatic language assessment and theoretical analysis of natural language corpora. She also serves as a consultant with the Educational Testing Services, Princeton, USA and the Central Institute of Indian Languages, Mysore, India.

↑

第60回研究会

とき: 2009/3/27 (金) 14:30 - 17:00 (終了時刻は前後する可能性があります)
ところ: 奈良先端大情報棟 A707 号室 (A棟7F 松本裕治研究室)

今回は 2 件の講演があります．

講演 1: Description and analysis of several recent kernels on a graph, with application to collaborative recommendation and semisupervised classification
講演者: François Fouss (Facultés Universitaires Catholiques de Mons / Univeristé Catholique de Louvain, Belgium)

概要: We will first introduce seven graph kernels and two related graph matrices, namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commute-time kernel, the random-walk-with-restart similarity matrix, the regularized commute-time kernel, the Markov diffusion kernel, and the cross-entropy diffusion matrix. Graph kernels compute proximity measures between nodes that help to study the structure of the graph. The power of the kernel-on-a-graph approach will then be illustrated by comparing the nine kernel-based algorithms on a collaborative-recommendation task as well as on a semisupervised classification task on several databases.

講演 2: The randomized shortest-paths approach and its applications
講演者: Marco Saerens (Universite Catholique de Louvain, Belgium)

概要: This presentation will introduce the concept of randomized shortest-paths (RSP) and some of its applications. Designing a RSP strategy aims to compute the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). The global level of randomness of the policy is quantified by the expected Shannon entropy spread throughout the network, and is provided a priori by the designer. By taking a sum-over-paths statistical physics framework, it is shown that the unique optimal policy (transition probabilities) can be obtained by solving a simple linear system of equations.
Applications of this technique are currently investigated; for instance

o Defining a RSP edit-distance taking all alignments into account and favoring good alignments;

o Defining a RSP dissimilarity between nodes of a graph intermediate between the shortest-path distance and the commute-time (or resistance) distance;

o Defining a RSP covariance between nodes of a graph where two nodes are correlated when they often co-occur on the same paths;

o Defining a re-estimation method for hidden Markov models intermediate between the Viterbi and the Baum-Welch algorithms.

講演者略歴: François Fouss received the B.S. degree in management sciences in 2001, the M.S. degree in information systems in 2002, and the Ph.D. degree in management sciences in 2007, all from the "Université catholique de Louvain" (UCL), Belgium. In 2007, he joined the "Facultés Universitaires Catholiques de Mons" (FUCaM), Belgium, as an assistant professor in computing science. His main research interests include data mining and machine learning (more precisely, collaborative recommendation, link-analysis, and network analysis).

Marco Saerens received the B.Sc. degree in physics engineering and the M.Sc. degree in theoretical physics, all from the Universite Libre de Bruxelles (ULB). After graduation, he joined the IRIDIA Laboratory (the artificial intelligence laboratory, Universite Libre de Bruxelles (ULB), Belgium) as a research assistant and completed a Ph.D. degree in applied sciences. While remaining a part-time researcher at IRIDIA, he then worked as a senior researcher in the R&D department of various companies, mainly in the fields of speech recognition, data mining, and artificial intelligence. In 2002, he joined the Universite catholique de Louvain (UCL) as a professor in computer sciences. His main research interests include artificial intelligence, machine learning, data mining, pattern recognition, and speech/language processing.

         第５９回研究会

講演タイトル：Enhanced Information Access to Troubleshooting-oriented Web User Forum Data
講演者：Timothy Baldwin (Melbourne University)
日時：2008年12月12日（金）16時～17時
場所：ATR Large Conference Room

Abstract:

The ILIAD (Improved Linux Information Access by Data Mining) Project is an
attempt to apply language technology to the task of Linux troubleshooting by
analysing the underlying information structure of a multi-document text
discourse and improving information delivery through a combination of
filtering, term identification and information extraction techniques. In this
talk, I will outline the overall project design and present results for a
variety of thread-level filtering tasks.

         第５８回研究会

講演タイトル：Large-Scale Automatic Set Expansion
講演者：Patrick Pantel (Yahoo and USC ISI)
日時：2008年10月6日（月）13時30分～15時
場所：奈良先端大　情報科学研究科A棟　A707室

Abstract:

Similarity modeling is a key task in computational lexical semantics
for finding word senses, concepts, paraphrases, topics, and
distributional synonyms, just to name a few. In this talk, we present
a flexible Map/Reduce infrastructure for very large-scale unsupervised
and semi-supervised learning and show its application to the task of
automatic set expansion using corpus statistic from a large web crawl.
A detailed empirical study is presented supporting the following
claims: i) corpus size matters &#8211; larger corpora yield significantly
better expansion performance; ii) corpus quality matters &#8211; Wikipedia,
a high quality text corpus, yields comparable performance to a lower
quality Web crawl of 60 times the size; iii) seed selection matters &#8211;
choosing various seed sets of a fixed size yields highly varying
performance; and iv) seed size matters &#8211; no more than 5 to 20 seeds
are needed to achieve high expansion recall, whereas seed set sizes of
1 and 2 lead to unpredictable performance.

          第５７回研究会

タイトル：Modelling and Managing Dialogue -- Approaches and Challenges
話者：David Schlangen (Potsdam University, Germany)
日時：２００４年９月２２日(水) 16:00-17:30
会場：ＡＴＲ地下０１会議室（連絡先 genichiro.kikui@atr.jp）

概要：
Arguably, the most natural setting for language use is direct
face-to-face dialogue. While there always has been interest
in computer systems that can enter into "dialogues" with human users,
only recently have such practical considerations been combined with
renewed theoretical efforts to model the dynamics of dialogue,
mostly through the developement of the "information-state update"
framework.

In this talk, I will first briefly review the unique challenges that
the task of modelling natural dialogue entails, and then review the
three main approaches to dialogue management that have been taken:
structured dialogues, plan-based reasoning, and information-state
update. In this context I will discuss work I have been doing with
colleagues at the University of Edinburgh and the University of
Potsdam.

I will describe the "PotBot" tourism information system that
is under developement at the University of Potsdam, which roughly
falls under the "structured dialogue" rubric, but is novel insofar as
it uses the structure inherent in the topic that is being discussed
to structure the space of possible dialogues. Then I will describe
the information-state update system "RUDI", which is an experimental
dialogue system based on an implementation of the dynamic semantics
theory SDRT (Asher and Lascarides 2003), that handles (in a certain
domain) context sensitive aspects of interpretation in dialogue such
as bridging references (computing the references of definite
descriptions) and resolution of fragments (i.e., non-sentential
utterances like B's in 'A: Who came to the party? -- B: Peter.'). The
system also deals with one aspect of grounding, namely detecting and
signalling understanding problems.

             第５６回研究会

    場所： 奈良先端大 情報科学研究科A棟7階 A707ゼミ室
    日時： 2004年7月2日(金) 17:00-18:30
  講演者： 和泉 絵美（NiCT）
タイトル： The NiCT JLE Corpus 
　　　　　　英語学習者コーパスの構築と英語教育への利用
    概要：
NiCT にて3年間を費やして構築され，今年の秋に一般公開を予定し
ている，日本人英語学習者発話コーパス "The NiCT JLE Corpus"に
ついて，その概要および英語教育への利用可能性について発表する．
本コーパスは，日本人英語学習者約1300人の発話のエラータグ付き
書き起こしテキストによって構成されている．エラータグ付与は，
文法的・語彙的誤りを対象に設計されたエラータグセットを使って
行った．このようなエラータグ情報は，各学習者データに付与され
ている

9段階の習熟度レベル情報と合わせて解析することにより，学習者
言語の発達段階の解明に有効であると考えられる．本発表ではまず，
このエラータグ付与を中心に，本コーパスの構築過程について報告
する．また，一般公開に先駆け，NiCT にて進めているいくつかの
分析・実験の紹介も行う．学習者言語の基礎的な分析として，学習
者の冠詞習得や連語使用傾向の研究を，学習者言語の機械処理およ
び将来的な学習支援システムの開発への模索として，機械学習を利
用した，自動誤り検出実験を行った．これらの分析・実験を通して，
本コーパスが第二言語習得・教育システム開発・言語処理研究にど
のように活用できるかについて検討する．

             第５５回研究会

    場所： NTTコミュニケーション科学基礎研究所 2F交流コーナー
    日時： 2004年6月4日(金) 16:00-17:30
  講演者： 野本忠司（国文学研究資料館）
タイトル： Confidence Models for Multi Engine Machine Translation
    概要：
With an increasing number of vendors putting their MT systems on the
market, I will explore an interesting question of whether it is
possible to somehow combine them to get a super MT system that beats
every one of them.  In particular I am interested in working with
proprietary MTs to build a composite MT system, which obviously means
that we are not permitted algorithmic details of how each MT operates.

Much of the past work on multi engine machine translation (MEMT)
concentrates on working  with "glass box" MTs, or those that allow you
some glimpse into the system. Frederking and Nirenburg (1994) develop
an MEMT system which operates by combining outputs from three
different engines, based on the knowledge it has about inner workings
of each of the component engines; which  Brown and Frederking (1995)
take further by adding a ngram-based mechanism for translation
selection.  By contrast, Nomoto (2003) sets out to pursue a line of
research whose goal is to use black box MTs for MEMT.

In the talk, I will review the approach by Nomoto (2003) and discuss
how we might improve on it by the extensive use of language model.

(The talk is in Japanese.)

             第５４回研究会

    場所： ＡＴＲ地下０１会議室
    日時： 2004年5月7日(金) 17:00-18:30
  講演者： Timothy Baldwin（CSLI, Stanford, NTTCS招聘研究員）
タイトル： Acquisition of Multiwod Lexical Entries from Corpora
    概要：
I will discuss a series of techniques to identify English
verb-particle constructions (VPCs) in the British National Corpus, and
classify them according to the lexical hierarchy of the English
Resource Grammar (ERG). VPC identification is based on the outputs of
shallow pre-processors, including a suite of taggers, a chunker, a
chunk grammar and a dependency grammar. I form preliminary
classification of lexical type based on these pre-processors, and then
validate the hypotheses according to the ERG, based on full parse
analyses and partial parse data contained in the chart. I then use
classifier combination and annotated data from the Wall Street Journal
and Brown Corpora to make the final determination of VPC type.

             第５３回研究会

    場所： 奈良先端大 情報科学研究科A棟7階 A707ゼミ室
    日時： 2003年10月23日(木) 17:00-18:30
  講演者： 渡辺太郎（ATR音声言語コミュニケーション研究所）
タイトル： 用例検索に基づいた統計的機械翻訳
    概要：
統計的機械翻訳におけるデコーダは、一般的に単語単位、あるいはフレーズ単
位に翻訳仮説を生成し、翻訳モデルと言語モデルによるスコアが最適になる解
を求めている。ところが、日英など文法構造が非常に異なる言語対では、複雑
な単語アライメントにより最適解を得ることは困難であった。本発表では、入
力文に最も近い文を対訳コーパスから検索し、その対訳を「種文」として、ス
コアが最適になるように修正を加える、という用例検索に基づいたデコーダを
説明する。英日中韓の四言語の双方向の実験の結果、単語単位に翻訳を生成す
るビームサーチよりも良い結果が得られた。
また、本発表では、統計的機械翻訳のチュートリアル、および最近の研究動向
についても報告をする。

               第５２回研究会

    場所： ＡＴＲ地下０１会議室
    日時： 2003年6月10日(火) 17:00～18:30
  講演者： 成山 重子（メルボルン大／奈良先端大）
タイトル： 口語英語に現れる主語省略の意図理解
	    Implicatures of subject ellipsis in informal English
    概要：
　書き言葉が中心に記述されている従来の英語文法のためか、話し言葉では
英語でも主語省略がよく出現することはあまり知られていない。直感的には、
	'(I) got in late' (平叙文での1人称主語省略)
	'(You) got in late?' (疑問文での2人称主語省略)
と、考えられているが、これは断片にすぎない。
　本稿では、コ－パス分析から、主語省略の出現傾向と話者の意図を中心に
考察する。特記すべき事は、省略のない文に比べ主語が省略されている文は、
言外の含みを醸しだすことである。また、省略や代名詞などの照応により文
の結束性が高まるといわれているが、英語の主語省略には、会話継続拒否の
意図が現れることもある。　

Abstract:
Despite the common belief, perhaps  by virtue of little study on the
issue, subject ellipsis (implicit  subject) in finite clauses occurs
rather frequently in informal English, particularly in dialogues and
causal  letters.  Based  on  a  small corpus  analysis,  this  paper
delineates   various   features,   constraints,   and   connotations
surrounding  subject ellipsis  in  English from  the  view point  of
understanding the speakerユs intention  behind its use. The analysis
shows that subject ellipsis operates  in a more complicated way than
what is commonly believed; namely, it is not restricted to the type,
as in ヤ(I) got in lateユversus ヤ(You) got in late?ユ with a rising
interrogative intonation.  The analysis shows  that subject ellipsis
occurs in the casual register (in terms of the content of the speech
and the  speech participants) with  first person as  the predominant
referent, and is triggered by anaphora and conventional expressions.
Moreover, sentences with subject  ellipsis give rise to connotations
different  from those  given  by the  corresponding full  sentences,
implying evasive  and dismissive motives  and therefore discouraging
responses.  Hence, although  it is  often  claimed that  the use  of
anaphors, of  which ellipsis is a part,  creates discourse cohesion,
this study  concludes that discourse cohesion is  often suspended by
the presence of subject ellipsis.

             第５１回研究会

場所：NTTコミュニケーション科学基礎研究所 2F交流コーナー
日時：2003年5月15日(木) 17:00～18:30
講演者：伊藤敬彦 (奈良先端大)
タイトル：多数の尺度を用いた参考文献の同定

概要：
ある文献が他のどの文献を参照しているか、という文献の参照情報
は、その文献の参考文献一覧の各一文（参考文献文）が指し示して
いる文献を、文献データベース（著者、題目、掲載誌等からなる文
献を表す文献データ集合）中から同定することで獲得できる。この
同定を、参考文献文と文献データの単なる文字列の完全一致判定で
行なうことはできない。参考文献文に表記の多様性や誤りが存在す
るためである。
本発表では参照情報の自動獲得手法を提案する。始めに、単一のベ
クトル空間とその上での類似度を用いて粗く候補を絞る。次に、参
考文献と文候補が同一の文献であるかを多数の尺度に基づく類似度
を特徴量として判定する。多数の尺度それぞれの重みを人手でつけ
ることは現実的ではないため本稿ではサポートベクターマシーンを
用い、各尺度の最適な重みを自動で算出した結果、F値0.992が得ら
れた。

             第５０回研究会

    場所： ATR地下０１会議室
    日時： 2003年4月17日(木) 16:30～18:00

講演者: 藤田 早苗（NTTコミュニケーション科学基礎研究所）
題名：結合価辞書の拡張方法

要旨：
下位範疇化構造や選択制限の情報をもつ結合価辞書は自然言語処理のほとんど
あらゆる分野で有用である。しかし、結合価辞書を人手で獲得するのは時間と
コストがかかり、自動的に獲得すると精度が保証できないという問題がある。
そこで我々は、人手で作成した結合価辞書を種として用い、自動あるいは半自
動的に拡張する方法を研究している。本発表では、(1)対訳辞書を利用した半
自動的拡張方法、(2)交替情報を利用した自動的拡張方法、について述べる。
これらの手法により、既存の結合価辞書の動詞見出し語異なりを、これまでの
約5,000から約7,600へと約1.5倍に増やすことができた。

             第４９回研究会

    場所： 奈良先端大 情報科学研究科A棟7階 A707ゼミ室
    日時： 2003年2月20日(木) 17:00～18:30

   講演者: 丸山岳彦 (ATR音声言語コミュニケーション研究所)
 タイトル: 話しことばに現れる「ですね」の分析
     概要: 
「ですね」という表現は，典型的には，文末にコピュラ(繋辞)として現れる．
しかし，自発的な話しことばの中では，「ですね」が文中に現れることがある．
文中に現れる「ですね」は，さまざまな処理の過程で問題となる．例えば，文
中に「ですね」を多く含む文に対して，形態素解析・統語解析・翻訳処理を正
しく行なうことは困難である．また，句点が含まれないコーパス (例えば「日
本語話し言葉コーパス」) に「ですね」が頻出した場合，文末の位置を機械的
に特定することは難しい．本発表では，二つの種類の「ですね」について，複
数の話しことばコーパスを用いて，出現数の分布，運用上の特性，出現位置，
フィラーとの違いなどを分析する．そして，両者の生起位置の分布が大きく異
なること，発話者によって出現数に大きな偏りがあること，文中に生起する
「ですね」はフィラーと極めて似た生起位置の分布を示すこと，などを示す．

      第４８回研究会のお知らせ

場所： NTTコミュニケーション科学基礎研究所 2F交流コーナー
       （世話人：前田さん(maeda@cslab.kecl.ntt.co.jp)）
日時： 2002年11月21日(木) 17:00～18:30
講演者: 持橋大地(NAIST)
タイトル: 意味の確率的空間

   概要：
Saussureが述べている通り意味には構文的なものと語彙的なものがあるが,
語彙の意味の確率的なモデリングは, 最近重要性を増しているにも関わらず
構文の確率的な解析に比べると遅れている.
本研究では PLSI (Hofmann1999)の方法を用い, 単語の意味(概念)を隠れクラスへの
確率帰属分布として表現し, EMアルゴリズムによってパラメータを推定する.
このとき, 単語間の意味的類似度はKLダイバージェンスを用いて自然に
表現することができる.
Penn Treebank コーパスに対し実験を行い, LSI (Deerwester et al. 1990)の
ベクトル空間によるモデルより本モデルの性能が優れていることを示す.
このとき次元数の決定が大きな問題となるため, AICを用いた最適次元数決定に
ついて述べ, また共起窓の定義による性能の違いについてふれる.
単語の意味を確率分布として考えるとき, 文脈priorのもとでの意味的な予測分布は
Maximum Entropy の立場から統計力学的に考えることができる. この予測分布を
用い, 現在研究中の意味的な言語モデルについて述べる.
単語の意味を潜在的クラス分布として考えるとき, その行列表現により
意味的コミュニケーションの Shannon モデルを考えることができる.
このとき, モデルを確率的なPCAと捉えることで, 「表現」と「理解」はそれぞれ
意味的な encoding, decoding として考えられる.

              第４７回研究会

    場所： ATR地下01会議室
    日時： 2002年10月17日(木) 17:00～18:30

   講演者: 平尾努（NTT)
 タイトル: 高性能な重要文抽出手法の実現について
     概要: SVMを用いた重要文抽出手法について説明し，TSCのコーパスを用いた評
           価結果について報告する．さらに，文書のジャンルによって重要文抽出
           に有効な素性が異なることを明らかにする．

   講演者: 賀沢秀人（NTT)
 タイトル: 順位づけ学習で重要文抽出は可能か？
     概要: 平尾らが提案したSVMによる順位づけ関数の学習手法[1]を、一般化され
           た順序統計量という観点から正当化する。また、平尾法の改善として
           OrderSVMという新しい順位づけ学習手法について説明し、TSCの重要文
           抽出タスクに適用した結果について報告する。[2]
	
           [1] Hirao et al., "Extracting Important Sentences with 
               Support Vector    Machines," COLING-2002

           [2] 賀沢他, "順位づけ学習問題：順位つきサンプルを用いた順序関係推定,"
               信学会 PRMU研究会 2001/9/19

                第４６回研究会
     場所： 奈良先端大学 情報科学研究科 Ａ棟 707室
     日時： 2002年9月19日(木) 17:00～18:30
   講演者： 馬　青[MA Qing]（CRL）
 タイトル： 日本語・中国語意味マップの自己組織化

     概要：
単語間の意味的類似度を計算する技術は，多義語の曖昧性解消をはじめ多くの自
然言語処理のタスクに応用できる．情報検索など多数の実応用には単語間の類似
度だけでなく単語のグローバル的なソーティングが必要になる．これについては，
従来から種々のクラスタリング技術を用いて行われていたが，クラスタリング手
法はその結果に可視性を欠くなどの問題がある．そのために，クラスタリングの
代わりに単語を，意味的類似性を距離とする連続した意味空間（つまり，意味的
に近い単語どうしは近いところに，意味的に遠い単語どうしは離れたところに配
置されるような空間）にマッピングする技術が必要である．このような意味空間
にマッピングされた結果を意味マップと呼ぶ．今回は日本語及び中国語の名詞の
意味マップの自己組織化について紹介する．

                第４５回研究会
     場所： NTTコミュニケーション科学基礎研究所
     日時： 2002年7月18日(木) 17:00～18:30
   講演者： 新保 仁, 堀部 史郎（奈良先端大）
 タイトル： 論文の参照情報の自動推定
	    -- 柔軟な論文検索と論文評価システムの作成へ向けて --

     概要：
論文がどのような論文を引用しているかを同定し、参照関係の情報
を持つ論文データベースを構築することは、参照関係を利用した柔
軟な論文検索に有用であるとともに、citation indexなどの論文の
客観的な評価指標の計算にも役立つ。現在、OCR読み取りされた論
文のreference部分と既存の論文データベース内の論文の書誌情報
の近さを測ることによって、参照論文の自動推定がどの程度可能か
を実験により調査している。本発表では、この研究の目的と現状に
ついて報告する。

                第４４回研究会
     場所： ATR地下０１会議室
     日時： 2002年6月20日(木) 16:00～18:00
   講演者： 松本　賢司（ATR 音声言語コミュニケーション研究所）
 タイトル： 機械翻訳装置と会社情報を利用した日英記事間の自動対応付け手法

     概要：

日英機械翻訳研究に利用可能な大規模対訳コーパスの構築を目的として、
日本語新聞記事と英文翻訳記事間の対訳関係を特定する。対象とする日
英記事間にはほぼ忠実な翻訳関係があるが対応情報はない。市販の機械
翻訳装置と記事から得られる会社名などの情報を利用して、これらの記
事を自動的に対応付けする。新聞記事と放送ニュース記事に対して行なっ
た対応付け実験の結果を比較して、新聞・放送記事コーパスの性格の違
いについて論じる。最後に日英対訳コーパス構築の現状について報告す
る。

                第４３回研究会
     場所： 奈良先端科学技術大学院大学 情報科学研究科 Ａ棟 707
     日時： 2001年10月19日(金) 15:15～17:00
   講演者： Stuart C. Shapiro(University at Buffalo)
 タイトル： SNePS: A Logic for Natural Language Understanding
              and Commonsense Reasoning
     概要：
My colleagues, students, and I have been engaged in a long-term
project to build a natural language using intelligent agent.  While
our approach to natural language understanding (NLU) and commonsense
reasoning (CSR) has been logic-based, we have thought that the logics
developed for metamathematics are not the best ones for our purpose.
Instead, we have designed new logics, better suited for NLU and CSR.
The current version of these logics constitutes the formal language
and inference mechanism of the knowledge representation/reasoning
(KRR) system, SNePS 2.5.  SNePS is a constantly evolving system that
implements our evolving theory of how to build a computational,
natural language using, rational agent that does commonsense
reasoning.  In this talk, I will survey several ways in which the
SNePS logic has been designed to be more appropriate for NLU and CSR
than the standard First Order Predicate Logic

(This talk is based on Stuart C. Shapiro, "SNePS: A Logic for Natural
Language Understanding and Commonsense Reasoning," in Lucja Iwanska &
Stuart C. Shapiro, Eds., Natural Language Processing and Knowledge
Representation: Language for Knowledge and Knowledge for Language,
AAAI Press/The MIT Press, Menlo Park, CA, 2000, 175-195.)

                第４２回研究会
     場所： ＡＴＲ音声言語通信研究所 第一・第二打合せ室
     日時： 2000年7月28日(金) 16:00～18:00
   講演者： 工藤 拓、宮田 高志 (奈良先端大学院大学)
 タイトル： Support Vector Machine による日英係り受け解析
     概要：
機械学習アルゴリズムの一種である、Support Vector Machine を使った日本
語および英語の係り受け解析について報告する。Support Vector Machine は
Perceptron と同じように、n 次元実数ベクトルで表される正例および負例の
集合を学習データとして受け取り、それらを分離する n 次元超平面を求める
分類器である。Kernel Method とよばれる方法により各次元に依存関係がある
場合でも学習が可能であり、マージン最大化という学習戦略をとっているため
に過学習をおこしにくいという特長をもつ。

本発表では、[内元ら1999]での素性をほぼそのまま用いて、日本語については
京大コーパス2.0を、英語については Penntreebank を依存木に変換した結果
をそれぞれ学習データとした結果について報告する。

                第４１回研究会
     場所： 奈良先端科学技術大学院大学 情報科学研究科 Ａ棟 707
     日時： 2000年6月28日(水) 16:00～18:00
   講演者： 熊野 正 (ATR音声言語通信研究所)
 タイトル： NHKにおける翻訳支援システムの研究開発
     概要：
NHKでは、日々大量に発生している日英ニュース翻訳業務を支援するための翻
訳支援システムの研究を続けている。この研究の中間成果として、2000年6月
より「用例提示システム」を翻訳現場に実用導入し、運用を開始した。これは
過去に翻訳された日英ニュース原稿対を蓄積したデータベースを用い、入力さ
れた表現に類似した表現を持つ文を含むような記事を検索してその訳を記事対
の単位で提示することで、翻訳作業を支援するものである。本講演では、今回
開発したシステムについて、その開発方針や技術上の特徴について説明し、現
在の運用の状況やユーザの反応を紹介する。

また、本研究が目指している「統合翻訳支援環境」のコンセプトについて説明
する。これは、「対訳エディタ」を中心にして、翻訳作業に有用な種々の情報
資源の利用を統合・統一化することで、より高度な支援環境を構築していくも
のである。

                第４０回研究会
     場所： ＡＴＲ音声言語通信研究所 第一・第二打合せ室
     日時： 2000年5月24日(水) 16:00～18:00
   講演者： 浅原 正幸 (奈良先端大)
 タイトル： 統計的形態素解析器のための辞書作成ツール
     概要：
     本発表では、現在開発している統計的形態素解析の辞書作成ツールの概要につ
     いて報告する。統計的形態素解析の一般的な手法の一つに隠れマルコフモデル
     がある。通常の隠れマルコフモデルでは、例外的な言語現象に対応できないな
     ど、依然改善の需要がある。本ツールでは、例外的な言語現象に対応するため
     に、語彙レベルの統計量の利用、各件で異なるグループ化、選択的 Tri-gram
     モデルなど複数の拡張を行った。
     また、これらの拡張モデルのパラメータ設定には、例外的な言語現象の同定が
     必要となる。この問題に対処するために、現在取り組んでいる、誤り駆動によ
     るモデルの拡張についても報告する。

                第３９回研究会
     場所： 奈良先端科学技術大学院大学 情報科学研究科 Ａ棟 707
     日時： 2000年4月19日(水) 16:00～18:00
   講演者： 佐々木 裕 (NTTコミュニケーション科学基礎研究所)
 タイトル： 帰納論理プログラミングによる情報抽出規則の学習について
     概要：
         本発表では、帰納論理プログラミング(ILP)を情報抽出規則の学習
         に適用する手法とその実験結果について報告する。情報抽出は高度
         に分野依存であり，適用分野を変えるごとに情報抽出規則を再構築
         しなければならないという問題があった．この問題を解決するため
         情報抽出規則を例から学習する研究(AutoSlog,LIEP,RAPIER,etc.)
         が盛んに行なわれてきた．本研究では，適度な一般化レベルの規則
         を学習するために，階層化されたソート（または型）を効率的に扱
         える ILP を開発し，このILPにより新聞記事を単文単位の格フレー
         ムに変換した結果から情報抽出規則を学習するアプローチをとる．

                第３８回研究会
     場所： ＡＴＲ音声翻訳通信研究所 第一・第二打合せ室
     日時： 2000年2月16日(水) 16:00～18:00
   講演者： 山本 薫 (奈良先端大 情報科学研究科)
 タイトル： 統計的な係受け情報を使った対訳表現抽出について
     概要：
         本発表では、統計的な係り受け解析結果を用いた対訳表現の抽出方
	 法と実験結果を報告する。対訳表現抽出は、1990年代盛んに研究さ
	 れた。対応した文をそれぞれ解析し構造照合を行うトップダウン手
	 法や形態素をもとにn-gramを作成し、対応する語を抽出するという
	 ボトムアップ手法があった。

	 構文解析技術の向上により、統計的な係り受けパーザが入手可能に
	 なった。そこで、我々は、文の部分的な係り受け関係に着目し、文
	 節レベルでの対訳表現を抽出する方法を実験してみた。ボトムアッ
	 プ手法の実験結果と比較し、本手法の問題点についても報告する。

                第３７回研究会
     場所： 奈良先端科学技術大学院大学 情報科学研究科 Ａ棟 707
     日時： 2000年1月19日(水) 16:00～18:00
   講演者： 塚田 元 (ATR音声翻訳通信研究所 第一研究室)
 タイトル： ヘッドトランスデューサを用いた統計的機械翻訳とその評価
     概要：
	確率的ヘッドトランスデューサを用いた統計的機械翻訳手法
	およびそれを実装したシステムの評価実験について述べる．
	本翻訳手法は，対訳コーパスから同期依存木(synchronized 
	dependency tree)を自動学習するもので，日本語・英語間の
	ように構文的に大きくことなった言語間の統計的翻訳を可能
	にする．また，学習データを増やしながら，クローズドおよび
	オープン・テストの両方を行う評価手法により，本システムの
	限界性能と必要な学習データ量を実験的に明らかにする．
	その結果，限定されたタスクであれば現実的な学習データ量で，
	かなり高い翻訳性能が得られることがわかった．

                第３６回研究会
     場所： NTTコミュニケーション科学基礎研究所
     日時： 1999年12月15日(水) 16:00～18:00
   講演者： 松本裕治、浅原正幸、山下達雄 (奈良先端大)
 タイトル： 形態素解析のための統計学習システムおよび
            ブースティングを利用したコーパスとモデルの改良について
     要旨：
統計的形態素解析のパラメータ学習のため、可変長マルコフモデル
(現在はbigramとtrigramの混合)、品詞のグループ化、語彙と品詞の
スムージングなどを取り入れた学習プログラムを作成している。
その概要について述べる。また、統計学習において問題となるコーパ
スのエラー、データの過疎性に対応するため、ブースティングを利用
して問題箇所の洗い出しを効果的に行う実験を行ったので、その結果
についても報告する。

                第３５回研究会

     場所： 奈良先端科学技術大学院大学 情報科学研究科 Ａ棟 707
     日時： 1999年11月19日(金) 16:00～18:00
   講演者： 岩本 秀明 (ATR音声翻訳通信研究所 第四研究室)
   題目：   発話間の単語共起を利用した音声言語処理について
   概要：
  本研究では、対話における発話間の単語共起を統計的に学習した言語モデル
（以下、次発話予測モデル）を用いて、音声認識精度の改善を目指している。
  次発話予測モデルの perplexity は、音声認識で用いる従来の言語モデルの 
perplexity の半分強となった。認識結果に対する初期的な実験では、それぞ
れのモデルが与えるスコアを適当に重み付けし、error rate を 3% 削減した。
また、この実験結果から、話題語への改善効果が高いことがわかった。
  この実験について報告するとともに、実験結果の分析に基づいて、認識モデ
ルと予測モデルとを利用した話題遷移解析の可能性について述べる。

                第３４回研究会

     場所： ＡＴＲ音声翻訳通信研究所 第一打合せ室
     日時： 1999年10月20日(水) 16:00～18:00
   講演者： Md Maruf Hasan (奈良先端大)
 タイトル： Document Clustering: Before and After
            Singular Value Decomposition
     要旨：
To facilitate efficient navigation through the documents in an
archive, clustering is often used to represent and visualize the
document collection. On the other hand, to retrieve relevant documents
for a particular query, Term-Document-Matrix of tf.idf (or the like)
is commonly used to represent both documents and queries. Retrieval is
done by computing similarity between queries and documents.

In Latent Semantic Indexing (LSI) based retrieval,
Term-Document-Matrix is further decomposed into a reduced
dimension. Decomposition is done using Singular Value Decomposition
(SVD). SVD performs terms-to-concepts mapping.

The Information Need (IN) of a user can be better addressed by
providing the user with supports of both navigation and retrieval. We
addressed clustering of a document collection by using the
Term-Document-Matrix. Clustering algorithm is applied (before and
after the Singular Value Decomposition) to the
Term-Document-Matrix. The ultimate goal is to explore efficient way of
visualization, navigation and retrieval of documents in an archive.

キーワード：
Information Retrieval, Clustering, Document Representation and
Visualization

                第３３回研究会

     場所： ＮＥＣ関西研究所 コミュニティホール
     日時： 1999年9月16日(木) 16:00～18:00
   講演者： 平　博順 (NTT コミュニケーション科学基礎研究所)
 タイトル： Support Vector Machineを用いたテキスト自動分類
     要旨：
　本発表ではSupport Vector Machine (SVM)を用いたテキスト自
動分類とその属性選択について述べる。テキスト分類規則の自動獲
得において問題になるのが大量の単語属性である。高い分類精度を
達成するためにキーワードとなる単語属性を大量に用いると、過学
習が起こりやすく計算時間も増大してしまう。
　SVMは最近注目されているマージンを用いた学習手法の一つで大
量の入力属性に対しても過学習しにくいことが文字認識等の分野で
明らかになっており、テキスト分類において高精度の結果が得られ
ることが期待される。われわれは日本語テキストを対象にSVMを用
いた分類実験を行い従来の学習手法より高い分類精度を得た。さら
に、精度を向上させるため品詞を基準とした属性選択を行い分類精
度が向上することを明らかにした。

                第３２回研究会

   場所：NTTコミュニケーション科学基礎研究所 会議室
   日時：１９９９年７月１３日(火)午後４時～６時 
 講演者：竹内 和広 (奈良先端科学技術大学院大学)
   題目：自動要約を視野にいれたテキスト構造解析実験
   概要：
	  近年、計算機を用いて自動的に要約を作成する試みがさかんである。
	そのような試みの一つに、自動要約の前処理としてあらかじめ被要約
	テキストの構造解析を行い、その解析情報を利用する手法がある。
	  発表者らは、そのようなテキスト構造を利用する要約を考える上で、
	どのような構造解析が日本語において適当かを検討するため、テキスト
	構造解析用のタグおよびタグ付けシステムを試作し、実際に人間による
	テキスト構造解析実験を行っている。
	  今回は、その中間報告として、試作したタグ、作業者間のタグ付けの
	一致をもとにした評価、およびそれらから得られた知見を紹介する。

                第３１回研究会

  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
  日時：１９９９年６月１６日(水)午後４時～６時 
 講演者:石川 開(ATR音声翻訳通信研究所 第三研究室)
   題目:テキストコーパスを用いた音声認識誤り訂正手法
   概要:
	音声認識誤り訂正手法に関して述べる。人間は“聞き間違えた部分を聞きなれ
	た表現を使って直す”訂正を行っていると考えられる。我々は誤り部分を含む
	単語列に対して用例文集をもとに訂正し、意味と音韻の観点から妥当性を判断
	する誤り訂正手法を提案する。
	この手法は、（１）入力となる音声認識結果の誤り部分を意味距離計算によっ
	て特定し、（２）音韻的に類似した用例文を用いて部分的な訂正を行い、（３）
	訂正結果の文全体に対して意味距離を再計算することによりその妥当性を判断
	する。
	本提案手法を音声翻訳システムに組み込んで評価した。旅行会話３３７文に対
	して、認識結果、訂正結果の正解率を比較した結果、６５％、７５％となり、
	提案手法の有効性を確認できた。

                第３０回研究会

  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
  日時：１９９９年３月３０日午後１時～
 講演者: 李 航  (NEC C&Cメディア研究所)
   題目: ESCに基づく確率的決定リストを用いた自然言語処理
   概要:
    本講演では、我々が最近開発した、ESC に基づく確率的決定リストの学習法、
    及び確率的決定リストを用いた自然言語処理法について紹介する。確率的決定
   リストとは、判別、分類の決定を表すIFーTHENタイプのルールからなる表現形
   のことである。我々の決定リストの学習法の最大の特徴は ESC (Extended
   Stochastic Complexity、 拡張型確率コンプレキシティ)という量の最小化を
   モデル選択の規準として用いることである。ESC最小化規準は、MDL原理の拡張
   として提案されたもので、判別や分類のための学習により適している規準であ
   る。我々の方法で学習できた決定リストは、語彙曖昧性解消、テキスト分類、
   情報抽出等多くの自然言語処理問題に適用でき、また、高精度の処理を実現す
   ることができる。講演の前半では、自然言語学習問題の定式化を試み、MDLと
   ESCの紹介を行う。後半では、ESCに基づく確率的決定リストの学習法を説明す
   る。

                第２９回研究会

  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
  日時：９８年４月７日午後２時～ 
発表者：Mark Keane (University of Dublin, Trinity College, Dublin, Ireland)
  題目：The Constraint Theory of Concept Combination
  概要：
Costello  (1996; Costello & Keane,  1997)  has proposed a constraint
theory of conceptual combination.   This theory tries to explain how
mental representations for  novel  noun-coun combinations are  built
(e.g.,  horse vehicle, grass stone).  In  English, over  half of the
new terms entering the language are based on such novel combinations
of existing concepts (e.g., laptop  computer, soccer moms).  In this
talk, I outline  the details   of  our computational model of   this
phenomenon (explaining how it beats the intractable combinatorics of
the task) and  outline the psychological  evidence we have found  in
favour of it.  I also show how the  theory and model advances beyond
current theories in the literature.

Costello, F. (1996) Polysemy in Cocneptual Combination.
Unpublished PhD Thesis.

Costello, F. & Keane, M.T. (1997). Polysemy in conceptual
combination: Testing the constraint theory of combination,
Proceedings of the Nineteenth Annual Conference of the Cognitive
Science Society. Hillsdale, NJ: Erlbaum.

                第２８回(臨時)研究会

  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
  日時：９７年９月１日(月曜日)１１:００～
発表者：Remi Zajac（ニューメキシコ州立大学）
  題目：Ontologies and Machine Translation

                第２７回研究会

  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
  日時：９７年５月８日(木曜日)１１:００～１２:３０
発表者：関根聡（ニューヨーク大学）
  題目：
  概要：
  コーパスベース英語パーザー (Apple pie parser)    30min
  コーパスベース日本語パーザー (Cherry pie parser) 15min
  情報抽出(MUC)の簡単な紹介                        20min
  New York Universityでの活動の簡単な紹介           5min

                第２６回研究会

  場所：松下電器産業(株) 中央研究所
  日時：９７年２月１８日（火曜日）４：００～６：００

発表者：古瀬 蔵(ATR音声翻訳通信研究所)
  題目：ATRの言語翻訳研究
  概要：ATRの言語翻訳研究において，話し言葉や多言語を扱ううえでの
        いくつかの問題点などについて簡単に紹介する。

                第２５回研究会

  日時：９７年１月２７日(月) 午後４時～午後６時
  場所：ATR音声翻訳通信研究所 第２打合せ室
発表者：宮田高志(奈良先端大)
  題目：自然言語処理における推論の制御に関する研究
	(A Study On Inference Control In Natural Language Processing)
　概要：
	自然言語対話システムは、統語論・意味論・語用論など
	のさまざまな制約を柔軟に用いて処理を進める必要があ
	る。この目的のために用いられてきた「制御に関する多
	数のヒューリスティクスをあらかじめ用意しておく」と
	いう従来の手続き的アプローチの代りに、「制約によっ
	てシステムを宣言的に記述し、確率を使った一般的な原
	理に基づいて処理の制御を行う」という新しいアプロー
	チを提案し、その有効性を示す。発表では、実装した確
	率的 Horn 節制約システム(Probabilistic Horn
	Constraint System)の説明・PP attachment の曖昧性を
	含んだ構文解析を使った評価実験およびそのプログラム
	に対するパラメータ学習の結果について述べる。

                第２４回研究会

  日時：９６年１２月１０日(火) 午後３時～午後５時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
発表者：吉田佐治子(文教大学)
  題目：日本語の構文における助詞ハの機能について
　概要： 日本語の構文における助詞ハの機能について，三上の主張を心理学的に検討し，
これをほぼ確認した実験を報告する．また，曖昧な否定文の解釈とハとの関連に
ついての一連の実験結果から，ハが曖昧さを解消する有効な手段であること，
ハのこのような働きは，文を読む際にハが注目されることと関係があることを
示す．

                第２３回研究会

  日時：９６年１０月２２日(火) 午後４時～午後６時
  場所：松下電器産業 (株) 中央研究所
発表者：Yves Lepage(ATR音声翻訳通信研究所 第三研究室)
  題目：Analogy : phenomenon, tentative explanation, application and  ex-
	periments
　概要：In his "Cours de linguistique generale" (一般言語学講義),
	Saussure discusses a  phenomenon  of  tremendous  importance  in
	language:  analogy.  For example, given the series of three terms
	"walk", "walked" and "look",  human  beings  naturally  coin  the
	fourth  missing  term  "looked". This phenomenon  is usually com-
	pletely neglected by natural language processing.

	I shall give a tentative explanation of analogy in terms of  edit
	distances,  thus  paving the way to computational applications. I
	shall discuss some of the linguistic facts captured by  this  ex-
	planation.  An  application is analysis by analogy, which consti-
	tutes a significant  improvement  over  the  so-called  "example-
	based"  paradigm  in natural language processing. I shall present
	promising results of experiments carried out on an excerpt of the
	U-Penn tree-bank.

	Analysis  by  analogy  is  currently  being  integrated  under  a
	text-and-tree  editor, designed for tree-banking purposes, as one
	legitimally expects a snow-ball effect in speed of  indexing  and
	an amelioration in the consistency of the data.

	In conclusion, I shall discuss future directions and some  excit-
	ing open questions raised by analogy.

                第２２回研究会

  日時：９６年９月２６日(木) 午後４時～
  場所：ＡＴＲ音声翻訳通信研究所 第２打合せ室
発表者：藤尾 正和(奈良先端科学技術大学院大学)
  題目：統計情報ベースの係り受け解析
  概要：言語学者しか書けないような繁雑なルールは使用せず、統計情報を元に
係り受け解析を行う。まずJUMANの出力を元に文節区切りと属性を決定する。
この文節および属性の定義は固定したものではなく、正規表現などを用いてユー
ザーの側で自由に設定できるようにしている。
  基本となる属性としては 係り属性、受け属性、主辞、係り関係 を用いた。
係り属性と受け属性は、連体、連用の性質のみを見るようにしている。この属
性は統計処理の前段階の係り先候補をしぼる制約として用いる。残った複数候
補の中から、文全体の係り受け確率を最も高くするものを正解として選ぶ。
  係り受けの確率は、係り関係、係り側主辞、受け側主辞のtrigram頻度をも
とに計算している。
  EDRから10000文を自動抽出して正解例として用いた実験の結果についても述べる．

                第２１回研究会(臨時?)

  日時：９６年７月１７日(水) 午前１０時～１２時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
発表者：中澤恒子(NTT情報通信研究所)
  題目：使役助動詞「させ」は構造木の葉を構成するか
  概要：伝統的な変形文法に基づく統語論では使役助動詞「させ」を phrasal affix
	（動詞句と結合する動詞）として分析してきた (e.g. Kuroda 1965)。その一
	方、音韻論の扱いでは「させ」は（動詞句ではなく）動詞の語幹と伴に 
	phonological word （音韻上の語）を形成する。本講演では、統語構造上でも
	「させ」が動詞句ではなく動詞と直接結合すべきであることを主張する。句構
	造文法の枠組みで行われている最近の論争(e.g. Gunji forthcoming, Iida et
	al. forthcoming) をふまえて、統語、意味、語用論上の根拠を述べ、HPSG文
	法理論の語彙規則を用いた分析を紹介する。さらに語彙規則の替りとして、統
	語規則とArgument Composition (Hinrichs and Nakazawa 1989, 1994)を用い
	た「させ」の分析方法を示す。この二つの分析を比較して、日本語の助動詞の
	分析が一般的に語彙規則によるべきか、統語規則によるべきかという問題を提
	起する。

                第２０回研究会

  日時：９６年６月１８日(火) 午後４時～午後６時
  場所：松下電器産業 (株) 中央研究所
発表者：角所 考（かくしょ こう）(大阪大学 産業科学研究所 助手)
  題目：言語情報とパターン情報の相互変換のための地理的情報の
        言語的表現とその理解
  概要：大阪大学北橋研究室における，言語情報とパターン情報の相互変換，さ
        らにはマルチメディア情報処理を目指す研究のうち，移動ロボットのナ
        ビゲーションや地図情報システムに関する研究について紹介する．

        前者は，"２つ目の交差点を左折して..." などの言語的な走行経路指示
        に基づいて，未知環境中を目的地まで移動するための観測・行動計画に
        関するものであり，後者は，地図上に指定された出発点と目的地に基づ
        く経路探索の結果を，ユーザの指定したレベルの省略やデフォルメを施
        して，略地図や，前者の場合と同様の言語的な表現によって提示するメ
        ディアミックス型の地図情報システムに関するものである．

        これらはいずれも，センサデータや地図画像などのパターン情報と，言
        語表現に含まれる"交差点"などの地理的な概念ラベルとの間の対応付け，
        およびその際に生じるパターン解釈の曖昧性や主体依存性などに関する
        問題と関連しており，本発表では，主にこの観点から研究の具体的な内
        容について言及したいと考えている．

                第１９回研究会

  日時：９６年 ５月２１日(火) 午後４時～午後６時
  場所：ＡＴＲ音声翻訳通信研究所 第１・２打合せ室
発表者：Wide Roeland Hogenhout (奈良先端科学技術大学院大学)
  題目：Gradual Statistical Grammar Induction
  概要：Building a grammar is very expensive and this easily frustrates
        work on parsing of natural language. For this reason there is
        a strong interest in automatic grammar induction. One of the
        methods for automatic induction of a grammar is statistical
        training with the inside outside algorithm.

        The most successful experiment in this direction was carried
        out by Schabes et al. This had strong limitations however, as
        can be seen easily by trying to imagine a grammar under those
        conditions.  The problem is that the number of nonterminals has
        to be very low since the number of rules grows exponentially.

        We suggest an alternative method for grammar induction, which
        allows for more nonterminals, and further refines the grammar
        using a treebank. It is partly based on the inside outside
        algorithm, partly on information theory. Since we are only
        just beginning experiments, experimental results will not be
        available.

		第１８回研究会

  日時：９６年 ４月２３日(火) 午後４時～午後６時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
発表者：柏岡 秀紀 (ＡＴＲ音声翻訳通信研究所)
  題目：英語のツリーバンク作成方法とその周辺研究について
  概要：統計的な処理に必要となるツリーバンクの作成方法について述べる．
	品詞に意味カテゴリを含めた詳細な体系によるツリーバンクの作成
	のために，半自動的に品詞付与，構文解析をし，修正を容易にする
	支援ツールを開発した．本支援ツールについて報告するとともに，
	作成したツリーバンクを用いる統計的解析手法について述べる．

		第１７回研究会

  日時：９６年２月２７日(火) 午後４時～午後６時
  場所：松下電器産業 (株) 中央研究所
発表者：森 信介 (京都大学工学研究科)
  題目：マルコフモデルの重ね合わせと日本語形態素解析への応用
  概要：マルコフモデルを用いた日本語の形態素解析の改善方法について述
        べる。提案する改善方法は次のように要約される。1)各形態素の語
        彙化、2)附属語列の登録、3)マルコフモデルの重ね合わせ。以上の
        アイデアに基づく形態素解析器を作成し、実験を行なった結果を報
        告する。

		第１６回研究会

  日時：９６年１月２３日(火) 午後４時～午後６時
  場所：ＡＴＲ音声翻訳通信研究所 第一・第二打合せ室
発表者：春野雅彦 (ＮＴＴ ＣＳ研) または 松本裕治 (奈良先端大)
  題目：文脈木を利用した日本語形態素解析の高精度化
  概要：データ圧縮の分野で利用されている文脈木を用いた日本語形態素解
        析システムのパラメータ調整について述べる．文脈木を利用するこ
        とにより必要に応じて考慮すべき文脈の長さを変えることが可能と
        なり，既存の手法に比べて高精度の形態素解析が可能となる．大規
        模な解析済みコーパスから最適なパラメータ値を求める手法につい
        て述べる．

		第１５回研究会

  日時：９５年１１月２８日(火) 午後４時～午後６時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
発表者：坪香 英一 (松下電気産業(株) 中央研究所)
  題目：ファジィベクトル量子化に基づくＨＭＭ
  概要： ＨＭＭは時系列パターンを確率的に表現するモデルであって、音声
        信号に対し、それがもつ種々の変動を考慮した形でモデル化でき、
        サンプルデータからＨＭＭのパラメータを "数理的根拠に基づく普
        遍的な方法で" 効率的に推定するアルゴリズムが存在する。また、
        十分なサンプルデータを用いれば、精度の高い認識性能を示し、Ｄ
        Ｐマッチングと同じように連続音声認識への拡張も容易である。こ
        のことからＨＭＭは音声認識に多用されると共に、更なる性能向上
        を目指して種々の改良も行なわれている。
        ＨＭＭの改良は、主として、ＨＭＭの構造（トポロジー）に関する
        もの、各状態における観測ベクトルの発生度合いの表現方法に関す
        るもの、の２つに大別して考えられる。ここでは、後者のことにつ
        いて、少ない計算量で精度の高い結果を得ることを目的として当方
        で開発したファジィベクトル量子化に基づくＨＭＭについて解説す
        る。

                     第１４回研究会

  日時：９５年１０月２４日 (火) 午後４時～午後６時
  場所: 松下電器産業 (株) 中央研究所
発表者: 潮田 明 (ＡＴＲ音声翻訳通信研究所) 大会議室Ｂ
  題目: Data-Driven Word Clustering and Word-Bit Generation
  概要: This talk describes a data-driven word-clustering method
        in which a large vocabulary of English words (70,000 words)
        is clustered bottom-up, with respect to corpora ranging
        in size from 5 million to 50 million words, using a
        "greedy" algorithm that tries to minimize average loss
        of mutual information.  The same organizing principle,
        mutual information loss, is then used to form a bit-string
        representation of (i.e. to form "word bits" for) all the
        words in the vocabulary.  Preliminary performance results
        obtained employing this method will be presented.
        Evaluation of the word bits and word clusters constructed
        is carried out via two measures: (a) the error rate of
        the ATR Decision-Tree Part-Of-Speech Tagger and (b) the
        perplexity measure, applied to standard class-based
        trigram models of the UPenn Wall Street Journal corpus.

                     第１３回研究会

  日時：９５年９月２９日(金) 午後４時～午後６時
  場所：ＡＴＲ音声翻訳通信研究所 （第１・２打合せ室）
発表者：大石 亨 （奈良先端科学技術大学院大学）
  題目：日本語動詞の語彙知識獲得と記述について
  概要：格を含めた動詞句全体の意味によって日本語の動詞を類型化
        することにより主題的知識を得る手法、共起する副詞によっ
        てアスペクチュアルな知識を得る手法、及び、それらの知識
        の記述法などについて述べる。

                     第１２回研究会

  日時：９５年７月２５日(火) 午後４時～午後６時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
発表者: 村田  稔樹 (沖電気工業(株) 研究開発本部 関西総合研究所)
  題目: WWW 用機械翻訳システム :  W3-PENSEE
  概要: 近年、インターネットに代表されるコンピュータ
	ネットワークの普及により、世界各地から情報が
	発信されている。その情報はほとんどが英語であ
        り、日本語が母国語である日本ではその情報の扱
        いに不便を感じることが多い。そこで機械翻訳シ
        ステムを用いることになるが、現在の機械翻訳シ
        ステムの翻訳品質は、人間の行なう翻訳には及ば
        ない。そこで必要となるのが翻訳システムの使用
        方法の研究である。我々は、World Wide Web(WWW)
        の情報の翻訳を対象に、ユーザモデルを検討した
        結果得られた知見をもとに、通信路方式、事前／
        蓄積翻訳機能、タグレイアウト保存翻訳を特徴と
        するWWW 用機械翻訳システムW3-PENSEE
        を開発したので報告する。

                     第１１回研究会

  日時：９５年６月２７日(火) 午後４時～午後６時
  場所：松下電器産業 (株) 中央研究所 
発表者: 赤峯 享 (ＡＴＲ音声翻訳通信研究所)
  題目: 英日対話文翻訳における漸進的な翻訳方式
  概要: 従来、言語翻訳システムでは、文単位で処理を行ってきた。
        しかしながら、自動通訳システムのように入力が時系列に与
        えられる場合、文単位の処理では、発話が終了するまで処理
        を開始することができず、発話の伝達に遅延が生じてしまう。
        その結果、１）発話間の結束性が壊れてしまう、２）処理時
        間が増加してしまう等の問題が生じる。本発表では、前記の
        問題を解消するために、文より小さな単位で翻訳処理を行な
        う同時通訳的処理方式を示す。本方式では、接続詞のような
        区切りとなる表現をトリガーとして、入力された発話を句や
        節といった単位で英日変換し、その部分的な変換結果の日本
        語表現を順次確定していきながら、文全体として意味の通る
        日本語文の生成を行なう。

                     第１０回研究会

  日時：９５年５月２３日(火) 午後４時～午後６時
  場所：ＡＴＲ音声翻訳通信研究所 打合せ室１・２
発表者: 今一 修（奈良先端科学技術大学院大学）
  題目: 文法的不適格文処理のための統合的枠組み
        An Integrated Framework for Processing Grammatically
        Ill-Formed Inputs
  概要: 文法的不適格文を処理するために、統語的・意味的・文脈的情報を統
        合的に用いた手法について発表する。処理は、1)一文内での処理、2)
        文脈を考慮した処理、に分けられる。1)では、不適格文が検出された
        場合、解析途中で生成された部分解析結果を利用する。この部分解析
        結果から不適格文の回復のために適切なものを選択するために、a)統
        語的・意味的な不適格度、b)不適格文回復に対する貢献度を指標とし
        た手法について述べる。2)では、一文内では処理することができない
        断片文、 省略等を文脈情報を用いて処理する。本発表では、主に1)
        について述べる。

                     第９回研究会

  日時：９５年４月２８日(金) 午後４時～午後６時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
発表者：川越 睦 (松下電器産業株式会社)
  題目：論理型木構造変換言語の設計
	Design of Logic-based Tree to Tree Transducer
  概要：日本語文法を記述するための記述言語を紹介する。 この言語は、
	特にスクランブリング、省略、イディオムの記述のしやすさを狙っ
	ている。本文法記述言語で書かれた文法のPrologプログラムへの
	変換方法についても述べる。変換によって得られたプログラムは
	ボトムアップの横型探索により日本語の解析を行なう。

                     第８回研究会

  日時：９５年３月２４日(金) 午後４時～午後６時
  場所：松下電器産業 (株) 中央研究所 大会議室Ａ
発表者：田代 敏久（ＡＴＲ音声翻訳通信研究所）
  題目：音声言語処理のための構文解析ツールキット
  概要：頑健かつ高精度で、処理効率が良い構文・意味解析機構を目指して、
        多種多様な言語知識を有効に利用でき、外部モジュールと容易にリ
        ンクできる構文解析ツールキットの開発を進めている。本稿では、
        構文解析ツールキットの概要、および予備的な言語解析実験の結果を
        報告する。

                     第７回研究会

  日時：９５年２月２４日(金) 午後４時～午後６時
  場所：ＡＴＲ音声翻訳通信研究所 打合せ室２
発表者：竹内孔一 （奈良先端科学技術大学院大学 情報科学研究科）
  題目：ＨＭＭを用いた日本語形態素解析のパラメータ学習
  概要：我々の研究室で提供している日本語形態素解析システムJUMAN は、
	  日本語形態素の連接規則と単語ごとにコスト値が割り振られており、
	  このコスト値を元に曖昧性の絞り込みを行なっている。しかし、現
	  在このコスト値は人手でつけたものであり最適化する機構がない。
	  そこで、今回の発表ではJUMAN システムに対応する HMM(Hidden
	  Markov Model)を作成し、コーパスを利用してパラメータ（コスト値）
	  の再推定を行なうシステムを提案する。さらに，現在進行中の実験結
	  果について報告する。

                     第６回研究会

  日時：９５年１月２７日(金) 午後４時～午後６時
  場所：奈良先端科学技術大学院大学 情報科学研究科 Ａ７０７室
	(近鉄奈良線学園前駅から１５：３３発高山サイエンスタウン行きの
	 バスに乗り，「大学院大学前」で降車)
発表者：平井 誠 (松下電器中央研究所)
  題目：音声認識と自然言語処理
  概要：音声対話実現への課題として，音声認識，自然言語処理，対話管理な
	どの統合がある．実際に，音声認識に自然言語処理を応用した事例は
	あるが，充分な効果を上げるまでには至っていない．音声認識と自然
	言語処理の統合における基本的な問題点の１つは，この２つの処理手
	法の出発点が異なるという点である．本稿では，これを１）中間構造
	の欠如，２）尤度の不整合という２点から考察し，音声対話への課題
	を幾つか指摘する．

                     第５回研究会

  日時：９４年１１月２１日（月）午後４時半～午後６時
  場所：松下電器産業 (株) 中央研究所
発表者：巖寺 俊哲（ＡＴＲ音声翻訳通信研究所）
  題目：対話のインタラクション構造と話題の認識
  概要：対話のインタラクション構造と話題を認識する手法について提案する。
        本手法を国際会議の問い合わせに関する対話に適用することによって、
        本手法の正しさを示す。本手法は、
        (1)発話単位への分割、
        (2)表層表現を用いた表層発話行為の認定・ラベル付与、
        (3)談話分析の考え方を利用した対話の構造化、
        (4)この構造と仮説「話題の持続時間」を利用した話題の遷移の認識、
        のプロセスから成っている。対話の構造化に使用されるラベルはドメ
        インとは独立である。これにより、本手法は、頑健であることを特徴
        とする。

                     第４回研究会

  日時：９４年１０月２１日（金）午後４時半～午後６時半
  場所：ＡＴＲ音声翻訳通信研究所 打合せ室２
発表者：平沢 純一（奈良先端科学技術大学院大学 情報科学研究科）
  題目：関連性理論の枠組における文脈情報の選択と決定
  概要：自然言語理解における文脈の扱いは今も大いに困難な問題のひとつで
        あり、真に談話の理解を達成するためには、発話中に明示的に述べら
        れていない世界知識・背景知識などの文脈情報を補って解釈する必要
        がある。しかし、一般にこれらの文脈的知識は膨大であり、解釈に必
        要な文脈的知識を適切に選択し決定することは難しい。
        本発表では、最近一部で注目を集めている「関連性理論(Sperber &
        Wilson,1986)」の基本的なアイディアを紹介しながら、関連性理論の
        枠組に基づくことで「文脈情報の選択と決定」の問題がどのように取
        り扱われるのかを計算機的実現の視点も踏まえながら述べる。

                     第３回研究会

  日時： ９４年９月１２日(月)午後５時～６時半
  場所： 奈良先端大 情報科学研究科 Ａ７０７室
  話者：松下電器産業(株) 情報通信研究所 杉村 領一 氏
  題目：「松下の日英機械翻訳システムの全体像について」
  概要: 松下テクニカルレポートに基づき、日英機械翻訳システムの
        構成とその特徴を、高速性を支える技術を中心に紹介した。

                     第２回研究会

  日時: ９４年７月２９日 (金) 午後５時～午後７時
  場所: 松下電器産業 (株) 中央研究所
発表者: 田代 敏久 (ＡＴＲ音声翻訳通信研究所)
  題目: 形態素調整規則の半自動的獲得手法
  概要: 近年、コーパスに基づく自然言語処理が注目を浴びている。各種の
        コーパスの中でも、形態素情報付きコーパス（ｔａｇｇｅｄ　ｃｏ
        ｒｐｕｓ）は、もっとも基本的かつ重要な資料である。しかし、形
        態素情報体系（品詞体系、語の分割基準等）には、様々なバリエー
        ションが存在するために、異なる形態素情報体系のコーパスを有効
        利用することは困難な場合が多い。そこで、本報告では、異なる形
        態素情報体系のコーパスをできるだけ容易に有効利用するために、
        ２つの異なる形態素情報体系を調整するための規則（形態素調整規
        則）を半自動的に獲得する手法を提案する。また、獲得した規則に
        よるコーパスの書き換え実験の結果について報告する。

                     第１回研究会

  日時：９４年６月１７日（金）午後４時半～午後６時
  場所：ＡＴＲ音声翻訳通信研究所 打合せ室２
発表者：宇津呂 武仁（奈良先端科学技術大学院大学）
  題目：類似度からの検索質問生成による効率的類似用例検索
  概要：従来，用例に基づく自然言語処理においては，類似用例検索の際に
        用例データベース中の全用例と入力例の間で類似度の計算を行う
        (全用例検索 (Full Retrieval)) 必要があったが，この場合，用例
        検索の計算コストが用例データベース中の用例数に比例して大きく
        なるため，大きな問題となっていた．これに対しこの発表では，全
        用例検索を行なわない効率的類似用例検索法として，適当な類似度
        の用例を検索するための検索質問を生成しこの検索質問を満たす用
        例を検索するという検索法 (質問生成型検索 (Query Generation
        Retrieval)) について述べる．本手法により，データベース中の用
        例数の増加に対し検索時間がほぼ一定となるという結果が得られた．

自然言語処理学研究室