Study group:
Applied Natural Language Processing (NLP.app for short)
Date:
Monday 15:10-
Group Description:
This group focuses on natural language processing techiques for applications. Our main target is computational semantics using large-scale data. Our interest includes:
- distributed NLP
- Data-Intensive Text Processing with MapReduce tutorial at NAACL-HLT 2009
- predicate-argument structure analysis
- semantic role labeling (SRL)
- information extraction
- relation extraction
- information retrieval
- graph mining
- word sense disambiguation (WSD)
We will be solving one assignment a week according to Data-Intensive Information Processing Applications (Spring 2010) course at the University of Maryland by Jimmy Lin, in addition to normal reading groups.
Schedule (reading group/short talk)
- 4/26 (Mon) kickoff meeting (komachi) Data-Intensive Information Processing Applications: Introduction
5/3 (Mon)"Golden Week" holidays- 5/10 (Mon)
- Assignemnt 1-1 Getting started on Hadoop (komachi)
- Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean. "Large Language Models in Machine Translation". EMNLP-2007. (komachi)
- The State of the Art in Language Modeling by Joshua Goodman
- 5/17 (Mon)
- Assignment 2 Part I Bigram Counts Q1-Q4 (yasuhiro-r)
- Jakob Uszkoreit and Thorsten Brants. "Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation". ACL-2008. (komachi)
- 5/24 (Mon)
- Assignment 2 Part II Bigram Counts Q1-Q3 (yasuhiro-r)
- Delip Rao and David Yarowsky. Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce. In Proc. of TextGraphs-4. 2009. (komachi)
5/31 (Mon)- (canceled due to exam week)
- 6/10 (Wed)
- Assignment 2 Part III-IV Bigram Counts, Q3-Q4 for each part (yasuhisa-y)
- Cheng T. Chu, Sang K. Kim, Yi A. Lin, Yuanyuan Yu, Gary R. Bradski, Andrew Y. Ng, Kunle Olukotun. Map-Reduce for Machine Learning on Multicore. NIPS-2006. (komachi)
6/14 (Mon)- (canceled due to IBISML http://ibisml.org/001/ )
- 6/21 (Mon)
- Assignment 3 Inverted Indexing and Boolean Retrieval Part I Inverted index exercise (tomoya-m)
- Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen and Edward Y. Chang. PLDA: Parallel latent Dirichlet allocation for large-scale applications. In Proc. of the Fifth International Conference on Algorithmic Aspects in Information and Management (AAIM 2009), pages 301-314. 2009. (shuhei-k)
- 6/28 (Mon)
- Assignment 3 Inverted Indexing and Boolean Retrieval Part II Boolean retrieval exercise (yuta-h)
- Abhinandan S. Das, Mayur Datar, Ashutosh Garg and Shyam Rajaram. Google News Personalization: Scalable Online Collaborative Filtering. WWW-2007. (yasuhiro-r)
- 7/5 (Mon)
- Joseph E. Gonzalez, Yucheng Low and Carlos Guestrin. Residual splash for optimally parallelizing belief propagation. AISTATS-2009. (jordi-p)
- 7/12 (Mon)
- Code reading: CRF++-0.54, CRFSuite-0.10 (komachi)
- Thomas Lavergne, Oliver Cappe, and Francois Yvon. Practical Very Large Scale CRFs. ACL-2010 (to appear). (komachi)
7/19 (Mon)public holiday- 7/26 (Mon)
Material
- Jimmy Lin and Chris Dyer: "Data-Intensive Text Processing with MapReduce", (forthcoming in mid-2010)
- http://www.umiacs.umd.edu/~jimmylin/book.html (PDF manuscript is available)