概要
データマイニング,テキストマイニング関連の文献紹介
2003年度
6 月より SVM勉強会 と合同で行う.
01/21 担当: 工藤
- Yasemin Altun, Thomas Hofmann
- Large Margin Methods for Label Sequence Learning
- 8th European Conference on Speech Communication and Technology 2003
- http://www.cs.brown.edu/people/altun/pubs/AltunHofmann-EuroSpeech2003.pdf
chunking, POS-tagging, NER などの一般的な Sequential Labeling 問題に対してメタなアーキテクチャーを与え、以下のアルゴリズムを 統一的に説明しています。 HMM (Hidden Markov Model) CRF (Conditional Random Field) MRF (Marginalized Random Field) HMM-SVM (Hidden Markov SVM) Label Sequence AdaBoost
01/06 担当: 東
- Charles Elkan.
- Using the triangle inequality to accelerate k-means.
- In Proc. 20th International Conference on Machine Learning (ICML), 2003.
- http://www.cs.ucsd.edu/users/elkan/kmeansicml03.pdf
12/19 担当: 小林(の)
- Avrim Blum and Tom Mitchell.
- Combining labeled and unlabeled data with co-training.
- In Proc. the Conference on Computational Learning Theory (COLT-98), 1998.
- http://citeseer.nj.nec.com/blum98combining.html
12/09 担当: 工藤
- 工藤 拓.
- ME, SVM and boosting.
- Unpublished manuscript.
- http://cl.aist-nara.ac.jp/~taku-ku/handouts/svmme.pdf
12/02 担当: 伊藤
- Robert E. Shapire
- A breif introduction to boosting.
- In Proc. 16th International Joint Conference on Artificial Intelligence, 1999.
11/28 担当: 藤田(篤)
- Wee S. Lee and Bing Liu.
- Learning with positive and unlabeled examples using weighted logistic regression.
- In Proceedings of the 20th International Conference on Machine Learning (ICML), pp.448-455, 2003.
- http://www.hpl.hp.com/conferences/icml2003/papers/122.pdf
11/21 担当: 徳永
- Thomas Hofmann.
- Probabilistic latent semantic indexing.
- In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval (SIGIR), pp.50-17, 1999.
- http://citeseer.nj.nec.com/article/hofmann99probabilistic.html
11/14 担当: 新保
- David Eppstein.
- Finding the k shortest paths.
- SIAM J. Computation, Vol.28, No.2, pp.652-673, 1998.
11/07 担当: 伊藤
- Scott White, and Padhraic Smyth.
- Algorithms for discovering relative authority in graphs.
- In Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2003), 2003.
10/28 担当: 松本(裕)
- Thorsten Joachims.
- Transductive learning via spectral graph partitioning.
- In Proc. 20th International Conference on Machine Learning (ICML'03), pp.290-297, 2003.
- http://www.hpl.hp.com/conferences/icml2003/papers/355.pdf 参考:
- Indetjit S.Dhillon.
- Co-clustering documents and words using bipartite spectral graph partitioning.
- In Proc. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2001), 269-274, 2001.
- http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf
10/21 担当: 工藤
- Xifeng Yan and Jiawei Han.
- gSpan: graph-based substructure pattern mining.
- In Proc. IEEE International Conference on Data Mining (ICDM), 2002.
- http://citeseer.nj.nec.com/yan02gspan.html 参考:
- http://www.cs.uiuc.edu/Dienst/Repository/2.0/Body/ncstrl.uiuc_cs/UIUCDCS-R-2002-2296/pdf
10/14 担当: 鈴木
- Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi.
- Marginalized kernels between labeled graphs.
- In Proc. ICML 2003, pp.321-328.
- http://www.hpl.hp.com/conferences/icml2003/papers/150.pdf
Random walk に基づいて 2 つのグラフの類似度をはかるための kernel です.
07/29 担当: 伊藤
- Andrew Y. Ng, Alice X. Zheng and Michael I. Jordan.
- Stable algorithms for link analysis
- In Proc. SIGIR'01, 2001.
- http://robotics.stanford.edu/~ang/papers/sigir01-stablelinkanalysis.pdf 参考:
- Andrew Y. Ng, Alice X. Zheng and Michael I. Jordan
- Link analysis, eigen vectors and stability
- In Proc. 17th International Joint Conference on Artificial Intelligence (IJCAI-01), 2001.
- http://robotics.stanford.edu/~ang/papers/ijcai01-linkanalysis.pdf
07/22 担当: 小林(の)
- Ramakrishnan Srikant and Rakesh Agrawal.
- Mining generalized association rules.
- In Proc. 21st International Conference on Very Large Data Bases (VLDB'95), pp.407-419, 1995.
- http://citeseer.nj.nec.com/srikant95mining.html
07/01 担当: 工藤
- Jun Sese and Shinichi Morishita.
- Answering the most correlated N association rules efficiently.
- In Proc. 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02), pp.410-422, 2002.
- http://platinum.ims.u-tokyo.ac.jp/~moris/paper/pkdd20022.pdf
同著者の PODS 2000 発表論文 (05/27 紹介) の続編. 縦型候補表現を用いて高速化.
06/24 担当: 新保
- Nicola Cancedda, Eric Gaussier, Cyril Goutte and Jean-Michel Renders.
- Word-sequence kernels.
- J. Machine Learning Research, Vol.3, pp.1059-1082, 2003.
- http://www.jmlr.org/papers/volume3/cancedda03a/cancedda03a.pdf
Lodhi et. al の Sub-sequence Kernel を単語単位で行う. 減衰率や重みを調整し, 多項式カーネル (bag-of-words) と可能な限り等しい条件のもとで比較. 結論: 出現順序,局所性はあまり意味ないみたい.
06/17 担当: 上出
- Gideon S. Mann and David Yarowsky.
- Unsupervised personal name disambiguation.
- In Proc. Conference on Natural Language Learning (CoNLL-2003), 2003.
- http://cnts.uia.ac.be/conll2003/pdf/03340man.pdf
06/03 担当: 伊藤
- Vanayak Borkar and Kaustubh Deshmukh and Sunita Sarawagi.
- Automatic segmentation of text into structured records.
- In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD-2001), pp.175-186, 2001.
05/27 担当: 工藤
- Shinichi Morishita and Jun Sese.
- Traversing itemset lattice with statistical metric pruning.
- In Proc. ACM Symposium on Principles of Database Systems (PODS-2000), pp.226-236, 2000.
- http://citeseer.nj.nec.com/article/morishita00traversing.html
ルール選択基準が反単調性をみたさない場合のバスケットマイニング法 (AprioriSMP). χ2乗検定量等に適用可能.
05/20 担当: 山崎
- Charu C. Aggarwal and Philip S. Yu.
- A new framework for itemset generation.
- In Proc. ACM Symposium on Principles of Database Systems (PODS-1998), pp.18-24, 1998.
- http://citeseer.nj.nec.com/aggarwal98new.html
05/13 担当: 新保
- C. Faloutsos and King-Ip (David) Lin.
- FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets.
- In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'95), pp.163-174, 1995.
- http://www.msci.memphis.edu/~linki/_mypaper/fastmap.ps.gz
高次元空間上のオブジェクト集合とそれらの間の距離が与えられたとき, 距離をできるだけ保ったまま低次元に写像する方法.
2002年度
12/06 担当: 上出
- Dayne Freitag and Nicholas Kushmerick.
- Boosted wrapper induction.
- In Proc. AAAI/IAAI-2000 pp.577-583, 2000
- http://citeseer.nj.nec.com/freitag00boosted.html
11/22 担当: 工藤
- Ion Muslea, Steven Minton and Craig A. Knoblock.
- Hierarchical wrapper induction for semistructured information sources.
- Autonomous Agents and Multi-Agent Systems, Vol.4 Nos.1/2, pp.93-114, 2001
- http://citeseer.nj.nec.com/muslea01hierarchical.html
2001年度
12/14 担当:
- Roberto J. Bayardo Jr.
- Efficiently Mining Long Patterns from Databases.
- In Proc. SIGMOD-98, pp.85-93, 1998.
12/05 担当: 坪井
- J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C.Hsu.
- PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth.
- In Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
- http://www.cs.sfu.ca/~peijian/personal/publications/span.pdf 参考:
- ICDE'01 presentation (in PowerPoint), April 2001.
- ftp://ftp.fas.sfu.ca/pub/cs/han/slides/icde01_prefixspan.ppt
- 人工知能学会(vol.16 no.6 2001/11)のP.913に「文献紹介」
11/22 担当: 新保
- T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa.
- Efficient substructure discovery from large semi-structured data.
- Technical report DOI-TR-200 (October 2001), Department of Informatics, Kyushu University.
- http://www.i.kyushu-u.ac.jp/doitr/trcs200.ps.gz
11/16 担当: 松田
- Shinichi Shimozono, Hiroki Arimura, and Setsuo Arikawa.
- Efficient discovery of optimal word-association patterns in large text databases.
- New Generation Computing, Vol.18, pp.49 - 60, 2000.
- http://www.i.kyushu-u.ac.jp/~arim/papers/arimura-shimozono-ngc01.ps.gz
11/09 担当: 山本
- Frank Smadja.
- Retrieving Collocations from Text: Xtract.
- Computational Linguistics Vol.19 No.1 pp. 143-177, 1993.
10/30 担当:
- 有村 他
- テキストデータからの高速データマイニング.
- 人工知能学会誌 15巻 4号 (2000/7) pp.618-628.
- http://www.i.kyushu-u.ac.jp/~arim/papers/dis_9b3_abe_arimura.ps.gz
- 最適パターン発見にもとづくデータマイニング(発表資料, ppt)
- 有村博紀,「統計数理とデータマイニング,発見科学2」,統計数理研究所共同研究リポート,142,13-24, 統数研,2001年3月.(統数研共同研究 12-共研-4001, 代表 今井浩,研究集会,2000年11月)
- http://www.i.kyushu-u.ac.jp/~arim/jtalks/ism-nov00.pdf
10/19 担当:
- 松澤裕史 「大規模データベースからの頻出構造化パターンの抽出」
- 情報処理学会論文誌:データベース Vol.42 No.SIG8 (TOD10)
- 平成13年 7月
10/12 担当: 工藤
- Rakesh Agrawal, Tomasz Imielinski, and Arun Swami
- Mining Association Rules between Sets of Items in Large Databases
- In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data
- http://citeseer.nj.nec.com/agrawal93mining.html
文献リスト
番号: B767 著者: Pieter Adriaans, Dolf Zantinge 書名: Data Mining 出版社: Addison-Wesley 発行年: 1996 ISBN: 0-201-40380-3 ページ数: 158 価格: 4150円
番号: B799 著者: Ryszard S. Michalski, Ivan Bratko, Miroslav Kubat 書名: Machine Learning and Data Mining *method and Applications シリーズ: 出版社: John Wiley & Sons 発行年: 1998 ISBN: 0-471-97199-5 ページ数: 456
番号: B1035-1 著者: Ian H. Witten, Eibe Frank 書名: Data Mining *Practical Machine Learning Tools and Techniques with Java Implementations シリーズ: The Morgan Kaufmann Series in Data Management Systems 出版社: Morgan Kaufman 発行年: 1999 ISBN: 1-55860-552-5 ページ数: 371
番号: B1035-2 著者: Jiawei Han, Micheline Kamber 書名: Data Mining *Concepts and Techniques シリーズ: The Morgan Kaufmann Series in Data Management Systems 出版社: Morgan Kaufman 発行年: 2000 ISBN: 1-55860-489-8 ページ数: 550
番号: B1091 著者: Robert Groth 書名: Data Mining *Building Competitive Advantage 出版社: Prentice Hall PTR 発行年: 2000 ISBN: 0-13-086271-1 ページ数: 266
番号: B1164-1 著者: Oded Maimon, Mark Last 書名: Knowledge Discovery and Data Mining *The Info-Fuzzy Network (IFN) Methodology シリーズ: Massive Computing (MACO), Vol.1 出版社: Kluwer Academic Publishers 発行年: 2001 ISBN: 0-7923-6647-6 ページ数: 171