A Genetic Programming Framework for Topic Discovery from Online Digital Library

Автор: Yinxing Li, Ning Li

Журнал: International Journal of Information Technology and Computer Science(IJITCS) @ijitcs

Статья в выпуске: 1 Vol. 2, 2010 года.

Бесплатный доступ

Various topic extraction techniques for digital libraries have been proposed over the past decade. Generally the topic extraction system requires a large number of features and complicated lexical analysis. While these features and analysis are effective to represent the statistical characteristics of the document, they didn't capture the high level semantics. In this paper, we present a new approach for topic extraction. Our approach combines user's click stream data with traditional lexical analysis. From our point of view, the user's click stream directly reflects human understanding of the high-level semantics in the document. Furthermore, a simple, yet effective, piece-wise linear model for topic evolution is proposed. We apply genetic algorithm to estimate the model and extract topics. Experiments on the set of US congress digital library documents demonstrate that our approach achieves better accuracy for the topic extraction than traditional methods.

Еще

Genetic Algorithms, Non-linear Matrix Factorization, Web-click Data, Convex Optimization, Interior Point Method

Короткий адрес: https://sciup.org/15011571

IDR: 15011571

Список литературы A Genetic Programming Framework for Topic Discovery from Online Digital Library

  • J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In SIGIR, 1998.
  • R. Baeza-Yates and B. Ribeiro-Neto. Modern Information retrieval. Addison-Wesley, 1999.
  • C. Barry and L. Schamber. Users criteria for relevance evaluation: A cross-situational comparison. Information Processing and Management, 34(2-3):219–236, 1998.
  • N. J. Belkin. Intelligent information retrieval: Whose intelligence? In Proceedings of the Fifth International Symposium for Information Science, pages 25–31, 1996.
  • S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
  • http://pyevolve.sourceforge.net.
  • D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.
  • D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, volume 13, page 556C562, 2001.
  • T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In SIGIR, 2007.
  • X. Xu and Z. Niu. Automatic document tagging in social semantic digital library. In ICONIP, volume 2, pages 344–351, 2009.
  • Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007.
  • MECS Journal
  • Home Latest News & Events Aims and Scope Submit a Paper Author Guidelines Editorial Board Review Process E-Mail Alert Become a Member Indexing Service Publication Charge Recommend to Library Best Paper Awards Terms & Conditions Special Issues Publication Ethics and Malpractice Statement
Еще
Статья научная