Mining Data Streams using Option Trees

Автор: B.Reshma Yusuf, P.Chenna Reddy

Журнал: International Journal of Computer Network and Information Security(IJCNIS) @ijcnis

Статья в выпуске: 8 vol.4, 2012 года.

Бесплатный доступ

In today's applications, evolving data streams are stored as very large databases; the databases which grow without limit at a rate of several million records per day. Data streams are ubiquitous and have become an important research topic in the last two decades. Mining these continuous data streams brings unique opportunities, but also new challenges. For their predictive nonparametric analysis, Hoeffding-based trees are often a method of choice, which offers a possibility of any-time predictions. Although one of their main problems is the delay in learning progress due to the presence of equally discriminative attributes. Options are a natural way to deal with this problem. In this paper, Option trees which build upon regular trees is presented by adding splitting options in the internal nodes to improve accuracy, stability and reduce ambiguity. Results based on accuracy and processing speed of algorithm under various memory limits is presented. The accuracy of Hoeffding Option tree with Hoeffding trees under circumstantial conditions is compared.

Еще

Data streams, hoeffding trees, option trees, large databases

Короткий адрес: https://sciup.org/15011111

IDR: 15011111

Список литературы Mining Data Streams using Option Trees

  • P. Domingos and G. Hulten, "Mining High Speed Data Streams", in Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, 2000.
  • P. Domingos and G. Hulten. A General Framework for Mining Massive Data Streams.
  • Manish Mehta, Rakesh Agarwal, and Jorma Rissanen. "SLIQ : A fast scalable classifier for data mining". In Extending Database Technology, 1996.
  • John Shafer, Rakesh Agarwal, and Manish Mehta. "SPRINT : A scalable parallel classifier for data mining ". In International Conference on Very Large Databases. 1996.
  • Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams. In KDD, pages 97–106, 2001.
  • Eric Bauer and Ron Kohavi. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
  • Bernhard Pfarhringer, Goeffrey Holmes, and Richard Kirkby. "New Options for Hoeffding trees". 2007.
  • Ron Kohavi and Clayton Kunz, "Option Decision trees with majority votes". In International Conference on Machine Learning.
  • Richard Kirkby, "Improving Hoeffding Trees", University of Waikato, 2007.
  • Dariusz Brzezinski, "Mining data streams using concept drift", Poznan University of Technolgy, 2010
Еще
Статья научная