Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework

Автор: T Ramakrishnudu, R B V Subramanyam

Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa

Статья в выпуске: 7 vol.7, 2015 года.

Бесплатный доступ

Mining frequent and infrequent itemsets from a given dataset is the most important field of data mining. When we mine frequent and infrequent itemsets simultaneously, infrequent itemsets become very important because there are many valued negative association rules in them. Mining frequent Itemset is highly expensive, if the minimum threshold is low, whereas mining infrequent itemsets is highly expensive, if the minimum threshold is high. When the dataset size is very large, both memory usage and computational cost of mining infrequent items is very expensive. In addition, single processor’s memory and CPU resources are not enough to handle very large datasets. Parallel and distributed computing are effective approaches to handle large datasets. In this paper we proposed a method based on Hadoop-MapReduce model, which can handle massive datasets in mining infrequent itemsets. Experiments are performed on 8 node cluster with a synthetic dataset. The performance study shows that the proposed method is efficient in handling very large datasets.

Еще

Data Mining, Association Rule, Frequent Itemset, Infrequent Itemset, Hadoop, Mapreduce

Короткий адрес: https://sciup.org/15010732

IDR: 15010732

Список литературы Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework

  • R. Agrawal, T. Imielinski, A. Swami, “Mining association rules between sets of items in large databases”, In Proceedings of ACM SIGOM International Conference on Management of Data, New York, May 1993, pp. 207–216.
  • R.Agrawal and R.Srikant, “Fast algorithms for mining association rules”, In Proceedings of 20th International Conference on VLDB, Chile, May 1994, pp. 207–216.
  • J.Han and Y.Fu, “Mining multiple-level association rules in large databases”, IEEE Trans. on Knowledge and Data Engineering, Vol. 11, No 5, September 1999, pp. 798-805.
  • X. Wu, C. Zhang and S. Zhang,”Efficient mining of both positive and negative association rules”, ACM Trans. on Information Systems, vol.22 (3), 2004, pp 381–405.
  • Chris Cornelis, Peng Yan, Xing Zhang, Guoqing Chen: “mining positive and negative association rules from large databases”, in IEEE conference on Cybernetics and Intelligent Systems, Bangkok, June 2006, pp.1-6.
  • X. Yuan, B.P. Buckles, Z. Yuan and J. Zhang, ”Mining negative association rules”, Proceedings of the Seventh International Symposium on Computers and Communication, Italy, July 2002, pp. 623–629.
  • Junfeng Ding, Stephen S.T. Yau, “TCOM, an innovative data structure for mining association rules among infrequent items”, Computers and Mathematics with Applications, Vol. 57, No. 2, January 2009, pp. 290-301.
  • Ling Zhou, Stephen Yau, “Efficient association rule mining among both frequent and infrequent items”, Computers and Mathematics with Applications, Vol. 54, No.6, September 2007, pp. 737–749.
  • Luca Cagliero and Paolo Garza “Infrequent weighted itemset mining using frequent pattern growth ” IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 4, April 2013, pp. 903-915.
  • Jeffery Dean and Sanjay Ghemawat “MapReduce: simplified data processing on large clusters”, 6th Symposium on Operating Systems Design and Implementation, October 2004, pp.107-113.
  • Jeffery Dean and Sanjay Ghemawat “MapReduce: simplified data processing on large clusters”, Communications of the ACM, Vol. 51, No.1, 2008, pp. 107-113.
  • Dong, Z Zheng, Z Niu and Q Jiam ”Mining infrequent itemset based on multiple level minimum supports”, 2nd Int. Conf. on Innovative Computing, Information Control, 2007.
  • Jiawei Han and Micheline Kamber, “Data mining: concepts and techniques”, Morgan Kaufman, 2001.
  • Apache Hadoop Project, http://hadoop:apache.org/. accessed at 201408251930.
  • Jongwook Woo, “Market basket analysis algorithm on Map/Reduce in AWS EC2”, International Journal of Advanced Science and Technology, Vol.46, September 2012, pp. 25-38.
  • Su-Qi W, Yu-Bin Y, Guang-Peng C, Yang G and Yao Z, “MapReduce-based closed frequent Itemset mining with efficient redundancy filtering”, 12th International Conference on Data Mining Workshops, December 2012, pp. 449-453.
  • Othman Y, Osman H and Ehab E, “An efficient implementation of apriori algorithm based on hadoop-MapReduce model”, International Journal of Reviews in Computing, Vol. 12, December 2012, pp.57-67.
  • Ming-Yen Li, Pei-Yu L and Sue-Chen H, “Apriori-based frequent Itemset mining algorithms on MapReduce”, The 6th International Conference on Ubiquitous Information Management and Communication, Malaysia, February 2012, pp.257-264.
  • Ning Li, Li Z, Qing H and Zhongzhi S, “Parallel implementation of apriori algorithm based on MapReduce”, 13th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Japan, August 2012, pp. 236-241.
  • Le Z, Zhiyong Z and Jin C, “Balanced parallel fp-growth with mapreduce”, in 2010 IEEE Youth Congress on Information Counting and Telecommunications, Chaina, November 2010, pp. 243-246.
  • Xin Yue Y, Zhen L and YanFu, ”Mapreduce as a programming model for association rules algorithm on hadoop”, in 3rd International Conference on Information Sciences and Interaction Sciences, Chaina, June 2010, pp. 99-102.
  • Matteo Riondato, Justin A. DeBratant, Rodrigo Fonseca, and Eli Upfal, “PARAM: A parallel randomized algorithm for approximate association rules mining in MapReduce” in 21st ACM International Conference on Information and Knowledge Management, USA, October 2012, pp.85-94.
  • Mohammadhossein B and Madhi Niamanesh, “ScaniBino: An effective MapReduce-based association rule mining method”, in proceedings of the the sixteenth International Conference on Electronic commerce, USA, August 2014, pp.1-8.
  • Zahara Farzanaryar and Nick Cercone, “Efficient mining of frequent itemsets in social network data based on MapReduce framework”, in proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Canada, August 2013, pp.1183-1188.
  • M M Rahman “Mining social data to extract intellectual knowledge”, International Journal of Intelligent Systems and Applications, Vol. 4, No. 10, 2012, pp. 15-24.
  • Thabet slimani and Amor Lazzez, “Efficient analysis of pattern and association rule mining approaches”, International Journal of Information Technology and Computer Science, Vol. 6, No. 3, 2014, pp. 70-81.
Еще
Статья научная