Analysis of amazon product reviews using big data- apache pig tool

Автор: Amrit Pal Singh, Gurvinder Singh

Журнал: International Journal of Information Engineering and Electronic Business @ijieeb

Статья в выпуске: 1 vol.11, 2019 года.

Бесплатный доступ

We live in the era of digital technologies where data is increasing day by day at a very high rate. The data is further popularly classified as ‘Big Data’ because of its velocity, veracity, variety and its huge volume. This data could be unstructured, semi-structured or structured as it is divergent in nature. In this work, we would assess various categories of Amazon Product Reviews, the large datasets that contain around 144 million reviews in total. The datasets consists of Product reviews collected from Amazon, each having various numbers of attributes of 11 different categories. The motive of this work is to find and compare the ratings of the products during the lifespan of the product reviews. Another goal of this work is to help Amazon regarding the listing of the products in their database. This work aims to relate user’s ratings and reviews to discover how beneficial and good a product is [6]. User ratings are collected and are analyzed based on different categories (datasets) which gives an insight as to which product performs good and what are the problems associated to a certain non-performing product.

Еще

Section I focusses on the framework of Hadoop, its nodes and its functioning structural architecture.. Further Section II explains the work carried out by McAuley and how Big data is used to solve the industry’s problems. A step by step working model of the work is explained in section III, briefly describing the datasets. Section IV depicts the results and outcomes of the work. The conclusion of the work is can be examined in Section V

Еще

Короткий адрес: https://sciup.org/15016159

IDR: 15016159   |   DOI: 10.5815/ijieeb.2019.01.02

Список литературы Analysis of amazon product reviews using big data- apache pig tool

  • J., Dean, & S., Ghemawat (2010). MapReduce: a flexible data processing tool. Communications of the ACM, 53(1), 72-77.
  • J., Mehine (2011). Raamistiku Apache Pig kasutamine suuremahulises andmeanalüüsis (Doctoral dissertation, Tartu Ülikool).
  • B., Jopson (2011). Amazon urges California referendum on online tax. The Financial Times, 4.
  • J., McAuley, R. Pandey & J. Leskovec (2015, August). Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM.
  • J., McAuley, C., Targett, Q., Shi, & A., Van Den Hengel (2015, August). Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 43-52). ACM.
  • J., McAuley, & A. Yang, (2016, April). Addressing complex and subjective product-related queries with customer reviews. In Proceedings of the 25th International Conference on World Wide Web (pp. 625-635). International World Wide Web Conferences Steering Committee.
  • S., Mohanty, K., NathRout, S., Barik, & S.K., Das. A Study on Evolution of Data in Traditional RDBMS to Big Data Analytics.
  • S., Singh, V., Mandal, & S., Srivastava. The Big Data Analytics with Hadoop
  • Apache Hadoop, http://hadoop.apache.org
  • R., Shobana, D., Saranya. Hadoop on Big Data Analysis. International Journal of Advanced Research Trends in Engineering and Technology
  • S., Dhawan, & S., Rathee (2013). Big data analytics using Hadoop components like pig and hive. American International Journal of Research in Science, Technology, Engineering & Mathematics, 88, 13-131.
  • Pig Latin Reference Manual 2. https://pig.apache.org/docs/r0.8.1/piglatin_ref2.html
  • S., Rathi. A brief Study of Big Data Analytics using Apache Pig and Hadoop Distributed File System
  • E. L., Lydia, & M. B., Swarup. Analysis of Big data through Hadoop Ecosystem Components like Flume, MapReduce, Pig and Hive.
Еще
Статья научная