Optimization of the number of databases in the big data processing

Автор: Akhatov A.R., Renavikar A., Rashidov A.E.O., Nazarov F.M.

Журнал: Проблемы информатики @problem-info

Рубрика: Прикладные информационные технологии

Статья в выпуске: 1 (58), 2023 года.

Бесплатный доступ

Today, many organizations and companies increasingly need to use Big Data in order to increase their income, strengthen competitiveness, and study the interests of customers. However, most approaches to real-time processing and analysis of Big Data are based on the cooperation of several servers. In turn, the use of multiple servers limits the possibilities of many organizations and companies due to cost, management and other parameters. This research paper presents an approach for realtime processing and analysis of Big Data on a single server based on a distributed computing engine, and it is based on research that the approach leads to efficiency in terms of cost, reliability, integrity, network independence, and manageability. Also, in order to improve the efficiency of the approach, the methodology of optimizing the number of databases on a single server was developed. This methodology uses MinMaxScalcr, StandardScaler, RobustScaler, MaxAbsScalcr, QuantilcTransformcr Power Transformer scaling functions together with Machine Learning Linear Regression, Random Forest Regression, Multiple Linear Regression, Polynomial Regression, Lasso Regression algorithms. The obtained results were analyzed and the effectiveness of the regression algorithm and scaling function was determined for the experimental data.

Еще

Big data, real time processing, single server distributed computing engine, architecture, machine learning, regression algorithms, scaling

Короткий адрес: https://sciup.org/143180995

IDR: 143180995   |   DOI: 10.24412/2073-0667-2023-1-33-47

Список литературы Optimization of the number of databases in the big data processing

  • Alabdullah B., Beloff N., White M. Rise of Big Data — Issues and Challenges. 2018 // 21st Saudi Computer Society National Computer Conference (NCC) 25-26 April 2018, DOI: 10.1109/NCG.2018.8593166.
  • Big Data — Global Market Trajectory and Analytics. Global Industry Analysts. Inc., 2020.
  • Technology and Media, Big Data Analytics Market, Report ID: FBI 106179, Jul, 2022.
  • Amonov M. T.: The Importance of Small Business in a Market Economy // Academic Journal of Digital Economics and Stability, 2021. V. 7. P. 61-68.
  • Akhatov A.R., Rashidov A.E. Big Data va unig turli sohalardagi tadbiqi // Descendants of Muhammad Al-Khwarizmi, 2021. N 4 (18). P. 135—44.
  • Sassi I., Anter S., Bekkhoucha A. Fast Parallel Constrained Viterbi Algorithm for Big Data wi Applications to Financial Time Series // International Conference on Robot Systems and Applications, ICRSA 9 April 2021, P. 50-55. DOI: 10.1145/3467691.3467697.
  • Alaeddine B., Nabil H., Habiba Ch. Parallel processing using big data and machine learning techniques for intrusion detection // I AES International Journal of Artificial Intelligence (I J-AI), September 2020. V. 9. N 3. P. 553-560. DOI: 10.11591/ijai.v9.i3.pp553-560.
  • Akhatov A.R., Nazarov F.M., Rashidov A.E. Increasing data reliability by using bigdata parallelization mechanisms // ICISCT 2021: Applications, Trends and Opportunities, 3-5.11.2021, DOI: 10.1109/ICISCT52966.2021.9670387.
  • Landset S., Khoshgoftaar T.M., Richter A.N., Hasanin T. A survey of open source tools for machine learning wi big data in the Hadoop ecosystem // Journal of Big Data (2015). 2:24, DOI: 10.1186/s40537-015-0032-l.
  • Oussous A., Benjelloun F.-Z., Lahcen A. A., Belfkih S. Big Data technologies: A survey // Journal of King Saud University — Computer and Information Sciences2018. N 30. P. 431-448. DOI: 10.1016/j.jksuci.2017.06.001.
  • Tang B., Chen Z., Hefferman G., Wei T., He H., Yang Q. A Hierarchical Distributed Fog Computing Architecture for Big Data Analysis in Smart Cities // ASE BigData and Socialinformatics, ASE BD and SI 2015, DOI: 10.1145/2818869.2818898.
  • Chen P., Chun-Yang Z. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data // Information Sciences, 10 August 2014. V. 275. P. 314-347. 10.1016/j.ins.2014.01.015.
  • Kunanets N., Vasiuta О., Boiko N. Advanced Technologies of Big Data Research in Distributed Information Systems // International Scientific and Technical Conference on Computer Sciences and Information Technologies, September 2019. P. 71-76. DOI: 10.1109/STC-CSIT.2019.8929756.
  • Smeliansky R. L. Model of Distributed Computing System Operation wi Time // Programming and Computer Software, 2013. V. 39. N 5. P. 233-241. DOI: 10.1134/S0361768813050046.
  • Akhatov A., Nazarov F., Rashidov A. Mechanisms of information reliability in big data and blockchain technologies // ICISCT 2021: Applications, Trends and Opportunities, 3-5.11.2021, DOI: 10.1109/ICISCT52966.2021.9670052.
  • B.M. Alom, Henskens F., Hannaford M. Query Processing and Optimization in Distributed Database Systems // IJCSNS International Journal of Computer Science and Network Security, Sept. 2009. V. 9. N 9. P. 143-152.
  • Fabian P., Alfonsa K. Efficient distributed query processing for autonomous RDF databases // International Conference on Extending Database Technology, EDBT 2012. DOI: 10.1145/2247596.2247640.
  • Ali A., Hamidah I., Izura U. N., Fatimah S. Processing skyline queries in incomplete distributed databases // Journal of Intelligent Information Systems, 2017. N 48. P. 399-420. DOI: 10.1007/sl0844- 016-0419-2.
  • Reyes-Ortiz J.L., Oneto L., Anguita D. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf // Procedia Computer Science, 2015. N 53. P. 121-130. DOI: 10.1016/j.procs.2015.07.286.
  • Reis Marco Antonio de Sousa, de Araujo Aleteia Patricia Favacho. ArchaDIA: An Architecture for Big Data as a Service in Private Cloud // CLOSER 2019 — 9th International Conference on Cloud Computing and Sendees Science, P. 187-197, DOI: 10.5220/0007787801870197.
  • Sandhu A.K. Big Data wi Cloud Computing: Discussions and Challenges // Big Data Mining And Analytics, 2022. V. 5. P. 32-40. DOI: 10.26599/BDMA.2021.9020016.
  • Nagarajan R., Thirunavukarasu R. Big Data Analytics in Cloud Computing: Effective Deployment of Data Analytics Tools // IGI Global, 2022, 17 pages, DOI: 10.4018/978-1-6684-3662- 2.ch011.
  • Wu C. Research on Clustering Algorithm Based on Big Data Background // Journal of Physics: Conf. 2019. Ser. 1237. P. 22-131. DOI: 10.1088/1742-6596/1237/2/022131.
  • Kurasova O., Marcinkevicius V., Medvedev V., Rapecka A., Stefanovic P. Strategies for Big Data Clustering // IEEE 26th International Conference on Tools wi Artificial Intelligence, 2014. P. 739-747. DOI: 10.1109/ICT AI.2014.115.
  • Garlasu D., Sandulescu V., Halcu I., Neculoiu G., Grigoriu O., Marinescu M., Marinescu V. A Big Data implementation based on Grid Computing // Conference: Roedunet International Conference (RoEduNet), 2013 11th, DOI: 10.1109/RoEduNet.2013.6511732.
  • Yuanyuan J. Smart grid big data processing technology and cloud computing application status quo and challenges // 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), 21-23 January 2022, DOI: 10.1109/ICPECA53709.2022.9719287.
  • Akhatov A.R., Sabharwal M., Nazarov F.M., Rashidov A.E. Application of cryptographic methods to blockchain technology to increase data reliability // 2nd International Conference on Advance Computing and Innovative Technologies in Engineering 2022, 28-29 April, DOI: 10.1109/ICACITE53722.2022.9823674.
  • Bollegala D. Dynamic Feature Scaling for Online Learning of Binary Classifiers // Knowledge¬Based Systems, July 2014, DOI: 10.1016/j.knosys.2017.05.010.
Еще
Статья научная