Clustering Russian Federation regions according to the level of socio-economic development with the use of machine learning methods

Автор: Ketova Karolina V., Kasatkina Ekaterina V., Vavilova Diana D.

Журнал: Economic and Social Changes: Facts, Trends, Forecast @volnc-esc-en

Рубрика: Regional economy

Статья в выпуске: 6 т.14, 2021 года.

Бесплатный доступ

The paper solves the problem of clustering Russian Federation regions according to their socioeconomic development, taking into account the sectoral structure of the gross regional product. Classical machine learning methods are a tool for solving the clustering problem. The object of the study is the differentiation of regions according to various socio-economic indicators. The subject of the study is the practice of using machine learning methods for clustering objects. The initial database for solving the problem of clustering regions includes actual statistical data on socio-economic development of RF constituent entities and the sectoral structure of their gross regional product as of 2019. We identify clusters of regions according to their socio-economic development with the use of modern machine learning methods implemented in Python, a high-level programming language, with the connection of libraries for working with data: Pandas, Sklearn, SciPy, etc. The preprocessing of the initial data was carried out: digitization of data categories, transition to specific values, standardization of indicators. The initial data set for 2019 contains 5,525 records on 65 indicators of socio-economic development for 85 regions of the Russian Federation. It identifies 15 basic indicators of socio-economic development of a region, based on the principal component analysis. According to these indicators, five regional clusters were identified with the use of the k-means clustering: the first cluster is characterized by a high share of wholesale and retail trade, real estate transactions, professional, scientific and technological activities in the GRP structure; the second cluster specializes in manufacturing, wholesale and retail trade, real estate transactions, agriculture and forestry; the third cluster can be described as a cluster with a mixed economy, which is characterized by averages for the main socio-economic indicators in the Russian Federation; regions of the fourth cluster show a high level of unemployment and a high share of public administration, military and social security; the fifth cluster specializes in mining.

Еще

Socio-economic indicators, industry structure, gross regional product, machine learning, cluster analysis, principal component analysis

Короткий адрес: https://sciup.org/147236306

IDR: 147236306   |   DOI: 10.15838/esc.2021.6.78.4

Список литературы Clustering Russian Federation regions according to the level of socio-economic development with the use of machine learning methods

  • Golova I.M., Sukhovey A.F. Differentiation of innovative development strategies considering specific characteristics of the Russian regions. Economy of Region, 2019, vol.15, pp. 1294–1308. DOI:10.17059/2019-4-25
  • Mariev O., Pushkarev A. Clustering Russian regions by innovative outputs using a multi indicator approach. In: Proceedings of the 7th International Conference Innovation Management, Entrepreneurship and Sustainability (IMES), 2019. Pp. 519–533.
  • Ketova K.V., Vavilova D.D. Neural network forecasting algorithm as a tool for assessing human capital trends of the socio-economic system. Ekonomicheskie i sotsial’nye peremeny: fakty, tendentsii, prognoz=Economic and Social Changes: Facts, Trends, Forecast, 2020, vol. 13, no. 6, pp. 117–133DOI:10.15838/esc.2020.6.72.7 (in Russian).
  • Shubat O.M., Bagirova A.P., Akishev A.A. Methodology for analyzing the demographic potential of Russian regions using fuzzy clustering. Economy of Region, vol. 15, pp. 178–190. DOI:10.17059/2019-1-14
  • Ketova K.V., Trushkova E.V. The solution of the logistics task of fuel supply for the regional distributed heat supply system. Komp’yuternye issledovaniya i modelirovanie=Computer Research and Modeling, 2012, vol. 4, no. 2, pp. 451–470 (in Russian).
  • Lokosov V.V., Ryumina E.V., Ul’yanov V.V. Clustering of regions by indicators of quality of life and quality of population. Narodonaselenie=Population, 2019, vol. 22, no. 4, pp. 4–17 (in Russian).
  • Kostina S.N., Trynov A.V. Cluster analysis of the dynamics of the birth rate of fourth and subsequent children in Russian regions. Ekonomicheskie i sotsial’nye peremeny: fakty, tendentsii, prognoz=Economic and Social Changes: Facts, Trends, Forecast, 2021, vol. 14, no 3, pp. 232–245. DOI:10.15838/esc.2021.3.75.14 (in Russian).
  • Lavrinenko P.A., Rybakova D.A. Comparative analysis of regional differences in healthcare, environment, and public health. Ekonomicheskie i sotsial’nye peremeny: fakty, tendentsii, prognoz=Economic and Social Changes: Facts, Trends, Forecast, 2015, no. 5(41), pp. 198–210 (in Russian).
  • Petrykina I.N. Cluster analysis of regions of the Central Federal District in terms of human capital development. Vestnik Voronezhskogo gosudarstvennogo universiteta. Ekonomika i upravlenie=Proceedings of Voronezh State University. Series: Economics and Management, 2013, no. 1, pp. 72–80 (in Russian).
  • Demichev V.V., Maslakova V.V., Nestratova A.A. Clustering Russian regions by level of agricultural efficiency. Bukhuchet v sel’skom khozyaistve=Accounting in Agriculture, 2020, no. 12, pp. 58–66. DOI:10.33920/sel-11-2012-06 (in Russian).
  • Aksenov I.A. Clustering of foreign economic activity of regions. Ekonomika i menedzhment sistem upravleniya=Economics and Systems Management, 2016, no. 1–3, pp. 309–315 (in Russian).
  • Marchenko E.M., Belova T.D. Clustering of regions taking into account the energy efficiency. Regional’naya ekonomika: teoriya i praktika=Regional Economics: Theory and Practice, 2016, no. 1(424), pp. 51–60 (in Russian).
  • Paul S., Alvi A.M., Nirjhor M.A., Rahman S., Orcho A.K., Rahman R.M. Analyzing accident prone regions by clustering. Advanced Topics in Intelligent Information and Database Systems, 2017, vol. 710, pp. 3–13.
  • Orlova I.V., Filonova E.S. Cluster analysis of the regions of the central federal district socio-economic and demographic indicators. Statistika i ekonomika=Statistics and Economics, 2015, no. 5, pp. 111–115. DOI: 10.21686/2500-3925-2015-5-136-142 (in Russian).
  • Ultsch A., Lotsch J. Machine-learned cluster identification in high-dimensional data. Journal of Biomedical Informatics, 2017, vol. 66, pp. 95–104. DOI: 10.1016/j.jbi.2016.12.011
  • Khan I., Luo Z., Shaikh A.K., Hedjam R. Ensemble clustering using extended fuzzy k-means for cancer data analysis. Expert Systems with Applications, 2021, vol. 172, 114622. DOI: 10.1016/j.eswa.2021.114622
  • Ming F., Stephen T.A Machine learning based asset pricing factor model comparison on anomaly portfolios. Economics Letters, 2021, vol. 204, 109919. DOI: 10.1016/j.econlet.2021.109919
  • Blekanov I., Krylatov A., Ivanov D., Bubnova Y. Big data analysis in social networks for managing risks in clothing industry. IFAC PapersOnLine, 2019, vol. 52 (13), pp. 1710–1714.DOI: 10.1016/j.ifacol.2019.11.447
  • Arthur D., Vassilvitskii S. K-means++: The advantages of careful seeding. In: Conference: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA. DOI: 10.1145/1283383.1283494
  • Ozgur O., Akkoc U. Inflation forecasting in an emerging economy: Selecting variables with machine learning algorithms. International Journal of Emerging Markets, 2020. DOI: 10.1108/IJOEM-05-2020-0577
  • Faizullin R.V. Simulator of the navigation equipped with LIDAR of the mobile robot based on the neural network. IOP Conference Series: Materials Science and Engineering, 2020, vol. 873, no. 1. DOI: 10.1088/1757-899X/873/1/012023
  • De Sousa J.M., Santos R.L.D., Lopes L.A., Machado V.P., Silva I.S. Automatic labelling of clusters with discrete and continuous data using supervised machine learning. In: Proceedings of the 35th International Conference of the Chilean Computer Science Society (SCCC). 2016.
  • Lee C.H., Steigerwald D.G. Inference for clustered data. Stata Journal, 2018, vol. 18, no. 2, pp. 447–460. DOI: 10.1177/1536867X1801800210
  • Mitra D., Chu Y., Cetin K. Cluster analysis of occupancy schedules in residential buildings in the United States. Energy and Buildings, 2021, vol. 236, 110791, DOI: 10.1016/j.enbuild.2021.110791
  • Ofetotse E.L., Essah E.A., Yao R. Evaluating the determinants of household electricity consumption using cluster analysis. Journal of Building Engineering, 2021, vol. 43, 102487, DOI: 10.1016/j.jobe.2021.102487
  • Aivazian S., Afanasiev M., Kudrov A. Indicators of the main directions of socio-economic development in the space of characteristics of regional differentiation. Applied Econometrics, 2019, vol. 54, pp. 51–69. DOI: 10.24411/1993-7601-2019-10003
  • Kasatkina E.V., Vavilova D.D. Information-analytical system to forecast the factors of regional development. Problemy upravleniya=Control Siences, 2015, no. 4, pp. 25–34 (in Russian).
  • Omuya E.O., Okeyo G.O., Kimwele M.W. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 2021, vol. 174, 114765. DOI: 10.1016/j.eswa.2021.114765
Еще
Статья научная