Dynamic load balancing algorithms application automation subsystem development for LUNA system

Автор: Malyshkin Victor Emmanuilovich, Perepelkin Vladislav Aleksandrovich, Chmil Alexander Vladimirovich

Журнал: Проблемы информатики @problem-info

Рубрика: Прикладные информационные технологии

Статья в выпуске: 4 (57), 2022 года.

Бесплатный доступ

The imbalance of computational load is one of the key problems of parallel programming. The imbalance of the computational load occurs due to various factors. The causes of the imbalance can be both hardware and software. Hardware imbalance occurs due to heterogeneity of computing system resources. The program imbalance is associated with such factors as the dynamics of the simulated phenomenon and an inefficiently parallelized program containing excessive communications, an unsuccessful distribution of calculations between nodes, etc. Parallel implementation on supercomputers of large numerical models which requires dynamic load balancing on computing nodes is a complex task of system parallel programming. Solving this problem requires certain qualifications. Ordinary users of supercomputers in the field of scientific numerical modeling usually do not have such qualifications, which makes it difficult to use supercomputers in the relevant tasks. The problem is partially solved by using specialized software, where dynamic load balancing has already been implemented. However, the use of such software is not always possible, especially for new numerical models. Dynamic load balancing is a relatively well-developed topic. There are many publications on this topic. There are algorithms, methods, software implementations and studies of their properties. However, the task of ensuring efficient and balanced performance of supercomputer computing resources is still time-consuming and requires relevant qualifications. Even a “simple” adjustment of the parameters of the load balancing algorithm can become an insurmountable problem in practice. The problem is especially relevant for modern supercomputers of the peta and exaflops ranges, since it is not trivial to provide a sufficiently full load of computing resources of such supercomputers even for simple tasks. The elimination of the imbalance is a non-trivial task, for which there is no single method. There are many algorithms aimed at eliminating the imbalance, but none of them is universal. The principal solution is automation of dynamic balancing of the computing load. Automation in this case refers to a situation when various methods, algorithms and programs that perform dynamic load balancing accumulate in some database or library in a form that allows their automatic application. User creates his program in such a way that the corresponding methods, algorithms, and programs are applied automatically, without the need for the user to deeply understand the problems of dynamic load balancing. A specific case is the support for dynamic balancing of computational load in software such as Cliarm++ or PICADOR. A general case would be a situation where the programming system is not specialized, and all the knowledge about dynamic load balancing accumulated by researchers is available and automatically applied in the library. It is significant that there are no universal dynamic balancing algorithms due to the algorithmic complexity of this problem in the general formulation. Therefore, various particular and heuristic methods and algorithms used in various practical tasks are being researched in the relevant field. Accordingly, the automation of dynamic load balancing is based on the accumulation of these particular and heuristic algorithms, as well as on the information about their appropriate usage, and on how to determine which case is more suitable in a particular situation. The LuNA system of automatic design of parallel programs develops with an understanding of this circumstance. One of the tasks of the system is to ensure the accumulation and automatic application of knowledge about dynamic balancing of computing load. The paper reveals the question of how fundamentally the LuNA system is suited to provide this accumulation and automatic application, and also provides information about the extent at which this approach is currently implemented. In particular, the results of an experimental study of the performance characteristics of programs on LuNA systems using various dynamic load balancing algorithms are presented.

Еще

Система luna

Короткий адрес: https://sciup.org/143179782

IDR: 143179782   |   DOI: 10.24412/2073-0667-2022-4-107-119

Статья научная