Processing-in-memory: текущие направления развития технологии

Автор: Снытникова Т.В.

Журнал: Проблемы информатики @problem-info

Рубрика: Прикладные информационные технологии

Статья в выпуске: 3 (60), 2023 года.

Бесплатный доступ

Перемещение данных между центральным процессором и оперативной памятью является препятствием первого порядка на пути повышения производительности, масштабируемости и энергоэффективности современных систем. Компьютерные системы используют ряд методов для снижения накладных расходов, связанных с перемещением данных, начиная с традиционных механизмов и заканчивая новыми методами, такими как вычисления в памяти (Processingin-Memory, PIM). Эти методы можно разделить на два больших класса: вычисления рядом с памятью (processing-near-memory, PNM), когда вычисления выполняются в выделенных элементах обработки, и вычисление с использованием памяти (processing-using-memory, PUM), когда вычисления выполняются внутри массива памяти за счет использования внутренних аналоговых рабочих свойств запоминающего устройства. В работе рассматривается парадигма архитектур PIM и приводится обзор архитектур PUM, основанных на параллельных операциях DRAM и ассоциативных процессорах.

Еще

Архитектуры вычислений в памяти, ассоциативные процессоры, pdram

Короткий адрес: https://sciup.org/143181005

IDR: 143181005 | DOI: 10.24412/2073-0667-2023-3-37-54

Список литературы Processing-in-memory: текущие направления развития технологии

Boroumand A. et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks //In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). Association for Computing Machinery, New York, NY, USA, 2018, P. 316-331.
Mutlu O. et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation // Microprocessors and Microsystems, 2019. V. 67, P. 28-41.
Ghose S. et al., The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption Beyond-CMOS Technologies for Next Generation Computer Design, 2019.
Mutlu O. et al. Enabling Practical Processing in and Near Memory For Datalntensive Computing // Enabling Practical Processing in and near Memory for Data-Intensive Computing, 2019, P. 1-4.
Ghose S. et al. Processing-in-Memory: A Workload-Driven Perspective //in IBM Journal of Research and Development, 2019, V. 63. N 6. P. 3:1-3:19.
Siegl P. et al. Data-Centric Computing Frontiers: A Survey on Processing-in-Memory // Data- Centric Computing Frontiers: A Survey On Processing-In-Memory, 2016, P. 295-308.
Mutlu O. et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation // Microprocessors and Microsystems, 2019, V. 67, P. 28-41.
Wulf W. A., McKee S. A. Hitting the Memory Wall: Implications of the Obvious // SIGARCH Comput. Archit. News, 1995, V. 23, P. 20-24.
Alshahrani R. The Path to Exascale Computing // https://bit.ly/3CIzcll, 2015.
Oliveira G. F. et al. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks // IEEE Access, 2021, V. 9, P. 134457-134502.
Mutlu O. et al. A modern primer on processing in memory // Emerging Computing: From Devices to Systems: Looking Beyond Moore and Von Neumann. Singapore: Springer Nature Singapore, 2022. P. 171-243.
Santoro G., Turvani G., Graziano M. New logic-in-memory paradigms: An architectural and technological perspective // Micromachines, 2019, V. 10. N 6. P. 368.
Singh G. et al. Near-memory computing: Past, present, and future // Microprocessors and Microsystems, 2019, V. 71, P. 102868.
Seshadri V., Mutlu O. The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR // arXiv preprint arXiv:1610.09603, 2016.
Seshadri V., Mutlu O. Simple operations in memory to reduce data movement // Advances in Computers, Elsevier, 2017, V. 106, P. 107-166.
Kim J. S. et al. GRIM-filter: Fast seed filtering in read mapping using emerging memory technologies //arXiv preprint arXiv:1708.04329, 2017.
Kim J. S. et al. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies // BMC genomics, 2018, V. 19, N 2. P. 23-40.
Boroumand A. et al. LazyPIM: An efficient cache coherence mechanism for processing-in-memory // IEEE Computer Architecture Letters, 2016, V. 16. N 1. P. 46-50.
Hashemi M., Mutlu O., Patt Y. N. Continuous runahead: Transparent hardware acceleration for memory intensive workloads // 2016 49th Annual IEEE/АСМ International Symposium on Microarchitecture (MICRO), IEEE, 2016, P. 1-12.
Hashemi M. et al. Accelerating dependent cache misses with an enhanced memory controller // ACM SIGARCH Computer Architecture News, 2016, V. 44. N 3. P. 444-455.
Seshadri V. et al. Gather-scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses // Proceedings of the 48th International Symposium on Microarchitecture, 2015, P. 267-280.
Pattnaik A. et al. Scheduling techniques for GPU architectures with processing-in-memory capabilities // Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016, P. 31-44.
Zhang D. et al. TOP-PIM: Throughput-oriented programmable processing in memory // Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, 2014, P. 85-98.
Ahn J. et al. A scalable processing-in-memory accelerator for parallel graph processing // Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, P. 105¬117.
Gao M., Kozyrakis C. HRL: Efficient and flexible reconfigurable logic for near-data processing // 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), leee, 2016, P. 126-137.
Ahn J. et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture // ACM SIGARCH Computer Architecture News, 2015, V. 43, N 3S, P. 336-348.
Lee J. H., Sim J., Kim H. BSSync: Processing near memory for machine learning workloads with bounded staleness consistency models // 2015 International Conference on Parallel Architecture and Compilation (PACT), IEEE, 2015, P. 241-252.
Nai L. et al. Graphpim: Enabling instruction-level pim offloading in graph computing frameworks // 2017 IEEE International symposium on high performance computer architecture (HPCA), IEEE, 2017, P. 457-468.
Seshadri V. et al. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology // Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017, P. 273-287.
Seshadri V. et al. Fast bulk bitwise AND and OR in DRAM // IEEE Computer Architecture Letters, 2015, V. 14, N 2, P. 127-131.
Seshadri V., Mutlu O. Simple operations in memory to reduce data movement // Advances in Computers, Elsevier, 2017, V. 106. P. 107-166.
Kang Н. В., Hong. S. К. One-Transistor Type DRAM // US Patent 7701751, 2009
Lu S. L., Lin У. C., Yang C. L. Improving DRAM latency with dynamic asymmetric subarray // Proceedings of the 48th International Symposium on Microarchitecture, 2015, P. 255-266.
Gao F., Tziantzioulis G., Wentzlaff D. Computedram: In-memory compute using off-the-shelf drams // Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture, 2019, P. 100-113.
Seshadri V. et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization // Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013. P. 185-197.
Kim J. S. et al. D-RaNGe: Using commodity DRAM devices to generate true random numbers with low latency and high throughput // 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE, 2019, V. 582-595.
Hajinazar N. et al. SIMDRAM: a framework for bit-serial SIMD processing using DRAM // Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, P. 329-345.
Olgun A. et al. QUAC-TRNG: High-throughput true random number generation using quadruple row activation in commodity DRAM chips // 2021 АСМ/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), IEEE, 2021, P. 944-957.
Ferreira J. D. et al. pluto: In-dram lookup tables to enable massively parallel general-purpose computation //arXiv preprint arXiv:2104.07699, 2021.
Olgun A. et al. PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in- DRAM // ACM Transactions on Architecture and Code Optimization, 2022, V. 20, N 1, P. 1-31.
Yaglikci A. G. et al. HiRA: Hidden Row Activation for Reducing Refresh Latency of Off- the-Shelf DRAM Chips // 2022 55th IEEE/АСМ International Symposium on Microarchitecture (MICRO), IEEE, 2022, P. 815-834.
Garzon E. et al. AM 4: MRAM crossbar based CAM/TCAM/ACAM/AP for in-memory computing // IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2023, V. 13, N 1, P. 408-421.
Снытникова T. В. Развитие ассоциативных параллельных архитектур // Проблемы ин-форматики. 2019. No 2. С. 36-50.
Мартышкин, А. И. Специализированный аппаратный модуль ассоциативного сопроцессора на базе ПЛИС для вычислительных систем с изменяемой структурой / А. И. Мартышкин, А. Н. Перекусихина // XXI век: итоги прошлого и проблемы настоящего плюс, 2019, Т. 8, № 3(47), С. 42-50.
Бондаренко М. Ф., Хаханов В. И., Литвинова Е. И. Структура логического ассоциативного мультипроцессора // Автоматика и телемеханика, 2012, № 10, С. 71-92.
Гайдук С. и др. Сферический мультипроцессор PRUS для решения булевых уравнений // Радиоэлектроника и информатика, 2004, № 4 (29). С. 69-78.
Yantir Н. Е. et al. An ultra-area-efficient 1024-point in-memory EFT processor // Micromachines, 2019, V. 10, No 8, P. 509-514.
Kaplan R., Yavits L., Ginosasr R. BioSEAL: In-memory biological sequence alignment accelerator for large-scale genomic data // Proceedings of the 13th ACM International Systems and Storage Conference, 2020, P. 36-48.
Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment //Proteins: Structure, Function, and Bioinformatics, 1991, V. 9, N 1, P. 56-68.
Hanhan R. et al. Edam: edit distance tolerant approximate matching content addressable memory // Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022, P. 495-507.
Zhong Н. et al. ASMCap: An Approximate String Matching Accelerator for Genome Sequence Analysis Based on Capacitive Content Addressable Memory // arXiv preprint arXiv:2302.07478. — 2023.
Снытникова T. В., Непомнящая А. Ш. Решение задач на графах с помощью STAR- машины, реализуемой на графических ускорителях // Прикладная дискретная математика. 2016. Т. 3(33). С. 98-115.
Снытникова Т. В. Реализация модели ассоциативных вычислений на СРи:библиотека базовых процедур языка STAR // Вычислительные методы и программирование. Новые вычислительные технологии. 2018. Т. 19. С. 85-95.

Еще

Статья научная