DISTIL: A Distributed Spiking Neural Network Accelerator on 2.5D Chiplet Systems

By Pramit Kumar Pal, Harsh Sharma, Abhishek Moitra, and Partha Pratim Pande 
School of Electrical Engineering and Computer Science, Washington State University, USA

Abstract

Spiking Neural Networks (SNNs) implemented on in-memory computing (IMC) based architectures offer a promising solution for energy-efficient inference. However, the area and memory required to store temporal neuronal state membrane potentials updated by leaky-integrate fire (LIF) activation functions increase with the growing complexity of SNN models. Chiplet based 2.5D architectures provide scalability, but deploying SNNs on such systems introduces a critical design trade-off: a single global LIF module minimizes area but increases inter-chiplet communication latency, while dedicating an LIF module per layer reduces latency at the cost of excessive memory overhead. Existing approaches do not adequately address this trade-off or the placement of LIF modules on the interposer, leading to either large area overhead or communication bottlenecks on the Network-on-Interposer (NoI). This paper proposes DISTIL, a design and optimization framework for high-performance, areaefficient multi-chiplet architecture for SNN inference. DISTIL performs a design-space exploration (DSE) to jointly optimize the grouping of neural layers into shared sets of LIF tiles and their physical placement on the interposer to lower inter-chiplet traffic. Our experimental results show that DISTIL achieves up to 4.3× higher throughput per unit area (TOPS/mm 2) compared to state-of-the-art SNN accelerators while reducing LIF memory overhead by (60-90%).

Index Terms — Spiking neural networks, 2.5D multi-chiplet systems, In-Memory Computing, design space exploration

To read the full article, click here