System-Technology Co-Optimization for Dense Edge Architectures using 3D Integration and Non-Volatile Memory

By Leandro M. Giacomini Rocha ¹; Mohamed Naeim ^1,5,6; Guilherme Paim ^3,4; Moritz Brunion ¹; Priya Venugopal ¹; Dragomir Milojevic⁵ , James Myers ², Mustafa Badaroglu ⁷, Marian Verlhest ⁴, Julien Ryckaert ¹, and Dwaipayan Biswas ¹
¹ imec, Leuven, Belgium
² imec-UK, Cambridge, United Kingdom
³ INESC-ID, Lisbon, Portugal
⁴ KU Leuven, Leuven, Belgium
⁵ Université Libre de Bruxelles, Brussels, Belgium
⁶ Cadence Design Systems, San Jose, CA, USA
⁷Qualcomm, San Diego, CA, USA

High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density and small form factor, requiring a design space exploration across the whole stack – workloads, architecture, mapping and co-optimization with emerging technology. In this paper, we present an system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit-torque (VGSOT) MRAM, combined with memory-on-logic fine-pitch 3D wafer-to-wafer hybrid bonding. We observe that 3D system integration of SRAM-based design leads to 9% power savings with 53% footprint reduction at iso-frequency w.r.t. 2D implementation for the same memory capacity. 3D NVM-VGSOT allows 4× memory capacity increase with 30% footprint reduction at iso-power compared to 2D SRAM 1×. Our exploration with two diverse workloads – image resolution enhancement (FSRCNN) and eye tracking (EDSNet) – shows that more resources allow better workload mapping possibilities which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a 32× memory capacity can lead to a 7.4× faster execution with 5.7× higher effective TOPS/W than the 1× memory capacity case on the same technology.

To read the full article, click here

System-Technology Co-Optimization for Dense Edge Architectures using 3D Integration and Non-Volatile Memory

Related Chiplet

Related Technical Papers

Latest Technical Papers

System-Technology Co-Optimization for Dense Edge Architectures using 3D Integration and Non-Volatile Memory

Subscribe to the Chiplet Marketplace Newsletter

Related Chiplet

Related Technical Papers

Latest Technical Papers