System-Technology Co-Optimization for Dense Edge Architectures using 3D Integration and Non-Volatile Memory
By Leandro M. Giacomini Rocha 1; Mohamed Naeim 1,5,6; Guilherme Paim 3,4; Moritz Brunion 1; Priya Venugopal 1; Dragomir Milojevic 5 , James Myers 2, Mustafa Badaroglu 7, Marian Verlhest 4, Julien Ryckaert 1, and Dwaipayan Biswas 1
1 imec, Leuven, Belgium
2 imec-UK, Cambridge, United Kingdom
3 INESC-ID, Lisbon, Portugal
4 KU Leuven, Leuven, Belgium
5 Université Libre de Bruxelles, Brussels, Belgium
6 Cadence Design Systems, San Jose, CA, USA
7 Qualcomm, San Diego, CA, USA
High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density and small form factor, requiring a design space exploration across the whole stack – workloads, architecture, mapping and co-optimization with emerging technology. In this paper, we present an system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit-torque (VGSOT) MRAM, combined with memory-on-logic fine-pitch 3D wafer-to-wafer hybrid bonding. We observe that 3D system integration of SRAM-based design leads to 9% power savings with 53% footprint reduction at iso-frequency w.r.t. 2D implementation for the same memory capacity. 3D NVM-VGSOT allows 4× memory capacity increase with 30% footprint reduction at iso-power compared to 2D SRAM 1×. Our exploration with two diverse workloads – image resolution enhancement (FSRCNN) and eye tracking (EDSNet) – shows that more resources allow better workload mapping possibilities which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a 32× memory capacity can lead to a 7.4× faster execution with 5.7× higher effective TOPS/W than the 1× memory capacity case on the same technology.
To read the full article, click here
Related Chiplet
- 12nm EURYTION RFK1 - UCIe SP based Ka-Ku Band Chiplet Transceiver
- Interconnect Chiplet
- Bridglets
- Automotive AI Accelerator
- Direct Chiplet Interface
Related Technical Papers
- Co-Optimization of Power Delivery Network Design for 3-D Heterogeneous Integration of RRAM-Based Compute In-Memory Accelerators
- Flexible electronic-photonic 3D integration from ultrathin polymer chiplets
- 3D Integration, Advanced Metrology Shape the Semiconductor Landscape
- 2D materials-based 3D integration for neuromorphic hardware
Latest Technical Papers
- Enhancing Test Efficiency through Automated ATPG-Aware Lightweight Scan Instrumentation
- Modeling Chiplet-to-Chiplet (C2C) Communication for Chiplet-based Co-Design
- Die-Level Transformation of 2D Shuttle Chips into 3D-IC for Advanced Rapid Prototyping using Meta Bonding
- STAMP-2.5D: Structural and Thermal Aware Methodology for Placement in 2.5D Integration
- MCMComm: Hardware-Software Co-Optimization for End-to-End Communication in Multi-Chip-Modules