Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
By Jingwei Cai, Zuotong Wu, Sen Peng, Yuchen Wei, Zhanhong Tan, Guiming Shi, Mingyu Gao, Kaisheng Ma
Chiplet technology enables the integration of an increasing number of transistors on a single accelerator with higher yield in the post-Moore era, addressing the immense computational demands arising from rapid AI advancements. However, it also introduces more expensive packaging costs and costly Die-to-Die (D2D) interfaces, which require more area, consume higher power, and offer lower bandwidth than onchip interconnects. Maximizing the benefits and minimizing the drawbacks of chiplet technology is crucial for developing largescale DNN chiplet accelerators, which poses challenges to both architecture and mapping. Despite its importance in the postMoore era, methods to address these challenges remain scarce. To bridge the gap, we first propose a layer-centric encoding method to encode Layer-Pipeline (LP) spatial mapping for largescale DNN inference accelerators and depict the optimization space of it. Based on it, we analyze the unexplored optimization opportunities within this space, which play a more crucial role in chiplet scenarios. Based on the encoding method and a highly configurable and universal hardware template, we propose an architecture and mapping co-exploration framework, Gemini, to explore the design and mapping space of large-scale DNN chiplet accelerators while taking monetary cost (MC), performance, and energy efficiency into account. Compared to the state-of-the-art (SOTA) Simba architecture with SOTA Tangram LP Mapping, Gemini’s co-optimized architecture and mapping achieve, on average, 1.98× performance improvement and 1.41× energy efficiency improvement simultaneously across various DNNs and batch sizes, with only a 14.3% increase in monetary cost. Moreover, we leverage Gemini to uncover intriguing insights into the methods for utilizing chiplet technology in architecture design and mapping DNN workloads under chiplet scenarios.
To read the full article, click here
Related Chiplet
- DPIQ Tx PICs
- IMDD Tx PICs
- Near-Packaged Optics (NPO) Chiplet Solution
- High Performance Droplet
- Interconnect Chiplet
Related Technical Papers
- CarbonPATH: Carbon-aware pathfinding and architecture optimization for chiplet-based AI systems
- Communication Characterization of AI Workloads for Large-scale Multi-chiplet Accelerators
- High-Bandwidth Chiplet Interconnects for Advanced Packaging Technologies in AI/ML Applications: Challenges and Solutions
- Defect Analysis and Built-In-Self-Test for Chiplet Interconnects in Fan-out Wafer-Level Packaging
Latest Technical Papers
- Link Quality Aware Pathfinding for Chiplet Interconnects
- Effects of Poor Workload Partitioning on System Performance for Chiplet-Based Systems
- Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
- Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
- CarbonPATH: Carbon-aware pathfinding and architecture optimization for chiplet-based AI systems