AccelStack: A Cost-Driven Analysis of 3D-Stacked LLM Accelerators
By Chen Bai 1, Xin Fan 1,Zhenhua Zhu 1,2, Wei Zhang 1, Yuan Xie 1
1 The Hong Kong University of Science and Technology
2 Tsinghua University
Abstract
Large language models (LLMs) show viability for artificial general intelligence (AGI) with high computing power and memory bandwidth demands. While existing LLM accelerators leverage high-bandwidth memory (HBM) and 2.5D packaging to address the challenge, emerging hybrid bonding techniques unlock new opportunities for 3D-stacked LLM accelerators. This paper proposes AccelStack, a cost-driven analysis for the new architecture via two innovations. First, a performance model capturing memory-on-logic is presented. Second, a cost model for die-on-die (DoD), die-on-wafer (DoW), and wafer-on-wafer (WoW) is proposed. Evaluations show 3D-stacked accelerators achieve up to 7.17× and 2.09× faster inference than NVIDIA A100 (FP16) and H100 (FP8) simulation results across various LLM workloads, with chiplet-based designs reducing recurring engineering costs by 38.09% versus monolithic implementations.
To read the full article, click here
Related Chiplet
- Interconnect Chiplet
- 12nm EURYTION RFK1 - UCIe SP based Ka-Ku Band Chiplet Transceiver
- Bridglets
- Automotive AI Accelerator
- Direct Chiplet Interface
Related Technical Papers
- A cost analysis of the chiplet as a SoC solution
- Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM
- CATCH: a Cost Analysis Tool for Co-optimization of chiplet-based Heterogeneous systems
- Thermal Issues Related to Hybrid Bonding of 3D-Stacked High Bandwidth Memory: A Comprehensive Review
Latest Technical Papers
- AccelStack: A Cost-Driven Analysis of 3D-Stacked LLM Accelerators
- ATMPlace: Analytical Thermo-Mechanical-Aware Placement Framework for 2.5D-IC
- Nanoelectromechanical Systems (NEMS) for Hardware Security in Advanced Packaging
- Ultrafast Semiconductor Chip Bonding Using Intense Pulsed Light Soldering for Chip-on-Glass Packaging
- Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism