AccelStack: A Cost-Driven Analysis of 3D-Stacked LLM Accelerators

By Chen Bai ¹, Xin Fan ¹,Zhenhua Zhu ^1,2, Wei Zhang ¹, Yuan Xie ¹
¹ The Hong Kong University of Science and Technology
² Tsinghua University

Abstract

Large language models (LLMs) show viability for artificial general intelligence (AGI) with high computing power and memory bandwidth demands. While existing LLM accelerators leverage high-bandwidth memory (HBM) and 2.5D packaging to address the challenge, emerging hybrid bonding techniques unlock new opportunities for 3D-stacked LLM accelerators. This paper proposes AccelStack, a cost-driven analysis for the new architecture via two innovations. First, a performance model capturing memory-on-logic is presented. Second, a cost model for die-on-die (DoD), die-on-wafer (DoW), and wafer-on-wafer (WoW) is proposed. Evaluations show 3D-stacked accelerators achieve up to 7.17× and 2.09× faster inference than NVIDIA A100 (FP16) and H100 (FP8) simulation results across various LLM workloads, with chiplet-based designs reducing recurring engineering costs by 38.09% versus monolithic implementations.

To read the full article, click here

AccelStack: A Cost-Driven Analysis of 3D-Stacked LLM Accelerators

Abstract

Related Chiplet

Related Technical Papers

Latest Technical Papers

AccelStack: A Cost-Driven Analysis of 3D-Stacked LLM Accelerators

Abstract

Subscribe to the Chiplet Marketplace Newsletter

Related Chiplet

Related Technical Papers

Latest Technical Papers