Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets
By Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque (University of California)
To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency, compared to a monolithic accelerator with an optimized output-stationary dataflow.
Related Chiplet
- Direct Chiplet Interface
- HBM3e Advanced-packaging chiplet for all workloads
- UCIe AP based 8-bit 170-Gsps Chiplet Transceiver
- UCIe based 8-bit 48-Gsps Transceiver
- UCIe based 12-bit 12-Gsps Transceiver
Related Technical Papers
- SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
- RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures
- Workflows for tackling heterogeneous integration of chiplets for 2.5D/3D semiconductor packaging
- Five Workflows for Tackling Heterogeneous Integration of Chiplets for 2.5D/3D
Latest Technical Papers
- Analysis Of Multi-Chiplet Package Designs And Requirements For Production Test Simplification
- Spiking Transformer Hardware Accelerators in 3D Integration
- GATE-SiP: Enabling Authenticated Encryption Testing in Systems-in-Package
- AIG-CIM: A Scalable Chiplet Module with Tri-Gear Heterogeneous Compute-in-Memory for Diffusion Acceleration
- Chiplever: Towards Effortless Extension of Chiplet-based System for FHE