Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets

By Mohanad Odema, Hyoukjun Kwon, Mohammad Abdullah Al Faruque (University of California)

To address increasing compute demand from recent multi-model workloads with heavy models like large language models, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency, compared to a monolithic accelerator with an optimized output-stationary dataflow.

To read the full article, click here

Related Chiplet

DPIQ Tx PICs
IMDD Tx PICs
Near-Packaged Optics (NPO) Chiplet Solution
High Performance Droplet
Interconnect Chiplet

Related Technical Papers

SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
Chiplets on Wheels : Review Paper on holistic chiplet solutions for autonomous vehicles
Chiplets Are The New Baseline for AI Inference Chips

Latest Technical Papers

Link Quality Aware Pathfinding for Chiplet Interconnects
Effects of Poor Workload Partitioning on System Performance for Chiplet-Based Systems
Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
CarbonPATH: Carbon-aware pathfinding and architecture optimization for chiplet-based AI systems

Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets

Subscribe to the Chiplet Marketplace Newsletter

Related Chiplet

Related Technical Papers

Latest Technical Papers