Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators

By Arnav Shukla ¹, Harsh Sharma ², Srikant Bharadwaj ³, Vinayak Abrol ¹, Sujay Deb ¹
¹ Indraprastha Institute of Information Technology Delhi, New Delhi, India
² Washington State University, Pullman, Washington, USA
³ Microsoft Research, Redmond, Washington, USA

Abstract

Heterogeneous chiplet-based systems improve scaling by disaggregating CPUs/GPUs and emerging technologies (HBM/DRAM). However this on-package disaggregation introduces a latency in Network-on-Interposer (NoI). We observe that in modern large model inference, parameters and activations routinely move back and forth from HBM/DRAM, injecting large, bursty flows into the interposer. These memory-driven transfers inflate tail latency and violate Service Level Agreements (SLAs) across k-ary n-cube baseline NoI topologies. To address this gap we introduce an Interference Score (IS) that quantifies worst-case slowdown under contention. We then formulate NoI synthesis as a multi-objective optimization (MOO) problem. We develop PARL (Partition-Aware Reinforcement Learner), a topology generator that balances throughput, latency, and power. PARL-generated topologies reduce contention at the memory cut, meet SLAs, and cut worst-case slowdown to 1.2× while maintaining competitive mean throughput relative to linkrich meshes. Overall, this reframes NoI design for heterogeneous chiplet accelerators with workload-aware objectives.

Keywords: network-on-package, chiplets, Mixture-of-Experts, activation sparsity, sparse multicast, energy-efficiency

To read the full article, click here

Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators

Abstract

Related Chiplet

Related Technical Papers

Latest Technical Papers

Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators

Abstract

Subscribe to the Chiplet Marketplace Newsletter

Related Chiplet

Related Technical Papers

Latest Technical Papers