LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration
By Tiantian Lin 1, Cheng Qiu 2, Xiaohang Wang 1, Ling Wang 3, Zhulin Zheng 1, Yingtao Jiang 4, Amit Kumar Singh 5, Jieming Yin 6, Sihai Qiu 7, Xiaodong Li 8, Xin Tang 8, Jie Song 8, Mingzhe Zhang 8, Kui Ren 1
1 The State Key Laboratory of Blockchain and Data Security, Zhejiang University and Hangzhou High-Tech Zone (Binjiang), Institute of Blockchain and Data Security Hangzhou, China
2 South China University of Technology Guangzhou, China
3 The University of Western Australia, Western Australia, Australia
4 University of Nevada, Las Vegas, Las Vegas, USA
5 University of Essex, Essex, United Kingdom
6 Nanjing University of Posts and Telecommunications, Nanjing, China
7 Beijing Smart-chip Microelectronics Technology Co., Ltd, Beijing, China
8 Ant Group Beijing, China

Abstract
The rise of multi-chiplet integration challenges existing simulators like gem5 and GPGPU-Sim for efficiently simulating heterogeneous multiple-chiplet systems due to incapability to modularly integrate heterogeneous chiplets and high synchronization overheads in parallel simulation. To address these limitations, this paper introduces LEGOSim, a unified parallel simulation framework capable of flexibly integrating various open-source and in-house designed chiplet simulators as processes in parallel simulation, referred to as "simlets" with minimal modifications needed. It introduces an on-demand synchronization protocol with adaptive time quanta and non-global fencing, ensuring synchronization only occurs when necessary, thus reducing overhead while maintaining correctness. The framework also integrates Network-on-Interposer (NoI) simulator for modeling inter-chiplet communication, enabling accurate assessment of various interconnection architectures’ performance. Evaluated with diverse benchmarks, LEGOSim shows high accuracy in simulating multi-chiplet architectures like SIMBA and a CiM-based accelerator, with average errors of 3.79% and 3.94%, respectively. It significantly reduces synchronization overhead by up to 99.9% compared to per-cycle synchronization and by 66.1% compared to time quantum synchronization, without synchronization errors. Five case studies show that LEGOSim also provides precise system performance metrics and stall cause reporting, simplifying tasks such as performance analysis and optimization, and can be used for design space exploration of various multi-chiplet systems.
Keywords: Architectural simulation, multi-chiplet system simulation.
To read the full article, click here
Related Chiplet
- DPIQ Tx PICs
- IMDD Tx PICs
- Near-Packaged Optics (NPO) Chiplet Solution
- High Performance Droplet
- Interconnect Chiplet
Related Technical Papers
- Heterogeneous Integration Technologies for Artificial Intelligence Applications
- Co-Optimization of Power Delivery Network Design for 3-D Heterogeneous Integration of RRAM-Based Compute In-Memory Accelerators
- ATSim: A Fast and Accurate Simulation Framework for 2.5D/3D Chiplet Thermal Design Optimization
- HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
Latest Technical Papers
- Link Quality Aware Pathfinding for Chiplet Interconnects
- Effects of Poor Workload Partitioning on System Performance for Chiplet-Based Systems
- Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
- Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
- CarbonPATH: Carbon-aware pathfinding and architecture optimization for chiplet-based AI systems