LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration
By Tiantian Lin 1, Cheng Qiu 2, Xiaohang Wang 1, Ling Wang 3, Zhulin Zheng 1, Yingtao Jiang 4, Amit Kumar Singh 5, Jieming Yin 6, Sihai Qiu 7, Xiaodong Li 8, Xin Tang 8, Jie Song 8, Mingzhe Zhang 8, Kui Ren 1
1 The State Key Laboratory of Blockchain and Data Security, Zhejiang University and Hangzhou High-Tech Zone (Binjiang), Institute of Blockchain and Data Security Hangzhou, China
2 South China University of Technology Guangzhou, China
3 The University of Western Australia, Western Australia, Australia
4 University of Nevada, Las Vegas, Las Vegas, USA
5 University of Essex, Essex, United Kingdom
6 Nanjing University of Posts and Telecommunications, Nanjing, China
7 Beijing Smart-chip Microelectronics Technology Co., Ltd, Beijing, China
8 Ant Group Beijing, China

Abstract
The rise of multi-chiplet integration challenges existing simulators like gem5 and GPGPU-Sim for efficiently simulating heterogeneous multiple-chiplet systems due to incapability to modularly integrate heterogeneous chiplets and high synchronization overheads in parallel simulation. To address these limitations, this paper introduces LEGOSim, a unified parallel simulation framework capable of flexibly integrating various open-source and in-house designed chiplet simulators as processes in parallel simulation, referred to as "simlets" with minimal modifications needed. It introduces an on-demand synchronization protocol with adaptive time quanta and non-global fencing, ensuring synchronization only occurs when necessary, thus reducing overhead while maintaining correctness. The framework also integrates Network-on-Interposer (NoI) simulator for modeling inter-chiplet communication, enabling accurate assessment of various interconnection architectures’ performance. Evaluated with diverse benchmarks, LEGOSim shows high accuracy in simulating multi-chiplet architectures like SIMBA and a CiM-based accelerator, with average errors of 3.79% and 3.94%, respectively. It significantly reduces synchronization overhead by up to 99.9% compared to per-cycle synchronization and by 66.1% compared to time quantum synchronization, without synchronization errors. Five case studies show that LEGOSim also provides precise system performance metrics and stall cause reporting, simplifying tasks such as performance analysis and optimization, and can be used for design space exploration of various multi-chiplet systems.
Keywords: Architectural simulation, multi-chiplet system simulation.
To read the full article, click here
Related Chiplet
- DPIQ Tx PICs
- IMDD Tx PICs
- Near-Packaged Optics (NPO) Chiplet Solution
- High Performance Droplet
- Interconnect Chiplet
Related Technical Papers
- Co-Optimization of Power Delivery Network Design for 3-D Heterogeneous Integration of RRAM-Based Compute In-Memory Accelerators
- ATSim: A Fast and Accurate Simulation Framework for 2.5D/3D Chiplet Thermal Design Optimization
- HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
- Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism
Latest Technical Papers
- Failure Analysis in Transition: An Industry Survey of Challenges, Priorities, and Standardization Needs in Advanced Packaging and Heterogeneous Integration
- 2.5D Root of Trust: Securing the Chiplet Ecosystem
- Plasma Etch Process Optimization for Photonic-Grade Diamond-on-Insulator Substrates and Thickness Evaluation using Colorimetry
- CUTh-Solver: GPU-Accelerated Sparse Matrix Solver for High-Resolution Thermal Simulation of 3D ICs
- Making Locality-aware GEMM Compatible with Page-Granularity Placement on Chiplet GPUs