LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration
By Tiantian Lin 1, Cheng Qiu 2, Xiaohang Wang 1, Ling Wang 3, Zhulin Zheng 1, Yingtao Jiang 4, Amit Kumar Singh 5, Jieming Yin 6, Sihai Qiu 7, Xiaodong Li 8, Xin Tang 8, Jie Song 8, Mingzhe Zhang 8, Kui Ren 1
1 The State Key Laboratory of Blockchain and Data Security, Zhejiang University and Hangzhou High-Tech Zone (Binjiang), Institute of Blockchain and Data Security Hangzhou, China
2 South China University of Technology Guangzhou, China
3 The University of Western Australia, Western Australia, Australia
4 University of Nevada, Las Vegas, Las Vegas, USA
5 University of Essex, Essex, United Kingdom
6 Nanjing University of Posts and Telecommunications, Nanjing, China
7 Beijing Smart-chip Microelectronics Technology Co., Ltd, Beijing, China
8 Ant Group Beijing, China

Abstract
The rise of multi-chiplet integration challenges existing simulators like gem5 and GPGPU-Sim for efficiently simulating heterogeneous multiple-chiplet systems due to incapability to modularly integrate heterogeneous chiplets and high synchronization overheads in parallel simulation. To address these limitations, this paper introduces LEGOSim, a unified parallel simulation framework capable of flexibly integrating various open-source and in-house designed chiplet simulators as processes in parallel simulation, referred to as "simlets" with minimal modifications needed. It introduces an on-demand synchronization protocol with adaptive time quanta and non-global fencing, ensuring synchronization only occurs when necessary, thus reducing overhead while maintaining correctness. The framework also integrates Network-on-Interposer (NoI) simulator for modeling inter-chiplet communication, enabling accurate assessment of various interconnection architectures’ performance. Evaluated with diverse benchmarks, LEGOSim shows high accuracy in simulating multi-chiplet architectures like SIMBA and a CiM-based accelerator, with average errors of 3.79% and 3.94%, respectively. It significantly reduces synchronization overhead by up to 99.9% compared to per-cycle synchronization and by 66.1% compared to time quantum synchronization, without synchronization errors. Five case studies show that LEGOSim also provides precise system performance metrics and stall cause reporting, simplifying tasks such as performance analysis and optimization, and can be used for design space exploration of various multi-chiplet systems.
Keywords: Architectural simulation, multi-chiplet system simulation.
To read the full article, click here
Related Chiplet
- DPIQ Tx PICs
- IMDD Tx PICs
- Near-Packaged Optics (NPO) Chiplet Solution
- High Performance Droplet
- Interconnect Chiplet
Related Technical Papers
- Heterogeneous Integration Technologies for Artificial Intelligence Applications
- Co-Optimization of Power Delivery Network Design for 3-D Heterogeneous Integration of RRAM-Based Compute In-Memory Accelerators
- ATSim: A Fast and Accurate Simulation Framework for 2.5D/3D Chiplet Thermal Design Optimization
- HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
Latest Technical Papers
- CHICO-Agent: An LLM Agent for the Cross-layer Optimization of 2.5D and 3D Chiplet-based Systems
- A PPA-Driven 3D-IC Partitioning Selection Framework with Surrogate Models
- Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs
- ChipLight: Cross-Layer Optimization of Chiplet Design with Optical Interconnects for LLM Training
- ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving