LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration

By  Tiantian Lin 1, Cheng Qiu 2, Xiaohang Wang 1, Ling Wang 3, Zhulin Zheng 1, Yingtao Jiang 4, Amit Kumar Singh 5, Jieming Yin 6, Sihai Qiu 7, Xiaodong Li 8, Xin Tang 8, Jie Song 8, Mingzhe Zhang 8, Kui Ren 1
1 The State Key Laboratory of Blockchain and Data Security, Zhejiang University and Hangzhou High-Tech Zone (Binjiang), Institute of Blockchain and Data Security Hangzhou, China
2 South China University of Technology Guangzhou, China
3 The University of Western Australia, Western Australia, Australia
4 University of Nevada, Las Vegas, Las Vegas, USA
5 University of Essex, Essex, United Kingdom
6 Nanjing University of Posts and Telecommunications, Nanjing, China
7 Beijing Smart-chip Microelectronics Technology Co., Ltd, Beijing, China
8 Ant Group Beijing, China

Abstract

The rise of multi-chiplet integration challenges existing simulators like gem5 and GPGPU-Sim for efficiently simulating heterogeneous multiple-chiplet systems due to incapability to modularly integrate heterogeneous chiplets and high synchronization overheads in parallel simulation. To address these limitations, this paper introduces LEGOSim, a unified parallel simulation framework capable of flexibly integrating various open-source and in-house designed chiplet simulators as processes in parallel simulation, referred to as "simlets" with minimal modifications needed. It introduces an on-demand synchronization protocol with adaptive time quanta and non-global fencing, ensuring synchronization only occurs when necessary, thus reducing overhead while maintaining correctness. The framework also integrates Network-on-Interposer (NoI) simulator for modeling inter-chiplet communication, enabling accurate assessment of various interconnection architectures’ performance. Evaluated with diverse benchmarks, LEGOSim shows high accuracy in simulating multi-chiplet architectures like SIMBA and a CiM-based accelerator, with average errors of 3.79% and 3.94%, respectively. It significantly reduces synchronization overhead by up to 99.9% compared to per-cycle synchronization and by 66.1% compared to time quantum synchronization, without synchronization errors. Five case studies show that LEGOSim also provides precise system performance metrics and stall cause reporting, simplifying tasks such as performance analysis and optimization, and can be used for design space exploration of various multi-chiplet systems.

Keywords: Architectural simulation, multi-chiplet system simulation.

To read the full article, click here