MCMComm: Hardware-Software Co-Optimization for End-to-End Communication in Multi-Chip-Modules
By Ritik Raj, Shengjie Lin, William Won and Tushar Krishna
Georgia Institute of Technology, GA, USA
Increasing AI computing demands and slowing transistor scaling have led to the advent of Multi-Chip-Module (MCMs) based accelerators. MCMs enable cost-effective scalability, higher yield, and modular reuse by partitioning large chips into smaller chiplets. However, MCMs come at an increased communication cost, which requires critical analysis and optimization. This paper makes three main contributions: (i) an end-to-end, off-chip congestion-aware and packaging-adaptive analytical framework for detailed analysis, (ii) hardware software co-optimization incorporating diagonal links, on-chip redistribution, and non-uniform workload partitioning to optimize the framework, and (iii) using metaheuristics (genetic algorithms, GA) and mixed integer quadratic programming (MIQP) to solve the optimized framework. Experimental results demonstrate significant performance improvements for CNNs and Vision Transformers, showcasing up to 1.58x and 2.7x EdP (Energy delay Product) improvement using GA and MIQP, respectively.
To read the full article, click here
Related Chiplet
- DPIQ Tx PICs
- IMDD Tx PICs
- Near-Packaged Optics (NPO) Chiplet Solution
- High Performance Droplet
- Interconnect Chiplet
Related Technical Papers
- LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMs
- Co-Optimization of Power Delivery Network Design for 3-D Heterogeneous Integration of RRAM-Based Compute In-Memory Accelerators
- Optimized Low Parasitic Capacitance ESD Clamps for High-Bandwidth 2.5D/3D Chiplet Interfaces in Advanced FinFET Technology
- Defect Analysis and Built-In-Self-Test for Chiplet Interconnects in Fan-out Wafer-Level Packaging
Latest Technical Papers
- Advanced semiconductor packaging design via artificial intelligence and machine learning: A review
- DTCO of NOR-Type IGZO FeFETs for 3D Heterogeneous AI Memories: A Read-Centric Perspective
- Modeling, Optimizing and Exploring Multi-Die FPGA Routing Architectures
- A 32 Gb/s 0.41 pJ/bit Single-Ended Transmitter with TX-Based-Only Adaptive Crosstalk Cancellation for Ultra-Short-Reach Wireline Applications
- Transient Multiscale Workflow for Thermal Analysis of 3DHI Chip Stack