Corsair: An In-memory Computing Chiplet Architecture for Inference-time Compute Acceleration
By Satyam Srivastava; Akhil Arunkumar; Nithesh Kurella; Amrit Panda; Gaurav Jain; Purushotham Kamath
d-Matrix Corporation
Abstract:
Advances in Generative AI (GenAI) have reinvigorated research into novel computing architectures such as Transformer. Transformer, characterized by low arithmetic intensity during most of the inference time, has become the cornerstone of GenAI underlying Large Language (LLM) and Reasoning Models (RM). Numerous solutions to the intense memory bandwidth problem have been proposed. Corsair is an architecture that targets this need using chiplet design, digital in-memory computing-based matrix engine, efficient die-to-die interconnects, block floating point numerics, and large high-bandwidth on-chip memories. We describe the Corsair chiplet, scaling approaches to compose larger systems, and outline the software stack. We formulate the inference-time requirements of LLM and RM computation, memory bandwidth, memory capacity, and interconnect efficiency for scaling. We also show how Corsair design perfectly fits these workloads. We present benchmark results from Corsair silicon that correlate strongly with the design and preview an estimate of workload-level improvements expected with Corsair.
To read the full article, click here
Related Chiplet
- High Performance Droplet
- Interconnect Chiplet
- 12nm EURYTION RFK1 - UCIe SP based Ka-Ku Band Chiplet Transceiver
- Bridglets
- Automotive AI Accelerator
Related Technical Papers
- PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration
- Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
- AIG-CIM: A Scalable Chiplet Module with Tri-Gear Heterogeneous Compute-in-Memory for Diffusion Acceleration
- Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism
Latest Technical Papers
- Thermo-mechanical co-design of 2.5D flip-chip packages with silicon and glass interposers via finite element analysis and machine learning
- High-Efficient and Fast-Response Thermal Management by Heterogeneous Integration of Diamond on Interposer-Based 2.5D Chiplets
- HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement
- A physics-constrained and data-driven approach for thermal field inversion in chiplet-based packaging
- Probing the Nanoscale Onset of Plasticity in Electroplated Copper for Hybrid Bonding Structures via Multimodal Atomic Force Microscopy