CUTh-Solver: GPU-Accelerated Sparse Matrix Solver for High-Resolution Thermal Simulation of 3D ICs
By Chenghan Wang 1, Zhen Zhuang 1, Shui Jiang 1, Siyuan Liang 1, Xiaoman Yang 1, Kai Zhu 2, Darong Huang 2, Luis Costero 3, Rongmei Chen 4, Tsung-Wei Huang 5, David Atienza 2, Tsung-Yi Ho 1
1 Department of Computer Science and Engineering, The Chinese University of Hong Kong, NT, Hong Kong SAR
2 Embedded Systems Laboratory (ESL), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
3 Department of Computer Architecture and System Engineering, Universidad Complutense de Madrid (UCM), Madrid, Spain.
4 School of Electronics, Peking University, Beijing, China
5 Department of Electrical and Computer Engineering (ECE), the University of Wisconsin at Madison, Wisconsin, USA.

Abstract
Coarse-grained thermal simulation tends to underestimate localized thermal issues, potentially missing critical hotspots. Accurate analysis, therefore, demands fine-grained information, which dramatically increases grid resolution and thus computational workload. Fortunately, the coefficient matrices are often sparse with regular sparsity patterns, offering optimization opportunities. However, existing general-purpose matrix solvers on GPUs rarely exploit these domain-specific properties, thereby encountering bottlenecks in data storage, memory access, parallelism, computational efficiency, and hardware utilization. Therefore, we propose CUTh-Solver, a co-designed GPU-accelerated Preconditioned Conjugate Gradient (PCG)-based sparse solver framework for Symmetric Positive Definite (SPD) systems arising from high-resolution steady-state and transient 3D IC thermal simulation. For data storage, CUTh-Solver condenses the Diagonal (DIA) storage format to remove redundancy. To optimize the memory access, CUTh-Solver employs diagonal-wise SpMV to achieve coalesced memory access. We further observe a critical conflict between parallelism and preconditioning quality and thus adopt a high-parallelism preconditioning strategy. To improve computational efficiency and hardware utilization, we employ an adaptive fine-grained mixed-precision strategy that leverages diverse floating-point units to avoid resource contention, enhancing throughput without compromising numerical stability. Experimental results show that CUTh-Solver achieves up to 25.8x speedup over GPU-accelerated COMSOL Multiphysics 6.4 and over 3x speedup over NVIDIA's native general-purpose libraries (AmgX, cuSPARSE, cuDSS). Ablation studies validate the individual contribution of each optimization. The code is available at: https://github.com/Chenghan-Wang/CUTh-Solver
Index Terms — Thermal Simulation, GPU Acceleration, Sparse Solvers, Storage Format, SpMV, Mixed Precisions
To read the full article, click here
Related Chiplet
- DPIQ Tx PICs
- IMDD Tx PICs
- Near-Packaged Optics (NPO) Chiplet Solution
- High Performance Droplet
- Interconnect Chiplet
Related Technical Papers
- Spatiotemporal thermal characterization for 3D stacked chiplet systems based on transient thermal simulation
- A Review of Multiscale Thermal Modeling in Heterogeneous 3D ICs
- DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design
- Transient Multiscale Workflow for Thermal Analysis of 3DHI Chip Stack
Latest Technical Papers
- CUTh-Solver: GPU-Accelerated Sparse Matrix Solver for High-Resolution Thermal Simulation of 3D ICs
- Making Locality-aware GEMM Compatible with Page-Granularity Placement on Chiplet GPUs
- Advanced semiconductor packaging design via artificial intelligence and machine learning: A review
- DTCO of NOR-Type IGZO FeFETs for 3D Heterogeneous AI Memories: A Read-Centric Perspective
- Modeling, Optimizing and Exploring Multi-Die FPGA Routing Architectures