Energy-/Carbon-Aware Evaluation and Optimization of 3-D IC Architecture With Digital Compute-in-Memory Designs

By Hyung Joon Byun; Udit Gupta; Jae-Sun Seo (Cornell Tec)

Abstract

Several 2-D architectures have been presented, including systolic arrays or compute-in-memory (CIM) arrays for energy-efficient artificial intelligence (AI) inference. To increase the energy efficiency within constrained area, 3-D technologies have been actively investigated, which have the potential to decrease the data path length or increase the activation buffer size, enabling higher energy efficiency. Several works have reported the 3-D architectures using non-CIM designs, but investigations on 3-D architectures with CIM macros have not been well studied in prior works. In this article, we investigate digital CIM (DCIM) macros and various 3-D architectures to find the opportunity of increased energy efficiency compared with 2-D structures. Moreover, we also investigated the carbon footprint of 3-D architectures. We have built in-house simulators calculating energy and area given high-level hardware descriptions and DNN workloads and integrated with carbon estimation tool to analyze the embodied carbon of various hardware designs. We have investigated different types of 3-D DCIM architectures and dataflows, which have shown 42.5% energy savings compared with 2-D systolic arrays on average. Also, we have analyzed the tradeoff between performance and carbon footprint and their optimization opportunities.

Introduction

For energy-efficient artificial intelligence (AI) inference, many custom dataflows and architectures have been presented. Systolic array architectures, such as Google TPU [1] and Samsung neural processing unit (NPU) [2], were suggested to increase data reuse by temporal/spatial data movements between tightly coupled processing elements (PEs), which include registers for activation and weights, and multiply-accumulate (MAC) units. Furthermore, compute-in-memory (CIM) architectures are suggested to reduce the off-chip data accesses, which are very energy expensive. Several memory technologies have been used to build analog CIM arrays [3] with analog accumulation, but they suffer from large area and energy overhead due to analog-to-digital converters (ADCs), and are prone to noise and variations, leading to loss in accuracy [4]. Digital CIM (DCIM) architectures, on the other hand, do not require ADCs, eliminate accuracy loss, and show high energy efficiency [5], [6], [7].

However, as the CMOS technology scaling reaches limitations, increasing the resource on 2-D chips would face the limitations. Thus, stacking up chips could increase the effective number of resources or reduce the footprint size. This is enabled by 3-D technologies with high-density (<10-μ m pitch) hybrid bonding, such as TSMC SoIC [8], which recently became commercially available. By stacking the dies vertically, the critical paths could be shortened, further reducing the wire energy and providing high bandwidth for vertical connections, which could be useful for the implementation of AI algorithms with low data reuse.

Note that both CIM designs and 3-D architectures have been separately investigated for a similar high-level purpose of reducing the on-chip memory access cost. However, there have been less studies that collectively investigated both CIM and 3-D integrated circuit (IC) designs towards achieving further energy benefits for AI applications.

On the other hand, as the replacement rate of mobile-scale chips becomes very high, manufacturing chips contribute significantly to the global carbon emissions [9]. For example, IC manufacturing carbon takes about 33% of the total carbon emission of Apple in 2019 [10]. Depending on the scale of the die and the process, 3-D architectures could have better yields than the iso-resource 2-D architecture by having smaller die area, so it is worth investigating the optimization opportunity of the carbon footprint of 3-D architectures. However, the embodied carbon of various 3-D architectures has not been thoroughly explored, as most research works have focused primarily on performance or energy efficiency aspects of 3-D ICs [11], [12], [13]. To that end, we aim to investigate 2-D and 3-D IC designs that are based on both systolic arrays and CIM macros for various AI workloads by assessing the embodied carbon footprint alongside their energy efficiency and performance metrics.

In this work, we aim to comprehensively investigate four schemes for AI acceleration as follows: 1) 2-D systolic array (baseline)-based accelerator; 2) 2-D DCIM-based accelerator; 3) 3-D systolic array-based accelerator; and 4) 3-D DCIM-based accelerator. To analyze this objectively, we developed an in-house performance evaluation tool for both 2-D and 3-D IC designs, which incorporates energy/area modeling of systolic array, on-chip SRAM buffer, CIM macros, 2-D and 3-D interconnects, off-chip DRAM access, and hardware utilization. Also, we improved and incorporated the existing embodied carbon estimation scheme for 3-D architectures to consider the hybrid bonding density, which computes the carbon footprint using the area estimation of the performance tool.

The main contributions of this work are as follows.

  1. The 2-D and 3-D AI accelerators with systolic array scheme and DCIM scheme across various compute/memory sizes are comprehensively analyzed.

  2. We developed an in-house energy modeling framework and integrated with improved carbon estimation tool, which can easily evaluate the energy consumption of 2-D/3-D accelerators considering various schemes and enable early design space exploration.

  3. The evaluation of 8 different AI models show that, compared with the 2-D baseline, the 3-D systolic array scheme, 2-D DCIM scheme, and 3-D DCIM scheme show on average 9.1%, 32.0%, and 42.5% energy improvement, including DRAM energy, respectively.

  4. Compared with the 2-D baseline, the embodied carbon of systolic array and DCIM with 3-D Scheme 1 shows 4.3% and 14.2% savings, while for systolic array and DCIM with 3-D Scheme 2 shows 27.9% and 1.8% overhead, respectively.

To read the full article, click here