ControlPULPlet: A Flexible Real-time Multi-core RISC-V Controller for 2.5D Systems-in-package

By Alessandro Ottaviano (ETH Zurich, Switzerland), Robert Balas (ETH Zurich, Switzerland), Tim Fischer (ETH Zurich, Switzerland), Thomas Benz(ETH Zurich, Switzerland), Andrea Bartolini (University of Bologna,  Italy), Luca Benini (ETH Zurich, Switzerland)

The increasing complexity of real-time control algorithms and the trend toward 2.5D technology necessitate the development of scalable controllers for managing the complex, integrated operation of chiplets within 2.5D systems-in-package. These controllers must provide real-time computing capabilities and have chiplet-compatible IO interfaces for communication with the controlled components. This work introduces ControlPULPlet, a chiplet-compatible, real-time multi-core RISC-V controller, which is available as an open-source release. It includes a 32-bit CV32RT core for efficient interrupt handling and a specialized direct memory access (DMA) engine to automate periodic sensor readouts. A tightly-coupled programmable multi-core accelerator is integrated via a dedicated AXI4 port. A flexible AXI4-compatible die-to-die (D2D) link supports inter-chiplet communication in 2.5D systems and enables high-bandwidth transfers in traditional 2D monolithic setups. We designed and fabricated ControlPULPlet as a silicon prototype called Kairos using TSMC's 65nm CMOS technology. Kairos executes predictive control algorithms at up to 290 MHz while consuming just 30 mW of power. The D2D link requires only 16.5 kGE in physical area per channel, adding just 2.9% to the total system area. It supports off-die access with an energy efficiency of 1.3 pJ/b and achieves a peak duplex transfer rate of 51 Gb/s per second at 200 MHz.

I Introduction

An increasing number of integrated systems rely on closed-loop control to meet their mission profile, often specified as a setpoint of an objective function. The system controller is expected to track the setpoint, minimizing tracking errors while maintaining stability. Such control pipelines are widely used across diverse application domains, from automotive systems and robotics [1] to power conversion [2] and CPU power management [3]. For example, in robotics, the reference objective function may be a trajectory, while in CPU power management, it could be a power budget based on runtime workload and temperature.

The system actor that tracks a given reference and controls physical actuators is generally called a controller agent, or simply controller. To maximize flexibility, a controller is typically implemented as a Multiple-Input Multiple-Output (MIMO) digital programmable unit (Section II), with power consumption ranging from tens to hundreds of mW/ and an on-chip memory footprint of under a few MiB/ [4, 3].

A key trend for digital controllers, common across various applications, is tighter integration with their controlled systems, leading to a system in package (SiP), as shown in Fig. 1b-f. Today, this process is driven by the ”disintegration” of integrated circuits into multiple chiplets or dielets, a key post-Moore technology that reduces design and production costs, improves yield through smaller chip sizes, and facilitates the integration and reuse of heterogeneous IPs [5].

This technology shift is challenging and requires advanced silicon/package co-design. For example, in power conversion circuits, discrete devices, such as GaN-based high-voltage switches, are evolving to dielets for 2.5D heterogeneous SiP integration with their drivers [6, 2]. Similarly, high-performance digital systems are shifting from single-die systems on chip (SoCs) to 2.5D multi-chip modules (MCMs) in a SiP [7, 8]. Inter-chiplet communication happens through interposers hosting the die-to-die (D2D) connectivity fabric [9].

Given these trends, modern digital controller architectures must evolve accordingly. First, they must support flexible package integration options, ranging from traditional single-die SoC integration to chiplets on silicon interposers (Fig. 1). Second, they must integrate hardware features that enable real-time execution. Finally, they need to offer modular integration of domain-specific accelerators (DSAs) for compute-intensive control workloads, such as model predictive control (MPC) policies. Section II outlines the limitations of existing academic and industrial controllers from these three perspectives.

To address these challenges, we present ControlPULPlet, a flexible, open-source multi-core RISC-V controller design that can be configured for integration into conventional on-chip dies or as a chiplet for SiPs (Fig. 1), thanks to a flexible D2D link exposed to the on-chip interconnect as a standard AXI4 port. ControlPULPlet offers real-time execution capabilities, featuring a 32-bit CV32RT RISC-V core for fast interrupt handling and a specialized direct memory access (DMA) engine for automated sensor data acquisition in periodic control loops. For compute-intensive control algorithms, it includes a tightly coupled, AXI4-compatible 8-core programmable multi-core accelerator (PMCA).

The synthesizable hardware description and FPGA implementation for hardware-in-the-loop emulation are freely available and open-source.1

This paper makes the following contributions:

  • An open-source, embedded RISC-V controller with comprehensive real-time and computing capabilities compatible with the monolithic and chiplet design paradigms. The design enhances the open-source ControlPULP [3], tuned for on-chip control applications (Section III).
  • A scalable, AXI4-compatible source-synchronous digital D2D link. With eight channels and a flow control buffer depth of 128, the link achieves an average peak bus utilization of 83 %/, compared to 95 %/ for its on-chip equivalent. This setup incurs a minimal PHY area overhead of 2.9 %/ and results in a negligible performance impact, with only 0.06 %/ increase in the free time window of a MIMO periodic control loop application (Section IV-B). Increasing the buffer depth enhances utilization and matches on-chip control performance, albeit at additional area cost.
  • Integration of hardware enhancements for real-time execution: fast interrupt handling and context switching [10], and DMA with periodic mid-end [11] (Section III-B).
  • A standalone, single-core demonstrator chip called Kairos, fabricated in TSMC’s 65 nm/ node. At 1.2 V/, Kairos achieves a peak clock frequency of 290 MHz/ with a power envelope not exceeding 30 mW/ during data-intensive control workloads (Section IV). On the chip, the D2D link enables off-die accesses at only 1.3 pJ/bit while attaining a duplex peak transfer rate of 51.2 Gbit/s at the nominal 200 MHz/ (Section IV).

To read the full article, click here