Modular Compilation for Quantum Chiplet Architectures
By Mingyoung Jessica Jeng, Nikola Vuk Maruszewski, Connor Selna, Michael Gavrincea, Kaitlin N. Smith, Nikos Hardavellas -- Northwestern University, USA
Abstract
As quantum computing technology continues to mature, industry is adopting modular quantum architectures to keep quantum scaling on the projected path and meet performance targets. However, the complexity of chiplet-based quantum devices, coupled with their growing size, presents an imminent scalability challenge for quantum compilation. Contemporary compilation methods are not well-suited to chiplet architectures. In particular, existing qubit allocation methods are often unable to contend with inter-chiplet links, which don't necessary support a universal basis gate set. Furthermore, existing methods of logical-to-physical qubit placement, swap insertion (routing), unitary synthesis, and/or optimization are typically not designed for qubit links of wildly varying levels of duration or fidelity. In this work, we propose SEQC, a complete and parallelized compilation pipeline optimized for chiplet-based quantum computers, including several novel methods for qubit placement, qubit routing, and circuit optimization. SEQC attains up to a 36% increase in circuit fidelity, accompanied by execution time improvements of up to 1.92x. Additionally, owning to its ability to parallelize compilation, SEQC achieves consistent solve time improvements of 2-4x over a chiplet-aware Qiskit baseline.
1.Introduction
Classical computer systems, through the decades of their existence, have become increasingly distributed. Physical, technological, and economic constraints have prevented single “monolithic” systems from scaling to the point that they could meet the demand for high performance and yet remain practical. Today, distributed architectures are prevalent, and their latest renditions, from cloud computing to chiplet-based digital processors, are ubiquitous.
We postulate that quantum computing is on a similar path. While improvements in superconducting quantum hardware have led to the debut of processors with 1000+ qubits, practical implementations of quantum computation will require millions of physical qubits. The number of qubits required by a quantum algorithm, the depth of the algorithm (i.e., execution time, or number of gates on the critical path), the auxiliary and syndrome qubits required to reach sufficiently low error rates, and the size of matter qubits when all control hardware is included, all suggest that many more qubits are required than are likely to fit in a single die. The overhead for error correction, cooling, dilution, control systems, I/O lines, challenges associated with verification and adequate chip thermalization, and realistic resources such as lossy waveguides, limited qubit fabrication yields as qubit capacity grows, and finite chip sizes, make the prospect of a single “monolithic” quantum processor very expensive or entirely unrealistic by current standards. These constraints force large-scale quantum systems to adopt a physically distributed architecture.
We are already observing signs of this shift. Recent developments in quantum chip linking, including flip-chip architectures and low-loss coaxial cables, suggest that modular designs are the most viable for scaling quantum computers. Many contemporary or upcoming leading quantum systems adopt chiplet-based modular quantum processors, for example using high-bandwidth quantum links between nearest-neighbor quantum chiplets (e.g., carrier-chip couplers in Rigetti Aspen-M, or m-couplers in IBM Crossbill)), or lower-fidelity, lower-bandwidth, but longer-distance flexible coupling of discrete chips (e.g., l-couplers in IBM Flamingo), or a combination of the above (e.g., c-, m-, and l-couplers in IBM Starling), or multi-chip connectivity through tunable couplers and routing chips. Most major companies have set their sights at modular designs to meet quantum scaling targets in practical ways, and modular quantum processors dominate their latest roadmaps.
Unfortunately, current compilation infrastructure is not capable of reasoning well about this quantum interconnect heterogeneity. The mapping of a quantum program’s logical quantum gates into the native gates supported by the underlying quantum processor, and the mapping of logical qubits to the physical qubits of the quantum hardware, largely determine the number and type of native quantum gates required for the computation, the circuit depth, and its execution time—long programs may not complete successfully as qubits decohere. Different mappings can result in vastly different compiled quantum circuits, with diverse characteristics along all these axes. These details of efficiency can make or break an algorithm in today’s noisy intermediate-scale quantum systems (NISQ) era, so quantum compilers aggressively optimize for all of them. To make matters more challenging, each qubit, coupler, and gate have diverse error profiles that are highly variable both spatially and temporally. Unlike classical compilation that is done only once for a given architecture, quantum programs must be recompiled every time before execution, as the ideal physical qubits to execute on change between runs. So, quantum compilers today are faced with the daunting task of optimizing quantum programs across multiple dimensions with often conflicting demands, in a continuously changing environment. Coupled with the hardness of synthesizing unitaries (which is exponential in the number of qubits) and the need for quick and frequent compilation, modern quantum compilers have no option other than to rely on heuristics to perform the task. These heuristics result in compilation complexity of O(n2) for n total qubits in a quantum processor.
Hardware modularity adds significant complexity to this already hard task, as inter-chiplet links are typically inferior compared to intra-chiplet ones, connectivity across chiplets is often limited, and not all basis gates are necessarily supported across chiplets. In fact, popular quantum software stacks today (e.g., Qiskit) are not even cognizant of the existence of hardware modularity. Hardware modularity, though, also presents an opportunity. Inspired by classical compilation, in this paper we leverage hardware modularity to achieve compilation modularity.
Compilation in classical systems is typically performed independently and in parallel for each source file, producing one object file per source. The individual object files are then linked together to construct the executable. We propose a compilation framework for modular quantum processors that works in a similar fashion: it stratifies, i.e., splits, the source quantum circuit into subcircuits small enough to fit in each chiplet, and maps subcircuits to chiplets, and then in parallel elaborates each subcircuit and compiles it for its target chiplet. This Stratify-Elaborate Quantum Compiler (SEQC) stratifies a source program only once for a given chiplet architecture, and performs only the elaboration step recurrently before each execution. In essence, SEQC replaces the recurrent O(n2) compilation step for an n-qubit quantum processor, with several parallel O(k2) elaboration steps for k-qubit chiplets. As the qubit capacity n of quantum processors grows exponentially, today’s O(n2) compilation latency rises even faster. We expect, however, that the number of qubits k per chiplet will remain relatively stable or grow much slower, as it seems to be the case for the foreseeable future, and thus the SEQC recurrent compilation latency is expected to remain relatively stable. The stratification step is O(n2), so the end-to-end complexity remains the same, but stratification is performed only once; the recurrent compilation in SEQC is only O(k2), and barely growing with new processor designs.
Additionally, as SEQC is cognizant of hardware modularity, it can stratify the source quantum circuit into subcircuits and map them to chiplets to minimize inter-chiplet communication. As we show in this paper, SEQC produces circuits with shorter execution times and significantly fewer inter-chiplet gates compared to today’s stock compilers, leading to much higher fidelity execution. More importantly, as the number of qubits in a processor grows, SEQC achieves even higher performance in these figures of merit.
In summary, the contributions of this paper are as follows:
- We make stock compilers aware of hardware modularity, thereby allowing them to correctly compile circuits for modular architectures with limited cross-chiplet gate support.
- We design and implement SEQC, a Stratify-Elaborate Quantum Compiler for modular architectures. SEQC performs compilation in two stages, with the first stage (stratification, or chiplet splitting) performed only once for a given architecture, and the second stage (elaboration, or chiplet compilation) performed in parallel for each chiplet. Only this second stage needs to be performed before each execution, and thus SEQC’s compilation time is largely unaffected by the growth of qubit counts in future quantum processors.
- We design and implement in SEQC several novel methods for qubit placement, qubit routing, and circuit optimization.
- We evaluate SEQC and show it compiles circuits with up to 36% higher circuit fidelity and up to 1.92× lower execution time, while consistently achieving 2−4× faster compilation time compared to a chiplet-aware Qiskit baseline.
To read the full article, click here
Related Chiplet
- Direct Chiplet Interface
- HBM3e Advanced-packaging chiplet for all workloads
- UCIe AP based 8-bit 170-Gsps Chiplet Transceiver
- UCIe based 8-bit 48-Gsps Transceiver
- UCIe based 12-bit 12-Gsps Transceiver
Related Technical Papers
- RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures
- MECH: Multi-Entry Communication Highway for Superconducting Quantum Chiplets
- Codesign of quantum error-correcting codes and modular chiplets in the presence of defects
- Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models
Latest Technical Papers
- Modular Compilation for Quantum Chiplet Architectures
- Ammonia Plasma Surface Treatment for Enhanced Cu–Cu Bonding Reliability for Advanced Packaging Interconnection
- Energy-/Carbon-Aware Evaluation and Optimization of 3-D IC Architecture With Digital Compute-in-Memory Designs
- Optimized Low Parasitic Capacitance ESD Clamps for High-Bandwidth 2.5D/3D Chiplet Interfaces in Advanced FinFET Technology
- Why Chiplet-Based Architecture Is the Next Frontier in Semiconductors