UCIe for 1.6T Interconnects in Next-Gen I/O Chiplets for AI data centers

The rise of generative AI is pushing the limits of computing power and high-speed communication, posing serious challenges as it demands unprecedented workloads and resources. No single design can be optimized for the different classes of models – whether the focus is on compute, memory bandwidth, memory capacity, network bandwidth, latency sensitivity, or scale, all of which are affected by the choke point of interconnectivity in the data center.

Processing hardware is garnering attention because it enables faster processing of data, but arguably as important is the networking infrastructure and interconnectivity that enables the flow of data between processors, memory and storage. Without this, even the most advanced models can be slowed from data bottlenecks. Data from Meta suggests that more than a third of the time data spends in a data center is spent traveling from point to point. By preventing the data from being effectively processed, connectivity is choking the current network and slowing training tasks.

AI data centers

Infrastructure architecture for AI data centers requires a new design paradigm from that used in traditional data centers. Machine learning-accelerated clusters residing in the network’s back end, which handle AI’s large training workloads require high bandwidth traffic to move across the back-end network. Unlike the front-end network, where packet-by-packet handling is needed, this traffic (typically) moves in regular patterns and operate with high levels of activity.

There are steps to reduce latency, with fast access to other resources enabled by a flat hierarchy through limiting hops. This prevents compute from being underutilized, as the performance of AI networks can be bottlenecked by even one link with frequent packet loss. Optimizations in switching designs to be non-blocking, and network robustness are critical design considerations. These back-end ML networks allow AI processors to access each other’s memory seamlessly, as the dedicated network isolates the data from the vagaries of front-end network demands, which rise and fall with various priorities depending on the incoming compute request.

To read the full article, click here