DPUs in ToR Switches
Recently on a SmartNICs Summit panel about the future, I clearly stated: “that there is rarely anything new under the sun.” Most “new” things are old ideas with a fresh coat of paint to make them appear shiny and different, which goes for both products and problems. The context of this panel discussion was around the issues of tail latencies and network congestion. My point was that tail latencies, why some network transactions are slower than others, are by no means new. Since we’ve been working to improve tail latency even before stock exchanges were computerized at the end of the last century. As for network congestion, while we can assign fancy new names to this class of problems, we’ve been experiencing them since the dawn of time. Moving a high volume of particles from one zone to another, whether those particles are horses, automobiles, blood cells, or network packets, will always share some similarities. Just as congestion issues at 1 GbE and 100 GbE will be similar, when differences crop up, they likely have manifested themselves at other points in the past, just at slower speeds or with different protocols.
During my opening plenary session of the SmartNICs Summit, I called out that Top or Rack (ToR) network switches may soon be bursting with DPUs designed into the switch architecture, as outlined in the sketch above, on the motherboard of the switch. This was in response to learning the day before that both Pensando and NVIDIA are heading in this direction with their future product offerings. I mentioned that placing DPUs, specifically FPGAs, in the switch to manage specific switch ports with application-level functionality within the switch fabric was a pretty old idea. Before this current wave of innovation, I pointed out the following two prior waves:
Wave 2: 2015 Layer-1 Switches with FPGAs for Applications on the Edge
High-Frequency Trading has been placing FPGAs running specific applications in Layer-1 network switches for almost a decade, as referenced by these products, all from 2015 (note this is only a tiny sample of the winners):
Metamako MetaApp32 - A 32 port Layer-1 ToR Switch with an end-user programmable FPGA to service a wide variety of high-performance, very close to the exchange use cases. These use cases are for applications that check market data coming into trading systems or trades exiting these systems destined for exchanges. Arista Networking bought Metamako in 2018 for this platform.
Exablaze ExaLINK Fusion - A 48 port Layer-1 ToR Switch also with a programmable FPGA at the edge. In 2020 Cisco purchased Exablaze for this technology.
LDA Technologies e4 - A 48 port Layer-1 switch with a programmable FPGA from an aggressive team that continues innovating.
The high-frequency trading market still innovates in this space, and newer versions of all the above products exist today with impressive feature sets.
Wave 1: 2006 Blending HPC and Ethernet Fabrics on a Switch Blade
Myricom 10G-SW16LC-6C2ER Switch line card debuted at SuperComputing 2006. This switchblade had a switch chip with 16 ports, eight front-facing, and eight rear-facing, into the switch fabric. Six of the eight front-facing ports supported Myrinet-10G over CX4 (copper cables that were later replaced by Direct Attach cables). The remaining two ports provided 10 GbE over an SFP+. Behind these two ports, you will see two Myrinet Lanai-10G chips in the back left corner of the picture. These were programmable NIC processors, very early DPUs, configured to translate from Myrinet-10G to Ethernet 10G and back.
I know Voltair and Mellanox eventually had similar products in their Infiniband switch fabrics to create an Ethernet bridge, but I won’t expand on those here.
By 2025 we can expect to begin seeing early 1U ToR Switches with four or even six DPUs behind the 24 server ports, all operating at 400G. These switches will likely also have four 800 GbE uplink ports back to the Core enterprise switches. Given how power-hungry 2U dual-socket servers are today and where they are headed, packing more than 24 in a single rack is growing less likely. These 24 servers will each have a lighter-weight commodity dual port 400 GbE DPU (<$1K) that will work collaboratively with the ToR to deliver traffic to these server workloads efficiently. More on that in a future post…