5 Google Aquila Interconnect Surprises
Picture from Google Aquila Research Publication

5 Google Aquila Interconnect Surprises

In early April, a 25-person team at Google published “Aquila: A Unified Low-latency Fabric for Datacenter Networks.” I've only started grinding through the original research paper and the very insightful article by The Next Platform, but here are some of the realizations that surprised me, and might interest others in the SmartNIC community. 

  1. There is no Aquila NIC! Aquila created a Top of rack in NIC (TiN) chip. Then it put two of these NIC chips on a line card, and six line cards into their Aquila switch for a total of 24 PCIe bus connections to servers, per switch. These 24 servers represent, in Aquila terminology, a pod. There are then two pods per rack, for 48 servers/rack.
  2. Aquila Server Connectivity is Limited to 128 Gbps. The Aquila TiN chip utilizes a PCIe Gen3x16 cable to connect each server. In July, servers begin shipping with PCIe Gen5x16 slots (512 Gbps). While you can plug a Gen3 device into a Gen5 slot, that would be like taking a 20 MPH moped onto a California freeway where 65 MPH is a suggestion. Google does call out that two PCIe cables could go to each server, but they explicitly state that their pods are 24 servers each, so there is only one cable per server.
  3. The real Aquila Server Interconnect is a PCIe Gen3x16 Cable. It appears Google brings a PCIe Gen3x16 cable into the back of the server and then plugs it down into the PCIe bus. While this sounds trivial, PCIe signal integrity issues grow significantly with distance. This is why PCIe Gen3x16 extension cables are rarely more than 15 inches long and PCIe Gen4x16 only eight inches. Sure, custom shielded cables exist that can achieve significant distances, nearly three meters for PCIe Gen3, and clearly, Google has done some impressive cable engineering. If they ever move Aquila to PCIe Gen4 or Gen5 this will become their Achilles heel.
  4. Aquila Servers are 3.5 to 1 Oversubscribed. Each Aquila TiN chip has a single 100 GbE connection shared between two servers for 50 Gbps/each. Also, each TiN provides each server 300 Gbps within the pod and 100 Gbps of uplink beyond the pod. This works out to a total available interconnect bandwidth of 450 Gbps into a 128 Gbps PCIe Gen3x16 bus, which is 3.5 to 1. By contrast, an HPC Clos topology network is not oversubscribed, full-fat trees are 2 to 1 oversubscribed, and some other HPC implementations are 4:1 oversubscribed.
  5. Leaving Every Aquila Rack are 216 Data Cables! Each rack has 192x25Gbps dragonfly interconnect cables and 24x100 GbE cables. Within the rack, there are another 48 PCIe Gen3 cables connecting switches servers; that’s 264 data cables per rack, OMG. Perhaps Google is acquiring a cable company. 

I’m still digging into both documents, so more to come later this week. 

Tim Dales

Product Marketing Manager

2y

WOW! What are they thinking? Is this a solution, looking for a problem?

Tim Mazumdar

FPGA Engineer at Major defense contractor

2y

Scott Schweitzer, thank you for posting this diagram. There is no way around unless one reads the paper- thank you for providing the link to the paper- 😁 The key to the whole vision looks like this diagram which shows 2 pods of 24 servers each and in each pod there are two TiN ASICs. I have the PCIe3X versus PCIe 5X observation but lets just see the broader topology. I will read the paper at least 2-3 times and then we can compare notes.

  • No alternative text description for this image

Good observations, Scott. It is interesting that they used PCI Gen3 instead of Gen4 or Gen5. Also interesting is the emphasis on latency--that seems to be key point of the interconnect to minimize the round-trip between nodes. Also interesting will be to see how the relationship with Intel works out as they sell NICs.

To view or add a comment, sign in

More articles by Scott Schweitzer, CISSP

  • SuperNIC Explained? Part 2

    SuperNIC Explained? Part 2

    Earlier this summer, in Part 1, I speculated on NVIDIA's definition of a SuperNIC. On Friday, I received an email…

    8 Comments
  • SuperNIC Explained? Part 1

    SuperNIC Explained? Part 1

    During Jensen’s NVIDIA GTC keynote a few months back, he used the term "SuperNIC" interchangeably when discussing the…

    2 Comments
  • SmartNIC = (DPU, IPU, NPU)

    SmartNIC = (DPU, IPU, NPU)

    When we name an object, or class of objects, that immediately endows a measure of permanence, then we can begin…

    1 Comment
  • DPUs in ToR Switches

    DPUs in ToR Switches

    Recently on a SmartNICs Summit panel about the future, I clearly stated: “that there is rarely anything new under the…

  • Top Ten DPU Features in 2028

    Top Ten DPU Features in 2028

    The last panel of the 2023 SmartNIC Summit was titled "SmartNICs in 2028 and How We Got There," it was chaired by…

    2 Comments
  • GFTs, Hyperscaler Magic Pixie Dust

    GFTs, Hyperscaler Magic Pixie Dust

    Recent experience has shown that Hyperscalers are gaga about Generic Flow Tables (GFT) because they appreciate the…

    2 Comments
  • GFT, the Smart in SmartNIC

    GFT, the Smart in SmartNIC

    From AI-based trading solutions to security and storage, there are dozens of use cases for SmartNICs, but the most…

  • What Makes SmartNICs "Smart"

    What Makes SmartNICs "Smart"

    Standard Network Interface Cards (NICs) are engineered to convert electrical signals from the Ethernet into data…

    2 Comments
  • Will 100GbE Dominate Thru 2024?

    Will 100GbE Dominate Thru 2024?

    Given that the new server processors from AMD (Genoa) and Intel (Sapphire Rapids) are hitting the market and providing…

    1 Comment
  • A Server Designed for 2x200GbE!

    A Server Designed for 2x200GbE!

    It appears Dell's engineers may have collaborated with NVIDIA when designing their new Intel Sapphire Rapids server…

    1 Comment

Insights from the community

Others also viewed

Explore topics