Cisco ACI Multi-Pod Part-1 || Overview
ACI Multi-Pod

Cisco ACI Multi-Pod Part-1 || Overview

To be able to understand how ACI Multi-Pod works and how it is provide fault tolerant fabric, we need to understand its own control plane and how it works under the hood.

Cisco ACI Multi-Pod control plane protocols are run individually in each pod as follows:

  • Intermediate system-to-intermediate system (IS-IS): For infra tunnel endpoint (TEP) within a pod.

And in case of IS-IS stopped working in one pod, it doesn't affect the IS-IS running in the other pod because it runs only between spine and leaf switches in each pod.

For TEP reachability toward nodes in other pods, Spines learn a TEP range of other pods instead of individual TEP IPs via OSPF from IPN network (will discuss it later).

And then the IS-IS advertise it locally in its pod to all Leafs.

  • Council of oracle protocol (COOP): For endpoint details learned in a pod, it won't be affected if the COOP in the other pod stopped working, because coop is run only between Leafs and Spines in each pod.

And regarding endpoint details in one pod is shared and stored in the COOP database of the pod via (MP-BGP EVPN) between spine switches in each pod through IPN.

  • VPN v4/v6 MP-BGP: for L3out route distribution within a pod.

Also if the MP-BGP stopped working in one pod, it doesn't affect MP-BGP in

the other pod, because MP-BGP is runs and establishes neighborship between spine route reflector and leaf switches in each pod to distribute the L3Out within the pod.

On top of MP-BGP within a pod, Multi-POD establishes other MP-BGP VPNv4/v6 sessions between spine switches in each pod through IPN, to share learned L3Out in one pod to the pod.

ACI Multi-Pod

Failure Scenarios

Since the communications inside ACI depends on COOP database, and as we knows the database is used by Cisco APIC is split into several database units (shards), which each shard is replicated 3-times with each copy is assigned to a specific Cisco APIC. So Multi-pod fabric may face different failure scenarios due to APIC node positioning in pods.

In case of 3-node APIC cluster nodes, so the database shard is replicated on every APIC node in the cluster.

But in case of 5-node cluster, the database shards will be replicated on the three of the five nodes, So in this case an issue will happen to the fabric in case of failure.

Split-Brain Failure Scenario:

Multi-Pod Split-brain Scenario

A split brain failure scenario happens when the connectivity between the pod is interrupted.

In such scenario, all the APIC cluster nodes are up but connectivity between pods are down, so there is no communication between APIC nodes in pod-1 and APIC node in pod-2, So in this case there will be no issue regarding the read-write configuration in pod-1 because of the majority of APIC nodes are in pod-1 and APIC node in pod-2 will go for read only mode which is affecting its operation and can't perform any configuration for its local pod.

In case the APIC cluster has 5 APIC nodes (ex. 3 APICs in Pod 1, 2 APICs in Pod 2), the read/write, or read-only mode will be indeterministic because of shard replica distribution (3 replicas of each object). Replicas of some objects may be on 1 APIC in Pod 1 and 2 APICs in Pod 2, where Pod 2 APICs are majority and in read/write mode, but others may have majority in Pod 1, where Pod 1 APICs are in read/write mode.

So it is very important to keep the connectivity between two pods are up and fix it asap once happen.

Pod Failure Scenario

Pod Failure Scenario

In case of pod failure or disaster happen to one of the data center, So let's assume 3 APIC cluster and the failure happen to the pod with the majority APIC nodes where the shards database replicated across the 3 nodes, So in this case the pod with the single APIC node will go into read only mode.

So in such case we can add one APIC as standby in pod 2 and promote this controller to be active and re-establish the quorum of the cisco APICs.

In scenario of 5 node APIC cluster (3 APICs in pod-1 and 2 APICs inpod-2),

and the failure happen to pod-1 which has the majority of the APICs so in this case the same scenario happen and APICs on pod-2 go in read only mode, so we can add a standby controller which will provide cluster majority and re-establishment of quorum.

The only difference here is even if have a standby controller and due to this failure it may lead to the loss of information for the shards that were replicated in the three nodes on pod-1 (failed pod).

In such scenario, we can make fabric recovery from configuration backup.


In the next article, Will discuss Inter-Pod Network, how it is works and its consideration.


George Ngoru

Senior Network Engineer @ NEC Australia | IP Networking, Enterprise Networking and Security

8mo

Thanks for sharing

Like
Reply
Palash Barua

SDN/IP/MPLS/Cloud-Native/Solution-Architect/Automation/Linux/DB (CCIE Enterprise # 60345)

8mo

Awesome series about ACI

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics