Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

Kartik S.

SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations

Published Feb 22, 2023

Systems which are 99.999 %(5 9s) available or operational throughout the year are called highly available systems. This basically means that your system is only down for -

5.26 minutes in a year.
1.31 minutes in a quarter.
26.30 seconds in a month.
6.05 seconds in a week.
864.00 milliseconds in a day.

The basic way to achieve a highly available system is by eliminating single points of failure.

If you see the image attached then we have -

Multiple Servers - If one fails then the request can be redirected to the another one. But who will monitor if server is up and running and who will redirect the request?
Comes load balancer into picture which is responsible for monitoring the servers and redirecting the traffic to the another available server. But what if the load balancer itself fails?
Comes a stand-by load balancer to rescue which monitors the health of primary load balancer and comes into picture if the primary one fails. But how this fallover to stand-by load balancer works?
Here comes the need of rapid IP remapping mechanism so that load is quickly transferred to the standby load balancer, so floating IP comes into picture here.

Floating IP is a static virtual IP that is assigned to active primary load balancer, if it fails then the virtual IP is assigned to standby load balancer, so basically this virtual IP floats b/w the two load balancers hence called Floating IP.

This is a very basic foundation to achieve a highly available system will add more parts soon. If you find the article useful and want more such articles than subscribe the newsletter and follow Kartik S.

Cheers

Kartik Sapra

System Design For Interviews

7,605 followers

+ Subscribe

Karan S.

Insightful

1 Reaction

Navjot Singh

Full-Stack Developer | SDE at Majid Al Futtaim(Carrefour)

Very useful , waiting for many more ❤️ Kartik S.

1 Reaction

Prakhar Rai

SWE @ Abnormal Security • Previously worked at Physics Wallah, Cisco, Interview Kickstart

Thanks for posting

1 Reaction

John Crickett

Helping you become a better software engineer by building real-world applications.

There are very very few times when 5 9s will make economic sense.

3 Reactions

Kaivalya Apte

The GeekNarrator Podcast | Staff Engineer | Follow me for #distributedsystems #databases #interviewing #softwareengineering

👏Good luck Kartik. Keep it up.

2 Reactions

See more comments

To view or add a comment, sign in

Foundations Of Highly Available System Design Part 1 - Achieving 5 9's of Availability

Kartik S.

SDE @Amazon | GSoC @RedHat | Open Source and Coding Mentor |Ex @Nagarro|Ex @Coding Blocks|System Design Content Creator|20k+ linkedin followers|3 million views|open for collaborations

System Design For Interviews

7,605 followers

More articles by Kartik S.

Insights from the community

Others also viewed

5 Key Concepts for Software Defined Fabric by Kaloom.com - 2-minute video news story

Enhancing Resilience in Applications-Circuit Breaker

Series Blocks & Parallel Blocks in Reliability

Learning to filter routing updates using distribute-list

Learn more about the discovery phase process of installing certificates in your organization.

Can you have 1 IT contractor for it all?

Cubro's Approach to Network Visibility: Harnessing Hardware and Software Integration

VSAT NMS basics and NMS Protocol (Network Management and Protocol)

Detection and Response Services in the ICS Environment (Part 1 - Technical)

Explore topics

System Design For Interviews

7,605 followers

More articles by Kartik S.

System Design Interview - All about Consistent Hashing With Examples

System Design Interviews - Hash based data distribution and Intro to Consistent Hashing

System Design Interviews - CAP Theorem Made Easy

Foundations Of Highly Available System Design - Data Replication And Replication Strategies

Google Summer Of Code Student Application Phase

Insights from the community

Others also viewed

5 Key Concepts for Software Defined Fabric by Kaloom.com - 2-minute video news story

Enhancing Resilience in Applications-Circuit Breaker

Series Blocks & Parallel Blocks in Reliability

Learning to filter routing updates using distribute-list

Learn more about the discovery phase process of installing certificates in your organization.

Can you have 1 IT contractor for it all?

Cubro's Approach to Network Visibility: Harnessing Hardware and Software Integration

VSAT NMS basics and NMS Protocol (Network Management and Protocol)

Detection and Response Services in the ICS Environment (Part 1 - Technical)

Explore topics