Shorticle 925 – Composite availability of cloud infrastructure services in Google cloud
When we choose a cloud platform for our target architecture, we would be considering the fact on high availability, reliability and agility in design for cloud architecture. Hence, when we choose cloud services such as VMs or storage service, we would like to understand the Non-functional requirements for availability and reliability expectation and accordingly choose multi-instance or multi-regional instances or multi-cloud solution architecture.
The percentage of target availability of cloud service is defined by Service Level Agreement (SLA) in cloud platform. Many of Google cloud services have high availability of > 99.5%. On the other side, Service Level Objective (SLO) is the single numerical value defined for high-availability. The expectation on SLA and SLO helps us to choose resilient architecture and cloud resource requirements.
During this approach, to decide how you want your services to be designed based on availability requirement, you need to understand composite availability which is the parameter used to define the compounded availability of all services used for a single application. For example, a web application can use compute service, security features, network security and storage services. The availability of all these to form a single application architecture is called composite availability.
This kind of composite availability is influenced by dependent services such as Cloud Run dependent on Middleware or backend database. This dependent service is classified into
· Serial services where one service is directly dependent on another service and availability is dependent on the dependent service as well.
Recommended by LinkedIn
· Parallel service where one or more service is dependent on one or more service and helps to accelerate the organization’s transformation journey. This will increase the reliability of application as we are taking care of parallel services to be available to handle user requests.
· Independent redundant services where same services are made available across multiple regions. This helps to improve the SLA agreement for end-users to have high available applications.
For a successful infrastructure optimization and availability, it is important to define the Service level objective (SLO) tailored for different applications, error budget to prepare monitoring, self-healing facility and Mean-time to recovery (MTTR) and operational complexity to which the availability is addressed through automated or semi-automated activity.