Adaptive Network Security in the Cloud
Contents:
Introduction
Applications delivered in the cloud face sophisticated cyber threats upon a uniquely diverse and dynamic landscape. With cloud’s agility, and the dynamic surface area of modern, distributed applications, maintaining a tight security posture is increasingly complex and operationally expensive. Organizations leveraging security virtual appliances to secure cloud resources often find themselves hindered by rigid operating patterns that slow business execution, and rising costs to meet throughput demands.
This document will investigate the challenges of reliably deploying virtual appliances across AWS, Azure, and GCP in a sustainable and scalable manner, and present solutions to some of the roadblocks organizations typically face. It will also look at how the Prosimo platform streamlines this integration, lowers cost, and improves operational efficiency.
Security Requirements for the Agile Cloud Era
Securing cloud-native platforms, services, and resources requires a vastly different approach to the traditional castle-and-moat security model that followed many enterprises to the cloud. With cloud, perimeters of communication are no longer rigid, shared services across distributed applications are commonplace, and practices like continuous delivery and production A/B testing make for fluid boundaries.
Consider the following diagram and its depiction of the three phases of cloud adoption.
The first phase of cloud adoption saw cloud as a resource extension to systems in private data centers. Application architecture scarcely differed from that of on-premises – the three-tier app architecture of database, middle-ware, and front-end still reigned – and with it the traditional castle-and-moat security model sufficed. At this stage, operating at scale was not of wide-spread concern and many deployed highly-available pairs of virtual security appliances in each VPC/VNET.
With its rapid expansion of platform and service offerings, the second phase of cloud adoption saw a sharp increase in production workloads running entirely in the cloud, with cloud now perceived as a peer to on-premises data centers. A peer in terms of focus and priority, but not necessarily in implementation patterns. Microservice architecture was gaining popularity but security architecture largely still focused on the platform ingress.
The third phase of cloud adoption positions the cloud as the center of gravity with many organizations adopting a cloud-first approach. Application diversity is widespread with architects leaning in on distributed platforms and services spanning both regions and providers. Communication paths are far from linear and the surface area in need of securing is agile – comparatively chaotic to previous phases.
Scenarios Revealing Cloud Security Challenges
Considering the three phases of cloud adoption, and the security architecture implications of each, organizations are typically presented with the following scenarios and the individual challenges they represent:
Growing Cost of Security Appliance Sprawl
Deploying virtual appliances per VPC / VNET becomes increasingly expensive as adoption grows from 10’s of VPCs / VNETs to 100’s. The combination of compute cost and virtual appliance licensing climbs and, with it, new operational challenges at scale.
Traditional operating practices
At phase 2, with accelerated cloud adoption, the increase in scale presents operational challenges that begin to stall execution and competitive edge, for example:
Appliance vendors responded to the operational challenges with fleet management tools delivering orchestrated patching and upgrades, auditing capabilities, and central configuration management. This era also saw the transition to shared service implementations, with pooled fleets of virtual appliances, and a migration away from the per-VPC/VNET model, whereby application traffic is routed through load-balanced pools presented as a security service. While significantly reducing an organization’s operational cost of their security footprint, two new challenges were presented:
Intelligent traffic steering for virtual appliance consolidation
At phases 2 and/or 3 of cloud adoption organizations often realize two new truths about running a shared pool of virtual security appliances:
The following section looks at the technical challenges organizations face when attempting to address these scenarios.
Technical Challenges & Cloud Platform Gaps
This section will review the capabilities and limitations of addressing these scenarios with native AWS and Azure services and tooling.
Deploying Highly available Virtual Appliances in AWS
The AWS Gateway Load Balancer (GWLB) is a special type of load-balancer that enables you to deploy, scale, and manage virtual appliances such as firewalls, intrusion detection and prevention systems, and deep packet inspection systems within AWS. The GWLB offers several benefits, such as high availability, scalability, security, and simplicity.
GWLB employs a new kind of VPC endpoint called Gateway Load Balancer Endpoint (GWLBe) which serves as the interface for the GWLB. Instead of sending traffic directly to the virtual appliances, the VPC or Transit Gateway will direct the traffic to a GWLBe which is used as the next hop in the VPC or TGW routing table.
The Challenges of Using AWS Gateway Load Balancer
The AWS Gateway Load Balancer (GWLB) comes with some challenges that must be addressed before implementing. Following are the four main challenges of using the GWLB:
Complex Network Configuration
One of the challenges of using the GWLB is the complexity of configuring the network to enable traffic inspection between different Virtual Private Clouds (VPCs). Consider the scenario where traffic between two VPCs is inspected by virtual appliances behind the GWLB. This scenario involves several factors that contribute to the complexity, such as:
Recommended by LinkedIn
Limited Logging and Visibility
Another challenge of using the GWLB is the limited logging and visibility of the traffic and the virtual appliances. Monitoring the GWLB with tools like VPC Flow Logs, CloudWatch, and CloudTrail presents hurdles, such as:
Complex Cost Analysis
The GWLB uses a usage-based pricing model that includes hourly operational charges plus costs based on the number of Gateway Load Balancer Capacity Units (GLCUs) consumed. GLCUs are a composite metric reflecting the various dimensions of traffic processed by the GWLB, calculated on an hourly average. The primary factors contributing to GLCU consumption are:
Lack of Advanced Traffic Steering
The GWLB’s basic traffic forwarding capabilities limit administrators’ control over the network. They face several drawbacks, such as:
Deploying Highly Available Virtual Appliances in Azure
In the dynamic environment of Microsoft Azure, several strategies can be employed to ensure high availability for virtual appliances, tailored to specific traffic patterns. Let’s explore some prominent options:
Azure Load Balancer:
Azure Route Server:
Gateway Load Balancer:
For the purpose of this document, we will delve deeper into the Azure Load Balancer option. It is widely deployed and supports both North-South and East-West traffic flows.
The Challenges of Using Azure Load Balancer
The Azure Load Balancer does provide virtual appliance resiliency however, like the GWLB in AWS, it also comes with challenges:
Complex Network Configuration
Ensuring that all necessary traffic is inspected by Virtual Machines can be a complex task, particularly when configuring the correct routes to ensure that traffic is efficiently routed via the Load Balancer. For instance, in an East–West traffic flow scenario, requires four route Azure tables and the virtual appliance route table of the Virtual appliance to be updated and maintain. However, maintaining these tables at scale can lead to User-Defined Route (UDR) sprawl, especially when subnet ranges can be summarized more efficiently.
The consequences of UDR sprawl include:
Due to VNETs being peered into the same Hub, a significant challenge arises in network segmentation. All attached VNETs have the capability to connect with one another, which complicates efforts to create distinct network segments. Moreover, a misconfiguration of the Virtual Appliance firewall policy could inadvertently permit unauthorized appliances to communicate with each other, potentially compromising network security.
Limited Logging and Visibility
When it comes to identifying connectivity issues, monitoring and analyzing various logs such as load balancer logs, Virtual Network (VNet) flow logs, and Network Security Group (NSG) rules pose several challenges:
Complex Cost Analysis
Estimating the total cost of an Azure Load Balancer is complex due to its usage based pricing. Calculating the cost of the Azure load balancer requires one to have an understanding of the data that needs to be processed and the number of Load Balancer rules you will use. Gathering these metrics in a large dynamic cloud environment can be extremely challenging which makes predicting costs and chargeback processes among different departments complicated.
A Sophisticated Approach
Networking vendors early to cloud, at a time when cloud-native offerings were scarce and limited in capability, created network overlay solutions focused on layer 3 (IP to IP, subnet to subnet) communications, which worked for phase 1 of cloud adoption, with traditional app architecture, but fell short for the platform and service diversity that came with phases 2 & 3. Networking with application centricity is required to leverage modern cloud capabilities, and the ability to define adaptive cloud network security policies.
Prosimo’s Adaptive Service Insertion simplifies compliance in the cloud by allowing fine-grained policy definition and real-time visibility to selectively insert stateful services such as firewalls in the path of networks and apps. This reduces the risk of human error, simplifies ongoing maintenance, and helps right-size the services to save costs.
Define adaptive security service insertion policy once and apply to application network paths to: