Adaptive Network Security in the Cloud

Adaptive Network Security in the Cloud


Contents:

  • Introduction
  • Security Requirements for the Agile Cloud Era
  • Scenarios Revealing Cloud Security Challenges
  • Technical Challenges & Cloud Platform Gaps: AWS, Azure
  • A Sophisticated Approach


Introduction

Applications delivered in the cloud face sophisticated cyber threats upon a uniquely diverse and dynamic landscape. With cloud’s agility, and the dynamic surface area of modern, distributed applications, maintaining a tight security posture is increasingly complex and operationally expensive. Organizations leveraging security virtual appliances to secure cloud resources often find themselves hindered by rigid operating patterns that slow business execution, and rising costs to meet throughput demands.


The networking chaos of modern cloud adoption


This document will investigate the challenges of reliably deploying virtual appliances across AWS, Azure, and GCP in a sustainable and scalable manner, and present solutions to some of the roadblocks organizations typically face. It will also look at how the Prosimo platform streamlines this integration, lowers cost, and improves operational efficiency.


Security Requirements for the Agile Cloud Era

Securing cloud-native platforms, services, and resources requires a vastly different approach to the traditional castle-and-moat security model that followed many enterprises to the cloud. With cloud, perimeters of communication are no longer rigid, shared services across distributed applications are commonplace, and practices like continuous delivery and production A/B testing make for fluid boundaries.

Consider the following diagram and its depiction of the three phases of cloud adoption.


Trends Driving Cloud Network Sophistication


The first phase of cloud adoption saw cloud as a resource extension to systems in private data centers. Application architecture scarcely differed from that of on-premises – the three-tier app architecture of database, middle-ware, and front-end still reigned – and with it the traditional castle-and-moat security model sufficed. At this stage, operating at scale was not of wide-spread concern and many deployed highly-available pairs of virtual security appliances in each VPC/VNET.

With its rapid expansion of platform and service offerings, the second phase of cloud adoption saw a sharp increase in production workloads running entirely in the cloud, with cloud now perceived as a peer to on-premises data centers. A peer in terms of focus and priority, but not necessarily in implementation patterns. Microservice architecture was gaining popularity but security architecture largely still focused on the platform ingress.

The third phase of cloud adoption positions the cloud as the center of gravity with many organizations adopting a cloud-first approach. Application diversity is widespread with architects leaning in on distributed platforms and services spanning both regions and providers. Communication paths are far from linear and the surface area in need of securing is agile – comparatively chaotic to previous phases.


Scenarios Revealing Cloud Security Challenges

Considering the three phases of cloud adoption, and the security architecture implications of each, organizations are typically presented with the following scenarios and the individual challenges they represent:

  • Growing cost of security appliance sprawl
  • Traditional operating practices stalling modern continuous deployment patterns
  • Need for intelligent application traffic steering to tighten security while reducing footprint


Growing Cost of Security Appliance Sprawl

Deploying virtual appliances per VPC / VNET becomes increasingly expensive as adoption grows from 10’s of VPCs / VNETs to 100’s. The combination of compute cost and virtual appliance licensing climbs and, with it, new operational challenges at scale.


Traditional operating practices

At phase 2, with accelerated cloud adoption, the increase in scale presents operational challenges that begin to stall execution and competitive edge, for example:

  • keeping on top of virtual appliance lifecycle tasks such as upgrades and patching
  • ongoing policy management across a highly distributed fleet in the face of continuous deployment practices; or as the network and security appliance operators see it, Continuous Re-Configuration

Appliance vendors responded to the operational challenges with fleet management tools delivering orchestrated patching and upgrades, auditing capabilities, and central configuration management. This era also saw the transition to shared service implementations, with pooled fleets of virtual appliances, and a migration away from the per-VPC/VNET model, whereby application traffic is routed through load-balanced pools presented as a security service. While significantly reducing an organization’s operational cost of their security footprint, two new challenges were presented:

  • How to ensure that all traffic is routed to the shared service pool, and;
  • Scaling the shared service pool cost effectively


Intelligent traffic steering for virtual appliance consolidation

At phases 2 and/or 3 of cloud adoption organizations often realize two new truths about running a shared pool of virtual security appliances:

  1. routing every session through the shared security fleet is both extremely expensive and unnecessary, however;
  2. without advanced and precise application traffic control, and the observability to do so without risk, it is required to accept those costs


The following section looks at the technical challenges organizations face when attempting to address these scenarios.


Technical Challenges & Cloud Platform Gaps

This section will review the capabilities and limitations of addressing these scenarios with native AWS and Azure services and tooling.

Deploying Highly available Virtual Appliances in AWS

The AWS Gateway Load Balancer (GWLB) is a special type of load-balancer that enables you to deploy, scale, and manage virtual appliances such as firewalls, intrusion detection and prevention systems, and deep packet inspection systems within AWS. The GWLB offers several benefits, such as high availability, scalability, security, and simplicity.

GWLB employs a new kind of VPC endpoint called Gateway Load Balancer Endpoint (GWLBe) which serves as the interface for the GWLB. Instead of sending traffic directly to the virtual appliances, the VPC or Transit Gateway will direct the traffic to a GWLBe which is used as the next hop in the VPC or TGW routing table.

The Challenges of Using AWS Gateway Load Balancer

The AWS Gateway Load Balancer (GWLB) comes with some challenges that must be addressed before implementing. Following are the four main challenges of using the GWLB:

  • complex network configuration
  • limited logging and visibility
  • intricate cost analysis
  • a lack of advanced traffic steering.


Complex Network Configuration

One of the challenges of using the GWLB is the complexity of configuring the network to enable traffic inspection between different Virtual Private Clouds (VPCs). Consider the scenario where traffic between two VPCs is inspected by virtual appliances behind the GWLB. This scenario involves several factors that contribute to the complexity, such as:

  • The existence of six independent route tables across various VPCs and transit gateways, each demanding individual management. This decentralization amplifies the risk of errors that could lead to network outages.
  • The laborious and time-intensive nature of troubleshooting due to this multifaceted configuration.
  • An escalating complexity as the network expands to include more VPCs and network segments.
  • The need for an increased number of meticulously configured route tables to ensure optimal traffic flow and avoid asymmetric routing issues as the network grows.


Limited Logging and Visibility

Another challenge of using the GWLB is the limited logging and visibility of the traffic and the virtual appliances. Monitoring the GWLB with tools like VPC Flow Logs, CloudWatch, and CloudTrail presents hurdles, such as:

  • The integration process for data from these sources is complicated by their diverse formats and voluminous logs.
  • Comprehensive logging can be expensive; hence selective data capture is often employed but risks omitting vital information.
  • Analyzing disrupted traffic flows often necessitates examining access logs from virtual appliances behind the GWLB—a task complicated by multi-team coordination requirements.
  • Addressing connectivity issues associated with the GWLB is time-intensive, as it involves differentiating between configuration errors and intentional traffic obstructions instigated by firewalls, which prolongs the mean time to resolution (MTTR).


Complex Cost Analysis

The GWLB uses a usage-based pricing model that includes hourly operational charges plus costs based on the number of Gateway Load Balancer Capacity Units (GLCUs) consumed. GLCUs are a composite metric reflecting the various dimensions of traffic processed by the GWLB, calculated on an hourly average. The primary factors contributing to GLCU consumption are:

  • New Connections/Flows: The rate of new connections or flows established per second.
  • Active Connections/Flows: The peak number of concurrent connections or flows, sampled every minute.
  • Processed Bytes: The total volume of data processed by the GWLB, measured in gigabytes (GBs).
  • Predicting the metrics for GWLB usage is challenging due to fluctuating traffic patterns affecting GLCU consumption rates. This variability causes notable difficulties in budget forecasting and implementing precise chargeback processes among different departments.


Lack of Advanced Traffic Steering

The GWLB’s basic traffic forwarding capabilities limit administrators’ control over the network. They face several drawbacks, such as:

  • They have to balance the need for thorough traffic inspection with the implications of higher resource consumption, as they face a trade-off between robust security and increased costs.
  • They cannot fine-tune traffic routing based on granular details like source IP or content type—limiting flexibility in managing the network.
  • They may have to scale up their infrastructure to handle the broad traffic selection, which can lead to higher operational costs and inefficiencies, as more resources are consumed to process and inspect all traffic directed to the given destination IP, including that which may not require such scrutiny.
  • They have to deal with the trade-offs and limitations of using only the destination IP address for selecting the flow path, as they must carefully consider their security policies and financial constraints to optimize both protection and cost.


Deploying Highly Available Virtual Appliances in Azure

In the dynamic environment of Microsoft Azure, several strategies can be employed to ensure high availability for virtual appliances, tailored to specific traffic patterns. Let’s explore some prominent options:

Azure Load Balancer:

  • Supports configurations including active/active, active/standby, and scale-out.
  • Provides flexibility in deployment.

Azure Route Server:

  • Requires virtual appliances to support BGP.
  • Supports active/active and active/standby configurations.

Gateway Load Balancer:

  • Guarantees traffic symmetry without SNAT.
  • Allows NVAs (Network Virtual Appliances) to be shared across tenants.
  • Boasts excellent convergence time.
  • Supports active/active, active/standby, and scale-out virtual appliances.


For the purpose of this document, we will delve deeper into the Azure Load Balancer option. It is widely deployed and supports both North-South and East-West traffic flows.


The Challenges of Using Azure Load Balancer

The Azure Load Balancer does provide virtual appliance resiliency however, like the GWLB in AWS, it also comes with challenges:

  • Complex Network Configuration
  • Limited Logging and Visibility
  • Complex Cost Analysis


Complex Network Configuration

Ensuring that all necessary traffic is inspected by Virtual Machines can be a complex task, particularly when configuring the correct routes to ensure that traffic is efficiently routed via the Load Balancer. For instance, in an East–West traffic flow scenario, requires four route Azure tables and the virtual appliance route table of the Virtual appliance to be updated and maintain. However, maintaining these tables at scale can lead to User-Defined Route (UDR) sprawl, especially when subnet ranges can be summarized more efficiently.

The consequences of UDR sprawl include:

  1. Operational Complexity: Managing multiple route tables requires careful configuration and monitoring. As the number of route tables grows, operational complexity increases, making it harder to ensure consistent and r network flows.
  2. Configuration Errors: With more route tables, the chances of misconfigurations rise. Incorrect routes or missing entries can disrupt traffic flow, impacting application availability and performance.
  3. Scalability Issues: As your infrastructure scales, maintaining individual route tables for each subnet becomes cumbersome. It can slow down network changes and hinder agility.
  4. Resource Limits: Each UDR consumes resources within Azure. Having too many can mean you may hit a limit preventing you for adding new routes.
  5. Security Risks: Misconfigured route tables may inadvertently expose sensitive resources or prevent traffic from being inspected. Proper management is crucial to maintain security boundaries.


Due to VNETs being peered into the same Hub, a significant challenge arises in network segmentation. All attached VNETs have the capability to connect with one another, which complicates efforts to create distinct network segments. Moreover, a misconfiguration of the Virtual Appliance firewall policy could inadvertently permit unauthorized appliances to communicate with each other, potentially compromising network security.


Limited Logging and Visibility

When it comes to identifying connectivity issues, monitoring and analyzing various logs such as load balancer logs, Virtual Network (VNet) flow logs, and Network Security Group (NSG) rules pose several challenges:

  1. Diverse Data Formats and Voluminous Logs: Integrating data from diverse sources (load balancer logs, VNet flow logs, NSG rules) is complicated due to their different formats. Handling logs with varying structures requires careful parsing and normalization.
  2. Cost vs. Selective Data Capture: Comprehensive logging captures all network activity but can be expensive. Organizations often resort to selective data capture to manage costs.
  3. Access Logs from Virtual Appliances: Troubleshooting involves examining access logs from virtual appliances behind the Load Balancer. Coordinating across multiple teams to access and interpret these logs.
  4. Configuration Errors vs. Firewall Obstructions: Distinguishing between unintentional configuration errors and deliberate traffic obstructions. Identifying the cause prolongs the Mean Time to Resolution (MTTR).


Complex Cost Analysis

Estimating the total cost of an Azure Load Balancer is complex due to its usage based pricing. Calculating the cost of the Azure load balancer requires one to have an understanding of the data that needs to be processed and the number of Load Balancer rules you will use. Gathering these metrics in a large dynamic cloud environment can be extremely challenging which makes predicting costs and chargeback processes among different departments complicated.

  1. Variable Workloads: In a dynamic environment, workloads can fluctuate based on factors such as user demand, application usage patterns, and seasonal variations. Estimating ILB costs becomes challenging when workload variability is high.
  2. Scaling Requirements: The number of instances and the instance size of the ILB may need to scale dynamically to accommodate changes in workload demand. Estimating costs accurately requires predicting the scaling requirements and associated costs.
  3. Data Transfer Patterns: The amount of data processed and transferred by the ILB can vary based on factors such as network traffic volume, communication patterns between VMs, and data processing requirements. Estimating data transfer costs accurately requires understanding and predicting these patterns.
  4. Resource Interdependencies: In a large environment with multiple interconnected resources, such as VMs, VNets, and other Azure services, estimating ILB costs becomes more complex due to resource interdependencies and their impact on ILB usage and performance.


A Sophisticated Approach

Networking vendors early to cloud, at a time when cloud-native offerings were scarce and limited in capability, created network overlay solutions focused on layer 3 (IP to IP, subnet to subnet) communications, which worked for phase 1 of cloud adoption, with traditional app architecture, but fell short for the platform and service diversity that came with phases 2 & 3. Networking with application centricity is required to leverage modern cloud capabilities, and the ability to define adaptive cloud network security policies.


Prosimo Console - Adaptive Service Insertion


Prosimo’s Adaptive Service Insertion simplifies compliance in the cloud by allowing fine-grained policy definition and real-time visibility to selectively insert stateful services such as firewalls in the path of networks and apps. This reduces the risk of human error, simplifies ongoing maintenance, and helps right-size the services to save costs.

Define adaptive security service insertion policy once and apply to application network paths to:

  • Increase control across a diverse and distributed application surface area
  • Reduce costs through traffic steering precision and avoid bloated one-size-fits-all scaling

Learn more: prosimo.io/event/adaptive-cloud-network-security/


To view or add a comment, sign in

More articles by Prosimo.io

Insights from the community

Others also viewed

Explore topics