Cloud Data Transfer & Storage Cost Reduction Strategies
Introduction
Most companies focus their cloud cost reductions strategies initially on reducing the costs of compute, processing power, servers or virtual machines. Companies with large cloud footprints often overlook data transfer and storage cost reductions strategies which can significantly reduce companies annual cloud spend when properly implemented. especially when not optimized. To understand these strategies, one must have a solid understanding of how cloud data transfer and storage works. (Please read my other article Cloud Cost Management, Optimization & Savings Strategies that contains further cloud cost saving strategies)
Basic Data Transfer
Basic cloud data transfer involves moving data between local systems and cloud storage or between different cloud environments. It typically works as follows (1):
Different providers might have specific tools and services to facilitate data transfer, such as AWS Snowball, Azure Data Box or GCP Transfer Appliance, particularly for large datasets.
Data Ingress vs. Egress
One of the more complex and difficult to understand cost drivers in the cloud operations are data transfer costs. Cloud hosting providers charge companies when data transfers in (ingress) and when data is exported (egress).
The following tables show which types of charges that the big three cloud providers charge for data transfer traffic (2):
Recommendations to Minimize Data Transfer Costs
The following tables contain general recommendations and advice to minimize and reduce data transfer costs for AWS, Azure and GCP (2):
How Cloud Storage Works
It is important to understand how cloud storage and architecture works to evaluate cost reduction opportunities. There are three main storage technologies that are currently utilized by most cloud providers (2):
1. Block storage is similar to Storage Area Network (SAN) or Direct Attached Storage (DAS) storage. Data is stored in the form of blocks and offers low latency and superior performance. It is used for virtual machines and is higher cost than other storage types.
2. File storage allows access storage through a shared file system or a file share with different clients. It works similarly to Network Attached Storage (NAS) devices with common protocols such as Network File System (NFS) and Server Message Block (SMB), used in Windows and Linux, respectively.
3. Object storage allows data to be stored as objects. This storage method is especially well suited for unstructured datasets. It is a storage engine based on HTTP REST APIs for storage operations. This method of storage is very cost-effective, but not all cloud applications support its use.
Methodologies for Optimizing Storage and Reducing Costs
There are a variety of ways of managing storage to improve operational efficiencies and reduce long-term storage costs including the following (1):
Storage Tiers
By default, most cloud providers store data in the standard classes or what is called hot storage. Standard storage is usually the most expensive storage class as data stored in this class can be rapidly retrieved on demand. If a company’s data is not needed as quickly then moving it to a lower tier where it cannot be retrieved as quickly can result in significant savings. Tiered storage solutions use cost-effective storage classes based on data access patterns (e.g., AWS S3 offers Standard, Infrequent Access, and Glacier tiers). One must be careful though in not moving critical data to a storage tier that will take too long to retrieve that has operational impact. Cloud hosting providers do charge for moving data from one tier to the next and for data that spends only a brief period in a tier. Companies should utilize specialized software that moves and stores data based on pre-defined rules.
Recommended by LinkedIn
Reducing Disk Size in Cloud Providers
Disk size reduction is currently not supported in either Azure managed disks, Amazon EBS, or GCP Compute Engine Persistent Disk. The only way to reduce disk size is by migrating data manually, using tools such as AzCopy (Azure) or Robocopy, this why it important to right size the disk size you need when creating virtual machines. If disks are not properly adjusted to their correspondent use case, a firm is paying for something that it is not using.
Thick vs. Thin Provisioning in Disks
Thin and thick provisioned disks offer similar storage capabilities, but they do have differences. When a firm uses thick provisioning, disk capacity in its entirety is pre-allocated to a machine. Thick disks allocate the entirety of required storage space at the time of creation. With thin provisioning only a portion of the disk is allocated for storage that is necessary for the machine for function and then more is added dynamically as needed. Thin disks allocate storage space on demand up until the disk reaches its maximum required amount. Utilizing one method over the other can result in storage cost savings depending on how they are implemented.
Storage Redundancy
Storage redundancy means that data is replicated/copied or backed up in various locations. Building these replication mechanisms can be very costly without the cloud, as a firm would need to have different data centers to host our virtual machines, as well as strong network connections between them to guarantee consistent replication.
Disk Snapshots
A disk snapshot captures the contents of a disk whether the disk is attached to a running virtual machine (VM) instance. A snapshot captures an image of a storage drive at a specific point in time. A snapshot is not a complete and full back up. Snapshots are easy to do which is why company technical teams like them, but they are not substitutes for full backups. Snapshots allow one to roll back if something goes wrong to a specific point in time. Snapshots are point-in-time copies of specific disks, while a backup protects a complete virtual machine, including all its disks.
Selective Disk Back-up
Selective backup refers to the process through which specific files or directories are chosen for backup, as opposed to backing up the entire system. This approach allows for a greater degree of control over the data that is preserved, ensuring that only necessary and relevant information is securely stored.
Versioning & Soft Deletes
Object storage offers the capability to keep multiple versions of the same files, as well as soft delete features, to avoid loss of data or unwanted deletion/modification of objects. When versioning is active, every time an object is modified a new version is created and stored. When soft delete is turned on, deleted data is stored in the bucket or storage account for a specified retention period.
Backup Redundancy
The more redundant backups that a firm has in their cloud environments the higher their storage costs. Redundancy options should only be utilized when required. Redundancy options that span regions have a higher price tag and only make sense in critical multi-regional solutions.
Back-up Policies
Companies with significant cloud footprints need to have well thought out backup policies to ensure continuity of operations for internal applications and customers. A typical backup policy describes things such as:
Once all these parameters are well defined then a firm begins cost optimization for backups and general storage activities.
Automated Backups and Storage Tier Movement
Many companies utilize highly sophisticated software that automatically backs up cloud environments on regular schedules using policy rules. In addition, these tools have sophisticated algorithms that evaluate data storage at different storage tiers and move data in regular interval from hotter tiers to colder tiers as the data is less urgently needed.
Conclusion
Companies with large cloud footprints often overlook data transfer and storage cost reduction strategies which can significantly reduce monthly cloud spend. Data transfer ingress and egress costs will vary from one cloud provider to the next, so it is important to understand all options. There is a vast array of methodologies for optimizing cloud storage assets from an operational and and cost perspective. These options will change over time as a company’s cloud footprint changes.