Cloud Data Transfer & Storage Cost Reduction Strategies

Cloud Data Transfer & Storage Cost Reduction Strategies

Introduction

Most companies focus their cloud cost reductions strategies initially on reducing the costs of compute, processing power, servers or virtual machines. Companies with large cloud footprints often overlook data transfer and storage cost reductions strategies which can significantly reduce companies annual cloud spend when properly implemented.  especially when not optimized. To understand these strategies, one must have a solid understanding of how cloud data transfer and storage works. (Please read my other article Cloud Cost Management, Optimization & Savings Strategies that contains further cloud cost saving strategies)

Basic Data Transfer

Basic cloud data transfer involves moving data between local systems and cloud storage or between different cloud environments. It typically works as follows (1):

  1. Data Preparation: Identify the data you want to transfer, whether it's files, databases, or applications.
  2. Connection: Establish a secure connection to the cloud service provider. This may involve using APIs, SDKs, or command-line tools provided by the cloud provider.
  3. Data Transfer Protocols: Use transfer protocols like HTTP, FTP, or SFTP to send data. Cloud providers often offer specialized protocols optimized for speed and reliability.
  4. Encryption: To ensure security during transfer, data is often encrypted. Many cloud services provide built-in encryption options.
  5. Transfer Mechanism: Initiate the transfer. This can be done manually, through scripts, or automated processes (like scheduled backups).
  6. Monitoring and Management: Monitor the transfer process for errors or interruptions. Many cloud services provide dashboards to track transfer progress.
  7. Data Integrity Checks: After transfer, checksums or hashes may be used to verify that the data was transferred accurately and completely.
  8. Finalization: Once data is successfully transferred, it can be accessed or utilized in the cloud environment.

Different providers might have specific tools and services to facilitate data transfer, such as AWS Snowball, Azure Data Box or GCP Transfer Appliance, particularly for large datasets.

Data Ingress vs. Egress

One of the more complex and difficult to understand cost drivers in the cloud operations are data transfer costs. Cloud hosting providers charge companies when data transfers in (ingress) and when data is exported (egress).

  • Data Transfer Ingress/Inbound: This is any data and information that is uploaded or moved to cloud hosting provider. This traffic is usually free because cloud providers want customers to upload as much information as possible to their cloud environments.
  • Data Transfer Egress/Outbound: This type of data transfer is used when a company downloads, exports or transfers information from the cloud to other services or other cloud regions. Cloud providers charge higher rates the greater the distance that the data need to be moved.  Data egress charges can be quite large, so it needs to be carefully monitored.

The following tables show which types of charges that the big three cloud providers charge for data transfer traffic (2):

Recommendations to Minimize Data Transfer Costs

The following tables contain general recommendations and advice to minimize and reduce data transfer costs for AWS, Azure and GCP (2):

How Cloud Storage Works

It is important to understand how cloud storage and architecture works to evaluate cost reduction opportunities.  There are three main storage technologies that are currently utilized by most cloud providers (2):

1. Block storage is similar to Storage Area Network (SAN) or Direct Attached Storage (DAS) storage. Data is stored in the form of blocks and offers low latency and superior performance. It is used for virtual machines and is higher cost than other storage types.

2. File storage allows access storage through a shared file system or a file share with different clients. It works similarly to Network Attached Storage (NAS) devices with common protocols such as Network File System (NFS) and Server Message Block (SMB), used in Windows and Linux, respectively.

3. Object storage allows data to be stored as objects. This storage method is especially well suited for unstructured datasets. It is a storage engine based on HTTP REST APIs for storage operations. This method of storage is very cost-effective, but not all cloud applications support its use.

Methodologies for Optimizing Storage and Reducing Costs

There are a variety of ways of managing storage to improve operational efficiencies and reduce long-term storage costs including the following (1):

Storage Tiers

By default, most cloud providers store data in the standard classes or what is called hot storage. Standard storage is usually the most expensive storage class as data stored in this class can be rapidly retrieved on demand. If a company’s data is not needed as quickly then moving it to a lower tier where it cannot be retrieved as quickly can result in significant savings. Tiered storage solutions use cost-effective storage classes based on data access patterns (e.g., AWS S3 offers Standard, Infrequent Access, and Glacier tiers). One must be careful though in not moving critical data to a storage tier that will take too long to retrieve that has operational impact. Cloud hosting providers do charge for moving data from one tier to the next and for data that spends only a brief period in a tier. Companies should utilize specialized software that moves and stores data based on pre-defined rules.

Reducing Disk Size in Cloud Providers

Disk size reduction is currently not supported in either Azure managed disks, Amazon EBS, or GCP Compute Engine Persistent Disk.  The only way to reduce disk size is by migrating data manually, using tools such as AzCopy (Azure) or Robocopy, this why it important to right size the disk size you need when creating virtual machines. If disks are not properly adjusted to their correspondent use case, a firm is paying for something that it is not using.

Thick vs. Thin Provisioning in Disks

Thin and thick provisioned disks offer similar storage capabilities, but they do have differences. When a firm uses thick provisioning, disk capacity in its entirety is pre-allocated to a machine. Thick disks allocate the entirety of required storage space at the time of creation. With thin provisioning only a portion of the disk is allocated for storage that is necessary for the machine for function and then more is added dynamically as needed. Thin disks allocate storage space on demand up until the disk reaches its maximum required amount. Utilizing one method over the other can result in storage cost savings depending on how they are implemented.

Storage Redundancy

Storage redundancy means that data is replicated/copied or backed up in various locations.  Building these replication mechanisms can be very costly without the cloud, as a firm would need to have different data centers to host our virtual machines, as well as strong network connections between them to guarantee consistent replication.

Disk Snapshots

A disk snapshot captures the contents of a disk whether the disk is attached to a running virtual machine (VM) instance. A snapshot captures an image of a storage drive at a specific point in time.  A snapshot is not a complete and full back up. Snapshots are easy to do which is why company technical teams like them, but they are not substitutes for full backups. Snapshots allow one to roll back if something goes wrong to a specific point in time.  Snapshots are point-in-time copies of specific disks, while a backup protects a complete virtual machine, including all its disks.

Selective Disk Back-up

Selective backup refers to the process through which specific files or directories are chosen for backup, as opposed to backing up the entire system. This approach allows for a greater degree of control over the data that is preserved, ensuring that only necessary and relevant information is securely stored.

Versioning & Soft Deletes

Object storage offers the capability to keep multiple versions of the same files, as well as soft delete features, to avoid loss of data or unwanted deletion/modification of objects. When versioning is active, every time an object is modified a new version is created and stored. When soft delete is turned on, deleted data is stored in the bucket or storage account for a specified retention period.

Backup Redundancy

The more redundant backups that a firm has in their cloud environments the higher their storage costs. Redundancy options should only be utilized when required. Redundancy options that span regions have a higher price tag and only make sense in critical multi-regional solutions.

Back-up Policies

Companies with significant cloud footprints need to have well thought out backup policies to ensure continuity of operations for internal applications and customers. A typical backup policy describes things such as:

  • Which data needs to be retained?
  • How long or for what time period does the data need to be kept?
  • When do backups occur and how frequently?
  • The technical process that is followed for the data to be deleted.Which data needs to be retained?

Once all these parameters are well defined then a firm begins cost optimization for backups and general storage activities.

Automated Backups and Storage Tier Movement

Many companies utilize highly sophisticated software that automatically backs up cloud environments on regular schedules using policy rules. In addition, these tools have sophisticated algorithms that evaluate data storage at different storage tiers and move data in regular interval from hotter tiers to colder tiers as the data is less urgently needed.

Conclusion

Companies with large cloud footprints often overlook data transfer and storage cost reduction strategies which can significantly reduce monthly cloud spend. Data transfer ingress and egress costs will vary from one cloud provider to the next, so it is important to understand all options. There is a vast array of methodologies for optimizing cloud storage assets from an operational and and cost perspective.  These options will change over time as a company’s cloud footprint changes.

 

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics