Erfahren Sie mehr über einige der bewährten Methoden zur Optimierung der AWS S3-Speicherleistung und -kosten, z. B. die Auswahl der richtigen Speicherklasse, die Partitionierung Ihrer Daten, die Verwendung von Präfixen und Caching sowie die Überwachung Ihrer Nutzung.

In addition to partitioning other patterns can be utilized either in replacement or in coexistence such as Bucketing, and other patterns dedicated to some file format like ZOrdering in Delta.

A. Cost: 1/ Use the Right Storage Class and consider transitioning infrequently accessed data to lower-cost storage classes. 2/ Enable Versioning Wisely: Use versioning for critical data but be mindful of storage costs and potential increase in operations. B. Performance: 1/ Distribute requests across multiple prefixes to avoid contention. Use CloudFront for content delivery to improve latency and reduce costs. 2/ Review and Optimize: Use Amazon S3 metrics and CloudWatch alarms to monitor performance. Regularly review and optimize your S3 storage configurations based on changing needs. And finally, always keep an eagle eye on taking advantage of new features and improvements.

- Store file with compression, for example when using parquet file you can choose to compress data with algorithms such as gzip, snappy, etc... - Put object versioning only on necessary prefixes & Remove versions of objects that are no more necessary - Define data storage strategies like data depth, data retention depending on the datalake layer - Backup your unsed data using lifecycle strategies or transition to move it to glacier or deep archive.. - Store only a portion of data depending on the Datalake/LakeHouse layer, as instance in medaillon architecture Gold Layer, it possible to only store the depth of data required by the use case - Use lifecycle strategies

Key recommendations: Performance Optimization: Choose the Right Storage Class: Use the appropriate storage class based on data access patterns. For frequently accessed data, use Standard or Intelligent-Tiering. For infrequently accessed data, use Standard-IA For archival data, use Glacier or Glacier Deep Archive. Enable Transfer Acceleration: Use S3 Transfer Acceleration for faster data uploads and downloads by leveraging the CloudFront global content delivery network. Multipart Uploads: For large object uploads, use multipart uploads to improve efficiency and reliability. Use CloudFront for Content Delivery: Integrate S3 with Amazon CloudFront for content delivery.

Analyze storage usage and access patterns using Amazon S3 Storage Lens to optimize costs. Leverage different S3 storage classes like S3 Intelligent Tiering, S3 Standard - Infrequent Access, S3 Glacier according to data access patterns and performance needs. Use S3 Lifecycle policies to automatically transition data to the right storage class based on access patterns, saving costs over time. Parallelize read/write operations across multiple prefixes within or across buckets to scale performance up to thousands of requests per second.

Use lifecycle policies to manage storage. This uses combination of rules to transition objects to another storage class, eg. move objects from S3 Standard to S3 IA after 7 days and to Deep Glacier Archive after 30 days. Its is important to also understand the usage patterns to best implement this.

Adopting columnar storage formats like Apache Parquet or ORC, which are optimized for Athena, ensures faster and more efficient queries. Utilizing compression algorithms such as Snappy or GZIP reduces S3 storage costs but also improves read times. Implementing data partitioning and using prefixes effectively. This strategy enables scaling each data segment independently, enhancing data retrieval. Integrating a Glue Data Catalog for efficient metadata storage, which streamlines Athena query performance. Understanding and implementing these best practices are crucial for handling large volumes of data efficiently.

So optimieren Sie die AWS S3-Speicherleistung und -kosten

1 Wählen Sie die richtige Speicherklasse

S3 bietet sechs verschiedene Speicherklassen, die sich in Preis, Verfügbarkeit, Haltbarkeit und Abrufgeschwindigkeit unterscheiden. Abhängig von Ihren Datenanforderungen können Sie die am besten geeignete Speicherklasse für Ihre Daten auswählen. Wenn Sie beispielsweise häufig und schnell auf Ihre Daten zugreifen müssen, können Sie S3 Standard oder S3 Intelligent-Tiering verwenden, die eine hohe Leistung und geringe Latenz bieten. Wenn Sie Daten speichern müssen, auf die selten zugegriffen wird oder die archiviert werden, können Sie S3 Standard-Infrequent Access verwenden (S3 Standard-IA), S3 One Zone-Seltener Zugriff (S3 Eine Zone-IA), S3 Glacier oder S3 Glacier Deep Archive, die niedrigere Speicherkosten, aber höhere Abrufgebühren und längere Abrufzeiten bieten.

Fügen Sie Ihre Sichtweise hinzu

Anil Kumar Khichar

CTO at Elastiq | Ex-Amazonian
Beitrag melden
A. Cost: 1/ Use the Right Storage Class and consider transitioning infrequently accessed data to lower-cost storage classes. 2/ Enable Versioning Wisely: Use versioning for critical data but be mindful of storage costs and potential increase in operations. B. Performance: 1/ Distribute requests across multiple prefixes to avoid contention. Use CloudFront for content delivery to improve latency and reduce costs. 2/ Review and Optimize: Use Amazon S3 metrics and CloudWatch alarms to monitor performance. Regularly review and optimize your S3 storage configurations based on changing needs. And finally, always keep an eagle eye on taking advantage of new features and improvements.

Übersetzt

Gefällt mir
Subramaniam Swaminathan

Director of Technology @ Natwest ★ Tech Strategy & Business Alignment ★ Enterprise Architecture & Transformation★ Operational Excellence & Risk ★ DevOps, CICD & Agile Expertise★ Leadership & Stakeholder Engagement.
Beitrag melden
Key recommendations: Performance Optimization: Choose the Right Storage Class: Use the appropriate storage class based on data access patterns. For frequently accessed data, use Standard or Intelligent-Tiering. For infrequently accessed data, use Standard-IA For archival data, use Glacier or Glacier Deep Archive. Enable Transfer Acceleration: Use S3 Transfer Acceleration for faster data uploads and downloads by leveraging the CloudFront global content delivery network. Multipart Uploads: For large object uploads, use multipart uploads to improve efficiency and reliability. Use CloudFront for Content Delivery: Integrate S3 with Amazon CloudFront for content delivery.

Übersetzt

Gefällt mir
Arnaud JEAN (John Wick)

Delighting DEVelopers EXperience
(bearbeitet)
Beitrag melden
Analyze storage usage and access patterns using Amazon S3 Storage Lens to optimize costs. Leverage different S3 storage classes like S3 Intelligent Tiering, S3 Standard - Infrequent Access, S3 Glacier according to data access patterns and performance needs. Use S3 Lifecycle policies to automatically transition data to the right storage class based on access patterns, saving costs over time. Parallelize read/write operations across multiple prefixes within or across buckets to scale performance up to thousands of requests per second.

Übersetzt

Gefällt mir
Sagar Navroop

Data Architect | AI | MLOps | AWS | SIEM | Observability | Technologist
Beitrag melden
To optimize AWS S3 storage for peak performance and cost efficiency, embrace Amazon S3 Intelligent-Tiering. Imagine an e-commerce platform: hot-selling product images stay in a frequent-access tier, while older inventory gracefully transitions to a more economical storage class. This minimizes costs without compromising retrieval speed. Leverage Amazon S3 Transfer Acceleration for faster data uploads, ensuring swift content delivery. Additionally, implement Lifecycle policies to automate transitions to Glacier for archival, reducing long-term storage expenses. By tailoring storage classes to usage patterns and automating cost-effective transitions, businesses achieve optimal performance and significant AWS S3 storage cost savings.

Übersetzt

Gefällt mir
Vimal Purohit

Technology Leader | Curious & Creative | Get things done
Beitrag melden
Addition to four points mentioned - Enable S3 Object Lifecycle Management - delete the expired objects/content automatically - Regular inventory of S3 contents and run automated script to analyze and purge - Use AWS Glue Crawlers to tag/search for missing expiring date objects and address them

Übersetzt

Gefällt mir

Weitere Beiträge laden

2 Partitionieren Sie Ihre Daten

Das Partitionieren Ihrer Daten bedeutet, dass Ihre Daten basierend auf bestimmten Kriterien wie Datum, Uhrzeit, Region oder Kategorie in logische Teilmengen organisiert werden. Durch die Partitionierung Ihrer Daten können Sie Ihre S3-Leistung und -Kosten verbessern, indem Sie die Menge der Daten reduzieren, die von Ihren Abfragen oder Anwendungen gescannt, übertragen oder verarbeitet werden. Wenn Sie beispielsweise AWS Athena oder AWS Glue verwenden, um Ihre in S3 gespeicherten Daten abzufragen, können Sie Ihre Daten nach Datum partitionieren und eine WHERE-Klausel verwenden, um nur die relevanten Partitionen zu filtern. Auf diese Weise können Sie Zeit und Geld sparen, indem Sie das Scannen oder Verarbeiten unnötiger Daten vermeiden.

Fügen Sie Ihre Sichtweise hinzu

Mehdi TAZI

Chief Technical Officer | Data & Cloud Architect | BigData & NoSQL Expert | Author of 'The Definitive Guide to Data Integration' | Founder
Beitrag melden
In addition to partitioning other patterns can be utilized either in replacement or in coexistence such as Bucketing, and other patterns dedicated to some file format like ZOrdering in Delta.

Übersetzt

Gefällt mir
Esteban Maya Cadavid

Building the future of science @ Fastfold | PSF Fellow | Docker Captain | Open Source | Consultant | Entrepreneur | Investor | Speaker | Pycon Colombia & Rust Colombia Co-Organizer | Big data | Machine Learning
(bearbeitet)
Beitrag melden
In addition to partitioning data, the structure and type of technology you use also impacts performance and cost. For example, storing files in a parquet format allows you to have large volumes in a few megabytes, with the advantage that Athena also supports parquet. You can do a terabyte data recovery using parquet for a few cents or dollars.

Übersetzt

Gefällt mir
Awungjia T.

Lead Sr. Strategic Innovation Leader / Data Architect SME Air Force Research Lab Munitions Directorate (AFRL/RW)/ Weapon Engagement Simulation Technology for Advanced Research (WESTAR) (TS/SCI)
(bearbeitet)
Beitrag melden
Data partitioning has been a game-changer in optimizing query performance and reducing costs. In a data analytics project, we partitioned historical data by date, significantly speeding up query times in AWS Athena and reducing the data scanned. This approach is not just about organizing data; it's about understanding and anticipating the query patterns to optimize data retrieval.

Übersetzt

Gefällt mir
Vinoj Balasubramanian

|Expertise in Software engineering and IT Services,Strategy Planning, Key Account Management, Process Improvement, Business Development, Digital Transformation, P&L Management,ERP architecture and functional.
Beitrag melden
The best practice is to retrieve the most suitable data. The retrieval conditions have to be designed to transfer the best data at the best way.

Übersetzt

Gefällt mir
Ahmed Laaziz

Data Engineer
Beitrag melden
Partitioning data on S3 is a game-changer. Think of it as organizing a vast library into sections by date, topic, or region. When using tools like AWS Athena or Glue, partitioning by date, for instance, lets you query only the relevant segments, saving time and costs by skipping unnecessary data scans. It's like using a targeted spotlight, focusing resources precisely where needed for efficient analysis and substantial savings.

Übersetzt

Gefällt mir

Weitere Beiträge laden

3 Verwenden von Präfixen und Zwischenspeicherung

Präfixe sind der erste Teil des Objektschlüssels, der Ihre Daten in S3 identifiziert. Wenn Ihr Objektschlüssel beispielsweise sales/2020/01/january.csv lautet, lautet das Präfix sales/2020/01/. Präfixe können sich auf Ihre S3-Leistung und -Kosten auswirken, indem sie beeinflussen, wie Ihre Daten auf mehrere Server verteilt und abgerufen werden. Um Ihre S3-Leistung und -Kosten zu optimieren, sollten Sie Präfixe verwenden, die gleichmäßig verteilt sind, und sequenzielle oder überlappende Präfixe vermeiden, die Hotspots oder Engpässe verursachen können. Sie sollten auch Caching-Techniken wie CloudFront oder S3 Transfer Acceleration verwenden, um die Latenz und die Bandbreitenkosten für den Zugriff auf Ihre Daten von verschiedenen Standorten aus zu reduzieren.

Fügen Sie Ihre Sichtweise hinzu

Kishore Kamarajugadda

AI & Data Engineering Thought Leader | Digital Transformation Advisor | TOGAF Certified Enterprise Architect | Accredited Coach | Technology Blogger
Beitrag melden
In my experience, using multiple prefixes in object names helped maximize performance, as AWS S3 scales automatically to high request rates. If transferring large amounts of data over large distances, enable Amazon S3 Transfer Acceleration. For caching, use Amazon CloudFront, which caches objects closer to the users, thereby reducing latency and increasing load speeds. Manage storage costs effectively by using lifecycle policies to move or expire objects between storage classes as needed. For cost-efficiency, use S3 Intelligent-Tiering, which monitors access patterns and shifts lesser-used objects to a cheaper storage tier. Pre-upload file compression can further reduce storage costs and improve transfer speed and response times.

Übersetzt

Gefällt mir
Gurpreet Singh

Cloud Pioneer of the Year & Top25 Exceptional Leaders Awardee | 5x AWS | 5x Azure | CISM | CTO & CISO | Cloud Strategy, Software Development, Information Security, AI/ML, Innovation & Data Solutions
Beitrag melden
Optimal use of prefixes and caching mechanisms can significantly impact S3's performance. Organizing data with well-thought-out prefixes avoids hotspots and ensures an even distribution across servers. This strategic distribution prevents bottlenecks and enhances retrieval efficiency. Additionally, leveraging caching, through services like CloudFront or S3 Transfer Acceleration, can dramatically reduce latency and bandwidth costs, especially for geographically dispersed access.

Übersetzt

Gefällt mir
Jaime Corona Flores

Director TI | CIO | CTO | SAP | BI, Analytics & Insights | Agile | IT Governance | BPMN | Data Driven | ITIL | AZURE | MS365 | Cybersecurity | IA | Focused on increasing profitability and productivity
Beitrag melden
Amazon S3 admite solicitudes paralelas, lo que permite escalar el rendimiento de S3 en función del clúster de cómputo, sin tener que realizar ninguna personalización en la aplicación. El rendimiento se escala en función del prefijo, por lo que puede utilizar tantos prefijos en paralelo como necesite para lograr el rendimiento que haga falta. No existen límites para el número de prefijos. El rendimiento de Amazon S3 admite al menos 3500 solicitudes por segundo para agregar datos y 5500 solicitudes por segundo para recuperarlos. Cada prefijo S3 admite estas tasas de solicitud, lo que simplifica la tarea de incrementar significativamente el rendimiento.

Übersetzt

Gefällt mir
Rafael Palma

Líder | Arquitetura | Cloud | Integração | Dados
Beitrag melden
### Using Prefixes in S3 1. Organization: Structure your data logically using prefixes, such as by date (`2024/01/04/datafile.txt`), file type (`images/profile.jpg`), user, or any other relevant category. 2. Performance Optimization: Be cautious with prefixes to avoid creating "hot spots" with many accesses focused on a single prefix. ### Caching to Improve Performance 1. Client-Side Caching: Set cache headers when uploading files to S3. Use HTTP headers like `Cache-Control` to control how browsers and proxies cache your files. 2. Amazon CloudFront: Use CloudFront, AWS's CDN (Content Delivery Network), in conjunction with S3. CloudFront caches files in locations closer to end-users, reducing latency and improving load performance.

Übersetzt

Gefällt mir
Francesco Golfieri

Data Driven Innovation
Beitrag melden
Efficient use of prefixes and caching in Amazon S3 enhances performance and cost-effectiveness. Prefixes organize data like folder paths, allowing hierarchical structuring and enabling parallel processing, thus boosting transaction rates. For high-throughput demands, creating multiple prefixes facilitates parallel read/write operations, scaling up request rates significantly. Caching with Amazon CloudFront, ElastiCache, or AWS Elemental MediaStore reduces latency and increases data transfer rates for frequently accessed data. CloudFront caches S3 data globally, ElastiCache uses in-memory storage for quick data retrieval, and MediaStore is optimized for media content, particularly useful for video workflows

Übersetzt

Gefällt mir

Weitere Beiträge laden

4 Überwachen Sie Ihre Nutzung

Die Überwachung Ihrer Nutzung ist für die Optimierung Ihrer S3-Leistung und -Kosten unerlässlich. Sie sollten Metriken wie Speichergröße, Anforderungsrate, Fehlerrate, Latenz und Durchsatz nachverfolgen und analysieren. Sie können Tools wie AWS CloudWatch, AWS CloudTrail oder AWS S3 Storage Lens verwenden, um Ihre S3-Nutzungsdaten zu erfassen und zu visualisieren. Sie können auch AWS Cost Explorer oder AWS Budgets verwenden, um Ihre S3-Ausgaben zu verwalten und zu optimieren. Durch die Überwachung Ihrer Nutzung können Sie Leistungs- oder Kostenprobleme identifizieren und beheben, z. B. nicht ausgelastete oder überteuerte Speicherklassen, ineffiziente oder teure Abfragen oder unerwartete oder übermäßige Anforderungen.

Fügen Sie Ihre Sichtweise hinzu

Francesco Golfieri

Data Driven Innovation
(bearbeitet)
Beitrag melden
To effectively monitor Amazon S3 storage: 0) Define Objectives: Like every other IT monitoring task, first you have to establish your goals and assign clear monitoring responsibilities. Without this, nothing else matters 😉 About S3 monitoring tools: 1) CloudWatch Alarms: Monitor metrics like storage size, request rate, error rate and throughput with anomalies notifications. You can customize CloudWatch dashboards for targeted monitoring as well 2) AWS CloudTrail: Track detailed actions taken in S3 for auditing and security purposes 3) Server Access Logging: Obtain records of bucket requests for security and access audits 4) AWS Trusted Advisor: Identifies security gaps and ensures best practices in S3 bucket configuration

Übersetzt

Gefällt mir
Philip Goldman

Sr. Solutions Architect, Technology Consulting @ EPAM | Data Analytics Consulting, Cloud, Data Governance | AWS & Azure Certified Solutions Architect – Prof.
Beitrag melden
To optimize AWS S3, regularly check your storage size, how often it's accessed, errors, speed, and data flow using tools like AWS CloudWatch and S3 Storage Lens. Keep an eye on your spending with AWS Cost Explorer and Budgets. This helps you spot and fix issues like using the wrong storage type or spending too much on data access.

Übersetzt

Gefällt mir
Prabhakaran Ravichandran

AWS Cloud Solutions Leader | 12x AWS Certified (AWStronaut) | Empowering Teams to Thrive in Cloud | Passionate Mentor
Beitrag melden
It is always almost helpful to monitor all cloud resources for atleast every week to optimize for cost by modifying configuration based on trend data.

Übersetzt

Gefällt mir
Gurpreet Singh

Cloud Pioneer of the Year & Top25 Exceptional Leaders Awardee | 5x AWS | 5x Azure | CISM | CTO & CISO | Cloud Strategy, Software Development, Information Security, AI/ML, Innovation & Data Solutions
Beitrag melden
Continuous monitoring is vital in optimizing S3 for both performance and cost. Utilizing tools like AWS CloudWatch or AWS S3 Storage Lens provides valuable insights into storage size, request rates, and error rates. Regular analysis of these metrics can uncover potential performance bottlenecks or cost inefficiencies. It's also essential to keep an eye on spending patterns using AWS Cost Explorer or AWS Budgets to ensure that your S3 usage aligns with your budgetary goals.

Übersetzt

Gefällt mir
Reimo Reisberg
Beitrag melden
To avoid unjustified costs for S3, it is important to specify lifecycle rule configuration to delete incomplete multipart uploads after the specified time period. In some cases, this can offer extreme savings.

Übersetzt

Gefällt mir

Weitere Beiträge laden

5 Hier erfahren Sie, was Sie sonst noch beachten sollten

Dies ist ein Bereich, in dem Beispiele, Geschichten oder Erkenntnisse geteilt werden können, die in keinen der vorherigen Abschnitte passen. Was möchten Sie noch hinzufügen?

Fügen Sie Ihre Sichtweise hinzu

Mehdi TAZI

Chief Technical Officer | Data & Cloud Architect | BigData & NoSQL Expert | Author of 'The Definitive Guide to Data Integration' | Founder
Beitrag melden
- Store file with compression, for example when using parquet file you can choose to compress data with algorithms such as gzip, snappy, etc... - Put object versioning only on necessary prefixes & Remove versions of objects that are no more necessary - Define data storage strategies like data depth, data retention depending on the datalake layer - Backup your unsed data using lifecycle strategies or transition to move it to glacier or deep archive.. - Store only a portion of data depending on the Datalake/LakeHouse layer, as instance in medaillon architecture Gold Layer, it possible to only store the depth of data required by the use case - Use lifecycle strategies

Übersetzt

Gefällt mir
Samal Anand Cheriyil

Amazon Web Services | Technology Leader | Data & AI | Enterprise Modernization | Cloud Architect | Mentor
Beitrag melden
Use lifecycle policies to manage storage. This uses combination of rules to transition objects to another storage class, eg. move objects from S3 Standard to S3 IA after 7 days and to Deep Glacier Archive after 30 days. Its is important to also understand the usage patterns to best implement this.

Übersetzt

Gefällt mir
Matias Undurraga Breitling

Enterprise Technologist @ AWS | Transformation, Strategic Tech Planning
Beitrag melden
Adopting columnar storage formats like Apache Parquet or ORC, which are optimized for Athena, ensures faster and more efficient queries. Utilizing compression algorithms such as Snappy or GZIP reduces S3 storage costs but also improves read times. Implementing data partitioning and using prefixes effectively. This strategy enables scaling each data segment independently, enhancing data retrieval. Integrating a Glue Data Catalog for efficient metadata storage, which streamlines Athena query performance. Understanding and implementing these best practices are crucial for handling large volumes of data efficiently.

Übersetzt

Gefällt mir
Varshini Sampath

Senior Software Engineer @ Yahoo | M.S. in Computer Science
Beitrag melden
Watch out for data retrieval and transfer costs, in addition to storage expenses. 1. Data retrieval costs: Retrieval costs vary based on the chosen tier class (Standard, Infrequent, Glacier, etc) and volume of requests (SELECT, GET, LIST, POST, COPY, PUT). AWS charges per operation, so opting for the right storage class and minimizing the number of API requests can cut down on charges. 2. Data transfer costs: Data transfer is free into S3 but incurs fees for outbound transfers exceeding 100GB/month. Requesting for accelerated data transfers raises costs even more, so limit unnecessary outbound transfers, consider caching content, and manage data efficiently.

Übersetzt

Gefällt mir
Lei Xu
Beitrag melden
Performance tuning is a combination of multiply strategies, one single approach does help for sure but very limited and in some situation, it may contribute negative impact. Partitioning and cache implemented should base on the statistics of monitoring. data loading and access should be planned well, additional metadata or change file can be implemented on data modification.

Übersetzt

Gefällt mir

Weitere Beiträge laden

Was sind die bewährten Methoden zur Optimierung der AWS S3-Speicherleistung und -kosten?

1

2

3

4

5

1 Wählen Sie die richtige Speicherklasse

2 Partitionieren Sie Ihre Daten

3 Verwenden von Präfixen und Zwischenspeicherung

4 Überwachen Sie Ihre Nutzung

5 Hier erfahren Sie, was Sie sonst noch beachten sollten

Datenarchitektur

Diesen Artikel bewerten

Vielen Dank für Ihr Feedback

Weitere Artikel zu Datenarchitektur

Relevantere Lektüre

Was sind die bewährten Methoden zur Optimierung der AWS S3-Speicherleistung und -kosten?

1

2

3

4

5

1 Wählen Sie die richtige Speicherklasse

2 Partitionieren Sie Ihre Daten

3 Verwenden von Präfixen und Zwischenspeicherung

4 Überwachen Sie Ihre Nutzung

5 Hier erfahren Sie, was Sie sonst noch beachten sollten

Datenarchitektur

Diesen Artikel bewerten

Vielen Dank für Ihr Feedback

Andere Kenntnisse ansehen