AWS re:Invent 2022 - Part Three
AWS re:Invent 2022

AWS re:Invent 2022 - Part Three

Introduction

For a decade - as long as Nassstar has been an AWS Partner Network (APN), the AWS global cloud community has come together at re:Invent to meet, get inspired, and rethink what's possible. The event is hosted in Las Vegas and is AWS's biggest, most comprehensive, and most vibrant event in cloud computing. Executive speakers advise how to transform your business with new products and innovations.

This article is part three of a series of blog posts covering this historical event, with insight and analysis by AWS Ambassador and AWS Technical Practical Lead, Jason Oliver

Please see other posts in this series:

Dr Swami Sivasubramanian Keynote

Presented on Wednesday 30th November, in Las Vegas and online.

Dr Swami Sivasubramanian, Vice President (VP) of Data and Machine Learning (ML) AWS, revealed the latest AWS innovations that can help transform data into meaningful insights and actions for organisations.

Innovation

Swami declared that data is the genius of modern invention that leads to new customer experiences as its final output.

Building a data strategy can feel like a daunting task. Amazon as an early pioneer recognised that data beats intuition, so it built its business on data.

By working with leaders across all industries of all sizes, AWS has discovered at least three core elements of a robust data strategy. First, you need a future-proof data foundation supported by core data services. Second, you need solutions that weave connective tissue across your entire organisation. And third, you need the right tools and education to help you to democratise your data.

Build Future-Proof Foundations

Swami shared that his definition of a future-proof foundation was using the right services to build a foundation in which you don't need to be heavily rearchitecting or incur technical debt as your needs evolve and the volumes and types of data change.

With a data strategy built for tomorrow, organisations can make decisions that are key to gaining a competitive edge. A future-proof data foundation should have the following four key elements:

  • Tools for every workload. It should have access to the right tools for all workloads and data so you can adapt to changing needs and opportunities. 
  • Performance at scale. It should keep up with the growing data volume by performing at extraordinarily high scale.
  • Removing heavy lifting. It should remove the undifferentiated heavy lifting for your IT and data team, so you spend less time managing and preparing your data and more time getting value from it.
  • Reliability and scaling. It should have the highest reliability and security to protect your data stores.

Building a future-proof data foundation
Building a future-proof data foundation

The VP stated that a one-size-fits-all approach does not work in the long run. 94% of AWS top 1,000 customers use more than 10 of its data and analytics services. This is why they can support your data journey with the most comprehensive set of data services out of any Cloud provider. 

He reviewed the current database, analytics and ML capabilities, and services, as depicted below.

Databases
Databases
Analytics
Analytics


Machine Learning
Machine Learning

Tools for every workload

By providing a comprehensive set of data services, AWS and its partners can meet its customers where they are on their journey, from where they store their data to the tools and programming languages they use to get the job done.

Building an end-to-end data strategy
Building an end-to-end data strategy

Swami continued to highlight the runaway success of Amazon Athena as a simple SQL interface serverless, interactive analytics service and how it has listened to the Apache Spark community with the following announcement.

  • Amazon Athena for Apache Spark. Generally available (GA), a service to allow customers to start running interactive analytics on Apache Spark in just under one second. It enables customers to spin up to Spark workloads up to 75% faster than other serverless Spark offerings. It is now possible to build a simplified application with a notebook interface in the Athena console or by using Athena APIs. It has tight integration with other AWS services, such as SageMaker and EMR, enabling you to query your data from various sources, chain these together and visualise the results.

Amazon Athena for Apache Spark
Amazon Athena for Apache Spark

  • Amazon Redshift zero ETL in integration with Apache Spark. Since Amazon Redshift zero ETL in integration with Apache Spark was announced during a previous keynote, it will not be covered again.

With these new capabilities, Swami declared that AWS is the best place to run Apache Spark in the Cloud. Customers can run Apache Spark on EMR, Glue, SageMaker and Redshift with the AWS-optimised runtime, up to 3x faster than open-source Spark.

Performance at scale

Your data foundation should perform at scale across your data warehouses, databases and data lakes. Industry-leading performance will be needed to handle inevitable growth spurts in your business. It will be required to quickly analyse and visualise data to manage costs without compromising capacity requirements. The AWS innovations have helped its customers at scale from day one. He reviewed how

  • Amazon Aurora. A service which can scale to 228TB per database instance at 1/10th the cost of other legacy enterprise databases.
  • Amazon DynamoDB. A service that processes over 100 million requests per second across trillions of API calls on Amazon Prime day this year.
  • With Amazon Redshift, tens of thousands of customers collectively process exabytes of data daily at up to five times better price performance than other Cloud data warehouses. Redshift also delivers up to seven times better price performance than high concurrency, low latency workloads like dashboards.
  • Amazon DocumentDB. The company's fully managed document database service can scale automatically to 64TB of data per cluster with no latency that serves millions of requests per second.

To improve its capabilities, Swami announced the following service.

  • Amazon DocumentDB Elastic Clusters. GA, a fully-managed solution for document workloads for virtually any size and scale. The service scales automatically to handle almost any number of reads and writes with petabytes of storage in just minutes - all with little to no impact on downtime or performance impact.

Amazon DocumentDB Elastic Clusters
Amazon DocumentDB Elastic Clusters

Removing heavy lifting

Swami conveyed that when customers are backed by tools that enable them to perform at scale, they can analyse their data and innovate faster, with less manual effort.

He highlighted that we are all looking for ways to tackle our customer pain points by reducing manual tasks through automation and ML. For example:  

  • Amazon DevOps Guru. Leverages ML to automatically detect and remediate database issues even before they impact customers. While also saving database administrators time and effort to debug the problems.
  • Amazon Simple Storage Service (Amazon S3) Intelligent-Tiering. Reduces ongoing maintenance by automatically placing infrequently accessed data into lower-cost storage classes and saving customers up to $750 million to date.
  • Amazon SageMaker. Where AWS is removing the heavy lifting associated with ML so that it accessible to many more developers

The VP took a closer look at SageMaker, a service to enable customers to build, train and deploy ML models for virtually any use case.

Many AWS customers are solving complex problems using SageMaker. Most of these models are built using structured data that is well-organised and quantitative. However, according to Gartner, 80% of new enterprise data is unstructured or semi-structured, including images and hand-written notes.

Preparing and labelling unstructured data for ML is complex and labour-intensive. For this type of data, AWS provide features like SakeMaker GroundTruth and SakeMaker GroundTruth Plus, which helps lower costs while simplifying data labelling. However, it was evident that some customers required certain types of data that were still too challenging to work with, such as geospatial data.

Geospatial data can be used for various use cases, from maximising harvest yield in agricultural farms to sustainable urban development to identifying a new location for an opening retail store. However, accessing high-quality geospatial data to train your ML models requires working with multiple data sources and vendors. And these data sets are massive and unstructured, which means time-consuming data preparation before you can even start writing a single line of code to build your ML models.

This is a complicated process and requires a steep learning curve for your data scientists. With this, he announced:

  • Amazon SageMaker Geospatial ML capabilities. In previews, Amazon SageMaker now supports new geospatial ML capabilities. SageMaker can now access geospatial data from different sources with just a few clicks. To help prepare your data, the purpose-built operations enable you to efficiently process and enrich these large data sets. It has built-in visualisation tools allowing you to analyse your data and explore model predictions on an interactive map with 3D accelerated graphics. Finally, SageMaker also provides built-in pre-trained neural nets to accelerate model building in many use cases. 

Amazon SageMaker Geospatial ML capabilities
Amazon SageMaker Geospatial ML capabilities

This service will help AWS customers unlock the potential of their geospatial data. And I can already see and am looking forward to exploring the potential within our customer base of blue lights services, connected car platforms, train operators, and rail networks.

Reliability and scaling

Swami said these types of innovation demonstrate the enormous impact that data can have on its customers and the world. Data is extremely powerful, and today it is critical to almost every aspect of your organisation. This means you must put suitable safeguards in place to protect it from costly disruptions and potential compromises.

AWS has a long history of building secure and reliable services to help you protect your data. Examples cited include:

  • Amazon S3. An object store built to store data with 11 9's durability means you can keep your data without worrying about backups or device failures.
  • AWS LakeFormation. A service to help build a secure data lake with fine-grain access control in days.
  • Core database services like DynamoDB, Aroura and RDS were architected with multi-Availability Zone (AZ) capabilities to ensure seamless failovers in the unlikely event that an AZ is disrupted, thereby protecting its customers' mission-critical applications.

The VP said that customers' analytics applications on Redshift are mission-critical. He announced the following:

  • Amazon Redshift Multi-AZ. In preview, this is a new multi-AZ configuration that delivers the highest levels of reliability. It enhances the availability of your analytics application with automated failover. It enables your data warehouse to operate on multiple AZs simultaneously, and process reads and writes without needing an underutilised standby sitting idle in a separate AZ. That way, you can maximise your return on investment with no application changes or other manual intervention required to maintain business continuity.

Amazon Redshift Multi-AZ
Amazon Redshift Multi-AZ

  • Trusted Language Extensions for PostgreSQL. GA, a new open-source project that allows developers to leverage Postgres extensions for SQL and Aurora securely. These extensions help customers safely leverage Postgres extensions to add the data functionality required without waiting for AWS certification. They also support prevalent programming languages like JavaScript, Perl, and PL/pgSQL. With this project, AWS customers can start innovating quickly without worrying about unintended security impacts on their core databases.

Trusted Language Extensions for PostgreSQL
Trusted Language Extensions for PostgreSQL

When database services are leveraged on AWS, you can rely on AWS to operate, manage and control the security of the Cloud, like the hardware, software and networking layers. With its shared responsibility model, its customers are responsible for managing their data security in the Cloud. Including privacy controls for data, who has access to it how it is encrypted.

While this model eases a significant proportion of the security burden for AWS customers, it can still be challenging to monitor and protect against these evolving security threats. With this, he announced the following service:

  • AWS GuardDuty RDS Protection. In preview and built for Amazon Aurora, which quickly provides intelligent threat detection, it leverages ML to identify threats like access attacks for data stored in Amazon Aurora. It also delivers detailed security findings to locate swiftly where the event occurred and what type of activity occurred. And all this information is consolidated at an enterprise level. 

AWS GuardDuty RDS Protection
AWS GuardDuty RDS Protection

Weave Connective Tissue

Now that elements of data future-proof foundations have been explored, Swami continued into how to connect the dots across your data stores.

The ability to connect your data is as instrumental as the foundation that it supports. For the second element of a robust data strategy, you will need a set of solutions that help you weave connective tissue across your organisation, from automated data pathways to data governance tooling. This connective tissue should integrate your data and your organisation's departments, teams, and individuals.

Strong & Adaptive

When customers want to connect their structured and unstructured data for analytics or ML, they typically use a data lake such as Amazon S3 or AWS Lake Formation, AWS Glue, its data integration service.

This can help you gather rich insights, but only if you have high-quality data. Without it, your data lake can quickly become a data swamp!

To closely monitor the quality of your data, you need to set up quality rules. Customers told AWS that building these data quality rules across data lakes and their data pipelines was time-consuming and error-prone with trial and error. It can take days, if not weeks, for engineers to identify and implement them, and additional time must be invested for ongoing maintenance. They asked for a simple and automated way to manage their data quality. With this, he announced the following service.

  • AWS Glue Data Quality. In preview, a new feature of AWS Glue helps you build confidence in your data to make data-driven decisions routinely. Engineers can generate automated rules for specific data sets in just hours, not days, increasing the freshness and accuracy of your data. Rules can also be applied to data pipelines, so poor-quality data does not make it to your data lake. And if your data quality deteriorates, AWS Glue Data Quality alerts you so you can take action immediately.

AWS Glue Data Quality
AWS Glue Data Quality

With high-quality data, you can connect all the dots with precision and accuracy. But it is essential to ensure the right individuals within the organisation can access the correct data to collaborate and make these connections happen.

Governed by a system of Cooperation

The right governance strategy helps you move and innovate faster with well-defined guard rails that give the right people access to the data when and where they need it.

As the amount of data rapidly expands, customers want an end-to-end strategy that enables them to govern their data. They also want to make collaborating and sharing their data easier while maintaining quality and security.

The VP reviewed AWS Lake Formation with its row- and cell-level permissions to help protect data by giving users access to the data they need to perform their job. However, end-to-end governance continues beyond data lakes. You need to address access and privileges across more customers' use cases. Figuring out which data consumers in your organisation have access to what data can be time-consuming. With this, he announced the following service:

  • Centralised Access Controls for Redshift Data Sharing. In preview, a new feature to Amazon Redshift, centralised access controls that allow you to govern your Redshift data shares using the Lake Formation console. Using this console, you can designate user access without complex queries or manually identifying who has access to what specific data. With this new feature, you can easily manage access for data consumers across the entire organisation from one central console. This feature also improves data security by enabling admins to have granular row- and cell-level access within Lake Formation.

Centralised Access Controls for Redshift Data Sharing
Centralised Access Controls for Redshift Data Sharing

Centralised access controls are critical for helping users access siloed data sets in a governed way.

One of the vital elements of an end-to-end data strategy is ML, which is also essential for governance. Today more companies are adopting ML for their applications. However, governing this end-to-end process for ML presents a unique set of challenges particular to ML, like onboarding users and monitoring ML models. With this, he announced the following services.

  • Amazon SageMaker ML Governance. GA is a set of three new ML governance features for SageMaker, including Role Maker, Model Cards and Model Dashboards. These powerful governance capabilities will help customers responsibly build ML governance.

< sagemakergovernance >

Amazon SageMaker ML Governance
Amazon SageMaker ML Governance

  • Amazon DataZone. Since Amazon DataZone was announced during a previous keynote, it will not be covered again.

Governance can help the connective tissue by managing data sharing and collaboration across individuals across your organisation. But how do you weave a connective tissue within your data systems to mitigate data sprawl and derive meaningful insights?

Pathways to Vital Resources

Driving data connectivity for innovation and, ultimately, survival. Typically, connecting data across silos requires complex ETL pipelines. And every time you want to ask a different question of your data, or you want to build a new ML learning model. You need to create lots of other data pipelines. This level of manual integration needs to be faster to keep up with the dynamic nature of data and the speed at which you want your business to move.

Data integration needs to be more seamless. To make this easier, AWS is investing in a zero-ETL future where you never have to build a data pipeline again manually. With this, he announced the following service.

  • Amazon Redshift auto-copy from S3. In preview, a new feature for Amazon Redshift to support auto copying from Amazon S3 makes it easier to ingest data continuously. With this update, customers can easily create and maintain simple pipelines for continuous ingestion. Ingestion rules are automatically triggered when new files are uploaded to an S3 bucket without relying on custom solutions or third-party services. This integration also makes it easy for analysts to automate data loading without dependencies on your critical data engineer.

Amazon Redshift auto-copy from S3
Amazon Redshift auto-copy from S3

Amazon Redshift auto-copy from S3 would be an excellent solution for an online retailer ingesting terabytes of customer data from S3 to Redshift daily to quickly analyse how shoppers interact with the website and application and how they are making purchasing choices.

With its zero-ETL mission, AWS is tackling the problem of data sprawl by making it easier for customers to connect to data sources. But for this to work, you cannot have connections to some of your data sources; you need to connect seamlessly to all of them, whether they live in AWS or an external third-party application. That's why AWS is heavily investing in bringing your data sources together. For example, you can stream data in real time from more than 20 AWS services and third-party sources. Using Kinesis Data Firehouse is a fully managed serverless solution that enables customers to automatically stream data into Amazon S3, Redshift, OpenSearch, Splunk, Sumo Logic, and many more.

Democtratise Data

With a trained workforce to analyse, visualise and derive insights from data, customers can cast a wider net for your innovation. To accomplish this, you will need access to educated talent to fill the growing numbers of data and ML roles. It would help if you had development programs for your current employees. And no-code tools that enable non-technical employees to do more with your data.

Close

I am excited to hear more from the executive speakers to advise how to transform your business with new products and innovations in the coming days.

About Me

An accomplished AWS ambassador, technical practice lead, principal Cloud architect and builder with over 25 years of transformational IT experience working with organisations of all sizes and complexity.

An SME in AWS, Azure, and security wit strong domain knowledge in central government. Extensive knowledge of the Cloud, the Internet, and security technologies in addition to heterogeneous systems spanning Windows, Unix, virtualisation, application and systems management, networking, and automation.

I evangelise innovative technology, sustainability, best practices, concise operational processes, and quality documentation. 

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics