AWS re:Invent 2022 - Part Three
Introduction
For a decade - as long as Nassstar has been an AWS Partner Network (APN), the AWS global cloud community has come together at re:Invent to meet, get inspired, and rethink what's possible. The event is hosted in Las Vegas and is AWS's biggest, most comprehensive, and most vibrant event in cloud computing. Executive speakers advise how to transform your business with new products and innovations.
This article is part three of a series of blog posts covering this historical event, with insight and analysis by AWS Ambassador and AWS Technical Practical Lead, Jason Oliver.
Please see other posts in this series:
Dr Swami Sivasubramanian Keynote
Presented on Wednesday 30th November, in Las Vegas and online.
Dr Swami Sivasubramanian, Vice President (VP) of Data and Machine Learning (ML) AWS, revealed the latest AWS innovations that can help transform data into meaningful insights and actions for organisations.
Innovation
Swami declared that data is the genius of modern invention that leads to new customer experiences as its final output.
Building a data strategy can feel like a daunting task. Amazon as an early pioneer recognised that data beats intuition, so it built its business on data.
By working with leaders across all industries of all sizes, AWS has discovered at least three core elements of a robust data strategy. First, you need a future-proof data foundation supported by core data services. Second, you need solutions that weave connective tissue across your entire organisation. And third, you need the right tools and education to help you to democratise your data.
Build Future-Proof Foundations
Swami shared that his definition of a future-proof foundation was using the right services to build a foundation in which you don't need to be heavily rearchitecting or incur technical debt as your needs evolve and the volumes and types of data change.
With a data strategy built for tomorrow, organisations can make decisions that are key to gaining a competitive edge. A future-proof data foundation should have the following four key elements:
The VP stated that a one-size-fits-all approach does not work in the long run. 94% of AWS top 1,000 customers use more than 10 of its data and analytics services. This is why they can support your data journey with the most comprehensive set of data services out of any Cloud provider.
He reviewed the current database, analytics and ML capabilities, and services, as depicted below.
Tools for every workload
By providing a comprehensive set of data services, AWS and its partners can meet its customers where they are on their journey, from where they store their data to the tools and programming languages they use to get the job done.
Swami continued to highlight the runaway success of Amazon Athena as a simple SQL interface serverless, interactive analytics service and how it has listened to the Apache Spark community with the following announcement.
With these new capabilities, Swami declared that AWS is the best place to run Apache Spark in the Cloud. Customers can run Apache Spark on EMR, Glue, SageMaker and Redshift with the AWS-optimised runtime, up to 3x faster than open-source Spark.
Performance at scale
Your data foundation should perform at scale across your data warehouses, databases and data lakes. Industry-leading performance will be needed to handle inevitable growth spurts in your business. It will be required to quickly analyse and visualise data to manage costs without compromising capacity requirements. The AWS innovations have helped its customers at scale from day one. He reviewed how
To improve its capabilities, Swami announced the following service.
Removing heavy lifting
Swami conveyed that when customers are backed by tools that enable them to perform at scale, they can analyse their data and innovate faster, with less manual effort.
He highlighted that we are all looking for ways to tackle our customer pain points by reducing manual tasks through automation and ML. For example:
The VP took a closer look at SageMaker, a service to enable customers to build, train and deploy ML models for virtually any use case.
Many AWS customers are solving complex problems using SageMaker. Most of these models are built using structured data that is well-organised and quantitative. However, according to Gartner, 80% of new enterprise data is unstructured or semi-structured, including images and hand-written notes.
Preparing and labelling unstructured data for ML is complex and labour-intensive. For this type of data, AWS provide features like SakeMaker GroundTruth and SakeMaker GroundTruth Plus, which helps lower costs while simplifying data labelling. However, it was evident that some customers required certain types of data that were still too challenging to work with, such as geospatial data.
Geospatial data can be used for various use cases, from maximising harvest yield in agricultural farms to sustainable urban development to identifying a new location for an opening retail store. However, accessing high-quality geospatial data to train your ML models requires working with multiple data sources and vendors. And these data sets are massive and unstructured, which means time-consuming data preparation before you can even start writing a single line of code to build your ML models.
This is a complicated process and requires a steep learning curve for your data scientists. With this, he announced:
This service will help AWS customers unlock the potential of their geospatial data. And I can already see and am looking forward to exploring the potential within our customer base of blue lights services, connected car platforms, train operators, and rail networks.
Recommended by LinkedIn
Reliability and scaling
Swami said these types of innovation demonstrate the enormous impact that data can have on its customers and the world. Data is extremely powerful, and today it is critical to almost every aspect of your organisation. This means you must put suitable safeguards in place to protect it from costly disruptions and potential compromises.
AWS has a long history of building secure and reliable services to help you protect your data. Examples cited include:
The VP said that customers' analytics applications on Redshift are mission-critical. He announced the following:
When database services are leveraged on AWS, you can rely on AWS to operate, manage and control the security of the Cloud, like the hardware, software and networking layers. With its shared responsibility model, its customers are responsible for managing their data security in the Cloud. Including privacy controls for data, who has access to it how it is encrypted.
While this model eases a significant proportion of the security burden for AWS customers, it can still be challenging to monitor and protect against these evolving security threats. With this, he announced the following service:
Weave Connective Tissue
Now that elements of data future-proof foundations have been explored, Swami continued into how to connect the dots across your data stores.
The ability to connect your data is as instrumental as the foundation that it supports. For the second element of a robust data strategy, you will need a set of solutions that help you weave connective tissue across your organisation, from automated data pathways to data governance tooling. This connective tissue should integrate your data and your organisation's departments, teams, and individuals.
Strong & Adaptive
When customers want to connect their structured and unstructured data for analytics or ML, they typically use a data lake such as Amazon S3 or AWS Lake Formation, AWS Glue, its data integration service.
This can help you gather rich insights, but only if you have high-quality data. Without it, your data lake can quickly become a data swamp!
To closely monitor the quality of your data, you need to set up quality rules. Customers told AWS that building these data quality rules across data lakes and their data pipelines was time-consuming and error-prone with trial and error. It can take days, if not weeks, for engineers to identify and implement them, and additional time must be invested for ongoing maintenance. They asked for a simple and automated way to manage their data quality. With this, he announced the following service.
With high-quality data, you can connect all the dots with precision and accuracy. But it is essential to ensure the right individuals within the organisation can access the correct data to collaborate and make these connections happen.
Governed by a system of Cooperation
The right governance strategy helps you move and innovate faster with well-defined guard rails that give the right people access to the data when and where they need it.
As the amount of data rapidly expands, customers want an end-to-end strategy that enables them to govern their data. They also want to make collaborating and sharing their data easier while maintaining quality and security.
The VP reviewed AWS Lake Formation with its row- and cell-level permissions to help protect data by giving users access to the data they need to perform their job. However, end-to-end governance continues beyond data lakes. You need to address access and privileges across more customers' use cases. Figuring out which data consumers in your organisation have access to what data can be time-consuming. With this, he announced the following service:
Centralised access controls are critical for helping users access siloed data sets in a governed way.
One of the vital elements of an end-to-end data strategy is ML, which is also essential for governance. Today more companies are adopting ML for their applications. However, governing this end-to-end process for ML presents a unique set of challenges particular to ML, like onboarding users and monitoring ML models. With this, he announced the following services.
< sagemakergovernance >
Governance can help the connective tissue by managing data sharing and collaboration across individuals across your organisation. But how do you weave a connective tissue within your data systems to mitigate data sprawl and derive meaningful insights?
Pathways to Vital Resources
Driving data connectivity for innovation and, ultimately, survival. Typically, connecting data across silos requires complex ETL pipelines. And every time you want to ask a different question of your data, or you want to build a new ML learning model. You need to create lots of other data pipelines. This level of manual integration needs to be faster to keep up with the dynamic nature of data and the speed at which you want your business to move.
Data integration needs to be more seamless. To make this easier, AWS is investing in a zero-ETL future where you never have to build a data pipeline again manually. With this, he announced the following service.
Amazon Redshift auto-copy from S3 would be an excellent solution for an online retailer ingesting terabytes of customer data from S3 to Redshift daily to quickly analyse how shoppers interact with the website and application and how they are making purchasing choices.
With its zero-ETL mission, AWS is tackling the problem of data sprawl by making it easier for customers to connect to data sources. But for this to work, you cannot have connections to some of your data sources; you need to connect seamlessly to all of them, whether they live in AWS or an external third-party application. That's why AWS is heavily investing in bringing your data sources together. For example, you can stream data in real time from more than 20 AWS services and third-party sources. Using Kinesis Data Firehouse is a fully managed serverless solution that enables customers to automatically stream data into Amazon S3, Redshift, OpenSearch, Splunk, Sumo Logic, and many more.
Democtratise Data
With a trained workforce to analyse, visualise and derive insights from data, customers can cast a wider net for your innovation. To accomplish this, you will need access to educated talent to fill the growing numbers of data and ML roles. It would help if you had development programs for your current employees. And no-code tools that enable non-technical employees to do more with your data.
Close
I am excited to hear more from the executive speakers to advise how to transform your business with new products and innovations in the coming days.
About Me
An accomplished AWS ambassador, technical practice lead, principal Cloud architect and builder with over 25 years of transformational IT experience working with organisations of all sizes and complexity.
An SME in AWS, Azure, and security wit strong domain knowledge in central government. Extensive knowledge of the Cloud, the Internet, and security technologies in addition to heterogeneous systems spanning Windows, Unix, virtualisation, application and systems management, networking, and automation.
I evangelise innovative technology, sustainability, best practices, concise operational processes, and quality documentation.