Why Converged Databases Are Critical to Achieving Both Data and Developer Productivity
(This work was done in collaboration with my very thoughtful Oracle colleague Paul Sonderegger.)
Introduction
Enterprises must exploit their data capital more effectively to remain competitive. But as companies create more applications and analytics to digitize more processes and decision points, they seem to be faced with a difficult choice: optimize either for fast application development now or easier extraction of value from data later. In other words, developer or data productivity.
In the first option, developers spin up single-purpose databases for specific projects, but fragment the data across different data services, each with its own tooling, security methods, and operational characteristics. This risks data inconsistencies, security gaps, and increased difficulty in using that data in critical reporting and analytical work.
In the second option, developers build on a standardized, monolithic database supporting only a relational data model. The relational database enforces corporate policies and simplifies anticipated data reuse, but may slow or even prevent innovation. This risks putting the entire business at a disadvantage to the competition. Neither one of these choices is acceptable. How did businesses get stuck with this dilemma? More importantly, how do they escape it?
Internet consumer companies focused on flat architectures supporting millions or hundreds of millions of users. Enterprise SaaS companies focused on tens of thousands of companies, each with thousands of users. The former was more focused on the customer experience; the latter more on data integrity and accuracy. Is it possible to bridge the gap between the two.
Enter the Oracle Converged Database.
Multiple Data Models in One Database
Relational databases originally supported only numbers, dates and character strings. But many web applications leverage JSON and REST, so Oracle built support for REST access to JSON documents directly in the database via an API that doesn’t require knowledge of SQL or the database. However, unlike JSON-specific NoSQL databases and cloud services, Oracle can generate a schema and indices when they’re useful. It can also perform parallel SQL analytics, supports transactions, and can easily join JSON data with spatial, graph, and relational data when building data-driven applications, even at consumer Internet scale. In fact, one of the largest consumer electronics companies in the world is using Oracle’s JSON features to support world-wide point-of-sale transactions.
Oracle’s JSON implementation shows that giving up one form of productivity for another — data versus developer — is a false choice: you can make your developers and your data more productive at the same time. You can give developers the features they need in convenient API’s that rest on the Oracle converged database foundation. Moreover, a converged database empowers developers and data analysts to make their JSON data more productive in many ways:
· Easily use powerful SQL JOINs to integrate document data with relational, spatial, and graph data
· Generate useful reports using standard SQL queries and your favorite analytic tools like Tableau
· Empower more flexible queries and extract more useful information from the data
· At Internet scale, leverage ACID transactions on documents while performing real-time analytics
· Scale up performance and availability to global scale using the Oracle Cloud
· Develop powerful machine learning models using JSON directly in the database, without having to transfer data to a separate machine learning cloud service.
And it’s not just JSON data. Multi-model converged databases can support XML, text, relational, spatial, graph, and blockchain data with full joins, transactions, and other critical SQL features enterprises rely on.
But developers wanted to do more with this data, so the consumer Internet companies built specific services to implement machine learning, IoT, spatial processing, graph, and other workloads. In essence, they moved the data to the algorithms in separate cloud services. The APIs seemed simpler, but fragmenting the data across these services created a lot of integration work and data copying. It added complexity, made data more vulnerable to hackers (because copying data is inherent to the workflow), and made it difficult for analysts to independently extract value or insights from the data without understanding and writing a lot of complex code.
Multiple Workloads: Move Algorithms to the Data, not Data to the Algorithms
In contrast, converged databases are multi-workload: the algorithms are moved into the database, where the data is. Traditional SQL analytics plus advanced processing for specific data types (spatial, graph, JSON, text) keeps the data centralized and protected, without the security risks, integration and copying overheads inherent in moving data between services. Historically relational databases were limited, scarce resources that ran the business under the tight control of the DBA team. Is it reasonable to move these demanding, complex new workloads onto this critical, closely-managed resource?
The answer is yes because innovations in the Oracle database (and other SQL databases) over the last 15 years greatly increased performance and software agility while simplifying management. In the past, databases would typically be dedicated to a single task, either data warehousing or OLTP, because there wasn’t enough performance headroom to mix workloads. In addition, databases were a scarce resource because the dedicated hardware estates they ran on ran the business. Automation was still nascent, so managing and optimizing mission-critical databases was a people-intensive, manual task; because small errors in operations could heavily impact business results, enterprises used strict policies to manage changes to the database. This frustrated developers who wanted more flexibility to innovate. But just like Internet consumer cloud services, Oracle’s database evolved and innovated at a tremendous rate to meet customer requirements.
Over the last 15 years, three critical innovations in particular drove a transformation to extreme performance, software agility, and simplified management. Exadata, clustering, and multitenant were developed and, concurrent with the huge increases in processor, memory, storage and networking capacity, they completely changed the landscape for deploying applications on Oracle databases.
First, the Oracle Exadata platform (first introduced in 2008) directly integrated the database software with the hardware to create a platform that could uniquely leverage technology improvements in mass market hardware. Since its inception, Exadata and other Oracle optimizations like native sharding has accelerated SQL performance by a factor of 10 million. The latest generation Exadata alone, introduced late in 2019, increased performance by an order-of-magnitude over the previous generation when it integrated Intel’s Optane persistent memory. A 10x improvement in a 12-month period is almost unheard of in technology: it happened because the co-engineered Exadata design could uniquely leverage persistent memory via the marriage of software and hardware innovation.
Exadata (and the Oracle database in general, caches data at all levels: in front of the database (via Times Ten or other wireline caching protocols like memcached), classically using SGA for shared main memory, and then in Exadata with another half dozen custom caching and compression mechanisms at the block level using DRAM, FLASH, or persistent memory. Combine this with custom network protocols optimized for database access and the ability to cache data and perform queries at the storage layer, avoiding server-to-network round trips, and this platform provides trememdous performance for all kinds of workloads. Internet consumer clouds are not designed to exploit storage-level optimizations, and tend to focus on caching common requests at the network layer.
Second, clustering was introduced at the processor level (with Real Application Clusters), across servers, and across database shards within and across data centers (via Oracle native sharding), allowing the Oracle Database to achieve extremely high scalability, to the level required by today’s Internet consumer applications and beyond. There are nation-states that manage all the people and materials crossing their borders, in real-time, using this technology. Clustering and Exadata technologies contributed greatly to achieving very high availability while simultaneously reducing management complexity. Multiple Exadata’s can be clustered together with very high speed networks to create a single database machine to handle almost any workload.
Virtual Databases: Create and Use New Databases Easily
We’ve discussed the multi-model and multi-workload features of the Oracle database. The third key feature of a converged database, multi-tenant, was introduced by Oracle in 2012. It allowed a single physical database (the container database) to support multiple pluggable databases.
It greatly simplified delivering enterprise SaaS applications because business logic could be embedded in a master database that could then be cloned per enterprise. Instead of writing a single complex application managing millions of users across thousands of companies, multitenant massively simplified development by allowing a single code base to be optimized to support each company’s private copy of the database, while leveraging the database to solve hard application problems like security. The database enforces tenant security and removes the need for risky coding of tenant separation in every application. It also enables tenants to use standard tools like analytics. Pluggable Databases improve agility because they can be physically combined to simplify deployment, or separated to improve isolation and scalability.
Multitenant enables better automation, so that a small number of cloud database infrastructure specialists working at these enterprise SaaS vendors and other large enterprise companies were enabled to manage thousands of Oracle databases. This approach has been so popular that 19 of the top 20 enterprise SaaS vendors leverage it in their own clouds.
Microservices have become a popular development framework for building scalable web applications. Multitenant helps enable powerful microservice deployments. Microservice applications leverage a loosely coupled service-oriented architecture with bounded context per service. They can be written in different languages and often leverage databases supporting different data types. This means they support both polyglot programming and polyglot persistence.
A converged database that supports multiple data models is inherently useful in a microservices architecture. The Oracle Database makes it easier to develop and deploy microservices, by “containerizing” databases with multitenant. Each microservice database is isolated, but you can still secure and manage many as one, and use the exact data type or workload a particular microservice requires.
Powerful Synergies
A good analogy for a converged database is a smartphone. Consider how smartphones integrated phone calls, messaging, a camera, calendar, music and other features into a single product when each originally required separate products. Now, these point products are features of smartphones.
With smartphones, synergy across features makes the whole better than the sum of parts, and each feature is better than the standalone original because of the tight integration and the new workflows this integration makes possible.
For example, the camera in your smartphone integrates with applications, it automates the photo storage and backup process, allows pictures to be sent in emails and texts and easily posted on popular social media sites like Instagram, and can even provide automated editing and color correction in real time. The calendar is continuously updated since it uses the phone’s internet connectivity to sync with the cloud. The music app can stream music continuously from an extensive music library in the cloud. Each of these separate features is more capable and powerful compared to its standalone, single-purpose counterparts.
The same ease of use, convenience and synergy you get from a smartphone also holds for a converged database. A converged database makes it much simpler to develop applications because standard SQL can be used to run very sophisticated machine learning, spatial, and graph algorithms instead of implementing these in separate databases and APIs. Instead of writing complex messaging and event code to weave data together, you can use standard SQL functionality like JOINs.
These synergies are already leading to groundbreaking new data-driven applications. For example, the SailGP racing league built an application on the Oracle Database that lets spectators and racers track the race, and then analyze the results for boat performance tuning, and race analysis. It includes spatial mapping of the boats on the racecourse and graph algorithms to predict the optimal racecourse based on winds. Thanks to extreme IoT performance, data from dozens of sensors on each boat is processed in real-time. Machine learning is applied to the sensor data to optimize performance and tactics before, during and after a race.
Conclusion
So a converged database let’s you build powerful applications quickly and efficiently while still allowing data reuse by external analysts. But why should this matter to you?
Here’s what you can achieve with Oracle’s converged database:
- Achieve high developer and data productivity simultaneously
- You can achieve simplified application development while meeting enterprise requirements at cloud scale
- Build both microservices and monoliths, or hybrid systems like Citadels
- Rather than managing systems, focus on building great applications and insightful data analytics
- A converged database allows deployment anywhere: on-prem, in the public cloud, or in the public cloud on-prem
- You can use the same data, in many different ways, without writing any code
- Concentrating your data in a converged database increases security by reducing overall attack surfaces and the risks introduced by uncontrolled data copying
- Deployment anywhere can simplify meeting compliance and data governance requirements no matter your locality
Technology Leader| Multi-Cloud Solutions Architect | AI Practitioner | Product Management | Pursuing PhD (Gen AI Doctoral Program@Golden Gate University)
4yThis is the the Master piece, Never before ever after..!!!