Multi-Model Databases vs Polyglot Persistent Databases

Multi-Model Databases vs Polyglot Persistent Databases

The below diagram was helpful for me to differentiate among decentralized, distributed and centralized.

No alt text provided for this image

Centralization and decentralization relate to the level of control.

In a centralized system, control is exerted by just one entity (a person or an enterprise, for example).

In a decentralized system, there is no single controlling entity. Instead, control is shared among several independent entities.

Distribution relates to differences of location.

In a non-distributed (or co-located) system, all the parts of the system are in the same physical location. In a distributed system, parts of the system exist in separate locations.

Centralized data system is where all the data in a single computer, location and to access the information you must access the main computer of the system, known as “server”.

A distributed data system has as a single logical data network, installed in a series of computers (nodes) located in different geographic locations and that are not connected to a single processing unit, but are fully connected between to provide integrity and accessibility to information from any point.

Centralized are the easiest to maintain but unstable, less secure, less scalable; this is not the case of the distributed ones, which in theory are more difficult to maintain, stable, more secure, have more network speed, and are more scalable.

A distributed, but centralized system is a cloud service provider offering a data storage service. Physically, the data could be shared and replicated on different machines according to resource availability and resiliency(distributed). However, wherever the machines and data storage facilities happen to be, the cloud service provider still controls them all (centralized).

A decentralized and distributed system can be Bitcoin Blockchain but that could be a discussion for some other time.

From the perspective of Security: In centralized, if someone has access to the server with the information, any data can be added, modified, and deleted. In distributed, all data is distributed between the nodes of the network. If something is added, edited, or deleted in any computer, it will be reflected in all the computers in the network. Therefore, the system is self-sufficient and self-regulating. The databases are protected against deliberate attacks or accidental changes of information.

From the perspective of Availability: In centralized, if there are several requests, the server can break down and no longer respond. In distributed, it can withstand significant pressure on the network. All the nodes in the network have the data. Then, the requests are distributed among the nodes. Therefore, the pressure does not fall on a computer, but on the entire network. In this case, the total availability of the network is much greater than in the centralized one.

From the perspective of Accessibility: In centralized, if the central storage has problems, you will not be able to obtain your information unless the problems are solved. Different users have different needs, but the processes are standardized and can be inconvenient for customers. In distributed, given that the number of computers in the distributed network is large, DDoS attacks are possible only in case their capacity is much greater than that of the network. But that would be a very expensive attack. In a centralized model, the response time is very similar in this case. Therefore, it can be considered that distributed networks are secure.

From the perspective of Data transfer rates: In centralized, if the nodes are located in different countries or continents, the connection with the server can become a problem. In distributed networks, the client can choose the node and work with all the required information.

From the perspective of Scalability: In centralized, networks are difficult to scale because the capacity of the server is limited, and the traffic cannot be infinite. All clients are connected to the server. Only the server stores all the data. Therefore, all requests to receive, change, add or delete data go through the main computer. But server resources are finite. As a result, he is able to carry out his work effectively only for the specific number of participants. If the number of clients is greater, the server load may exceed the limit during the peak time. Distributed models do not have this problem since the load is shared among several computers.

In the traditional monolithic approach system there’s a monolithic data store, like SQL server that contains a single database with many tables. This central database is used as an engine for all data persistence.

However, SQL databases are not consistent for diverse data stores nowadays. NoSQL database system helps to handle high rates of schemaless data capture and updates.

As data are intentionally denormalized and replicated across nodes in the cluster to achieve higher availability, NoSQL databases can only offer BASE (Basically Available, Scalable, Eventually consistent) characteristics instead of ACID. Therefore, scalability with NoSQL databases is more possible than SQL.

Traditional DBMS is about ACID properties. The “C” and “I” in ACID can be exchanged for “Availability” as a fundamental tradeoff with performance degradation in CAP.

The CAP theorem is proved using two network models: asynchronous and partially synchronous. Distributed database systems need to be transactional (atomic, linear), available, and fault-tolerant.

Given distributed data or systems, the choice mostly comes up when there is a network partition, meaning two nodes of the system can't communicate immediately with one another. At that point there is a partition, and you have to choose whether your system now can be Consistent or Available (but not both).

Therefore, in a semi-synchronous network, there will be inconsistency in read/write operations when message communication is lost. However, this semi-synchronous model can limit the time, i.e., how long inconsistency continues (increased latency). These read/write operations will become consistent after some time. PACELC reorders CAP into PAC and adds a statement about what happens when the network is present. That is, you may choose between increased latency and consistency (Else Latency or Consistency).

This conceptualizes the “eventually consistent” property of BASE. The use of commutative operations, which make it easy to restore consistency after a partition recovers from delay and allows more operations to be considered commutative and simplifies eventual consistency during a partition. Putting it simply, data will eventually become consistent, although they are currently inconsistent, displaying stale data in some partitions.

The CAP theorem introduced the BASE alternative instead of ACID properties in a distributed system, i.e., the “Basically Available”, “Soft-State”, and “Eventual Consistency” properties of NoSQL databases. BASE differ mainly from ACID properties by assuming that the system might be in different states.

Even so, these different states will eventually be consistent after a point in time. Eventual Consistency is enough for most applications, except those that need high availability and low latency, i.e., applications that need fast reads and writes.

CAP theorem with ACID and BASE characteristics are deeply interrelated; orienting consistency and availability respectively.

Most NoSQL databases can fulfill either CP or AP using different consistency levels. SQL databases can be considered as having a CA property (without partition tolerance, as they do not provide it inherently).

The ACID property can be fully fulfilled if there is no partition in NoSQL data stores. If partitions are present, then improving availability is possible by allowing soft state and eventual consistency.

Different databases have different levels of consistency, especially those in NoSQL. Eventual consistency generally means that all data items in all replicas become consistent if there is no new update.

A data store is considered to have strong eventual consistency if two replicas of a data item that applied the same set of updates are in the same state.

ACID ensures that the transactions in the consistent states guarantee integrity.

RDBMS usually works with some form of strong consistency, such as linearization (illusion of a single copy) or serialization (transactions work in the form of a serial order), or both.

NoSQL databases try to provide at least eventual consistency.

In an organization each domain(sales, marketing, finance) might have to work with different kind of data stores. If the data architecture is designed keeping DDD(Domain Driven Design) principles then it creates distributed data management challenges. There are two key distributed data management patterns that solve these problems:

  • Saga pattern - implements transactions that span across various services
  • CQRS pattern - implements queries that span across various services

One of the side effects of decentralized data management is the need to handle eventual consistency. In a centralized data store developers can use transactional capabilities to ensure that data is in a consistent state between multiple tables. However, this is not the case when data is separated into different logical or physical databases.

One significant advantage of decentralized data management is the ability to take advantage of polyglot persistence. Different types of data have different storage requirements. Utilizing decentralized data management principles gives advantage of polyglot persistence and store the different types of data in different databases that serve the needs of that particular data type.

In polyglot persistence the idea is that you use specialized data stores for different purposes and they you stitch them together when you're building an application

In the multi-model database camp, you have a universal store and use it directly when you build applications

Polyglot persistence means the usage of multiple data storage technologies within the same application, where each technology can be used for addressing different requirements. For example, there are:

  • Key-value databases – usually adopted when fast read and writes are required.
  • RDBMSs – used when transactions are strictly necessary and data structures are fixed.
  • Document-based databases – used when dealing with high loads and flexible data structures.
  • Graph databases – used when rapid navigation among links is necessary.
  • Column databases – used when large-scale analytics are needed. 

The right solution when the system is quite complex with many different data to be synchronized would be to use an Event-Based Backbone

Among the different concepts of distributed systems, the CAP theorem (Consistency, Availability, and Partition Tolerant) points out the prominent use of the eventual consistency property in distributed systems.

This has prompted the need for other, different types of databases beyond SQL (Structured Query Language) that have properties of scalability and availability. NoSQL (Not-Only SQL) databases, mostly with the BASE (Basically Available, Soft State, and Eventual consistency)

The traditional approach is to use polyglot persistence as database pattern. The polyglot approach involves choosing the best-of-breed datastore for each type of data the organization requires to store.

There are significant risks in the polyglot undertaking, however. While there’s best option for each data type, integrating those data stores to work seamlessly is on the company’s technical team. This team will not only have to integrate these separate technologies but will also have to maintain expertise in each technology to support future development. Additionally, the company will bear the burden of paying multiple licensing and maintenance costs, one for each product.

Because of these key drawbacks, many companies are now offering the ability to store multiple types of data behind one storefront. Enter the Multimodel database.

A multimodel database is a unified database for different types of data – it contains and manages many different data types and models, all integrated by a single back end. Such databases can store tabular, relational, columnar, document, graph, key-value, and non-relational data and models in one place. The semantics for the database model (across all data stores) is described under a single underlying query language, making retrieving data across data stores much more straightforward than with polyglot persistence. This one-stop-shop makes interacting with an organization’s complex data much simpler.

Unlike with polyglot persistence, the company no longer has to pay multiple licensing costs or maintain a knowledge base in numerous technologies to upkeep their modern system. Additionally, some mutlimodel databases ensure ACID guarantees across all datastores, which is much harder to guarantee in individual databases with polyglot persistence.

There are a few drawbacks of multimodel databases. They are a relatively new technology, so there are a limited number of vendors with little history. This lack of historical experience can be a big red flag for large organizations trying to modernize. Some multimodel offerings may not include a type of datastore that an organization needs, thereby limiting the options for the organization. Additionally, because multimodel databases blend data, the optimization occurs at a higher level, which can be less performant in individual business areas.

So, which is right for the organization? Depends on the need.


Dharini Shah

Software Development Engineer @ Avalara

3y

Spot on!

Like
Reply
Jiri Fiala

Managing Partner & Founder @ IndigiLabs. AI Startups global Co-Founder.

3y

Love it Ayush!

Ryan Smits

Account Executive at The Worlds #1 CRM

3y

Really cool Ayush!

To view or add a comment, sign in

More articles by Ayush Pandey

Insights from the community

Others also viewed

Explore topics