Can we talk about ORM Crisis?

Can we talk about ORM Crisis?

Six years ago Martin Fowler wrote an article titled ORMHate.

If you work with databases - you should read it. I mean now.

Apparently, he went to Software development conference in London and it seems that every single talk he was attending was filled with "some snarky remarks about Object/Relational mapping (ORM) tools". In other words, conference was filled with ORM hate.

So he wrote that article in defense of ORM tools, basically admitting that they are "leaky abstraction" (at best), that systems using an ORM perform badly due the naive interaction with database, solutions with ORM aren't pretty - and you better really, I mean really understand how a relational database works - before tackling this problem which "isn't exactly cuddly" as Martin has stated.

Last year I went to very similar Software Development Conference again in London. ORM tools were hardly event mentioned, as you may guess, it was all about buzzword of the day - in this case - Microservices.

And when ORM tools were indeed mentioned - on two different occasions, by two prominent Software Development Consultants and speakers - which I won't event mention here - ORM tools were labeled as wonderful tools of abstraction and we were strongly encourage to use them.

Really?

Did entire Software development world just suffered from collective amnesia? What exactly happened in those five years?

First of all, before state my opinions - I'd like say that I completely agree with Martin Fowler and many others:

There is absolutely nothing wrong in wanting to use same, consistent language in your entire system but you better be sure that you damn well know how your relational database really works before start using it intensively. Except that you probably have to ignore mountain of JavaScript behind your back, so don't look that way ;)

Well, that is true in life of course - if you going to use something intensively then you need to understand it deeply. And most of the systems, at least that I encounter these days - are database intensive. Very intensive. Or maybe it is just me.

Ok, so, what is the really issue here? Lets first define exactly what is ORM.

In their essence ORM tools are mapping tools that can map in two ways. Sort of bi-directional mappings:

1. From your database schema to your favorite programming language data structures

2. From your favorite favorite programming language expressions and instructions to database commands or queries

And yes, I know, under first point - nowadays developers usually start by declaring structures in their beloved language of their choice - and then, usually by migration mechanism - they create their desired schema (approach known to EF people as "code first"), so it may look like it is other way around.

But that wasn't always the case (just ask ActiveRecord) and that is not even the point. Point is that there are two types of mapping going around here and we can all agree on that.

So, problems related to ORM mappings can fall in those two categories:

1. Problems with mapping from your database schema to programming language data structures

Let's all admit that there are number of fundamental differences between those two systems:

One system lives on persistence mediums, hard disks usually, with virtually unlimited space. Other system lives in your randomly accessed memory or RAM. If you need to be explained fundamental difference between those two - then you better stop reading this article and go back to basics.

Difference in motivation behind schema models. Relational database schema is motivated essentially just to store your data and to be either efficient and fast or to simplify querying as much as possible, or both.

Memory data structures are modeled to solve your specific problem with computers.

That is fundamental difference in motivation behind those structures. And whenever you have difference in motivation behind something, to paraphrase Uncle Bob - inevitably you're going to have two different axis of change. Meaning, they will change for different reasons in different time. Not always, of course, but inevitably they will.

One has columns, rows, identities, relations, indexes, etc, etc - the other have constructors, methods, properties, encapsulation, composition, inheritance, instances, etc, etc.

They are clearly different type of animal.

So, how on earth can someone think that those two can be same thing? And threat them as if they are the same when they are not. Because that is essentially the basic idea behind ORM tools - we have two of the same, so let's map them one to another.

Of course they are not the same, and they never will be same, and they should not be the same. One is your storage, the other is your memory. But that is not some exactly breaking news, that is actually an old problem in computer science, known as Object-relational impedance mismatch

Object-relational impedance mismatch is best described with following illustration in Martin Fowlers article ORMHate:

So, when you start mapping your relational database schema directly to your classes in your beloved programming language one of two things will happen: 1) you'll have to change and adapt your database schema to your memory model - and that is clearly out of the question, no doubt about it - or - 2) you'll have to adapt your memory model to your database schema.

And since 1) is out of the question, second approach is what inevitably happens and that means - ignore, avoid and break every possible rule of even fairly decent object-oriented design and modeling - that ever existed. Ever.

Well, maybe just most them, if not all.

From wonderful abstraction

Now, like that isn't horrible enough, it gets even "better". When Fowler stated that ORM tools are "leaky abstraction" I'm afraid he was being somewhat generous. Consider this - you map your table to your class. Fine. Now you change something, anything really, in your class - what happens? You must update your database schema, generate migration script and so on, and so on. Now, change something in your table. Well, you'll need to update your class too.

So, why is that?

Maybe because ORM is not abstraction at all.

It is a mapping obviously, as it name clearly implies. If it is abstraction, why isn't called Object-Relational Abstraction or ORA, instead of Object-Relational Mapping? Again, I repeat - because it is not abstraction at all.

See, clue is the name itself.

The proof is in the pudding. And, why, oh why, so many competent and smart developers think of it as abstraction or even as "wonderful abstraction" as it was labeled on prestigious development conference - at all? That is beyond me, but, however, that kind of thinking comes with heavy price to pay.

To pathological coupling

Price is that usually (if not always) systems ends up being tightly (or pathologically) coupled to your database schema - everywhere from top to bottom. And that is enormous tech debt that is usually never paid and it will never be paid.

Even the very idea of slightly changing database schema to gain some performance benefits for example - could mean changes in hundreds and hundreds of views, reports, modules, hundreds and hundreds of hours of very expensive work. Work that could be avoided in the first place by using database storage abstraction properly. That can leave any manager or developer shaking in fear.

So, what do people usually do?

Well, it is just best to ignore that inconvenient fact. Let's just think of it as "wonderful abstraction" and never mention it again. We'll hire some consultants if it gets too hot. Which one, the ones that told you all about "wonderful abstraction"? Well...

But wait, it can get even "better". Schema is not only tight coupling thingy in your system, thanks to this "wonderful abstraction" of yours.

Your system is now tightly coupled to your ORM tool too. Isn't that wonderful?

I remember project from quite long time ago where reasoning behind strict usage of ORM without any exception - is that we can now switch database engines whenever we want. Oracle for MSSQL for example, or whatever. (ANSI SQL anyone, ugh never-mind).

Now that, of course, was never even under slightest consideration, but what happened in the end was that our ORM tool of the day that we were using - was stopped being developed and lost all of the support. Of course, changing ORM tool would mean basically complete rewrite of entire system, since it was pathologically coupled from top to bottom. No way around it.

Well, to be perfectly honest, that happened not once in my career, but twice, and again, and again. It is usually that ORM itself is inadequate and it badly lacks important features. And we can't afford to remove it. It seems already like pattern here. Once you decide to use your ORM as tool of abstraction - you are married to it, no way out. That's it for your system.

I don't want to sound too pessimistic and dark here, and to be "that" person that complains but never proposes any solution - there are proven methods and patterns that can help you mitigate those issues and I'm, now, going to provide some way out for the people that are interested in solutions...

Solution

First of all, I just want to state that for smaller or even some medium system - database schema can indeed match 100% of your memory model. To be honest I haven't worked on small or even medium project in decades, but, the fact is that every project starts small at first. Maybe that is the problem.

So, anyway, if you wish to avoid those problems described above -you should go full DDD, no question about it. Domain-Driven-Development should be applied for even smallest projects, that is, if you want them to grow, and you do want that.

Just let your domain object-oriented model be exactly that - your object-oriented model of your domain that is your solution of problem that are trying to solve, whatever it is.

And at the same time, finally - let your database be your database. Doesn't matter that it have some things in it named exactly the same as within your domain, you can always map them easily, don't even try to pretend that they are the same, because they are not.

It doesn't even matter is your application more database intense or more domain intense. One way or the other - one layer will be larger and other smaller.

That way, you can achieve real database abstraction with clearly separated data access layer (or infrastructure layer as it is commonly known in modern DDD) - from the rest of your application. And if you wish to have ability to switch database providers at any time you want - surround your data access layer with database tests. So when you switch database providers, if all database tests are green - then you're good to go. No way around it, ORM won't help you with that. In-fact, it may just mislead you. SQL that your ORM produces doesn't mean that it will necessarily behave exactly the same on different database provider.

Not only that - now you can finally optimize inefficient queries anytime you want - without fear of breaking anything. And that also means, ironically - now that you have your proper database abstraction you can use ORM as much as you like, but I digress...

2. Problems with mapping from your favorite favorite programming language expressions and instructions to database commands or queries

All of the problems described so far are derived from one simple fact - that developers are using database objects and memory objects interchangeably, like it is the one of the same. Problems with other type of mapping (from your language expressions to database commands and queries) are no different. Except that they are worse. And when I say worse, I mean millions of dollars worse.

How on earth can somebody write piece of code that talks to database that can be rewritten with even less amount of code that is (real world example) - 750 times faster and still consider himself to be great expert and top-notch developer? But that is precisely kind of situation that you will end up when you are using large database as if it was your memory object.

That is type of situations that I'm dealing with on daily basis for years and years now, and it just getting worse it seems. Numbers I'm seeing are insane: ORM data access - 50 times slower, 250 times slower, 750 times slower, 1500 times slower, and so on, and on and on - it really depends on database size actually. Difference won't be obvious on three records.

Don't believe me, listen to Uber's principal engineer talking about it:

And when you start talking to people who do those kind of things - about their bad coding practices and how they are accumulating tech debt - you immediately come across these typical answers:

- "You want me to do early optimization. Don't you know that eager optimization is the enemy..." - No I don't, I want you to learn how relational database works and I want you stop writing horribly inefficient code. That is not even optimization at all - that is having basic competence in what you do.

- "ORM will take care of optimization for me." - No it won't , there is no such ORM. They'll do their best to interpret your code to SQL equivalent.

- "Database engine will take care of optimization for me." - No it won't. Database engine can optimize execution plan according to your query statistics - but it can't possibly know that you are using database improperly, it can't read your mind. In other words, database cannot possibly know that you shouldn't be sending thousands and thousands of queries instead of only one- only because you treat your database as you would you treat your object instance or any other memory object. Neither they can know that you don't actually need so many unnecessary scans.

- "DBA will take care of optimization for me." - Assuming that you have one, which is not always the case - no! - he or she won't do that. They can't read your mind either and they have no idea what are you trying to achieve. All they see is thousands and thousands of queries and they assume that you know what are you doing.

- "But SQL is harder to debug than ORM" - Well, actually, it is easier, since there is no need to inspect ORM what it is producing, you can see it directly.

- "Yes, yes, but I can't set break-point in SQL" - You don't have to. That is set-based language based on relational algebra, that you should totally learn and master, break-points are meaningless, it is different philosophy. Only way to debug relational code written in ORM is to see what SQL it produces and run it manually to see the results. That is something that you could write by yourself anyhow. That very fact that you think that you need break-point shows that you don't understand how it works...

- "Ok, but if I use SQL, I'll jeopardize my security by exposing code vulnerable to SQL injection attack " - No it wont - if you properly sanitize your inputs. Actually, most ORM tools have excellent helper methods and utils just for that, that you might want to use. Only thing that you really jeopardizing now, by trying to avoid SQL - is your valuable learning experience.

- "But that means that I need to pay extra attention to ensure that all inputs are sanitized properly and some of the developers in team might forget that and we are left vulnerable to those evil hackers ..." - Well hello, of course you need to know and pay attention what are you doing. Besides, if you are really really that concerned and serious about security - you can write stored procedures (or functions as they are called in PostgreSQL) - that will not only guarantee you that SQL injection can't happen ever, but they will also provide you with extra layer of security implemented in database itself, which vastly improves security and they will also vastly improve maintainability by adding extra layer of abstraction that can be altered in run-time, and they will also give you another performance boost as well , and, etc, etc ...

... and that is pretty much when cognitive dissonance kicks off. Muh ORM! Muh ORM!

And when finally, finally - ORM hits the fan - and infrastructure cannot take so much abuse (and I'm sorry, there is no other way of putting it if you write queries that are hundreds of times less efficient but abuse) - what is the solution of those experts:

Microservices! NoSQL! Elastic! Rabbit! Lucent! Kafka! Distributed computing! Graph database! Big Data! Big Big Data! Big data that is big data to other big data! Hadoops! Machine Learning! Artificial Intelligence! Nano-serverless autonomous services. Choose your buzzword! Any buzzword!!!

More complexity, more hardware, more infrastructure, more working hours, more money, much much much more money ... more everything! Well, can't argue with the expert, can you?

What have we done?

I know this - there are only three things that are certain in life: death, taxes and Dunning-Kruger effect - and I'm not even completely sure about first two.

In a span of 5-6 years industry has gone from what it seemed as total rejection to unquestionable acceptance.

How that happen? Did ORM improved dramatically over the years - or perhaps, we collectively dropped the balls and gave up? Or what? Because improvement is not what I'm seeing. I see endless number of systems built on relational databases, using ORM tools in extremely inefficient way that are now going to "microservices" and "no sql databases" and "big data", you name it.

Ironically, I am not against using ORM tools - if there is proper separation of concerns, proper data access layer, and competent developers who know what they are doing and really really understand how relational databases works, - I don't mind really.

ORM tools are delivering extremely fast results for developers not trained in database development, in particular relational database development, so, in very short span of time you can achieve amazing results with team not trained in database development.

That is great, helps companies to reduce costs, they don't have to train developers, it allows them to hit market earlier, gives them edge over competition and so on. And ultimately, it provides me with generous source of steady income, fixing all that mess.

But the problem is when fast results are there, there is also inevitable dopamine rush and cognitive bias towards self illusory superiority, mistakenly assessing their competences as greater than it is. Know in literature as Dunning Kruger effect which later gives tremendous resistance to any kind improvement. Well, expect "improvement" to microservices, of course, they are trendy now. It also guarantee not to improve database skills of the team.

Another irony is that time to produce results - would not even be anything slower if it is done right with competent database developer. But it would be much much cheaper later. Even without resistance.

So, my question is - can we address elephant in the room finally and start speaking of ORM Crisis?

I mean, the shear amount of intellectual energy and time of highly talented and smart people spent in finding ever more sophisticated and elaborate ways NOT TO USE programming language of choice for the relational databases is astounding. Even now, as I speak new ways and technical solutions are being developed. If we spent even fraction of that time in improving SQL skills we would be better off. Much much better off.

I'm afraid that we might just have raised an entire generation of developers with poor relational database skills. But with a lot buzzwords. ORM tools might just be tools of ignorance.

The only, only real value that I find in ORM tools is strong typing. Meaning: You can finally strongly type your entire database schema - if you are using ORM and strongly type language. So, when you schema changes - your build will fail also until you fix all of related queries and related data access.

Another sweet-bitter irony is that most of the systems now-days are not even strongly typed - Python, PHP, Node.JS, etc, etc...

Moving to front-end?

Anyway, sometimes it looks to me that things have got so bad lately on back-end side, so subjective to trends and latest fashion - that I've almost decided to move my development completely to front-end.

I know that people are giving up on front-end all together and moving to back-end because they can't keep up with all of those latest frameworks frenzy.

Well, I can't keep up with all of those new types of containers and exotic databases. Decade ago it was all about design patterns, now all of those lovely design patterns, are all anti-pattern now it seems. Strongly typed languages are even loosing to dynamic typed languages. Because why have another layer of tests upon your build (which is in essence your type checking system) - when you can and must by all costs manually cover everything with unit tests.

And you thought that things on front-end are bad?

So, im moving in different direction - to front-end - over course of last month - I've developed completely new front-end JavaScript framework.

It is already way better then Angular or React. I guarantee. Prove me wrong. I dare you!

You can even do your beloved microservices with it.

But just make sure that you are using relational database properly ;)

Cheers everyone :)

EDIT 1

It seem that this article gain a lot of traction, it's almost like that it might resonates with a lot of developers, thanks everyone.

Meanwhile, my attention was drawn to great Lukas Eder, founder and CEO of Data Geekery. Lukas gave brilliant, absolute must-see talk: "How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of" where he raised some excellent points:

  1. SQL is one and only ever successful, mainstream, and general-purpose 4th Generation Programming Language in existence
  2. SQL is only language that knows how to optimize it-self by selecting most appropriate algorithm for you - and let you focus only on your business logic and real value. In other words - SQL is dream of every Lean Thinker.
  3. As such productivity is not only same as I stated in article - it is actually much higher. Developers are much more productive and create much more efficient systems that know hot optimize them selves.

So, what do we do?

We use our 3rd-generation programming languages of choice such as Java, C#, Python, C++, etc, etc - to create libraries and tools in those same 3rd-generation language - that would enables us NOT to use 4th generation language that knows how to select most optimal algorithm for us - so that we could write those algorithms by our self's.

Pretty much. Am I getting this right?

Only people with artificial intelligence are able to invent something sophisticated and advanced like self-optimizing next generation language - and spend next decade or more finding ways how not to use it. It is unreal.

EDIT 2

After doing even some more research it seems that tide is finally turning.

There is new hope.

Answer with most up-votes on Stack Overflow to question - What Java ORM do you prefer, and why? - None. Stop using ORM's

Answer with with most up-votes on Stack Overflow to question - When should I use stored procedures? - Almost always.

It looks like common sense is going to prevail after all.

Money talks.

Cheers everyone :)

SOUMEN S.

Author, Technical Leader & Manager @ Tech Companies | Software Development Methodologies

5y

I directly experienced the malaise of ORM. A batch application was being designed to replace DataFlux product and Spring / JPA based approach simply was not performant enough to favor allocation of development dollars towards Object-Relational-Mapped (ORM) architecture. The Java based batch application used stored procedure and JDBC to achieve the product replacement goal with two developers (one with DBA experience) within three months.

Like
Reply
Farwa Batool Syeda

Director of Software Engineering at Cielo WiGle Inc.

5y

I am glad to find out that I am not the only one against ORMs. All your examples emphasized on ORMs with relational databases. What is your opinion on using ORMs with non-relational databases like MongoDB and AWS DynamoDB. ORMs might be beneficial in terms of beautiful code, but with my experience they increased our throughput and memory utilization of the RESTFul APIs which were written in Python (some developer from Java background had this dreaded idea in mind to implement pure Java based OOP patterns with Python and ORM with NoSQL, killing the basic purpose of both). From development point of view they require high maintenance for example, adding/removing columns in a table (which requires zero effort in NoSql ) becomes unnecessarily time consuming. I mean without ORM I'll just edit the query to insert a record and that's it but with ORM, I'll have to update the schema and then the entity models and then add a mapping. And I am sure it wont end here, as debugging will take more time in case of any broken links. The only time I loved the concept of ORM was when I worked with C#.net and entity framework 4 years back.

Adam Tandowski

Account Manager @ Catalyst | EdTech | GovTech | Open Source

5y

Great article. I couldn't agree more.  Have felt this way for a very long time, but it's unfortunately not the popular opinion at a lot of companies. Many don't quite grasp the performance difference.

Claudio Arriagada

Senior Software Support Engineer at Medallia

5y
Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics