The end of ORM

Reference

A link-bait title, I'm aware, but please indulge me while I try to expose some thoughts about ORM and Eventsourcing that have recently crossed my mind. I wish to go down a theory-crafting path exploring these thoughts and I will welcome any comments on the matter.

As the sage said, stay a while and listen...

The impedance mismatch

Let's start with a bit of basic context. Most decent sized applications need to persist information so it can be retrieved later. Traditionally, this has been achieved via use ofRDBMS. More recently, NoSQL movement has promoted alternatives. But, in the end, we need a storage area for that data.

But this is not without problems. The prevalent development model is based on Object Oriented programming along relational databases. This triggers what is known as Object-relational impedance mismatch, which is a fancy way to say that they don't work well together for many reasons, some subtle and others not so.

This is not a new problem, obviously, and people have tried quite hard to provide a solution or something that eases the pain. From a developer point of view the use ofORM is intended to alleviate the problem, even if only slightly. Tools like Hibernate take care of communication with the relational layer so the developer doesn't need to spend time on it. They are not perfect, though, and the law of leaky abstractions hits hard. Anyone working with Hibernate may have found pretty inefficient queries being run by the framework, which means it is time for the developer to delve deep and tell the tool how it should be done. But this means going down to a level, the database, which resurfaces the impedance issues mentioned before.

So ORM fail as a complete solution. But this must be a pretty serious problem, as we (developers) devote tons of effort to it. Look at this list of relational database access tools for Scala. I count 6 different tools to provide a communication layer between your code (Scala) and the persistence area. We are talking for only one language, not checking Python or Java for similar lists, nor any in-house system developed before ORM were commonplace. And they are good tools, production ready, not pet projects of some developer that wants to experiment.

How many man hours are we devoting to this each year?

Even if the development world is moving towards a more functional paradigm, as the changes in Java 8 seem to confirm, and even if NoSQL has become a viable alternative, the issue persists. We may not have impedance mismatch, but we still use transformation layers to communicate with the database, tools that work with our data objects and try to convert them to a format suitable for storage. You think MongoDB may solve this by being a document store and then you see this:

query = new BasicDBObject("i",       new BasicDBObject("$gt", 20).          append("$lte", 30));     cursor = coll.find(query);

The wrong approach?

If we are spending so many man hours in this problem and it still causes headaches, it may be that we are approaching it wrong?

Each new component in an application adds to the complexity of the system. There is good complexity, like microservices, which give a clear benefit even if they increase the number of "moving parts". And there is bad complexity. Having to learn a full framework like Hibernate for something like saving my user data seems overkill. A service that manages storage of application state is the right approach, but when you have domain objects in you application that contain all the data having a different model to store that same data is wasteful.

But we do that. Being honest, it's a natural mistake. As with many things, one is not usually aware that he is doing it wrong until shown the right way. And in here I have to add a mention to Akka Persistence as that tool that shows you the correct way.

Yes, I won't delude myself: it is still a framework to learn, and one marked as experimental. It may well be that it won't perform once put into demanding production scenarios, it may be that all the promises will fail short. But, as I said at the beginning, please indulge me and ignore all this for a second.

Akka Persistence provides an implementation of the Event Sourcing pattern. Forget additional mismatched layers for storage. Your source of true are your domain objects. You store only changes to the objects, so you can reproduce the state of the system at any given time. By taking care of "new states" no transactional issues happen in the data layer due to concurrent updates on pre-existing states. You are even capturing something that relational databases struggle to, but it's essential in many systems: time (as a sequence of states).

Doesn't it just click? Doesn't it seem natural? Why to waste time on ORM or other alternatives? Persistence suddenly becomes a support system instead of one of the main components of your application. The data stays where it is generated: your domain. Your objects are the source of truth, and you only make sure their state won't be lost in case you have to take them out of memory. Simple. As it should.

And the fact that it fits perfectly with DDD as described by Vaughn only reinforces the sensation that this is the right approach.

Why will it succeed?

As I said, there may be technical issues which I have not considered. I may be blind to glaring limitations, either on use cases or performance. It may be that it is not feasible to construct a solution using this storage system only. I may be wrong.

On the other hand, given the amount of man hours we have and are still devoting to ORM-like systems, which we know are far from perfect, it seems natural that we could fix any issues Akka Persistence may have. It is doubtful that the guys at Typesafe, all brilliant engineers, would have integrated this project if it had obvious major problems. And, if I'm right, Persistence (either implemented by Typesafe or any other implementation of the concept that may appear) will inevitably succeed. Even if it ends up being slightly worse, performance wise, than the current ORM.

The reason I'm so sure is because simplicity trumps all. Look at many popular technologies from the last decade or so: MySql, PHP, Ruby, MongoDb. They weren't the most performant. They weren't the most good looking. They had some shortcomings. But they became incredibly popular, with plenty of developers moving to use them. Why? Simplicity. If these technologies have something in common is that they were easy to use. Any developer could spend a few hours learning the basics and use them. They were enablers that required less effort than other more stablished alternatives. And, yes, at some point the developers hit walls on performance or security or other areas. But that was later on.

Persistence can follow that path. In a bad scenario, it allows a developer to forget about learning SQL, transactional contexts, ORM mappings and other relatively complex stuff. Your domain data is your data. And that is a massive enabler, which increases productivity just by removing all the bugs and edge cases that happen when you use an ORM at scale. In the best scenario, we have a framework which is both simple and performant.