In this post, we’ll take a close look at some of the differences between a traditional relational store and a NoSQL document store.
Rows vs. Documents
To begin with, a document database stores entities as documents – meaning JSON documents, and this is very different to the way relational databases store data as rows in a table.
Columns vs. Properties
While rows in the relational world are made of up columns, documents contain properties.
Schema vs. Schema-Free
In the relational world, every table has a schema that defines the columns and data types that every row in the table must conform to. In contrast, a document database has no defined schema, and every document can be structured differently.
In this example, there are four columns defined for a table, and it would be necessary to alter the table schema if we wanted a fifth column, or if we wanted to change the maximum length of the name column, or if we wanted to allow nulls in date-of-birth, you get the idea. But because document databases as schema-free, they aren’t subject to these constraints. This makes them ideal when you have a rapidly evolving schema, as is usually the case in software development today.
Here, the first document has several properties, one of which is called name. Yet this doesn’t prevent us from adding a second document that uses the property fullName, rather than name, because there are no rules on schema. Similarly, we can add a third document that stores the fullName property an object that has distinct first and last name properties embedded within it. The database is more than happy to accept all these variations. Of course, this is just an example, and it would normally make no sense to mix up property names and shapes like this, just because you can. In this case, we can speculate that – at some point, it was determined that fullName was a better choice than name, and then at some later point, the fullName property was enhanced to distinguish between first and last names. This is a big selling point of document databases, because they let you evolve your schema as your needs dictate, as compared to relational database, where keeping up with an evolving schema is far more disruptive. Of course, when you are supporting varying schemas like this, it falls on you to present a unified view of the data. In this scenario, you could easily devise a query that returns the full name, first, by testing for the presence of a property called name, or a property called fullName if there is no name property, and then by testing if the fullName is an object or not; if it is, then you’d concatenate the embedded first and last name properties, otherwise, you’d just return the fullName property.
Normalized vs. Denormalized
Another significant difference between relational databases and document databases has to do with data normalization. In the relational world, we strive to normalize data as much as possible. This means avoiding data duplication, and maintaining separate tables for each related entity that can be joined together to produce a complete view of the data.
In this example, we have a User table with a row for John, and we have a separate Holdings table with multiple rows for all of John’s stock holding. The two tables are related on User ID, which is 1 for John, so that all individual users are stored in the user table, and each user’s related holdings are stored in the holdings table. Not only must the tables be joined when retrieving data for an application, but the application must know how to take a single modified object with a user and their holdings, and persist changes to the database by updating the user and holding tables separately. This takes significant effort, even if you leverage assistance from an Object-Relational Mapping framework (ORM) like Entity Framework.
In a document database, we typically do the opposite.
Here we see a single document that contains both the user and their holdings. Suddenly, there is no need to join when running a query, nor is there any need to shred the object into different places when saving changes, in fact, no need for an ORM layer at all. This, together with a schema-free model, eliminates a tremendous amount of friction that you normally have with relational database when designing schemas, joining tables, and maintaining an ORM layer on top of your database.
Of course, this isn’t a silver bullet that works well in every scenario. Here we see the one-to-many relationship implemented in a JSON document using an array of embedded objects. This is fine when you expect a reasonable maximum number of child objects, but what about so-called “unbounded” data. For example, imagine a blog post document with an infinite number of related comments. It would not be possible to store a single blog post with all of its comments inside a single document, and this is an example of where you actually might implement a model similar to a relational database, with each blog post in their own document, and each comment in a separate related document that is tied logically on the blog post ID. You can certainly do this, but it will fall on you and your application to store and retrieve blog posts and comments separately, because the document database won’t join them for you. You’re also free to model things however you’d like, so you could create blog post documents that include the first 100 comments, and then related documents that contain the next hundred, and the next hundred, with 100 comments per related document. There’s no one approach to data modeling that works in every scenario, so it’s up to you to experiment with what works best.
Data duplication is another aspect to this. With document databases, it’s not uncommon to duplicate data across multiple documents so that each document has the data it needs without having to locate other documents. Of course, if this is data that frequently changes, then you face another question as to whether it’s better to update multiple documents when a single piece of duplicated data is changed, or to extract the duplicate data out of each document and maintain it in a single shared document. Once again, how you model your data is all up to you. But at the end of the day, it’s important to understand that document databases work best when dealing with rich hierarchical documents that are entirely, or almost entirely, self-contained. Yes, you can model related documents when you need to, but if you find yourself modeling a database that contains many related documents, and/or, your documents have mostly flat structures, then this is a clear sign that a document database may not be the right tool for the job.
Strong Consistency vs. Eventual Consistency
Relational databases also enforce strong consistency on write operations. After updating the balance of a bank account, for example, we must be guaranteed that queries immediately show the updated balance – it’s entirely unacceptable to continue showing the old balance any time after it’s been changed.
One reason that document databases perform as well as they do is because when you write to the database, the changes are propagated to multiple replicas in the background. Then, read requests can be satisfied by any replica, making it possible to satisfy a high volume of client queries just by maintaining enough replicas. However, because not all replicas may be up-to-date at the point in time that a client issues a query, it is possible to receive inconsistent query results. Eventually, of course, all the replicas will be updated, and queries will return consistent results, which is why this behavior is called “eventual consistency.”
Simple vs. Complex
As I had started explaining, NoSQL databases are simple by design, which is the primary reason that they are able to achieve scale and performance that surpasses relational databases. This is also the reason why they do not and cannot replace relational databases that are often better at handling more complex requirements that don’t necessarily need to achieve massive scale.
Scale-Up vs. Scale-Out
And finally, speaking of scale, relational databases simply don’t scale out easily. You can do it, but it’s hard and expensive, whereas scale-out is a fundamental design goal with document databases.