What is a NoSQL Document Database?

In this blog post, we’ll discuss the major concepts around NoSQL document databases. In future posts, I’ll introduced Azure DocumentDB, Microsoft’s newest NoSQL document database, and discuss the major differences between relational databases and document databases.

What is NoSQL?

The best way to start is to clear up some terminology, where the industry has unfortunately adopted a couple of terms that are arguably misleading. SQL means Structured Query Language – meaning it’s just a language; a way of expressing a request to “go find something from someplace, where some condition is true, and give me back the result in the shape that I want it.” And so, SQL per-se doesn’t really define a specific technology. Again, it’s just a language, a dialect, but because SQL is the traditional query language of relational databases, the terms are often equated. So it’s really more helpful to think of a NoSQL database as a “non-relational” database, where – however you go about querying this database – it’s a database that abandons may of the concepts of relational databases. And so we wound up with the term “NoSQL,” where by now many NoSQL databases have emerged, and Azure DocumentDB is the latest NoSQL contender from Microsoft. But unlike most other NoSQL databases, the primary way to query DocumentDB is – oddly enough – by using SQL, or at least, a version of SQL that’s been adapted to the non-relational world of NoSQL databases. I’ll be talking a lot about DocumentDB in upcoming posts.

OK, so NoSQL really means non-relational. Now that’s a really broad definition. Saying it isn’t relational is like saying it’s anything else. And that’s true, which is why in fact there are different types of NoSQL databases. These include key-value stores, such as Azure Table Storage, column based stores like Cassandra, graph databases like Neo4, and document databases like MongoDB and Azure DocumentDB. While there are key differences between these types, all NoSQL database platforms share several common characteristics.

Huge amounts of data

First, they are designed to scale out, not just up. Meaning that while relational databases scale up easily enough, simply by adding more hardware, it’s much more difficult to scale them out horizontally – that is, to spread relational data across multiple partititions – once you hit the ceiling on CPU, disk, and memory, and can no longer scale up. In contrast, NoSQL databases are designed to scale out – infinitely, in fact—making it much easier to achieve internet scale for modern applications.

Schema-free data

Another common characteristic among NoSQL databases is the concept of schema-free data. That is, unlike relational databases, a NoSQL database does not enforce any schema. Every item in the database is free to store information that may or may not be structured the same as other items – even other items of the same type. This means that you can simply introduce new elements in your data as they become pertinent, without requiring any design changes in the database, such as adding or dropping columns, or changing data types. Similarly, you can stop including elements in new data as they start becoming irrelevant, again, without maintaining a schema in the database.

Simplicity Rules

By design, NoSQL databases are simple. They are not nearly as robust as traditional relational database platforms, like SQL Server and Oracle, and there are two reasons for this. For one, NoSQL databases don’t try to provide the complete functionality that is currently available in a relational database. That is, they are specifically designed to be simpler than relational databases, which is how they are able to out-perform relational databases on a large scale. In other cases, NoSQL databases lack features simply because they are much younger than their relational counterparts, which achieved maturity a long time ago. So while you can expect to see improvements in areas of missing functionality as NoSQL databases evolve, these platforms won’t make an attempt to replace full feature set available in relational databases. Despite the negative connotation in the name “NoSQL,” eliminating relational databases in favor of NoSQL is certainly not a stated goal. But at the same time, a scalable, schema-free, and easy-to-use database platform is rather compelling, and gives us more choices for our applications than we had before. Relational databases are definitely here to stay, but they no longer enjoy the monopoly they once had as the back-end platform of choice for new applications, now that a variety of NoSQL alternatives are here.

Document Database

I mentioned that Azure DocumentDB is a NoSQL database, and that it’s a NoSQL document database specifically. This is yet another unfortunate term, because when most people think of a document, they think of a file – like a Word document, spreadsheet, PDF file, and this is definitely not the type of document we mean when we say “document database.” In NoSQL terms, a document is more like an object graph; a complete representation of an entity and its related entities. While object graphs can be serialized and deserialized in any format, NoSQL databases typically leverage JSON, or JavaScript object notation, as the format for storing, projecting, and transporting data. Given the pervasiveness of JSON in today’s world – particularly among web applications – it should really take noone by surprise that NoSQL document databases have generally embraced JSON as their native data format. It’s a simple, lightweight format, yet expressive enough to support many different data modeling scenarios.