In this blog post, we’ll discuss the major concepts around NoSQL document databases. In future posts, I’ll introduced Azure DocumentDB, Microsoft’s newest NoSQL document database, and discuss the major differences between relational databases and document databases.
What is NoSQL?
The best way to start is to clear up some terminology, where the industry has unfortunately adopted a couple of terms that are arguably misleading. SQL means Structured Query Language – meaning it’s just a language; a way of expressing a request to “go find something from someplace, where some condition is true, and give me back the result in the shape that I want it.” And so, SQL per-se doesn’t really define a specific technology. Again, it’s just a language, a dialect, but because SQL is the traditional query language of relational databases, the terms are often equated. So it’s really more helpful to think of a NoSQL database as a “non-relational” database, where – however you go about querying this database – it’s a database that abandons may of the concepts of relational databases. And so we wound up with the term “NoSQL,” where by now many NoSQL databases have emerged, and Azure DocumentDB is the latest NoSQL contender from Microsoft. But unlike most other NoSQL databases, the primary way to query DocumentDB is – oddly enough – by using SQL, or at least, a version of SQL that’s been adapted to the non-relational world of NoSQL databases. I’ll be talking a lot about DocumentDB in upcoming posts.
OK, so NoSQL really means non-relational. Now that’s a really broad definition. Saying it isn’t relational is like saying it’s anything else. And that’s true, which is why in fact there are different types of NoSQL databases. These include key-value stores, such as Azure Table Storage, column based stores like Cassandra, graph databases like Neo4, and document databases like MongoDB and Azure DocumentDB. While there are key differences between these types, all NoSQL database platforms share several common characteristics.
Huge amounts of data
First, they are designed to scale out, not just up. Meaning that while relational databases scale up easily enough, simply by adding more hardware, it’s much more difficult to scale them out horizontally – that is, to spread relational data across multiple partititions – once you hit the ceiling on CPU, disk, and memory, and can no longer scale up. In contrast, NoSQL databases are designed to scale out – infinitely, in fact—making it much easier to achieve internet scale for modern applications.
Another common characteristic among NoSQL databases is the concept of schema-free data. That is, unlike relational databases, a NoSQL database does not enforce any schema. Every item in the database is free to store information that may or may not be structured the same as other items – even other items of the same type. This means that you can simply introduce new elements in your data as they become pertinent, without requiring any design changes in the database, such as adding or dropping columns, or changing data types. Similarly, you can stop including elements in new data as they start becoming irrelevant, again, without maintaining a schema in the database.
By design, NoSQL databases are simple. They are not nearly as robust as traditional relational database platforms, like SQL Server and Oracle, and there are two reasons for this. For one, NoSQL databases don’t try to provide the complete functionality that is currently available in a relational database. That is, they are specifically designed to be simpler than relational databases, which is how they are able to out-perform relational databases on a large scale. In other cases, NoSQL databases lack features simply because they are much younger than their relational counterparts, which achieved maturity a long time ago. So while you can expect to see improvements in areas of missing functionality as NoSQL databases evolve, these platforms won’t make an attempt to replace full feature set available in relational databases. Despite the negative connotation in the name “NoSQL,” eliminating relational databases in favor of NoSQL is certainly not a stated goal. But at the same time, a scalable, schema-free, and easy-to-use database platform is rather compelling, and gives us more choices for our applications than we had before. Relational databases are definitely here to stay, but they no longer enjoy the monopoly they once had as the back-end platform of choice for new applications, now that a variety of NoSQL alternatives are here.