Any CIO who doesn't want to be squeezed out by shadow IT should bone up on NoSQL now. According to industry analysts at Gartner, by 2016, the majority of new enterprise applications will use at least one NoSQL database management system (DBMS). Some of those new applications will be brought in through business unit managers eager to benefit from the features the applications offer, and unaware that the lack of adherence to information governance policies will probably result in damage to corporate data.

Some organisations are taking a proactive approach by defining how they want to use NoSQL before it's brought in under the radar as part of an application. One such organisation is the NHS, who over the last few years have migrated their central patient data storage and messaging platform ("NHS Spine") from a traditional relational data model to a NoSQL model.

According to IDC, the new system went live in August last year and requires only 10-12 people to manage, as opposed to over a hundred for the old system. What's more, the new system costs around 95% less in supporting infrastructure, and provides lookup times on the order of milliseconds rather than the seconds endured by users of the old system.

If that caught your attention, you may be asking questions like, "What exactly is NoSQL?" and "Why are so many applications now based on NoSQL?".

The first thing you need to know is, there is no such thing as NoSQL. NoSQL is an umbrella term which means, "Not only SQL", and includes any data management system that uses some SQL, but also uses other techniques and data models that lend themselves more to handling large quantities of unstructured data that's geographically distributed.

The three biggest drivers pushing the trend towards NoSQL are:

  • Scalability: Traditional DBMSs just weren't designed to handle the amounts of data we see today. NoSQL can handle humongous amounts of data and it assumes the data is distributed.
  • Flexibility: Application developers don't want to have a rigid schema imposed on them. Instead developers need each application to interpret data in ways that are appropriate to that application.
  • Ease of use: As seen with the NHS Spine case, with NoSQL database administration is no longer required, as each application has its own view of the data, and each application handles data consistency in its own way.

Nowadays enterprises need applications to get developed and deployed quickly, without having to go through the trouble of designing a traditional database, or without having to change an existing traditional database to accommodate the new application. Besides, traditional database schemas don't support the kinds of data that needs to be collected and retrieved.

By moving away from the traditional relational database model, developers have the flexibility to meet the demands of modern markets. The newer data models, don't have the same kinds of integrity constraints as those imposed by the relational database model.

The peculiarities of unstructured data

To understand the need for NoSQL, you need to understand the difference between structured and unstructured data. Structured data is data that is naturally homogenous and that can fit into fields of fixed length. For example, your contact list is a set of structured data. It's easy to search on surname, because it's stored in a known location in each record.

However, once you start storing pictures, audio, and long chat conversations, all bets are off. The data has no natural structure, which makes it very difficult to catalogue and later retrieve through meaningful queries.

The problem of storing unstructured data isn't exactly new. In the 1980s, Jim Starkey, who was working for DEC at the time, invented a new field to allow images and audio to be stored in relational databases. This field was of variable length and special procedures were used to interpret the string of bytes it contained.

Starkey named the new field type a BLOB, supposedly naming it after the bad Steve McQueen film from 1958, "The Blob", which was about an amorphous monster that kept growing until it threatened to eat up an entire city. Two different acronyms were later retrofitted - binary large object and basic large object - and the field was from then on referred to in capital letters (BLOB).

BLOBs allow you to store an image or audio (or whatever the string of bytes represents) into a field in a database table. BLOB were of variable length and require special semantics to handle. You can't just use the field in an SQL query. But you can have other fields in the table indicate something about the contents, such as the names of the people in a picture, or the date the picture was taken. Those fields allow you to catalogue and retrieve the unstructured data stored in a BLOB.

[Next page - NoSQL to the rescue]

NoSQL to the rescue

Big Data requires storage of lots of unstructured data and searches on that data at extraordinary speeds. Developers of applications that handle Big Data frequently need to relax the rules normally applied to relational databases. So they turn to NoSQL.

Whereas the so-called ACID properties (atomicity, consistency, isolation, and durability) were emphasised for traditional RDBMSs, developers of NoSQL applications look for the BASE properties (basically available, soft state, and eventually consistent).

Here is further explanation of those properties:

  • Basically available means that even if not all the data is available, you will get a response.
  • Soft state indicates that the system could change over time as transactions are completed.
  • Eventually consistent means that the data may not be consistent at any given moment, due to the distributed nature of the data and the relaxation of ACID constraints, but that the data will eventually become consistent as more input is received.

Since NoSQL assumes there is no central database management system, in which schemas are defined, the schema resides with the data or with the application. NoSQL assumes data is geographically distributed - the data might be in the cloud, for example. And as mentioned above, updates don't get pushed to all nodes instantly - hence, the concept of eventually consistent.

In some ways, what we're seeing with NoSQL is reminiscent of the evolution of relational database management systems. In the 1970s and 1980s, when enterprises started using IT, they bought applications where the databases resided in the application. Applications frequently used Indexed Sequential Access Method (ISAM), which was a method invented by IBM for accessing a relational database from within an application - that is, without a central database management system. Then relational database management systems (RDMBSs) started making their way into the enterprise, making it possible for the database to be independent from the application.

A similar trend is occurring with Big Data. You have different ways of organising the data, and what you choose for one application may not be appropriate for another application.

It was relatively easy to standardise on the relational model because data didn't come in so many forms back in the 70s and 80s. By contrast, it's hard to imagine a standard data model that will fit the needs of all the different applications and all the different kinds of data those applications store and retrieve today. Instead of having a central database system supporting several applications, according to Gartner: "NoSQL DBMSs are typically deployed to support a single application."

New data models

It's easy to forget the NoSQL is not a single standard, but a set of tools and techniques to handle large, unstructured data sets that are geographically distributed. Most NoSQL implementations use some SQL, but they use a lot of other methods to loosen up on constraints imposed by SQL.

Most NoSQL database management systems fall into one of four categories:

1. Document Store - In a document store database management system, data is storied in a hierarchical, tree-like format. Because these data models use web-centric interchange formats, such as JavaScript Object Notation (JSON) or XML, to describe data, they are a natural fit to web applications.

2. Key-Value - These data models allow key lookups to retrieve values, thereby providing consistent access times. Both the keys and values are stored as binary object. The best use cases for these models are those requiring a series of reads and writes of small amounts of data - for example, real-time bidding (RTB).

3. Table-Style - The best example of this model is Google's BigTable, in which the three-way combination of a row key, a column key, and a time stamp form an index to retrieve data. Row keys may be as large as 64KB, but are typically 10-100 bytes in size. Transactional integrity is guarantee on a single row, but not on multiple rows. The time stamp allows storage of different versions of the same data. BigTable is used by more than 60 Google apps, including those, such as Google Earth, which require extreme scale.

4. Graph - This model allows storage of information in structures that record relationships between elements. These model are useful in applications requiring the classification and discovery of relationships between data sets.

To catch up to industry demands and to minimise the loss of market share, many of the big traditional database vendors are adding extensions for NoSQL. For example, Oracle has made changes to add NoSQL features to both its traditional line of database products and to MySql, which it also owns; and Microsoft has added NoSQL features and interfaces to SQL server.

In all likelihood, if you don't currently have applications that use NoSQL, you will in the near future. But what about your mainstream applications? Is NoSQL a good choice? To help you decide, Gartner recommends, "Understand how your applications use data before deciding whether the benefits of NoSQL DBMSs - developer productivity and horizontal scalability - outweigh the possible costs of data quality and consistency."