Over the years there has been an explosion in the growth of data. As recently as 2000, digital media accounted for just 25 per cent of all information in the world, but by 2007 it was 94 per cent, according to a study by the University of Southern California.
Although processing power has, in accordance with Moore’s Law, also seen exponential growth, this pace has not been kept up with by memory and disk storage access speeds.
The last few years have seen this problem exacerbated by the growth in social media and by the increase in the amount of data automatically collected by sensors and devices like RFID tags and clickstream tracking software.
The consequence of this divergence is that many enterprises find that traditional database approaches are struggling to keep up with their needs to analyse the increasingly huge volumes of data.
To complicate things, more of this data is unstructured (such as documents and web pages, rather than just numbers), which traditional databases have never been especially good at dealing with.
Industries which have found this a problem include internet marketing companies, social media web sites and financial institutions like hedge funds who want to test trading strategies on historical trading data.
The term Big Data has been coined to describe this issue, and a number of interesting approaches have arisen to tackle it. For one thing there has been an explosion of entrants to the previously staid data warehouse market.
Approaching the numbers
Two approaches have come to the fore.
First, traditional relational databases have been optimised for transaction update, and are row-oriented, designed to have tables with a few columns (name, address or product number) and large numbers of rows.
This is what you want for update processing, but in the case of largely read-only processing it can be more efficient to flip this on its head to column-oriented storage.
Pioneered by Sybase, this approach has been taken up by many of the recent analytic database market entrants.
It is easier to compress this style of data, though there is a price to pay in terms of load times and it is not well suited to frequent transactional updates.
However for analytic processing this is not really an issue, and columnar databases can, for certain analytic queries, deliver query performance an order of magnitude faster than traditional approaches.