Big Data is one of those buzz-terms that is difficult to avoid. But where are we when it comes to actual adoption of those technologies?
To many, data becomes Big when you're talking volumes ending in terabytes, petabytes, exabytes and beyond.
Others apply criteria such as a number of records, transactions, or files. If the number is mind-bogglingly large, then surely it must be Big Data.
But that's missing a crucial point is it's not just about quantity, but also about the nature of the data.
Just scaling up your existing systems to accommodate ever larger data volumes doesn't necessarily mean you're dealing with Big Data.
The Big comes into play when you're looking at all the data that resides outside your existing, typically well-structured, systems and wondering how on earth you can harness it for business benefit.
Another definitional issue is structured versus unstructured data.
Categorizing structured data tends to be less contentious because this is data that exists in tabular form, typically in relational database management systems (RDBMS). Then there's the rest.
To call it unstructured applies only in the sense that this data doesn't reside in traditional RDBMSs.
But there are many shades of grey. For example, the data in system or web logs is usually very well structured, and each data element has a known definition and set of characteristics.
Similarly, data from a social media stream, like Twitter, is well-structured in some ways (for instance, it defines length of message and use of operators such as @ or #), and yet is totally unstructured in others (the content of the message could be anything).
Email, documents, spread sheets and presentations also fall into this category to a certain degree; it all depends on the context in which they are stored.
Then there are blogs, pictures, videos and all kinds of other data elements which your organisation may well wish to understand better but doesn't yet capture within existing systems.
There's certainly no getting away from data growth. A recent survey* conducted by Freeform Dynamics shows that most organisations are seeing data volumes increase, with unstructured data for many looking set to grow even faster than structured data, as illustrated in Figure 1.