When talking about Big Data, people tend to throw around a lot of big numbers. For example, it's been said that from the dawn of humanity up until the year 2003, we have created five exabytes of data, and now we create that much data every day. (In case you're wondering, an exabyte is a billion gigabytes.)
I've also read that in the year 2011, we created 1.8 zettabytes of data. I had no idea what a zettabyte was, so I went through the trouble of looking it up: A zettabyte is a thousand exabytes. Or if you prefer to think of it in terms of terabytes (which is a trillion bytes), a zettabyte is a billion terabytes.
It's easy to throw these figures around, so let's get a little perspective on just how big these numbers really are. One way to make these numbers meaningful is to think about them in terms of units of time. A billion seconds is 32 years. A billion times that is 32 billion years. If you typed one letter per second to generate two bytes of Unicode (remember that Unicode uses two bytes per letter) every second, it would take you a little longer than the age of the universe to produce an exabyte. To generate a zettabyte, you would have to generate over 2,000 bytes of data every second starting at the moment of the big bang, and continuing up until right about now – plus or minus 100 million years, mind you.
Those are big numbers. But what does all that data mean in terms of money. Well, let's compare it to Bill Gates' buying power. Last time I looked, Gates was worth $80 billion. That means that if data were on sale for $80 a terabyte, Bill Gates could buy a zettabyte of data.
$80 a terabyte is a reasonable price when you consider that external 1TB hard drives are around $50. Up the amount a little to account for the value of the data itself, and then apply a volume discount (I think a volume discount is appropriate, since you need a billion 1TB hard drives to make a zettabyte.), and you'd have to agree that $80 a terabyte is in the right ballpark.
So Bill Gates really could buy a majority share of all the data produced in 2011. But then what would he do with all that data? That's the same question many organisations are asking today: "We know how to collect gobs of data, but then what do we do with it?"
Unfortunately, many companies are doing very little with all the information they're accumulating. Far worse than doing very little, some organisations are making use of their data, but in all the wrong ways. They wind up making bad decisions based on faulty analysis.
So how can an organisation put in place structures to ensure they make good use of the data they collect? That's not easy.
While many organisations understand how to collect the data, few know how to store the information in ways that allow them to do meaningful searches. To do that, one must catalogue and index information as it comes it. Still fewer organisations put together the right set of tools to allow them to analyse the data once it's catalogued and indexed.
But most of all, to do the right things with all the data you collect, you need real human beings to ask the right questions and draw reasonable inferences from the answers to those questions. You have to find people in your organisation who understand three things: data science, your industry, and your company strategy. Since these kinds of insights rarely reside in a single person, you have to put together a team of people whose complementary skills allow them to do the job.
And don't forget. Those people have to get along - not easy given their different backgrounds and personality types.