When I look back at my three decades working in data management it is intriguing to consider what has changed and what has remained the same. In the early 1980s the relational database began to gain commercial traction (Oracle appeared in 1979), making it easier for programmers to write code to easily access data without having to specify the navigation path to use. As a former IMS database administrator I can assure you that this was indeed an advance, whatever some of my sceptical colleagues thought at the time. I also recall working at Esso on an early Data Dictionary in which we stored business definitions of data, in order to try and improve consistency amongst the multiple database applications that we had developed.
Databases have continued to innovate considerably since those days, with the development of OLAP cubes for decision support data, columnar databases that more efficiently process business intelligence queries, through to multi-parallel processing approaches to better apply processing power to burgeoning database sizes.
Data volumes have of course grown to fill all the storage that advances in technology have provided. It was tricky enough to keep up with expanding enterprise data volumes. Consumer generated data via the Internet, and in particular machine-generated data from sensors in a vast range of devices has greatly exacerbated the problem we now know as Big Data.
New approaches to data storage such as Hadoop have sprung up as traditional relational databases have struggled to keep up. Indeed the last few years have seen an explosion in innovation in databases: in-memory approaches, NOSQL, graph databases and more have appeared in response to our insatiable desire to consume data. Much as we may complain about slow response times now and then, the core database and storage technology has actually done a pretty impressive job of keeping up with demand. In 1997 (according to the Winter Corporation, who measure such things), the largest commercial database in the world was 7TB in size, and this had only grown to about 30TB by 2003. Yet this figure tripled by 2005 to 100TB, and by 2008 the first petabyte sized database appeared – a tenfold increase in three years. This had risen to 23 petabytes by 2012. The fact that we have managed to deal with such an explosion of data volumes is a testament to the sheer inventiveness of data science.
The striving for consistency of data definitions has not changed since my early days working on a data dictionary project. Numerous attempts have been made to eliminate duplication of data and to stamp out multiple, inconsistent versions of it. Data Dictionaries became out of line with the core systems they described, and the ERP wave promised to sweep all other data versions away but failed. More recently, master data management has tried another tack, trying in assorted ways to keep track of data inconsistency and partially resolve it. However, even after a decade trying, few organisations can truly say that they have put the master data genie back in the bottle. This is not just due to limitations in technology but is caused by the ways companies are organised, with different areas of the business able to set up their own databases when they become frustrated by the inability of central IT to response quickly enough to their needs. This is compounded by the very human desire to control things, and owning data is more desirable than giving up such control. When “knowledge is power” it is tough to change the business practices that effectively encourage the development of unconnected islands of data. The rise of the discipline of data governance is recognition of the problem, but this faces an uphill battle in most organisations to rein back control of data. The struggle continues, but attaining consistency of data remains as much of a challenge today as it did 30 years ago. Tackling the vagaries of human nature is tougher than writing faster code and developing more efficient storage.
So, three decades on we see that data management has come a very long way and that much has been achieved, but many issues remain. The rapid innovation that we have seen in the last few years shows that human ingenuity will continue to be applied to the on-going challenges that data presents, simply because it has become so important to our everyday lives. As we move to an increasingly digital world, data management will become ever more important, and remain an exciting field in which to work.
This column has been running for six and a half years, but this will be the final one. On a personal note I would just like to say thank you to all those who have read my articles over the years; I hope that they have been useful to you.