Over the years I have been involved with a lot of projects that involve the management of data. Large firms perpetually struggle with how to get reliable, consistent, timely and accurate management information, and there have been numerous attempts to fix this by the industry.

Data warehouses made some headway, yet in reality most large companies today do not just have one enterprise data warehouse, but several. ERP was going to tackle the problem root and branch by ripping out the labyrinth of operational applications and replacing them with just one. However, this was a pipe dream: the scope of even the largest ERP vendor does not extend to every system that you need, and practical limitations meant that large companies ended up putting in not just a single ERP instance, but in some cases hundreds.

Even tackling the problem as it related to just one type of data, namely customer data, has proved problematic. CRM had its heyday but after everyone had put Siebel in, there were still other sources of customer data. A survey my company did in 2008 found that, on average, a large organisation has six systems that produce and maintain master customer data, which is five more than the original CRM vision.

Just four per cent of companies in the survey had a single source of customer data, and one had no fewer than 300 sources.

This diversity of data in large enterprises is a major source of difficulty. In the same survey, two-thirds of large firms admitted that analysts spent more time reconciling data than analysing it, while barely half could quickly calculate supplier spending across divisions, or even calculate profit margins consistently.

In one manufacturing company I did some work with, it was discovered that a sizeable number of contracts were actually loss-making due to inconsistency in the way that profit margins were calculated. Needless to say, these were the ones that had been growing in scale, causing serious erosion of profits.

So if data warehousing, ERP and CRM did not fix the issue, what will? One reason why such initiatives did not fully solve the information maze was sheer complexity, but a key one was that in many cases the IT department was seen as responsible for sorting out the data mess. Since IT departments typically have limited political clout, the more powerful business departments went on creating their own systems and perpetuating the diversification of data.

One thing that I do find encouraging is the recent emphasis on data governance, which is the process of collaborating between lines of business to define how common or shared data will be owned, propagated and controlled.

It is only by putting the ownership of the data diversification problem firmly back where it belongs, with the business, that we will ever make progress. Businesses need to understand that, while IT systems may hold all this competing data, the business is the owner of these systems.

It is not practical to expect IT departments to mediate between business units over the ‘correct’ version of the international product hierarchy, or the format of customer details or the definition of gross profit margin.

A survey we have just completed shows that one third of large organisations surveyed had data governance organisations, with a further third in some form of pilot or investigative stage. Those that do usually have a two-tier structure, using a data governance steering team with senior business and IT representation, as well as a working group sorting out data standards. The median number of full-time equivalent staff engaged with such activities was eight, suggesting a non-trivial level of commitment in those companies.

The emergence of a greater emphasis on data governance, often in concert with technology projects associated with better controlling common or shared data (‘master data’ in the parlance) seems to me a positive first step. Projects which deal with data ownership cannot succeed if they are hived off to IT.

For too long business divisions and subsidiaries have had a blinkered view of data needs, looking only at their own requirements and commissioning new systems rather than working across functional boundaries to agree on common data definitions and ownership.

Admitting that they cannot control their addiction to building new systems that generate shared data is a first step. It remains to be seen whether companies will complete their rehabilitation and begin to tame master data. This will be a long and painful process which will include making an inventory of all the systems that maintain such data, and slowly taking back control.

Other articles by Andy Hayler:

Andy Hayler: Do the right thing

Andy Hayler: A bit of discipline required