Data quality is not the most glamorous area of technology. If technology were a dance then data quality would sit on the sidelines getting little attention, looking on while mobile app development, big data and even dull old security got the admiring glances.
Let’s face it, no ambitious young executive dreams of going home after work and boasting “Hey, today I got promoted to data quality manager”.
Yet data quality is causing all kinds of problems for corporations.
A few of these make headlines, such as when in 1999 the $125m Mars Orbiter spacecraft crashed due to a mix-up between imperial and metric units.
Others seem comical. One major telephone company, in a scene reminiscent of the movie Spinal Tap, ordered boxes for its new range of phones with dimensions in centimetres rather than millimetres, the problem only coming to light when lorries started rolling into warehouses with boxes 1000 times too large in volume (this was a €25m mistake).
Most data quality issues are more mundane: old addresses for customers, incorrect product codes and wrong inventory details.
Yet all these seemingly small issues add up to a very large data quality problem.
I conducted a survey through my firm The Information Difference in 2010 in which participants were asked what proportion of their master data management budgets were allocated to data quality.
The average was 10 per cent of the budget, the problem being that the same companies admitted that the average spent on dealing with data quality was not 10 or even 20 per cent but a huge 30 per cent of their original project budget.
In another 2010 survey of 192 companies, just 12 per cent of respondents rated their data quality good or better.
Dr Thomas Redman’s 2001 book Data Quality, The Field Guide, estimated that data quality costs added up to the equivalent of around 10 per cent of the sales of a typical company.
Why does a problem of this scale remain?
In that same 2010 survey, a third of respondents admitted that they had no data quality programme of any kind.
Yet data quality software solutions abound, with dozens of products on the market, plus an army of consultants who are more than willing to advise companies on how to improve data quality.
I believe that part of the problem is simple lack of awareness, not helped by a software industry that has focused most of its attention on just a few aspects of data quality.
With many companies having no data quality programme, and even fewer (30 per cent in the survey) actually measuring the costs of poor data quality, the problem remains below the senior management radar.
People are aware that there are issues, but there is an assumption that the problem is minor and intractable.
Attention is devoted to it only when there is visible failure. As Henry Ford said: “Quality means doing it right when no one is looking”.
A key reason why the problem persists is down to human nature: a telesales person taking an order cares a lot about getting the financial details of the customer, because that is how their commission is set.
Yet that same salesperson may care much less about whether that customer already has an existing account, or being certain about the accuracy of the delivery address, or their credit worthiness, as that is an SEP: someone-else’s-problem.
If a person working in a company is collecting data but does not feel the impact of it, it is human nature that they will not be as careful about it as with data that directly affects them.
For example, payroll data in a company is pretty accurate, because employees will soon complain if they are not paid enough or on time.
Yet for most data that is entered there is little immediate impact on the individual.
The software industry has focused data quality solutions almost entirely on name and address data, because this is a problem that every company has and because it is relatively easy to write software to fix it.
There are plenty of published algorithms for determining common typing errors in names, so if you knock up a user interface then you have a data quality product.
There are far fewer software solutions aimed at data quality of other domains such as product data or financial data, as these are tougher to code for.
Yet given the scale of the problem, it seems to me that the software industry is missing a trick here.
It may be a tough sell, but the rewards for companies successfully tackling the data quality issue will be considerable.