A recurring theme among enterprises is the denial of real-world architectural complexity. CIOs love PowerPoint slides that show a spaghetti diagram of the messy applications and interfaces they have now, then a clean diagram of just a few boxes with neat interfaces showing the Nirvana that will occur once they finish the latest project, for which they are seeking an improbably large budget.

Vendors do much the same thing, producing point solutions which address some genuine but limited problem and paint a pretty picture of the world after you have implemented their solution, glossing over the pesky problem of removing legacy applications that are the cause of the current complexity.

An example of this is in an area in which I specialise, master data management. Large companies have numerous overlapping systems which generate key data about the context of their business transactions: information on customers, products, suppliers and so on. This information is known as master data and the trouble is that the average large company has a median of six systems which generate customer data and nine systems for product data.

Cue the software industry, which will happily sell you master data hubs or repositories which purport to offer a single source for all the previously competing master data. Of course, if all you do is add this new repository, and then have just put one further layer of complexity in, a process by which the existing ‘legacy' master data sources are either retired, or at least have their competing definitions and data merged before being consolidated in the new, shiny master data repository. At this point you have one repository to rule them all, and your problems are over.


This vision skirts over a few details. Firstly, the repositories are frequently good at handling only one kind of data (either product, or customer, or in some cases supplier) due to their origins. Even if you ignore this, the sheer scale of the problem in a large company is daunting. If you really have a single source for all your master data, then this will very likely need to operate in real time, since, for example, you cannot have a sales person adding a new customer account without checking with the repository that in fact the customer already has an account.

One large company owns up to over 600 major applications. Hence the sheer scale of both the repository, and the failover requirements of such a critical database, can be seen. The technical issues are just part of the problem, since the political reality of getting all the application owners of the current systems to switch off their current functionality, or submit to a new centralised approach that truly standardises data definitions, is likely to defeat all but the most regimented and centralised firms.

A more realistic view of the world would imply an approach where a linked federation of master data hubs is deployed, each tackling a more manageable problem; say, the master data within a group of countries or region. Or perhaps a project could tackle the global data for a particular domain, say global accounts or global brands, leaving other systems to handle local accounts or products. This approach implies a series of smaller hubs with less dramatic scale and operational needs, but which will still need to be connected together in some fashion in order to avoid a new round of duplication.

This need for a managed federation approach to master data came out in some recent research my firm conducted, with just over 20 per cent of firms that had deployed master data repositories already taking this approach. This must be a tough road to go down, since software vendors provide virtually no support for such an architecture, instead selling the simple ‘stuff all the data in our single giant box' approach, which has the convenience of fitting on a single PowerPoint slide and avoids them having to develop software to manage a linked federation of repositories.

Since many CIOs also live in a single PowerPoint slide world of architecture, there seems to be little resistance to this, and so most master data projects today are gaily implementing architectures with a monolithic hub, usually just ‘starting' with a small subset of data, in the vague hope that one day this will somehow be extended to all data types in all the subsidiaries of the company.

Few people seem overly fussed, especially the systems integrators that will make out like bandits in sorting out the resulting mess. The trouble is that in order to tackle the issues, CIOs would have to own up to the business what a shambles their application landscapes really are. I can't see this happening, so many large firms seem doomed to roll out a new layer of systems that at best tackle a very small part of the underlying data problem, and at worst actually add to it.