In March 2013, I read with great interest the results of the University of Cambridge analysis of some 58,000 Facebook profiles. The results predicted unpublished information like gender, sexual orientation, religious and political leanings of the profile owners.
In one of the biggest studies of its kind, scientists from the university’s psychometrics team developed algorithms that were 88 per cent accurate in predicting male sexual orientation, 95 per cent for race and 80 per cent for religion and political leanings. Personality types and emotional stability were also predicted with accuracy ranging from 62 to 75 per cent. The experiment was conducted over the course of several years through their MyPersonality website and Facebook Application. You can sample a limited version of the method for yourself at http://www.YouAreWhatYouLike.com.
Not surprisingly, Facebook declined to comment on the analysis, but I doubt this information is news to anyone at Facebook. In fact it’s just the tip of the iceberg. Facebook as a global technology company (arguably a global data company) has without a doubt, far more complex algorithms trawling, interrogating and manipulating its vast and disparate data warehouses, striving to give its demanding user base ever richer, more unique and distinctly customised experiences.
As an IT leader, I’d have had to be living under a rock to have missed the Big Data buzz. Vendors, analysts, well-intentioned executives and even my own staff – everyone seems to have a Big Data opinion lately, and most of those opinions imply that I should spend my budget on Big Data.
It’s been clear to me for some time that we are no longer in the age of ‘what’s possible’ when it comes to Big Data. Big Data is big business and the companies that can unlock, manipulate and utilise data and information to create compelling products and services for their consumers are going to win big in their respective industries.
Data flow around the world and through organisations is increasing exponentially and becoming highly complex; we’re dealing with greater and greater demands for storing, transmitting and processing it. But in my opinion, all that is secondary. What’s exciting is the developments to enable better customer service and bespoke consumer interactions that significantly increase value along all our service lines in a way that was simply not possible just a few years ago. This is what’s truly compelling. Big Data is just a means to an end, and I question whether we’re losing sight of that in the midst of all the hype.
Why do we want bigger or better data? What is our goal? What does success look like? How will we know if we have attained it? These are the important questions and I sometimes get concerned that – like so often before in IT – we’re rushing (or being pushed by both consultants and solution providers alike) to find solutions, tools and products before we really understand the broader value proposition.
Let’s not be a solution in search of a problem. We’ve been down that supply-centric road too many times before.
For me it’s simple – innovation starts with demand, and demand is the force that drives innovation. It all starts with a problem that needs solving, a value experience for our customers. Only through a deep understanding of what value means to the customer can we truly be effective in searching out solutions. This understanding requires an open mind and the innovative resolve to challenge the conventions of ‘How we’ve always done it’.
Candidly I hate the term Big Data. It is marketing verbiage coined by Gartner which covers a broad ecosystem of problems, tools, techniques, products, and solutions. If someone suggests you have a Big Data problem, that doesn’t say much as arguably any company operating at scale, in any industry, will have some sort of challenge with data. But beyond tagging all these challenges with the term ‘Big Data’, you’ll find little in common across diverse industries, products or services.
Given this diversity across industry and within organisations, how do we construct anything resembling a Big Data strategy?
We have to stop thinking about the supply of Big Data tools, techniques and products peddled by armies of over-eager consultants and solution providers. For me technology simply enables a business proposition. We need to look upstream, to the demand. This demand presents itself in business terms. For example in financial services you might look at:
• Who are our most profitable customers and, most importantly, why?
• How do we increase customer satisfaction and drive brand loyalty?
• How do we take excess and overbearing processes out of our supply chain and speed up time to market?
• How do we reduce losses to fraud without increasing compliance and control costs?
Importantly, asking these questions may or may not lead us down a Big Data road. But we have to start by asking them.
Similarly, the next set of questions are also not about specific solutions but about framing the demand and the potential solutions:
• How do we understand the problem today? How is it measured? What would improvement look like?
• What works in our current approach, in terms of the business results? What doesn’t? Why? What needs to improve?
• What are the technical limitations in our current platforms? Have new techniques and tools emerged that directly address our current shortcomings?
• Can we develop an experimental approach to test these new techniques, so that they truly can deliver an improvement?
• Having conducted the experiment, what did we learn? What should we abandon, and what should we move forward with?
Cycle of success
There’s a system to this. Once we go through the above process, we start the cycle over. In a nutshell, it’s the process of continuous improvement. Some of you will recognise the well-known cycle of Plan, Do, Check, Act (PDCA) in the above.
Continuous improvement and PDCA are interesting, in that they are essentially scientific methods applied to business. One of the notable components of the Big Data movement is the emerging role of the data scientist.
The data scientist can help you assess your options, walk you through the process of defining your business problem and help solve them through innovative analytics.
But what consitutes a data scientist? It’s not a well-defined position, but here would be an ideal candidate:
• Hands-on experience with building and using large and complex databases (relational and non-relational), and in the fields of data architecture and information management more broadly;
• Solid applied statistical training, grounded in a broader context of mathematical modeling;
• Exposure to continuous improvement disciplines and industrial theory;
• Most importantly, real world operational experience and a functional understanding of whichever industry is paying their salary – theory is valuable, but scar tissue from experience is essential.
This person should be able to model data, translate that model into a physical schema, load that schema from sources, and write queries against it. But that’s just the start, as one semester of introductory stats isn’t enough. They need to know what tools to use and when, and the limits and trade-offs of those tools. They need to be rigorous in their understanding and communication of confidence levels in their models and findings, and cautious of the inferences they draw.
Some of the data scientist’s core skills are transferable, especially at the entry level. But at higher levels, they need to specialise. Vertical industry problems are rich, challenging, and deep. For example, an expert in call centre analytics would most certainly struggle to develop comparable skills in supply chain optimisation or workforce management.
Ultimately, data scientists need to be experimentalists – true scientists with an unresolvable sense of curiosity engaged in a quest for knowledge on behalf of their company or organisation. They should be engaged in a continuous cycle of examining the current reality, developing and testing hypotheses and delivering positive results for broad implementation so the cycle can begin again.
Across the board
There are many sectors we can apply Big Data techniques to, with financial services, manufacturing, retail and energy among them. There are also common functional domains across the sectors: human resources, customer service, corporate finance, and even IT itself.
IT is particularly interesting as it’s the largest consumer of capital in most enterprises. IT represents a set of complex concerns that are not well understood in many enterprises: projects, vendors, assets, skilled staff and intricate computing environments. All these come together to (hopefully) deliver critical and continuous value in the form of agile, stable and available IT services for internal business stakeholders, and most importantly external customers.
Given the criticality of IT, it’s surprising how poorly managed IT is in terms of data and measurement. Does IT represent a Big Data domain? Absolutely. From the variety of IT deliverables and artefacts and inventories, to the velocity of IT events feeding management consoles, to the volume of archived IT logs, IT itself is challenged by Big Data.
IT is a microcosm of many business models. We in IT don’t do ourselves any favours starting from a supply perspective here, either. IT’s legitimate business questions include:
• Are we getting the IT we’re paying for?
• Do we have unintentional redundancy in what we’re buying?
• Are we paying for services not delivered?
• Why did that high-severity incident occur and can we begin to predict incidents?
• How agile are our systems?
• How stable, and how available?
• Is there a trade-off between agility, stability and availability?
• How can we increase all three of the above?
With the money spent on IT and its operational criticality, data scientists can deliver value here as well. The method is the same: understand the current situation, develop and test new ideas, implement the ones that work and watch results over time as they are input into the next round.
For example, the IT organisation might be challenged by a business problem of poor stakeholder trust due to real or perceived inaccuracies in IT cost recovery. In turn, it is then determined that these inaccuracies stem from poor data quality for the IT assets on which cost recovery is based.
Organisations need to know what confidence a model merits, and if data quality cannot be improved, a model remains more uncertain. But often, the quality can be improved. Asking why – perhaps repeatedly – may uncover key information that assists in turn with developing working and testable hypotheses for how to improve. Perhaps adopting master data management techniques pioneered for customer and product data will assist. Perhaps measuring the IT asset data quality trends over time is essential to improvement – people tend to focus on what is being measured and called out in a consistent way. Ultimately, this line of inquiry might result in the acquisition of a toolset like Blazent, which provides IT analytics & data quality solutions enabling a true end-to-end view of the IT ecosystem. Blazent is a toolset we’ve deployed at Barclays to great effect.
Similarly, a data scientist schooled in data management techniques and with an experimental, continuous improvement orientation might look at an organisation’s recurring problems in diagnosing and fixing major incidents and recommend that analytics be deployed against the terabytes of logs accumulating every day, both to improve root cause analysis and ultimately to proactively predict outage scenarios based on previous outage patterns. Vendors like Splunk and Prelert might be brought into assist with this problem at the systems management level. SAS has worked with text analytics across incident reports in safety-critical industries to identify recurring patterns of issues.
An end in sight
It all starts with business benefit and value. The Big Data journey must begin with the end in mind, and not rush to purchase vehicles before the terrain and destination is known.
A data scientist, or at least someone operating with a continuous improvement mindset that will champion this cause, is an essential component. So rather than just talking about Big Data, let’s talk about ‘demand-driven data science’. If we take that as our rallying cry and driving vision, we’ll go much further in delivering compelling, demonstrable and sustainable value in the end. ?
About the author:
Anthony Watson is Managing Director and CIO of Europe Middle East Retail & Business Banking at Barclays