Much of the current interest in 'Big Data' has been focused on analysis of the data in the corporate world, such as web advertising data, or sensor data gathered by devices in such things as cars, planes and industrial equipment.
The opportunities here are certainly intriguing. Ford’s Energi hybrid cars generate 25GB of sensor data an hour, which is returned to the factory for analysis. One benefit is that sensor data can predict when an engine is running poorly, and send an alert to request that the car be serviced. Aviva offer cheaper car insurance to drivers that agree to have a mobile phone app that uses GPS to report on how a driver brakes, accelerates and corners. The better the driver score based on the data, the bigger the discount offered.
However a less talked about area of Big Data may be every bit as interesting. Governments gather immense amounts of data about their citizens and countries, and are just beginning to open up some of this to the public. One key aspect to such data is not just whether it is free (or at least available at a negligible fee) and how freely it may be used, but how machine-readable it is – getting a table of data in a PDF is a lot less useful than seeing it directly in Excel, for example. GPS data, weather data and census data have long been examples of public data that governments have chosen to share freely, but this range is being extended in some countries.
The value that such data can have can be seen in the recent acquisition of The Climate Corporation, who collect weather crop and soil data, for almost a billion dollars by Monsanto. In Singapore the government publishes real-time data on available parking locations, allowing drivers to find spaces quicker and so reducing congestion. In Sweden the state transportation agency publishes real-time data on train arrival and departure times, allowing shippers and the public to choose optimal routes.
In an October 2013 report Open Data, The McKinsey Global Institute identified over $3 trillion in potential economic value from open data across the industrial sectors of education, transportation, consumer products, electricity, oil and gas, healthcare and consumer finance. The potential benefits come from new and improved products and services, efficiency and price transparency. Open data initiatives are in place now in 40 countries, though clearly there is a wide range of scope here. The UK government now publishes 10,000 public datasets, a quadrupling in the last four years.
There are many types of open data being used. In 2010 the Haiti earthquake spurred volunteers to combine satellite maps, mobile phone data and other sources to create online maps allowing aid workers to locate refugee camps and medical centres and to better plan for the demand on resources.
One area that McKinsey highlight is the potential for improved productivity in education. Personalised learning plans use online lessons that monitor how quickly a student progresses, how many times they need hints or retake questions. Teachers have this data available and can help students that are struggling. An early use of this in a remedial mathematics class in Arizona found improved pass rates and lower dropout rates than conventional approaches.
One of the most interesting areas in is healthcare. Although it is early days, there are consumer applications on mobile phones and other devices such as wristbands that can monitor exercise and sleep patterns – irregular sleep patterns can be an indicator of anxiety attacks in some patients, for example. Simply having electronic medical records available to doctors should be able to improve adverse drug reactions, and there is plenty of potential to improve here: even in the USA, only half of physicians are using electronic patient records. Clearly there will be all sorts of issues regarding patient privacy, but even aggregate data can be extremely useful.
Google has demonstrated that a flu outbreak can be mapped in real-time by monitoring the locations and frequency of searches that consumers make on the subject. One company has fitted a GPS tracker to the inhalers of asthma sufferers, allowing patients to be warned of conditions in the area, such as high pollen counts, that could trigger an asthma attack.
As governments slowly open up more datasets to the public, there will be more and more opportunities for new start-up companies to produce innovative applications based on this data and create value. At this point it is unclear who will be the winners here. The companies that will make the best use of this data and capture the public imagination have probably not been set up yet. However there is a clear trend towards greater opening up of public data, combined with more scalable and affordable ways to process it via Hadoop technology and cloud-based storage. The possibilities are just starting to unfold.