GlaxoSmithKline Chief Data Officer Mark Ramsey's career in data has spanned three decades, but his current role may have delivered his biggest achievement yet: an analytics platform that could cut the time taken to discover a new medicine from six to eight years down to 24 months. [See also: Chief Data Officer salary and job description - What's the role of the CDO and how much does a Chief Data Officer get paid?]
"We've created a centralised repository to support analytics across the R&D organisations," he told CIO UK at the DataIQ 100 awards at the Oxo Tower in London.
"It's bringing all of the R&D data together into a centralised platform, and then using that platform to rationalise and curate the data to make it available for scientists and researchers to progress a variety of areas."
The platform uses large-scale data analytics to drive better decisions about the drug discovery pipeline, by allowing the pharmaceuticals giant to test the potential for new drugs before it begins clinical trials.
It provides a single place for all the scientific and operational R&D data generated by GSK or external sources in an analytics-ready form for the company's wide range of users, from scientists to executives.
Breaking down barriers
The information from one trial can be highly useful for another in a different clinical area, but R&D data is typically stored in siloed databases and in a variety of standards and formats.
The platform breaks down these barriers so anyone from GSK can access knowledge from across the business.
"Clinical trials are done horizontally in order to support a new medicine," said Ramsey, who reports to GSK's R&D President.
"We've brought all of those trials together and we've standardised them into an industry ontology. Now the scientists can look across our entire repository of clinical trials and help make decisions on how to improve future trials and also look at that repository of data for other learnings that might be by-products of the original intent."
They now have access to all the historical medical information owned by a company that began life as a London pharmacy in 1715.
Big changes in analytics
Ramsey's own career in analytics spans more than 30 years at some of the world's biggest companies.
Prior to joining GSK, he was the first Chief Data Officer for Samsung Mobile, where he built a data science team that developed the first consumer repository and analytics to support the firm's marketing efforts and a consumer insights programme to better understand user behaviour and product feature usage.
Before that, he spent 18 years in a variety of roles at IBM and 10 years as a technical consultant at Nationwide Insurance, helping clients to build data warehouse systems to develop a deeper understanding of their businesses.
His strategy is to achieve innovation at scale with data analytics, an approach that led him to be named an IBM Master Inventor after more than 50 patent filings.
These patents include one on the integration of data mining within a parallel relational database and another on the use of advanced analytics for user behaviour monitoring.
He has leveraged his experiences from this broad range of industries to lead a transformational project that uses AI to create a new clinical working environment built on data.
"Traditional pharmaceutical companies have used a significant amount of statistics to prove or disprove the hypotheses around a clinical trial or experiment," Ramsey said.
"They're now moving into machine learning, deep learning and artificial intelligence to really significantly accelerate some of the decision making that can be made. That's really what will be fuelling the transformation in the industry."
The GSK platform uses a variety of machine learning tools, including TensorFlow, Google's open source AI software library, and Tamr, which turns isolated information into unified insights.
These machine learning technologies help GSK to quickly standardise and annotate information from documents such as study protocols.
They are among two dozen tools in the platform, which is built on Cloudera Hadoop and also contains StreamSets to move data from source to the platform and TIBCO Spotfire to turn data into visual analytics.
The platform uses these technologies to create a new model for sharing and understanding clinical information.
"Pharmaceuticals has been very deep in analytics for quite some time but it's been vertically done, so each business function was really deep in the way they did analytics," Ramsey said.
"This is stepping back and allowing that information to be shared across the organisation."
Making a medical impact
The platform has only been in place for 18 months, but Ramsey can already see the benefits of making data available to scientists much faster.
"There might be an analysis that in the past took 18 months to do, because one of the first steps was to just basically bring the data together," he said.
"Now that we've brought the data together, that same analysis can be done in a few clicks that take minutes instead of months."
One of the areas of medicine where the company is focusing is in genetics. The company is collaborating with the UK Biobank to improve the prevention, diagnosis and treatment of a wide range of diseases by studying the role of genetic predisposition in 500,000 volunteers.
GSK is currently conducting full gene sequencing on all of these patients, which Ramsey says will give the company an enormous repository of health and genetic information that it can analyse to better understand disease characteristics.
"One of the things that have been learned from that is that if you have genetic evidence about a potential new medicine, it doubles the chances of that medicine becoming a new product in the future," Ramsey said.
"Pharmaceuticals is a high-risk environment, where many of the things don't progress, so doubling the likelihood has a phenomenal impact on the outcomes."
The future of drug discovery
The platform has so far shown just a glimpse of its potential.
In October 2017, GSK announced that it has joined a consortium called Accelerating Therapeutics for Opportunities in Medicine (ATOM), which aims to combine data, biotechnologies and supercomputing to cut the time it takes to discover a new drug from six to eight years to two.
"Our goal over the next two years is to collapse that to 24 months, using computer simulation and a lot of analytic capabilities," Ramsey said.
"That's why I'm saying it's a step change in the industry. It's really thinking about how to tackle some of these problems in a very different way."
Ramsey believes that communication is crucial if companies are to get the maximum value from their data.
He believes this has become easier in pharmaceuticals due to the growing prominence of data in academia since he earned his own degrees: a BA in Computer Science and Business Administration, an MBA in Computer and Information Security and a PhD in Applied Computer Science.
"The new folks that are coming out of universities have a data background and a data passion," he said.
"There's a whole new wave of folks coming into the business and that data passion will also help change the industry."