natural history museum main hall london uk wikimediacommons diliff
© iStock/Devrimb

Natural History Museum CIO David Thomas is in the middle of a project to digitise 80 million specimens that document 4.5 billion years of life.

More than four million specimens gathered over the last 250 years are now available on the National History Museum Data Portal. The aim is to digitise 20 million samples within five years.

"Most museums have really very small collections, the majority of which are on display," Thomas tells CIO UK.

"At The Natural History Museum, we have 80 million specimens, of which 3,000 are on display, so being able to digitise the collection is fantastic for the public but also for scientific purposes."

Researchers study the collection to reveal the past, present and future of subjects that start at planet earth's geology and go up to the solar system.

Data from the samples has already been cited in more than 100 scientific papers covering fields including climate, biodiversity and human health.

Traditionally, researchers only had access to physical materials. Now, they can access "digital twins" that replicate the progenitor's characteristics and preserve them before they degrade.

"It's really important for the collections," says Thomas. "I was in discussions yesterday where we were looking at ancient DNA extracts, taking DNA from cheddar man and building a picture of cheddar man. That all comes from digitisation and data."

Digitisation process

Objects ranging from a parasitic louse collection of 70,667 slides, to a 548-year-old book are photographed or scanned to turn them into 3D images. Object recognition software can analyse the specimens to automatically categorise them by type.

The museum is also experimenting with using machine learning and text recognition to extract data from large-scale specimens and then label the images.

"These are very sensitive objects whatever they might be," says Thomas. "They might be microscopic insides, or it might be a blue whale. There are very different sizes and scales to handle and move, so digitisation in situ is really important to us."

The digisation saves research times and costs and also offers new methods of analysis, as one research team found by examining data on butterflies against more than a century of temperature records.

Their comparison showed that a warming climate causes British butterflies to emerge earlier, reducing their chances of survival as the plants that the caterpillars eat may not yet be available.

The digital museum

Museums across the country are suffering from declining visitors numbers. Last year, the Natural History Museum received 4.4 million visitors, 12% fewer than in 2016.

The concept of a "Digital Museum" could boost the numbers and unearth new revenue streams by creating innovative, personalised experiences.

Visitors could try an augmented reality experience hosted by Sir David Attenborough to virtually handle specimens, or receive mobile content that's optimised for their interests as the change.

The Natural History Museum partnered with Cisco to roll out wireless network infrastructure that supports location-based services and real-time analytics that can understand traffic movement and tweak the museum layout.

"When our front entrance was closed, for example, that meant we had to move five million visitors to get them into the building some other way," says Thomas.

"That is not a small thing to do. It's like a football size crowd every day. You don't know quite where they're going to turn up, and if it's raining it's going to be twice as big."

Open data to protect the natural world

The Natural History Museum is home to more than 300 scientists who publish over 700 research papers a year. They could have kept the precious scientific knowledge hidden in the museum's enormous collection for themselves, but instead they chose to share it with the world.

"If we took a very closed view of this, then we would never be able to explain probably what we've got and society wouldn't be able to benefit in quite the same way," says Thomas.

"That's where we essentially said we're going to be open by default. We are quite careful with some of our most commercialisable assets and our most sensitive assets and we license things like that under non-commercial agreements, but most things are available openly and people can see them. We think of the big picture for society."

By unlocking the data in these collections could help solve global challenges around food security, climate change and natural disasters.

"There was a volcanologist talking about an eruption in the 1920s and somebody had sampled the magma was being out of the original explosion every 20 minutes and just saved a bit of it," says Thomas.

"That's an explosive event holding data from the 1920s that's as modern today as it was back then because it can tell you how volcanos explode. If you don't make that available nobody knows how that happened."