Tearing down barriers to global research data sharing

One of the science world’s fastest-growing movements is the Research Data Alliance, which was launched just two years ago to reduce barriers to data sharing and accelerate development of a coordinated global data infrastructure.

Now it has 2,500 members in 92 countries, interesting outcomes and is “an anarchic groundswell of people – but good anarchy”, says Professor John Wood, secretary general of the Association of Commonwealth Universities.

Wood co-chairs the council of the Research Data Alliance, or RDA, which in its own words is focused on building “the social and technical bridges that enable open sharing of data”. There is also a technical advisory board, a secretariat and numerous working groups and interest groups.

“But most of this is going on without any of our knowledge in the sense that these groups just meet. We have six-monthly plenaries and each group can ask for a three-hour slot and we have 15 break-out rooms and we can only just fit them all in in three days,” Wood told University World News.

He was leading an international conference held in Johannesburg from 10-14 May, co-hosted by the Association of Commonwealth Universities and SARIMA – the Southern African Research and Innovation Management Association – under the theme of “Research and Innovation for Global Challenges”.

An unusual organisation

There have been five plenaries so far, and their locations have been as international as the Research Data Alliance’s members – the first was in Gothenburg, Sweden, in March 2013, the second in Washington DC, the third in Dublin in March 2014, the fourth in Amsterdam, and in March this year the fifth was in San Diego.

These are organisations and individuals involved from all around the world, and groups are often based around networks that exist already.

So for instance there are links with Elixir, which is building an infrastructure for biological information supporting life science research. Their data sets will be similar in size to those of the huge Square Kilometre Array radio telescope, or SKA, a massive global research project being based in South Africa and Australia.

Alliance members are making data sets more operable, and also operable with other data sets in other disciplines – “that’s the key, because that is where you get the real value,” says Wood.

The RDA’s working groups produce outputs that go through the technical board, which investigates the work to ascertain its potential to deliver. The alliance tries to connect really good innovations to funders and funding through grants.

“We never want to see a report that sounds all very well but nobody does anything about. We want something that works. You’ve got to show that somebody, somewhere is using it. Otherwise we just say get lost,” Wood continues.

There have been numerous outputs – which already have client users – from the action-focused movement, such as data models and defined terminology, a federation between data type registries, persistent identifier information types and example policy sets.

In the pipeline are a set of machine actionable rules to enhance trust, a metadata standards directory, a dynamic-data citation methodology and a unified repository certification scheme to reduce confusion and improve trust.

“Sometimes I’m tempted, because I’m a control freak at the top, to try to organise – but every time you do that there is a breakout somewhere.

“The nice thing is that the funders like that, and they can justify because they’re always worried about whether stakeholders are really involved – and here you’ve got a self-generating movement that just gets going,” continues Wood.

What’s up

Currently the Research Data Alliance is talking about certification of data stewardship plans, and is looking for organisations to do that. In the United Kingdom, for example, there is an open data institute that is starting to certify public data sets.

It is currently in consultation about data stewardship with the European Commission and several countries, which are starting to think about mandating. “You can’t get a grant unless you have a data stewardship plan. And that plan should be able to be interrogated,” Wood says.

“Another thing we’re looking at too, which is quite useful, is what is called the data fabric. This thing started and is pretty much led – without being rude – by ‘computer nerds’. There are acronyms going everywhere and I haven’t a clue what most of them mean,” says Wood.

“So we’ve now built four quadrants. In the main quadrant we have the computing people who are absolutely necessary for this, developing protocols and software.”

The quadrants have the following common terms: education, engagement, bridging and community; interoperability, harmonisation, integration and metadata; repository, fabric, analytics, identity and management; and governance, certification, cost recovery and legal.

There are more and more applications for products coming out of the working groups. In San Diego in March, there was a day-long discussion about the products and who is using them.

“The one I’m most familiar with, because it is in my field, is the total ability to go from quantum level in atoms right through to materials that are used in engineering structures, using data that’s available. It’s always been my dream to do that,” says Wood.

The sixth plenary is in Paris this September and will focus on industrial uptake and what it is going to do about jobs. “That is where our funders want us to go as well,” Wood explains. “There are jobs in the data field, but what are the other jobs?”

One of the big groups is looking at taking phenotyping data and weather systems, earth, geophysical and other data and putting it all together to develop new strands of wheat for arid regions. “You suddenly see that this is about feeding the world.” There are similar things happening in the marine field and growing activity around the environment and ecosystems.

There’s a group in South Africa looking at paleontology, doing X-ray tomography on fossils, and it is incredibly data intensive. “What on earth is the point of doing that?” Wood asks.

“Well, it’s interesting obviously, but they can now start to see how the brain developed over the eons and that’s being made use of by people doing MRI studies on language learning in children and looking at the plasticity of the brain and how education, in certain areas but specifically languages, will develop. The data sets are starting to approach those of the SKA.

“I could go on, there are loads of these things, all at different stages. It depends on groups within those research areas getting together.”

A matter of trust

“But you have to build trust – one of the areas we look at very much is trust. What is trust in the cloud, what does it mean? How do I know you are you? Personal identifiers are becoming more and more important. You also have checks and balances on whether the data is not corrupted – corruption of data is a huge issue,” says Wood.

“But also it’s a young people’s movement. I feel very ancient when I go to these meetings, and it’s fascinating. We have several people who are connected with the origins of the internet, at the top end, who say hands off, let it happen. So that’s exciting.

“My only fear is that this whole thing could fall apart if regulators get their hands on the open science movement and free use. If the whole thing about research management is protecting information, then we’re lost.

“There’s hardly any of us in the world who actually make profit out of technology transfer.”