Rush to save climate change data before Trump era

Scientists, librarians and digital historians from a growing number of universities have begun a crowdsourced effort to copy and archive thousands of federal government websites and data sets related to climate change, the environment and other areas of scientific research that they fear could become compromised or inaccessible under the incoming Trump administration.

[This is an article from The Chronicle of Higher Education, America’s leading higher education publication. It is presented here under an agreement with University World News.]

The movement, now known by its hashtag, #DataRefuge, took off in response to an essay in The Washington Post in which a meteorologist and journalist named Eric Holthaus described how he had begun "to systematically catalogue and preserve as much of the federal government’s publicly available climate-science data as possible in the next five weeks".

Holthaus said he was acting because President-elect Donald J Trump, who has called climate change a hoax perpetuated by the Chinese, was "relentlessly converting" his anti-science rhetoric into action through nominations for key posts in an administration that is "likely to be wilfully hostile toward the scientific process".

Holthaus posted an online form that allows scientists and others to suggest specific data sets and websites to be archived. The form automatically puts the suggested material onto a spreadsheet that is openly accessible. Volunteers can also use the form to "claim a data set to rescue".

"The spreadsheet went crazy," says Bethany Wiggin, a professor at the University of Pennsylvania who directs its programme in environmental humanities.

On Tuesday, the Penn programme, whose professors and graduate students had already been discussing an effort of this sort, agreed to take over management of the spreadsheet. With support from Penn Libraries, the programme will help coordinate and advance the fast-evolving project. Librarians and professors from several departments at the University of Toronto in Canada are also partners in the #DataRefuge effort.

On Saturday, the Toronto institution was to play host to an event it calls a "guerrilla archiving", during which professors will work with students and other volunteers on creating digital tools to improve the process of uploading, tagging and archiving government data sets, as well as getting started on saving some "vulnerable" data.

The Penn programme will host a similar event next month. Academics at the two institutions say they are also making connections with scholarly and technology organisations like the Society of American Archivists, the Coalition for Networked Information, and the Internet Archive for additional expertise and volunteer help.

The Chronicle spoke last week with Wiggin and with Patrick Keilty, an assistant professor at Toronto’s faculty of information, about the scope, challenges and rationale for the project. Here are some answers to key questions about their efforts.

Why do this? Is this just some kind of anti-Trump political statement?

Both professors say they’re undertaking the project in a better-safe-than-sorry vein.

"It’s not meant to be alarmist," says Wiggin, but given the hostility toward climate science and environmental regulations shown by some of Trump’s nominees, she says it’s prudent to take steps to ensure that vital information isn’t compromised.

The worst-case scenario, says Keilty, would be outright deletion. But "even if the data isn’t deleted on Day 1," he says, "it can slowly be made inaccessible by lack of maintenance."

Routine decisions, like federal departments’ remaking websites to reflect new priorities, can also mean "links go down" and key files go dead, he said. And for departments targeted for defunding, like the Environmental Protection Agency or EPA, he says new leaders there may not provide the labour or resources needed to maintain data.

"Access to government-funded research shouldn’t be a controversial issue. Evidence-based policy-making shouldn’t be a controversial issue," says Keilty. "It’s hard to create evidence-based policies if the evidence goes missing or becomes inaccessible."

Wiggin says the #DataRefuge project fits within the purview of the "strong public engagement component" of her programme, while also acknowledging that the effort could be seen as "a political intervention". But, she says, that doesn’t make it inappropriate. "Research happens in a social context. It doesn’t happen in a vacuum. Part of it is the political context."

Any significance to the fact that a Canadian university is involved?

Canadian academics are especially attuned to this issue, says Keilty, because in 2014, under former prime minister Stephen Harper, the government abruptly closed a number of well-respected natural science libraries, to the alarm of many scientists and environmentalists. Harper was also viewed as being hostile to scientists.

In the United States, Keilty added, the George W Bush administration sought over a period of several years to close down the EPA library system.

Wiggin, who teaches courses on censorship as well those on the social impacts of climate change, says projects like the archiving effort make sense "if you’re somebody who teaches that history provides lessons".

Since 2008, the Internet Archive has been working with partners to preserve federal websites and records as presidential administrations change, as part of the End of Term Web Archive. Doesn’t this duplicate that effort?

The Internet Archive uses tools known as web crawlers that can capture sites with URLs. Keilty says the #DataRefuge project will prioritise documents held in PDF files, Excel sheets, and other digital formats that web crawlers don’t pick up.

The effort is a giant lift, not just in terms of technology and digital storage space, but also manpower, data security and resources. How is that being handled?

Many of those questions are still being resolved. Project organisers say they’ll need secure storage for data measured in petabytes, and they’ll want to be sure the storage sites are dispersed. "More copies in more places is a better idea," says Wiggin.

Both professors say they’ve received offers from companies and other organisations for free and low-cost storage, as well as offers of financial support from people Wiggin calls "angel investors", but for now the professors have declined to name them.

Keilty says he’s also heard from professors, librarians and others from at least a dozen other universities with offers of help, but doesn’t yet know how many of them will formally become involved or be identified. With so many librarians, archivists and technologists showing interest, Wiggin says they are prepared to "really tap into a hugely active and knowledgeable community".

What about data sets in other fields – say, the social sciences – where academics might fear political interference?

"We staked out climate and environmental data," says Wiggin. But she adds that she wouldn’t be surprised if academics in other fields followed suit.

"I hope we inspire other people," she says. "We can’t download the internet."

Goldie Blumenstyk writes about the intersection of business and higher education. Check out www.goldieblumenstyk.com for information on her new book about the higher education crisis; follow her on Twitter @GoldieStandard; or email her at goldie@chronicle.com.