As researchers turn to Google, libraries navigate discovery tools

Many professors and students gravitate to Google as a gateway to research. Libraries want to offer them a comparably simple and broad experience for searching academic content. As a result, a major change is under way in how libraries organise information. Instead of bewildering users with a bevy of specialised databases - books here, articles there - many libraries are bulldozing their digital silos. They now offer one-stop search boxes that comb entire collections, Google style.

[This is an article from The Chronicle of Higher Education, America's leading higher education publication. It is presented here under an agreement with University World News.]

That's the ideal, anyway. The reality is turning out to be messier.

The rise of these 'discovery' tools, which mine giant indexes of aggregated content, is generating new tensions. Because some companies that make the search tools are also in the content business, selling article databases and other material to libraries, one fear is that firms could favour their own content in results.

Another is that discovery software, by sluicing content together, could deluge users with less appropriate resources. Either way, they could miss relevant articles.

Discovery tools have fed a broad and sometimes bitter debate within the library world.

Last year, for example, one library consortium, the Orbis Cascade Alliance, grew so frustrated with the lack of cooperation between two major vendors in the discovery business, Ex Libris Group and Ebsco Information Services, that it issued open letters urging the companies to "bring this nonsense to an end".

Promising signs are emerging, however, including Ebsco's recent announcement of a new data-sharing policy that the company calls "a huge advancement in cooperation".

Innovative technology market

Controversy aside, library patrons are reaping the benefits of what has become a vibrant and innovative technology market.

Those patrons can now get library search results tailored to their interests. They can search across a sea of curated academic content, not just the limited pond of one library's holdings. They can also use the software to explore features that go beyond just search results, including topic-focused research guides and names of campus librarians who can help them further investigate a subject.

The big question is how these emerging tools are influencing research. Scholars have begun several studies to find out.

The work is important because "unlike almost anything that libraries have done before", the rollout of one-stop search tools is "really intentionally trying to change the way people do research", says Michael Levine-Clark, associate dean for scholarly communication and collections services at the University of Denver Libraries.

"That's bound to change what people find."

Levine-Clark and two collaborators - John McDonald of the University of Southern California and Jason Price of the Statewide California Electronic Library Consortium - have studied how adoption of a discovery tool changes the use of articles from publisher-hosted online journals.

Based on data from 33 libraries and 8,765 journals from six major publishers, their analysis showed "an overall increase in usage for the entire set of journals in the year after implementation, though the extent of change varied by discovery service and publisher."

But what are people finding?

That's at the heart of a separate study by Andrew Asher, assessment librarian at Indiana University at Bloomington. Asher, an anthropologist by training, gained notice for previous work on a five-university study of the student research process, which ran from 2008 to 2010 and used ethnographic methods to closely observe students' habits.

In 2011, he began a fresh experiment to figure out how undergraduates use the new library search tools and how they stack up against Google.

The results, published last year in a College & Research Libraries paper written with Lynda M Duke and Suzanne Wilson of Illinois Wesleyan University, shed some light on these sometimes-opaque products, along with the bias issues that have dogged them.

Built-in bias?

The study divided undergraduates from two universities, Bucknell, in Pennsylvania, and Illinois Wesleyan, into test groups.

The groups were assigned different search systems: Ebsco Discovery Service; Summon, from ProQuest; Google Scholar; and conventional library catalogue and periodical databases. Students were instructed to find resources they would use to complete various assignments. Librarians rated their choices.

To appreciate what Asher and his co-authors found, it helps to understand how discovery tools work. Libraries make large investments in different kinds of content, such as their subscriptions to databases of scholarly articles, or the books that fill their local catalogues.

The new breed of search software hinges on building "a very large, consolidated index that represents all of those things", says Marshall Breeding, a consultant who specialises in library technology.

Vendors of discovery tools will make deals with providers that sell content to libraries, he says, so that content can be represented in the discovery tools' indexes and made available for search. (Beyond products from Ebsco and ProQuest, other major tools in this genre, known as 'web scale' or 'index based' discovery, include Primo, from Ex Libris, and WorldCat Discovery Services from OCLC.)

Vendors describe their discovery tools as unbiased arbiters of information. Ebsco, for example, sells both search software and content, as does ProQuest.

Asked whether Ebsco favours its own content in the results generated by its search tool Sam Brooks, executive vice president for sales and marketing, dismissed the idea as "competitor-driven propaganda". He added: "There's no truth to that whatsoever." Bias toward a content provider, he says, "would be commercial suicide for any discovery vendor".

Brooks points out, however, that Ebsco makes design choices about article relevance that may seem like bias, yet actually have nothing to do with content providers. For example, a university that uses Ebsco's search tool, he says, will find that "a two-sentence news blurb will lose to a four-page, peer-reviewed article".

Asher's experiment discovered that default settings of the tools had a major effect on what resources students chose.

Working with Google Scholar, which is integrated with Google Books, students used more books. With Summon, they used a lot of shorter newspaper and magazine articles. With Ebsco Discovery Service, they used more journals, which meant they scored highest under the study's rating rubric. (In a blog post responding to Asher's study, ProQuest said the methodology "inadvertently penalised" Summon, its product.)

Asher believes that "it's a logical impossibility to create a querying tool that doesn't have any form of bias". He speculates that discovery vendors may have better information about their own content, boosting certain articles higher in results.

After Bucknell adopted Summon, Asher's study notes, the university saw significant increases in use of newspaper databases, including a jump of more than 700% for the ProQuest-owned Ethnic NewsWatch.

'Content Neutrality'

In this competitive market - where rival players angle to sell discovery tools, content databases, and in some cases both - other complications can arise when competitors refuse to 'play nice' with each other.

Say, for example, a library buys a discovery tool from one vendor and a content database from another. If the database vendor declines to share information about its content with the discovery-tool provider, that content may fail to appear in the discovery tool's search results - even though the library pays for both products.

It can be difficult for librarians to evaluate which discovery tools cover which content, and how well.

"If I subscribe to something, I want my users to be able to find it regardless of what discovery system I choose," says Laura Morse, director of library systems for Library Technology Services at Harvard University, which picked the Ex Libris search tool.

Morse belongs to a group, the Open Discovery Initiative, which unites various players to promote transparency and best practices for discovery tools.

In the emerging conversations around this topic, one buzz phrase is 'content neutrality'. The idea resembles 'net neutrality', the notion that network operators shouldn't block or favour certain content.

Proponents of content neutrality argue that discovery providers should have equal access to the information needed to surface content in the search tools' results. That information includes things like the full text of journal articles and the 'metadata' that describe the articles, such as the author, subject, journal title and publication date.

The ice has cracked a bit. In January, two competitors, ProQuest and Ex Libris, announced a data-sharing deal. Ebsco, meanwhile, points to its new metadata-sharing policy, as well as partnerships with three other discovery providers, Innovative Interfaces, SirsiDynix, and OCLC.

How much all this will matter is debatable. Only about 20% of faculty members begin research at their libraries' online catalogues, according to a 2012 survey by Ithaka S+R. And while undergraduates, in particular, enjoy the new one-stop discovery tools, others emphasise that specialised databases remain important for serious scholarship.

Meanwhile, the competition for student and faculty attention has only intensified since 2004, when Google's "simple way to broadly search for scholarly literature" made its debut. That free service, called Google Scholar, has many fans in academe.

One of those is Asher, the Indiana University librarian who studies discovery software. He appreciates the 'cited by' feature on Google Scholar, which lets you trace how an article is used. "It's faster," he says of the product, "and I'm just used to it".

Asher is familiar with the criticisms of Google Scholar. After all, his own study listed them: "Limited advanced search functionality, incomplete or inaccurate metadata, inflated citation counts, lack of usage statistics, and inconsistent coverage across disciplines." Perhaps for this reason, he sounded a bit sheepish admitting his preference.

"I kind of hate to say it, since I am a librarian," he says. "We pay a lot of money for discovery tools. And then I go off and just use Google Scholar."