Researchers in Europe falling behind in the use of text and data mining

02 Jun 2016 | News
Copyright restrictions and an inflexible publisher licensing system are making European researchers hesitant to attempt analyses of large and disparate databases, according to Lisbon Council report

Asia has replaced the EU as the world’s leading hub for academic research on text and data mining, according to findings presented this week by the Lisbon Council think tank.

The EU’s global share of publications in the field has fallen to 28.2 per cent, down from 38.9 per cent in 2000, while Asia’s share of academic publications has risen to 32.4 per cent of all global publications, up from 31.1 per cent in 2000.

Chinese and US researchers are also racing ahead to patent text and data mining systems, although it should be noted there are no comparable EU data because software is not patentable in Europe.

The Lisbon Council uses the metric of academic research on the subject of text and data mining as a proxy for actual use of text and data mining, suggesting Europe is lagging because of continued widespread uncertainty about its legality.

“The European ecosystem for engaging in text and data mining remains highly problematic, with researchers hesitant to perform valuable analysis that may or may not be legal,” according to the research.

Researchers “face a maze of restrictions which generate confusion and undermine the self-confidence of the research community” and as a result researchers based in Europe are forced on occasion to outsource their text-and-data-mining work to elsewhere in the world.

“We hear stories of university and research bureaux deliberately adding researchers in North America or Asia to consortia because those researchers will be able to do basic text and data mining so much more easily than in the EU,” the report says.

Text and data mining allows researchers to spot and summarise relationships in academic papers, and to analyse and uncover patterns and relationships in diverse databases where connections were previously not easy to establish.

The right to do this kind of research is more constrained in Europe than in many other parts of the world, the Lisbon Council says. The UK is the only country that has exempted automated computer crawling from copyright law.

Restrictive licencing system

To use text and data mining techniques, a European researcher will often require the permission of the copyright owner, most often, an academic publisher. Companies such as Elsevier and Springer offer licenses for this, provided the research is non-commercial.

Researchers may text mine the databases of 400 publishers through the Crossref platform, which allows a single log-in to access journals. However, the report says, the value of these licences should “not be overstated” and the system falls well short of providing a norm binding all publishers to a common approach upon which researchers can rely.

“Even where mining rights are available, they involve complex and varied restrictions, often requiring use of a publisher-controlled portal, where researchers must register as a developer.”

Another barrier is that many smaller publishers do not yet offer access of this type.

Copyright law is clearer elsewhere in the world. For example, in the US, Israel, Korea, Singapore and Taiwan, copyright law allows a fair-use defence against a charge of copyright infringement.

EU push to alter the rules

There is support in the European Commission to overturn restrictive rules on text and data mining, with EU Research Commissioner Carlos Moedas in particular pushing for a right for researchers to mine papers unhindered.

In December, the Commission issued a legislative proposal which would clear the way for public interest research organisations to carry out text and data mining for scientific research purposes.

Researchers and publishers responded by asking for clarity on the proposal’s scope, which the Commission has yet to spell out fully.

Never miss an update from Science|Business:   Newsletter sign-up