GESIS Leibniz Institute for the Social Sciences: Go to homepage

Open Mining Infrastructure for Text and Data (OpenMinTeD)



Abstract

Recent years have witnessed an unparalleled upsurge in the quantities

of digital data, with their volume doubling every three years. In the

world of science, researchers worldwide generate over 1.5 million

publications on an annual basis. While, undoubtedly, these vast amounts

of new data and information can offer new insights, give rise to new

opportunities for analytics and improved understanding, it is equally

undoubted that reading and analysing them is beyond human capacities.

Text and data mining is emerging as a powerful tool for harnessing

the power of and discovering value in data, by analysing structured and

unstructured datasets and content at multiple levels and in many

different dimensions in order to discover concepts and entities in the

world, patterns they may follow and relations they engage in, and on

this basis annotate, index, classify and visualise such content.

Scientific publications as a whole cover a range of respective

scientific areas, each with its own terminology, conventions and way of

using language to express and communicate knowledge, let alone the fact

that they are rendered in different natural languages and comply with

varying access rights and/or restrictions. In the same vein, text mining

tools and platforms have been built either for mining linguistically

generic text or focusing on different domains and languages, each, more

or less, with its own technical and linguistic specifications. Text

mining tools have in the last decade been integrated in text mining

platforms, thus ensuring a level of interoperability between tools and

components within the same platform, while initiatives from

cross-platform interoperability have been launched in the recent years.

However, both text mining tools and integrated platforms are not easily

discoverable by end users (researchers, curators, librarians, policy

makers, etc), while they are also being documented in various ways

making searching and discovering them a challenging task.

OpenMinTeD aspires to enable the creation of an infrastructure that

fosters and facilitates the use of text and data mining technologies in

the scientific publications world and beyond, by both application domain

users and text-mining experts. It builds upon existing text mining
tools, workflows and platforms and renders them discoverable, through appropriate registries,
and

interoperable, through an existing standards-based, to the extent

possible, interoperability layer. It supports awareness of the benefits

and training of the text mining users and developers alike and

demonstrates the merits of the approach through a number of use cases

identified by scholars and experts from different scientific areas,

ranging from generic scholarly communication to life sciences

(bioinformatics, biochemistry, etc) to food and agriculture and social

sciences and humanities related literature.

GESIS will provide the social sciences use cases requirements,

evaluate the corresponding implementation and participate in the

interoperability framework specifications working groups. Furthermore,

GESIS will be actively involved in community engagement and training

activities.



Runtime
01.06.2015 – 31.05.2018