Value-Added Services for Information Retrieval (IRM)

Retrieval-Mehrwertdienste zur Weiterentwicklung wissenschaftlicher Fachportale wie vascoda und sowiport. Suchexpandierung und Re-Ranking

Bearbeitung: Dr. Philipp Mayr, Peter Mutschke, Philipp Schaer
Leitung: Prof. Dr. York Sure
Wissenschaftlicher Arbeitsbereich: Knowledge Technologies for the Social Sciences/Wissenstechnologien für Sozialwissenschaften (WTS)

Projektbeschreibung

The follow-up project IRM II has been launched in May 2011. Please visit the project page.

In the area of academic information systems, a whole array of bibliographic databases, disciplinary internet portals, institutional repositories or archival and other media type collections are increasingly accumulated and embedded in all-encompassing information systems in order to meet user requirements that demand one-stop "information fulfilment". Examples are Elsevier’s Scirus portal, the OCLC Worldcat union catalog or Tuft University’s Perseus project.

In Germany, the general science portal “vascoda” merges structured, high-quality information collections from more than 40 providers on the basis of search engine technology (FAST enterprise Search) and a concept which treats semantic heterogeneity between different controlled vocabularies (see project KoMoHe).

First experiences with the portal show some weaknesses of this approach which are typical for most metadata-driven Digital Libraries (DLs): standard search and ranking models such as term frequency - inverse document frequency (tf-idf), best match models and especially recent web-based ranking methods implemented in search engines (originally for web pages) are not always appropriate for search in heterogeneously collected scholarly metadata documents. Search, both in full-text collections like the Internet or more heavily structured and less diverse collections like institutional repositories, indexing databases or library catalogues, only works as well as the matching between the language in queries and the language in the searched documents. If the words in the query are different from the words in a relevant document, this document will not be found. Moreover, pure term frequency based rankings often provide results that doesn’t meet user needs, as first retrieval tests within the vascoda context have shown.

Objectives

The task of the project IRM is to introduce and evaluate value-added services (treatment of term vagueness and document re-ranking) for information retrieval within a heterogeneous DL environment (like sowiport). The methods, which will be implemented, focus on query construction and on result set re-ranking and are designed to positively influence each other. The goal of the project is to evaluate whether, and how far, search quality will be improved by applying the services under study.

We focus on the following three value-added services:

  • Search Term Recommender (STR, idea based on work at School of Information, University of California, Berkeley) for term vagueness treatment
  • Bradfordizing and author centrality in co-authorship networks which are derived from scientometrics and social network analysis for document re-ranking

Prototype

To start the interactive IRM Prototype just follow the link below or click the image.

 

Start the IRM Prototype.

Outlook

Beyond an isolated use, a combination of the approaches is promising to yield much higher innovation potential. The central impact of the project focuses on the integration of these three value-added services which aim at reducing the semantic complexity represented in distributed DLs at several stages in the information retrieval process: query construction, search and ranking and re-ranking.

We plan to evaluate the value-added services within a framework of different quantitative (e.g. IR tests) and qualitative (e.g. real user tests with the prototype) evaluation methods.

Future research work will address the use of information pertaining to institutions, themes or citations as a means of providing further value-adding functions in re-ranking methods, rather than just using authors and journals. An important further research issue is to apply and evaluate the proposed ranking methods at the user search stage in order to improve the precision of the initial result set.

Projektlaufzeit

01.01.2009 - 30.11.2010

Gefördert durch



Project number: INST 658/6-1

Partner

Projektpartner

Publikationen

Veröffentlichungen

Mayr, Philipp (2011): Bradfordizing als Re-Ranking-Ansatz in Literaturinformationssystemen, in: Information: Wissenschaft und Praxis , Vol. 62 , Nr. 1 , S. 3-10.

Mayr, Philipp; Schaer, Philipp & Mutschke, Peter (2011): A Science Model Driven Retrieval Prototype, in: Proceedings for Concepts in Context: Cologne Conference on Interoperability and Semantics in Knowledge Organization.

Mayr, Philipp; Mutschke, Peter; Petras, Vivien; Schaer, Philipp & Sure, York. (2011): Applying Science Models for Search, in: 12. Internationales Symposium für Informationswissenschaft (ISI), S. 184-196.

Mutschke, Peter; Mayr, Philipp; Schaer, Philipp; Sure, York (2011): Science Models as Value-Added Services for Scholarly Information Systems, in: Scientometrics, special issue on Modelling science – mathematical models of knowledge dynamics.

Mayr, Philipp; Mutschke, Peter; Schaer, Philipp; Sure, York (2011): Mehrwertdienste für das Information Retrieval: das Projekt IRM, in: Wissen - Wissenschaft - Organisation: Ergon-Verlag. (International Society for Knowledge Organization (ISKO) German Chapter)

Mayr, Philipp (2010): Information Retrieval-Mehrwertdienste für Digitale Bibliotheken: Crosskonkordanzen und Bradfordizing. GESIS-Schriftenreihe, Bd. 5. Bonn. 270 S.

Mutschke, Peter (2010): Zentralitäts- und Prestigemaße. In: Handbuch Netzwerkforschung. Wiesbaden: VS-Verlag für Sozialwissenschaften.

Schaer, Philipp; Mayr, Philipp & Mutschke, Peter (2010): Implications of Inter-Rater Agreement on a Student Information Retrieval Evaluation, in: Martin Atzmüller; Dominik Benz; Andreas Hotho & Gerd Stumme, ed.: Proceedings of LWA2010 - Workshop-Woche: Lernen, Wissen & Adaptivität. URL: http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/ir8.pdf

Schaer, Philipp.; Mayr, Philipp & Mutschke, Peter (2010): Demonstrating a Service-Enhanced Retrieval System, in: Proceedings of ASIST 2010 conference, Pittsburgh, PA, USA. URL: http://arxiv.org/pdf/1009.5003v1

Mayr, Philipp (2010): Bradfordizing mit Katalogdaten, in: BuB - Forum Bibliothek und Information 62, No. 1, pp. 61-63

Mayr, Philipp (2009a): Bradfordizing effects. S. 451-456, in: Kuhlen, Rainer (Hrsg.): Information: Droge, Ware oder Commons? Proceedings des 11. Internationalen Symposiums für Informationswissenschaft (ISI 2009). Konstanz: vwh Verlag Werner Hülsbusch

Mayr, Philipp (2009): Re-Ranking auf Basis von Bradfordizing für die verteilte Suche in Digitalen Bibliotheken. Philosophische Fakultät I, Institut für Bibliotheks- und Informationswissenschaft Humboldt-Universität zu Berlin Berlin. 237 p., URL: http://edoc.hu-berlin.de/dissertationen/mayr-philipp-2009-02-18/PDF/mayr.pdf

Mayr, Philipp; Mutschke, Peter; Petras, Vivien (2008): Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. In: Library Review 57, No. 3, pp. 213-224. URL: http://www.ib.hu-berlin.de/~mayr/arbeiten/mayr-etal_LR08.pdf

Mayr, Philipp; Petras, Vivien (2008a): Building a terminology network for search: the KoMoHe project. pp. 177-182, in: Greenberg, Jane; Klas, Wolfgang (eds.): Metadata for semantic and social applications: Proceedings of the 8. International conference on Dublin Core and Metadata Applications. Berlin: Uni.-Verl. Göttingen. URL: http://edoc.hu-berlin.de/conferences/dc-2008/mayr-philipp-177/PDF/mayr.pdf

Mayr, Philipp; Petras, Vivien (2008b): Cross-concordances: terminology mapping and its effectiveness for information retrieval, in: 74th IFLA World Library and Information Congress. Québec, Canada URL: http://www.ifla.org/IV/ifla74/papers/129-Mayr_Petras-en.pdf

Mutschke, Peter (2003): Mining Networks and Central Entities in Digital Libraries: A Graph Theoretic Approach applied to Co-Author Networks; Posterpräsentation, in: IDA 2003 - The 5th International Symposium on Intelligent Data Analysis. Berlin URL: http://fuzzy.cs.uni-magdeburg.de/confs/ida2003/

Mutschke, Peter (2004): Autorennetzwerke: Verfahren der Netzwerkanalyse als Mehrwertdienste für Informationssysteme. Bonn: Informationszentrum Sozialwissenschaften. 48 p. (IZ-Arbeitsbericht Nr. 32) URL: http://www.gesis.org/fileadmin/upload/forschung/publikationen/gesis_reihen/iz_arbeitsberichte/ab_32.pdf (975 KB)

Petras, Vivien (2006): Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages. University of California, Berkeley Berkeley, USA. URL: http://www.sims.berkeley.edu/~vivienp/diss/