Value-Added Services for Information Retrieval (IRM)
Retrieval-Mehrwertdienste zur Weiterentwicklung wissenschaftlicher Fachportale wie vascoda und sowiport. Suchexpandierung und Re-Ranking
Bearbeitung: Dr. Philipp Mayr, Peter Mutschke, Philipp Schaer
Leitung: Prof. Dr. York Sure
Wissenschaftlicher Arbeitsbereich: Informational Processes for the Social Sciences
Kontakt:
Projektbeschreibung:
In the area of academic information systems, a whole array of bibliographic databases, disciplinary internet portals, institutional repositories or archival and other media type collections are increasingly accumulated and embedded in all-encompassing information systems in order to meet user requirements that demand one-stop "information fulfilment". Examples are Elsevier’s Scirus portal, the OCLC Worldcat union catalog or Tuft University’s Perseus project.
In Germany, the general science portal “vascoda” merges structured, high-quality information collections from more than 40 providers on the basis of search engine technology (FAST enterprise Search) and a concept which treats semantic heterogeneity between different controlled vocabularies (see project KoMoHe).
First experiences with the portal show some weaknesses of this approach which are typical for most metadata-driven Digital Libraries (DLs): standard search and ranking models such as term frequency - inverse document frequency (tf-idf), best match models and especially recent web-based ranking methods implemented in search engines (originally for web pages) are not always appropriate for search in heterogeneously collected scholarly metadata documents. Search, both in full-text collections like the Internet or more heavily structured and less diverse collections like institutional repositories, indexing databases or library catalogues, only works as well as the matching between the language in queries and the language in the searched documents. If the words in the query are different from the words in a relevant document, this document will not be found. Moreover, pure term frequency based rankings often provide results that does’nt meet user needs, as first retrieval tests within the vascoda context have shown.
Objectives
The task of the project IRM is to introduce and evaluate value-added services (treatment of term vagueness and document re-ranking) for information retrieval within a heterogeneous DL environment (like sowiport). The methods, which will be implemented, focus on query construction and on result set re-ranking and are designed to positively influence each other. The goal of the project is to evaluate whether, and how far, search quality will be improved by applying the services under study.
We focus on the following three value-added services:
- Search Term Recommender (STR, idea based on work at School of Information, University of California, Berkeley) for term vagueness treatment
- Bradfordizing and author centrality in co-authorship networks which are derived from scientometrics and social network analysis for document re-ranking
Outlook
Beyond an isolated use, a combination of the approaches is promising to yield much higher innovation potential. The central impact of the project focuses on the integration of these three value-added services which aim at reducing the semantic complexity represented in distributed DLs at several stages in the information retrieval process: query construction, search and ranking and re-ranking.
We plan to evaluate the value-added services within a framework of different quantitative (e.g. IR tests) and qualitative (e.g. real user tests with the prototype) evaluation methods.
Future research work will address the use of information pertaining to institutions, themes or citations as a means of providing further value-adding functions in re-ranking methods, rather than just using authors and journals. An important further research issue is to apply and evaluate the proposed ranking methods at the user search stage in order to improve the precision of the initial result set.
Projektlaufzeit:
January 2009 to May 2010
Gefördert durch:

Project number: INST 658/6-1
Projektpartner:
- Institut für Bibliotheks- und Informationswissenschaft (IBI) der Humboldt-Universität zu Berlin (Contact Prof. Dr. Vivien Petras)
- Statistical Cybermetrics Research Group, School of Computing and IT, University of Wolverhampton, UK
- vascoda Partner
- Bundesinstitut für Sportwissenschaft (BISp)
- Deutsches Institut für internationale Pädagogische Forschung (DIPF)
Veröffentlichungen:
Mayr, Philipp (2009): Re-Ranking auf Basis von Bradfordizing für die verteilte Suche in Digitalen Bibliotheken. Philosophische Fakultät I, Institut für Bibliotheks- und Informationswissenschaft Humboldt-Universität zu Berlin Berlin. 237 p., URL: http://edoc.hu-berlin.de/dissertationen/mayr-philipp-2009-02-18/PDF/mayr.pdf
Mayr, Philipp; Mutschke, Peter; Petras, Vivien (2008): Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. In: Library Review 57, No. 3, pp. 213-224. URL: http://www.ib.hu-berlin.de/~mayr/arbeiten/mayr-etal_LR08.pdf
Mayr, Philipp; Petras, Vivien (2008a): Building a terminology network for search: the KoMoHe project. pp. 177-182. In: Greenberg, Jane; Klas, Wolfgang (eds.): Metadata for semantic and social applications: Proceedings of the 8. International conference on Dublin Core and Metadata Applications. Berlin: Uni.-Verl. Göttingen. URL: http://edoc.hu-berlin.de/conferences/dc-2008/mayr-philipp-177/PDF/mayr.pdf
Mayr, Philipp; Petras, Vivien (2008b): Cross-concordances: terminology mapping and its effectiveness for information retrieval. In: 74th IFLA World Library and Information Congress. Québec, Canada URL: http://www.ifla.org/IV/ifla74/papers/129-Mayr_Petras-en.pdf
Mutschke, Peter (2003): Mining Networks and Central Entities in Digital Libraries: A Graph Theoretic Approach applied to Co-Author Networks; Posterpräsentation. In: IDA 2003 - The 5th International Symposium on Intelligent Data Analysis. Berlin URL: http://fuzzy.cs.uni-magdeburg.de/confs/ida2003/
Mutschke, Peter (2004): Autorennetzwerke: Verfahren der Netzwerkanalyse als Mehrwertdienste für Informationssysteme. Bonn: Informationszentrum Sozialwissenschaften. 48 p. (IZ-Arbeitsbericht Nr. 32) URL: http://www.gesis.org/fileadmin/upload/forschung/publikationen/gesis_reihen/iz_arbeitsberichte/ab_32.pdf
Petras, Vivien (2006): Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages. University of California, Berkeley Berkeley, USA. URL: http://www.sims.berkeley.edu/~vivienp/diss/

