Unterstützung von Vagheit bei der interaktiven Text- und Faktensuche (VACOS)
Abstract
When users search for objects like products or research data, they often specify a number of desired properties of which several are vague or fuzzy (like e.g. price below 100 Euro, long battery life, high performance, release date from this year). Some of these conditions may be factual conditions directly relating to specific attribute values. However, current systems only support Boolean conditions on facts. This poses problems when conditions are in conflict, e.g. low price vs. high performance, and also completely eliminates conditions that only closely miss the desired search criteria. Moreover, textual conditions suffer from the vocabulary problem. They are usually short descriptions of relevant items and often do not contain any of the search terms.
The proposed project will develop new methods for handling vague fact and text conditions. For textual vagueness we will investigate a variety of methods for generating related terms from text corpora of the application domain; these terms are suggested as query completions to the user. Vague fact conditions will be handled as fuzzy predicates, this means, search results with barely missed conditions are also included in the result set but with a lower weighting. Furthermore fuzzy predicates allow to model general preferences like e.g. price low. We will also investigate methods for mapping typical text phrases onto fact conditions (e.g. recent release to release-date high). Query conditions can be either mandatory or optional. We will investigate different weighting functions for the various types of conditions, as well as the overall ranking functions. Users will be given the possibility to control these functions in various ways (e.g. by classical relevance feedback, or by controlling the weighting factors), for which appropriate user interfaces have to be developed. The methods will be applied to two use cases, one dealing with product search in an online shop and the other with a social sciences database containing data from several thousand surveys.