German Social Science Infrastructure Service
SearchSitemapHelp
GESIS Service Agency Eastern Europe Social Science Information Center  
Central Archive for Empirical Social Research, University  of Cologne Center for Survey Researchand Methodology

Literature & Research Information

Data Service & Archiving

Social Monitoring

Methods Consultation

Research & Development

Information Technology

Software

Publications

 

Order & Downloads

Events

GESIS Libraries

Link Collection SocioGuide

 

Cooperation

Consultation

Staff  & Addresses

Organization

 


 

 

Content Analysis, Retrieval and MetaData: Effective Networking (CARMEN)

Workpackage 11: Handling heterogeneity of textual information in different types of data and documentation languages

Staff: Robert Strötgen, Peter Mutschke, Dr. Jutta Marx
Project leader: Dr. Jutta Marx
Contact:  Dr. Maximilian Stempfhuber

Project description:

The special promotion measure CARMEN is engaged in the creation of suitable information systems in today's decentralized information world for data scattered throughout libraries, specialized information centers and across the internet. The integration of this data is less of a technical and more of a contextual and conceptual problem. Heterogeneity arises when different data are based on different thesauri and classifications, when metadata have been differently recorded or not recorded at all or when intellectually processed sources come together with usually completely non-prepared internet documents. Semantic improvement is sought in cases where searches in specialized databases are extended to internet searches and vice versa.
The generation of missing metadata from the documents is one avenue for this workpackage to approach improved retrieval. Metadata (title, author, institution, keywords and abstract) is automatically generated from the documents via deductive-heuristic extraction rules. Through the precise analysis of the heterogeneity found in the example documents, heuristics are generated for locating missing metadata. With the help of statistic-quantitative methods the different usage of terms in the different databases can then (in a second step) be compared. For mathematical documents there are already partially parallel corpora, for social science sources these are simulated via a commercial probablistic, full-text database. On this basis transfer relations between free-text terms and the descriptors of a documentation language such as the GESIS-IZ thesaurus are derived via word co-occurrence.

Project partner:

Duration: December 1999 - December 2001

Funding source:  BMBF

Publications:

  • Strötgen, Robert (2002): Behandlung semantischer Heterogenität durch Metadatenextraktion und Anfragetransfer. pp. 259-271. In: Womser-Hacker, Christa; Wolff, Christian; Hammwöhner, Rainer (Ed.): Information und Mobilität. Optimierung und Vermeidung von Mobilität durch Information; Proceedings des 8. Internationalen Symposiums für Informationswissenschaft (ISI 2002). Konstanz: UVK. (Schriften zur Informationswissenschaft, Band 40)
  • Strötgen, Robert (2002): Meta-Data Extraction and Query Translation. Treatment of Semantic Heterogeneity. pp. 362-373. In: Agosti, Maristella; Thanos, Costantino (Ed.): Research and Advanced Technology for Digital Libraries: 6th European Conference, ECDL 2002, Rome, Italy, September 16-18, 2002; Proceedings. Berlin: Springer. (Lecture Notes in Computer Science; 2458)
  • Binder, Gisbert; Marx, Jutta; Mutschke, Peter; Riege, Udo; Strötgen, Robert; Kokkelink, Stefan; Plümer, Judith (2002): Heterogenitätsbehandlung bei textueller Information verschiedener Datentypen und Inhaltserschließungsverfahren (Handling heterogeneity of textual information in different types of data and documentation languages); IZ Working Paper No. 24. Bonn: IZ Sozialwissenschaften
  • Hellweg, Heiko; Krause, Jürgen; Mandl, Thomas; Marx, Jutta; Müller, Matthias N.O.; Mutschke, Peter; Strötgen, Robert (2001): Treatment of Semantic Heterogeneity in Information Retrieval. Bonn: IZ Sozialwissenschaften. 47 S. (IZ Working Paper No. 23)
  • Strötgen, Robert; Kokkelink, Stefan: Metadatenextraktion aus Internetquellen: Heterogenitätsbehandlung im Projekt CARMEN. In: Schmidt, Ralph (Ed.): Information Research & Content Management: Orientierung, Ordnung und Organisation im Wissensmarkt; 23. Online-Tagung der DGI und 53. Jahrestagung der Deutschen Gesellschaft für Informationswissenschaft und Informationspraxis e.V., DGI, Frankfurt am Main, 8. bis 10. Mai 2001; Proceedings. Frankfurt am Main: DGI 2001. (Tagungen der Deutschen Gesellschaft für Informationswissenschaft und Informationspraxis; 4), S. 56-66.
  • Krause, Jürgen; Schwänzl, Roland; Plümer, Judith (2000): Content Analysis, Retrieval and Metadata: effective Networking for Mathematics, Physics and Social Sciences. In: Blasius, Jörg; Hox, Joop; Leeuw, Edith de; Schmidt, Peter (Hrsg.): Social Science Methodology in the New Millennium: Proceedings of the Fifth International Conference on Logic and Methodology, Cologne, October 3-6, 2000. CD-ROM. Amsterdam: TT-Publikaties.
  • Krause, Jürgen (2000): Integration von Ansätzen neuronaler Netzwerke in die Systemarchitektur von ViBSoz und CARMEN. Bonn: IZ Sozialwissenschaften. 26 S. (IZ Working Paper No. 21)
  • Krause, Jürgen: Virtual Libraries, Library Content Analysis, Metatdata and the Remaining Heterogenity. In: ICADL 2000 - Challenging to Knowledge Exploring for New Millenium-: The Proceedings of the 3rd International Conference of Asian Digital Library & the 3rd Conference on Digital Libraries, Korea; December 6-8, 2000; Seoul Education & Culture Center, Seoul, Korea. Seoul: ICADL 2000. pp. 209-214.
  • Krause, Jürgen; Marx, Jutta: Vocabulary Switching and Automatic Metadata Extraction or How to Get Useful Information from a Digital Library. In: Information Seeking, Searching and Querying in Digital Libraries. Proceedings of the First DELOS Network of Excellence Workshop. Zurich, Switzerland, December 11-12, 2000. Zurich 2000. pp. 133-134.

Further information: http://www.bonn.iz-soz.de/research/information/carmen/ap11/

© GESIS Stefan Bärisch 2008-01-02