|
|
Content Analysis, Retrieval and MetaData: Effective Networking (CARMEN)
Workpackage 11: Handling heterogeneity of textual information in different types of data and
documentation languages
Staff: Robert Strötgen, Peter Mutschke, Dr. Jutta Marx
Project leader: Dr.
Jutta Marx
Contact: Dr. Maximilian
Stempfhuber
Project description:
The special promotion measure CARMEN is engaged in the creation of suitable information systems in today's decentralized information world for data scattered throughout libraries, specialized information centers and across the internet. The integration of this data is less of a technical and more of a contextual and conceptual problem. Heterogeneity arises when different data are based on different
thesauri and classifications, when metadata have been differently recorded or not recorded at all or when intellectually processed sources come together with usually completely non-prepared internet documents. Semantic improvement is sought in cases where searches in specialized databases are extended to internet searches and vice versa.
The generation of missing metadata from the documents is one avenue for this workpackage
to approach improved retrieval. Metadata (title, author, institution, keywords and abstract) is automatically generated from the documents via deductive-heuristic extraction rules. Through the precise analysis of the heterogeneity found in the example documents, heuristics are generated for locating missing metadata. With the help of statistic-quantitative methods the different usage of terms in the different
databases can then (in a second step) be compared. For mathematical documents there are already partially
parallel corpora, for social science sources these are simulated via a commercial probablistic, full-text
database. On this basis transfer relations between free-text terms and the descriptors of
a documentation language such as the GESIS-IZ thesaurus are derived via word co-occurrence.
Project partner:
Duration: December 1999 - December 2001
Funding source: BMBF
Publications:
- Strötgen, Robert (2002): Behandlung semantischer Heterogenität
durch Metadatenextraktion und Anfragetransfer. pp. 259-271. In:
Womser-Hacker, Christa; Wolff, Christian; Hammwöhner, Rainer (Ed.):
Information und Mobilität. Optimierung und Vermeidung von Mobilität
durch Information; Proceedings des 8. Internationalen Symposiums für
Informationswissenschaft (ISI 2002). Konstanz: UVK. (Schriften zur
Informationswissenschaft, Band 40)
- Strötgen, Robert (2002): Meta-Data Extraction and Query
Translation. Treatment of Semantic Heterogeneity. pp. 362-373. In:
Agosti, Maristella; Thanos, Costantino (Ed.): Research and Advanced
Technology for Digital Libraries: 6th European Conference, ECDL 2002,
Rome, Italy, September 16-18, 2002; Proceedings. Berlin: Springer.
(Lecture Notes in Computer Science; 2458)
- Binder, Gisbert; Marx, Jutta; Mutschke, Peter; Riege, Udo; Strötgen,
Robert; Kokkelink, Stefan; Plümer, Judith (2002):
Heterogenitätsbehandlung bei textueller Information verschiedener
Datentypen und Inhaltserschließungsverfahren (Handling heterogeneity
of textual information in different types of data and documentation
languages); IZ
Working Paper No.
24. Bonn: IZ Sozialwissenschaften
- Hellweg, Heiko; Krause, Jürgen; Mandl, Thomas; Marx, Jutta; Müller, Matthias N.O.; Mutschke, Peter; Strötgen, Robert (2001): Treatment of Semantic Heterogeneity in Information Retrieval. Bonn: IZ Sozialwissenschaften. 47 S.
(IZ
Working Paper No. 23)
- Strötgen, Robert; Kokkelink, Stefan: Metadatenextraktion aus
Internetquellen: Heterogenitätsbehandlung im Projekt CARMEN. In:
Schmidt, Ralph (Ed.): Information Research & Content Management:
Orientierung, Ordnung und Organisation im Wissensmarkt; 23.
Online-Tagung der DGI und 53. Jahrestagung der Deutschen Gesellschaft
für Informationswissenschaft und Informationspraxis e.V., DGI,
Frankfurt am Main, 8. bis 10. Mai 2001; Proceedings. Frankfurt am
Main: DGI 2001. (Tagungen der Deutschen Gesellschaft für
Informationswissenschaft und Informationspraxis; 4), S. 56-66.
- Krause, Jürgen; Schwänzl, Roland; Plümer, Judith (2000): Content Analysis, Retrieval and Metadata: effective Networking for Mathematics, Physics and Social Sciences. In: Blasius, Jörg; Hox, Joop; Leeuw, Edith de; Schmidt, Peter (Hrsg.): Social Science Methodology in the New Millennium: Proceedings of the Fifth International Conference on Logic and Methodology, Cologne, October 3-6, 2000. CD-ROM. Amsterdam: TT-Publikaties.
- Krause, Jürgen (2000): Integration von Ansätzen neuronaler Netzwerke in die Systemarchitektur von ViBSoz und CARMEN. Bonn: IZ Sozialwissenschaften. 26 S.
(IZ
Working Paper No. 21)
- Krause, Jürgen: Virtual Libraries, Library Content Analysis,
Metatdata and the Remaining Heterogenity. In: ICADL 2000 - Challenging
to Knowledge Exploring for New Millenium-: The Proceedings of the 3rd
International Conference of Asian Digital Library & the 3rd
Conference on Digital Libraries, Korea; December 6-8, 2000; Seoul
Education & Culture Center, Seoul, Korea. Seoul: ICADL 2000. pp.
209-214.
- Krause, Jürgen; Marx, Jutta: Vocabulary Switching and Automatic
Metadata Extraction or How to Get Useful Information from a Digital
Library. In: Information Seeking, Searching and Querying in Digital
Libraries. Proceedings of the First DELOS Network of Excellence
Workshop. Zurich, Switzerland, December 11-12, 2000. Zurich 2000. pp.
133-134.
Further information: http://www.bonn.iz-soz.de/research/information/carmen/ap11/
© GESIS Stefan Bärisch
2008-01-02
|