Workshop at ISSI 2015

29 June 2015 - 

You are invited to participate in the upcoming workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics, to be held as part of the 15th International Society of Scientometrics and Informetrics Conference (ISSI).

Important Dates

  • Submission deadline (extended): 03 May 2015
  • Notification of acceptance/rejection: 20 May 2015
  • Camera-ready papers: 15 June 2015
  • Workshop: 29 June 2015, Istanbul, Turkey

Workshop proceedings

Pictures from the Workshop

Workshop Program

Half day workshop from 9:30 am - 1:00 pm (3,5 h). 
Location: Bogaziçi University / South Campus in Kriton Curi Hall

  • Introduction (15 min)
    Marc Bertin Editorial | Introduction  (226 KB)
  • Session 1 (Chair: Philipp Mayr): 9:45 am - 10:45 am
    • Gil Francopoulo, Joseph Mariani and Patrick Paroubek: NLP4NLP: Applying NLP to scientific corpora about written and spoken language processing Paper | Presentation (2.6 MB)
    • Mounia Haddoud, Aïcha Mokhtari, Thierry Lecroq and Saïd Abdeddaïm: Accurate Keyphrase Extraction from Scientific Papers by Mining Linguistic Information Paper | Presentation (504 KB)
    • Bilal Hayat, Muhammad Rafi, Arsal Jamal, Raja Sami Ur Rehman, Muhammad Bilal Alam and Syed Muhammad Zubair Alam: Classification of Research Citations (CRC) Paper | Presentation (1.9 MB)
  • Discussion (Chair: Marc Bertin): 10:45 am - 11:00 am
  • Break: 11:00 am - 11:30 am 
  • Session 2 (Chair: Marc Bertin): 11:30 am - 12:30 am
    • Bart Thijs, Wolfgang Glänzel and Martin Meyer: Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of “Information System Research" Paper | Presentation (738 KB)
    • Adam Meyers, Yifan He, Zachary Glass and Olga Babko-Malaya: The Termolator: Terminology Recognition based on Chunking, Statistical and Search-based Scores Paper | Presentation (246 KB)
    • Andi Rexha, Stefan Klampfl, Mark Kröll and Roman Kern: Towards Authorship Attribution for Bibliometrics using Stylometric Features Paper | Presentation (452 KB)
  • Discussion & Conclusion (Chair: Philipp Mayr): 12:30 pm - 1:00 pm
  • End of the workshop at 1:00 pm



The open access movement in scientific publishing and search engines like Google Scholar have made scientific articles more broadly accessible. During the last decade, the availability of scientific papers in full text has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv and CiteSeer.

The efforts to provide articles in machine-readable formats and the rise of Open Access publishing have resulted in a number of standardized formats for scientific papers (such as NLM-JATS, TEI, DocBook), full-text datasets for research experiments (PubMed, JSTOR, etc.) and corpora (iSearch, etc.). At the same time, research in the field of Natural Language Processing have provided a number of open source tools for versatile text processing (e.g. NLTK, Mallet, OpenNLP, CoreNLP, Gate, CiteSpace).


Scientific papers are highly structured texts and display specific properties related to their references but also argumentative and rhetorical structure. Recent research in this field has concentrated on the construction of ontologies for citations and scientific articles (e.g. CiTO, LinkedScience1) and studies of the distribution of references . However, up to now full-text mining efforts are rarely used to provide data for bibliometric analyses. While bibliometrics traditionally relies on the analysis of metadata of scientific papers (see e.g. a recent special issue on Combining Bibliometrics and Information Retrieval, Mayr & Scharnhorst, 2015), we will explore the ways full-text processing of scientific papers and linguistic analyses can play. With this workshop we like to discuss novel approaches and provide insights into scientific writing that can bring new perspectives to understand both the nature of citations and the nature of scientific articles. The possibility to enrich metadata by the full-text processing of papers offers new fields of application to bibliometrics studies.

Working with full text allows us to go beyond metadata used in bibliometrics. Full text offers a new field of investigation, where the major problems arise around the organization and structure of text, the extraction of information and its representation on the level of metadata. Furthermore, the study of contexts around in-text citations offers new perspectives related to the semantic dimension of citations. The analyses of citation contexts and the semantic categorization of publications will allow us to rethink co-citation networks, bibliographic coupling and other bibliometric techniques.


The workshop aims to bring together researchers in bibliometrics and computational linguistics in order to study the ways bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing. How can we enhance author network analysis and bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis?

Workshop topics

  • Linguistic modeling and discourse analysis for scientific texts
  • User interfaces, text representations and visualizations
  • Structure of scientific articles (discourse / argumentative / rhetorical / social)
  • Scientific corpora and paper standards
  • Act of citations, in-text citations and Content Citation Analysis
  • Co-citation and bibliographic coupling
  • Text enhanced bibliographic coupling
  • Terminology extraction
  • Text mining and information extraction
  • Scientific information retrieval
  • Ontological descriptions of scientific content
  • Knowledge extraction

The workshop will involve research project reports, system demonstrations and a panel discussion on the perspectives for the development of new text analytics approaches for bibliometrics.   

Submission Details

All submissions must be written in English following ISSI 2015 Template for Research in Progress Paper Manuscripts (up to 6 pages) and should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work.
ISSI 2015 Template for Research in Progress Paper Manuscripts.

Programme Committee

  • Lee Giles (College of Information Sciences and Technology, Pennsylvania State University, USA)
  • Yves Gingras (CIRST, Université du Québec à Montréal, Canada)
  • Vincent Lariviere (EBSI, Universite de Montreal, Canada)
  • Stefanie Haustein (EBSI, Universite de Montreal, Canada)
  • Timothy Bowman (EBSI, Universite de Montreal, Canada)
  • Izabella Thomas (Centre Tesniere, Universite de Franche-Comte, France)
  • Sylviane Cardey (Centre Tesniere, Université de Franche-Comte, France)
  • Beatrice Milard (Université de Toulouse 2, France)
  • Ruslan Mitkov (University of Wolverhampton, England)
  • Hitoshi Isahara (Toyohashi University of Technology, Japan)
  • Tomi Kauppinen (Aalto University, Finland)
  • Roman Kern (Know-Center, Austria)
  • Angelo Di Iorio (Department of Computer Science and Engineering, University of Bologna, Italy)


  • Iana Atanassova, Centre Tesnière, Université de Franche-Comté, France
  • Marc Bertin, CIRST, Université du Québec à Montréal, Canada
  • Philipp Mayr, GESIS - Leibniz Institute for the Social Sciences, Germany