Knowledge-based representation for transductive multilingual document classification

Salvatore Romeo, Dino Ienco, Andrea Tagarelli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-theart transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings
PublisherSpringer Verlag
Pages92-103
Number of pages12
ISBN (Electronic)9783319163536
Publication statusPublished - 1 Jan 2015
Externally publishedYes
Event37th European Conference on Information Retrieval Research, ECIR 2015 - Vienna, Austria
Duration: 29 Mar 20152 Apr 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9022
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other37th European Conference on Information Retrieval Research, ECIR 2015
CountryAustria
CityVienna,
Period29/3/152/4/15

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Romeo, S., Ienco, D., & Tagarelli, A. (2015). Knowledge-based representation for transductive multilingual document classification. In Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings (pp. 92-103). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9022). Springer Verlag.