Knowledge-based representation for transductive multilingual document classification

Salvatore Romeo, Dino Ienco, Andrea Tagarelli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-theart transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings
PublisherSpringer Verlag
Pages92-103
Number of pages12
ISBN (Electronic)9783319163536
Publication statusPublished - 1 Jan 2015
Externally publishedYes
Event37th European Conference on Information Retrieval Research, ECIR 2015 - Vienna, Austria
Duration: 29 Mar 20152 Apr 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9022
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other37th European Conference on Information Retrieval Research, ECIR 2015
CountryAustria
CityVienna,
Period29/3/152/4/15

Fingerprint

Document Classification
Knowledge-based
Machine Translation
Glossaries
Knowledge Base
Semantics
Robustness
Resources
Evaluate
Modeling
Language

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Romeo, S., Ienco, D., & Tagarelli, A. (2015). Knowledge-based representation for transductive multilingual document classification. In Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings (pp. 92-103). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9022). Springer Verlag.

Knowledge-based representation for transductive multilingual document classification. / Romeo, Salvatore; Ienco, Dino; Tagarelli, Andrea.

Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings. Springer Verlag, 2015. p. 92-103 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9022).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Romeo, S, Ienco, D & Tagarelli, A 2015, Knowledge-based representation for transductive multilingual document classification. in Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9022, Springer Verlag, pp. 92-103, 37th European Conference on Information Retrieval Research, ECIR 2015, Vienna, Austria, 29/3/15.
Romeo S, Ienco D, Tagarelli A. Knowledge-based representation for transductive multilingual document classification. In Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings. Springer Verlag. 2015. p. 92-103. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Romeo, Salvatore ; Ienco, Dino ; Tagarelli, Andrea. / Knowledge-based representation for transductive multilingual document classification. Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings. Springer Verlag, 2015. pp. 92-103 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{bc7179651e2747e5a1280172e2dc6b0d,
title = "Knowledge-based representation for transductive multilingual document classification",
abstract = "Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-theart transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.",
author = "Salvatore Romeo and Dino Ienco and Andrea Tagarelli",
year = "2015",
month = "1",
day = "1",
language = "English",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "92--103",
booktitle = "Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings",

}

TY - GEN

T1 - Knowledge-based representation for transductive multilingual document classification

AU - Romeo, Salvatore

AU - Ienco, Dino

AU - Tagarelli, Andrea

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-theart transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

AB - Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. To overcome such issues we propose a new framework for multilingual document classification under a transductive learning setting. We exploit a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. We resort to a state-of-theart transductive learner to produce the document classification. Results on two real-world multilingual corpora have highlighted the effectiveness of the proposed document model w.r.t. document representations usually involved in multilingual and cross-lingual analysis, and the robustness of the transductive setting for multilingual document classification.

UR - http://www.scopus.com/inward/record.url?scp=84925423274&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925423274&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84925423274

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 92

EP - 103

BT - Advances in Information Retrieval - 37th European Conference on IR Research, ECIR 2015, Proceedings

PB - Springer Verlag

ER -