Cross-language question re-ranking

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We study how to find relevant questions in community forums when the language of the newquestions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space.The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.

Original languageEnglish
Title of host publicationSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1145-1148
Number of pages4
ISBN (Electronic)9781450350228
DOIs
Publication statusPublished - 7 Aug 2017
Event40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017 - Tokyo, Shinjuku, Japan
Duration: 7 Aug 201711 Aug 2017

Other

Other40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
CountryJapan
CityTokyo, Shinjuku
Period7/8/1711/8/17

Fingerprint

Neural networks
Feedforward neural networks
Syntactics
Glossaries

Keywords

  • Community Question Answering
  • Cross-language Approaches
  • Distributed Representations
  • Kernel-based Methods
  • Neural Networks
  • Question Retrieval

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Computer Graphics and Computer-Aided Design

Cite this

Martino, G., Romeo, S., Barron, A., Rayhan Joty, S., Marques, L., Moschitti, A., & Nakov, P. (2017). Cross-language question re-ranking. In SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1145-1148). Association for Computing Machinery, Inc. https://doi.org/10.1145/3077136.3080743

Cross-language question re-ranking. / Martino, Giovanni; Romeo, Salvatore; Barron, Alberto; Rayhan Joty, Shafiq; Marques, Lluis; Moschitti, Alessandro; Nakov, Preslav.

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2017. p. 1145-1148.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Martino, G, Romeo, S, Barron, A, Rayhan Joty, S, Marques, L, Moschitti, A & Nakov, P 2017, Cross-language question re-ranking. in SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, pp. 1145-1148, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, Tokyo, Shinjuku, Japan, 7/8/17. https://doi.org/10.1145/3077136.3080743
Martino G, Romeo S, Barron A, Rayhan Joty S, Marques L, Moschitti A et al. Cross-language question re-ranking. In SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc. 2017. p. 1145-1148 https://doi.org/10.1145/3077136.3080743
Martino, Giovanni ; Romeo, Salvatore ; Barron, Alberto ; Rayhan Joty, Shafiq ; Marques, Lluis ; Moschitti, Alessandro ; Nakov, Preslav. / Cross-language question re-ranking. SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2017. pp. 1145-1148
@inproceedings{a0bd56274af74e5f89ec0a1709123e8c,
title = "Cross-language question re-ranking",
abstract = "We study how to find relevant questions in community forums when the language of the newquestions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space.The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.",
keywords = "Community Question Answering, Cross-language Approaches, Distributed Representations, Kernel-based Methods, Neural Networks, Question Retrieval",
author = "Giovanni Martino and Salvatore Romeo and Alberto Barron and {Rayhan Joty}, Shafiq and Lluis Marques and Alessandro Moschitti and Preslav Nakov",
year = "2017",
month = "8",
day = "7",
doi = "10.1145/3077136.3080743",
language = "English",
pages = "1145--1148",
booktitle = "SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Cross-language question re-ranking

AU - Martino, Giovanni

AU - Romeo, Salvatore

AU - Barron, Alberto

AU - Rayhan Joty, Shafiq

AU - Marques, Lluis

AU - Moschitti, Alessandro

AU - Nakov, Preslav

PY - 2017/8/7

Y1 - 2017/8/7

N2 - We study how to find relevant questions in community forums when the language of the newquestions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space.The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.

AB - We study how to find relevant questions in community forums when the language of the newquestions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space.The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.

KW - Community Question Answering

KW - Cross-language Approaches

KW - Distributed Representations

KW - Kernel-based Methods

KW - Neural Networks

KW - Question Retrieval

UR - http://www.scopus.com/inward/record.url?scp=85029385783&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029385783&partnerID=8YFLogxK

U2 - 10.1145/3077136.3080743

DO - 10.1145/3077136.3080743

M3 - Conference contribution

SP - 1145

EP - 1148

BT - SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery, Inc

ER -