Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
PublisherCEUR-WS
Pages24-29
Number of pages6
Volume1204
Publication statusPublished - 2014
EventWorkshop on Semantic Matching in Information Retrieval, SMIR 2014 - Gold Coast, Australia
Duration: 11 Jul 2014 → …

Other

OtherWorkshop on Semantic Matching in Information Retrieval, SMIR 2014
CountryAustralia
CityGold Coast
Period11/7/14 → …

Fingerprint

Processing
Linguistics
Degradation

Keywords

  • Inflection
  • Information Retrieval
  • Morphological Analysis
  • Query Expansion

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia. / Darwish, Kareem; Ali, Ahmed; Abdelali, Ahmed.

CEUR Workshop Proceedings. Vol. 1204 CEUR-WS, 2014. p. 24-29.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Darwish, K, Ali, A & Abdelali, A 2014, Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia. in CEUR Workshop Proceedings. vol. 1204, CEUR-WS, pp. 24-29, Workshop on Semantic Matching in Information Retrieval, SMIR 2014, Gold Coast, Australia, 11/7/14.
@inproceedings{e79f05842d8b4266a7b799d78a6d388b,
title = "Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia",
abstract = "Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.",
keywords = "Inflection, Information Retrieval, Morphological Analysis, Query Expansion",
author = "Kareem Darwish and Ahmed Ali and Ahmed Abdelali",
year = "2014",
language = "English",
volume = "1204",
pages = "24--29",
booktitle = "CEUR Workshop Proceedings",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia

AU - Darwish, Kareem

AU - Ali, Ahmed

AU - Abdelali, Ahmed

PY - 2014

Y1 - 2014

N2 - Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.

AB - Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.

KW - Inflection

KW - Information Retrieval

KW - Morphological Analysis

KW - Query Expansion

UR - http://www.scopus.com/inward/record.url?scp=84921057611&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921057611&partnerID=8YFLogxK

M3 - Conference contribution

VL - 1204

SP - 24

EP - 29

BT - CEUR Workshop Proceedings

PB - CEUR-WS

ER -