Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
Number of pages6
Publication statusPublished - 2014
EventWorkshop on Semantic Matching in Information Retrieval, SMIR 2014 - Gold Coast, Australia
Duration: 11 Jul 2014 → …


OtherWorkshop on Semantic Matching in Information Retrieval, SMIR 2014
CityGold Coast
Period11/7/14 → …



  • Inflection
  • Information Retrieval
  • Morphological Analysis
  • Query Expansion

ASJC Scopus subject areas

  • Computer Science(all)

Cite this