An investigation of decompounding for cross-language patent search

Johannes Leveling, Walid Magdy, Gareth J.F. Jones

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.

Original languageEnglish
Title of host publicationSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages1169-1170
Number of pages2
ISBN (Print)9781450309349
DOIs
Publication statusPublished - 1 Jan 2011
Event34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China
Duration: 24 Jul 201128 Jul 2011

Publication series

NameSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011
CountryChina
CityBeijing
Period24/7/1128/7/11

    Fingerprint

Keywords

  • Decompounding
  • Patent retrieval

ASJC Scopus subject areas

  • Information Systems

Cite this

Leveling, J., Magdy, W., & Jones, G. J. F. (2011). An investigation of decompounding for cross-language patent search. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1169-1170). (SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery. https://doi.org/10.1145/2009916.2010103