An investigation of decompounding for cross-language patent search

Johannes Leveling, Walid Magdy, Gareth J F Jones

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.

Original languageEnglish
Title of host publicationSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages1169-1170
Number of pages2
DOIs
Publication statusPublished - 1 Sep 2011
Externally publishedYes
Event34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'11 - Beijing, China
Duration: 24 Jul 201128 Jul 2011

Other

Other34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'11
CountryChina
CityBeijing
Period24/7/1128/7/11

Fingerprint

Information retrieval
Profitability
Experiments

Keywords

  • Decompounding
  • Patent retrieval

ASJC Scopus subject areas

  • Information Systems

Cite this

Leveling, J., Magdy, W., & Jones, G. J. F. (2011). An investigation of decompounding for cross-language patent search. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1169-1170) https://doi.org/10.1145/2009916.2010103

An investigation of decompounding for cross-language patent search. / Leveling, Johannes; Magdy, Walid; Jones, Gareth J F.

SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011. p. 1169-1170.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Leveling, J, Magdy, W & Jones, GJF 2011, An investigation of decompounding for cross-language patent search. in SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1169-1170, 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'11, Beijing, China, 24/7/11. https://doi.org/10.1145/2009916.2010103
Leveling J, Magdy W, Jones GJF. An investigation of decompounding for cross-language patent search. In SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011. p. 1169-1170 https://doi.org/10.1145/2009916.2010103
Leveling, Johannes ; Magdy, Walid ; Jones, Gareth J F. / An investigation of decompounding for cross-language patent search. SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011. pp. 1169-1170
@inproceedings{9fda7712bea14268be70bcdfbf8d7008,
title = "An investigation of decompounding for cross-language patent search",
abstract = "Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ({"}paten-tese{"}) Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.",
keywords = "Decompounding, Patent retrieval",
author = "Johannes Leveling and Walid Magdy and Jones, {Gareth J F}",
year = "2011",
month = "9",
day = "1",
doi = "10.1145/2009916.2010103",
language = "English",
isbn = "9781450309349",
pages = "1169--1170",
booktitle = "SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

TY - GEN

T1 - An investigation of decompounding for cross-language patent search

AU - Leveling, Johannes

AU - Magdy, Walid

AU - Jones, Gareth J F

PY - 2011/9/1

Y1 - 2011/9/1

N2 - Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.

AB - Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.

KW - Decompounding

KW - Patent retrieval

UR - http://www.scopus.com/inward/record.url?scp=80052129490&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052129490&partnerID=8YFLogxK

U2 - 10.1145/2009916.2010103

DO - 10.1145/2009916.2010103

M3 - Conference contribution

SN - 9781450309349

SP - 1169

EP - 1170

BT - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

ER -