Abstract
Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.
Original language | English |
---|---|
Title of host publication | SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval |
Pages | 1169-1170 |
Number of pages | 2 |
DOIs | |
Publication status | Published - 1 Sep 2011 |
Externally published | Yes |
Event | 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'11 - Beijing, China Duration: 24 Jul 2011 → 28 Jul 2011 |
Other
Other | 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'11 |
---|---|
Country | China |
City | Beijing |
Period | 24/7/11 → 28/7/11 |
Fingerprint
Keywords
- Decompounding
- Patent retrieval
ASJC Scopus subject areas
- Information Systems
Cite this
An investigation of decompounding for cross-language patent search. / Leveling, Johannes; Magdy, Walid; Jones, Gareth J F.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011. p. 1169-1170.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - An investigation of decompounding for cross-language patent search
AU - Leveling, Johannes
AU - Magdy, Walid
AU - Jones, Gareth J F
PY - 2011/9/1
Y1 - 2011/9/1
N2 - Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.
AB - Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("paten-tese") Which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.
KW - Decompounding
KW - Patent retrieval
UR - http://www.scopus.com/inward/record.url?scp=80052129490&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052129490&partnerID=8YFLogxK
U2 - 10.1145/2009916.2010103
DO - 10.1145/2009916.2010103
M3 - Conference contribution
AN - SCOPUS:80052129490
SN - 9781450309349
SP - 1169
EP - 1170
BT - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
ER -