Patent query reduction using pseudo relevance feedback

Debasis Ganguly, Johannes Leveling, Walid Magdy, Gareth J F Jones

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

Queries in patent prior art search are full patent applications and much longer than standard ad hoc search and web search topics. Standard information retrieval (IR) techniques are not entirely effective for patent prior art search because of ambiguous terms in these massive queries. Reducing patent queries by extracting key terms has been shown to be ineffective mainly because it is not clear what the focus of the query is. An optimal query reduction algorithm must thus seek to retain the useful terms for retrieval favouring recall of relevant patents, but remove terms which impair IR effectiveness. We propose a new query reduction technique decomposing a patent application into constituent text segments and computing the Language Modeling (LM) similarities by calculating the probability of generating each segment from the top ranked documents. We reduce a patent query by removing the least similar segments from the query, hypothesising that removal of these segments can increase the precision of retrieval, while still retaining the useful context to achieve high recall. Experiments on the patent prior art search collection CLEF-IP 2010 show that the proposed method outperforms standard pseudo-relevance feedback (PRF) and a naive method of query reduction based on removal of unit frequency terms (UFTs).

Original languageEnglish
Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
Pages1953-1956
Number of pages4
DOIs
Publication statusPublished - 13 Dec 2011
Externally publishedYes
Event20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, United Kingdom
Duration: 24 Oct 201128 Oct 2011

Other

Other20th ACM Conference on Information and Knowledge Management, CIKM'11
CountryUnited Kingdom
CityGlasgow
Period24/10/1128/10/11

Fingerprint

Pseudo-relevance feedback
Patents
Query
Art
Information retrieval
Language modeling
Ad hoc
Web search
Experiment

Keywords

  • patent search
  • pseudo-relevance feedback
  • query reduction

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Ganguly, D., Leveling, J., Magdy, W., & Jones, G. J. F. (2011). Patent query reduction using pseudo relevance feedback. In International Conference on Information and Knowledge Management, Proceedings (pp. 1953-1956) https://doi.org/10.1145/2063576.2063863

Patent query reduction using pseudo relevance feedback. / Ganguly, Debasis; Leveling, Johannes; Magdy, Walid; Jones, Gareth J F.

International Conference on Information and Knowledge Management, Proceedings. 2011. p. 1953-1956.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ganguly, D, Leveling, J, Magdy, W & Jones, GJF 2011, Patent query reduction using pseudo relevance feedback. in International Conference on Information and Knowledge Management, Proceedings. pp. 1953-1956, 20th ACM Conference on Information and Knowledge Management, CIKM'11, Glasgow, United Kingdom, 24/10/11. https://doi.org/10.1145/2063576.2063863
Ganguly D, Leveling J, Magdy W, Jones GJF. Patent query reduction using pseudo relevance feedback. In International Conference on Information and Knowledge Management, Proceedings. 2011. p. 1953-1956 https://doi.org/10.1145/2063576.2063863
Ganguly, Debasis ; Leveling, Johannes ; Magdy, Walid ; Jones, Gareth J F. / Patent query reduction using pseudo relevance feedback. International Conference on Information and Knowledge Management, Proceedings. 2011. pp. 1953-1956
@inproceedings{5fa96c5b31464d5fbe3a302d07865a92,
title = "Patent query reduction using pseudo relevance feedback",
abstract = "Queries in patent prior art search are full patent applications and much longer than standard ad hoc search and web search topics. Standard information retrieval (IR) techniques are not entirely effective for patent prior art search because of ambiguous terms in these massive queries. Reducing patent queries by extracting key terms has been shown to be ineffective mainly because it is not clear what the focus of the query is. An optimal query reduction algorithm must thus seek to retain the useful terms for retrieval favouring recall of relevant patents, but remove terms which impair IR effectiveness. We propose a new query reduction technique decomposing a patent application into constituent text segments and computing the Language Modeling (LM) similarities by calculating the probability of generating each segment from the top ranked documents. We reduce a patent query by removing the least similar segments from the query, hypothesising that removal of these segments can increase the precision of retrieval, while still retaining the useful context to achieve high recall. Experiments on the patent prior art search collection CLEF-IP 2010 show that the proposed method outperforms standard pseudo-relevance feedback (PRF) and a naive method of query reduction based on removal of unit frequency terms (UFTs).",
keywords = "patent search, pseudo-relevance feedback, query reduction",
author = "Debasis Ganguly and Johannes Leveling and Walid Magdy and Jones, {Gareth J F}",
year = "2011",
month = "12",
day = "13",
doi = "10.1145/2063576.2063863",
language = "English",
isbn = "9781450307178",
pages = "1953--1956",
booktitle = "International Conference on Information and Knowledge Management, Proceedings",

}

TY - GEN

T1 - Patent query reduction using pseudo relevance feedback

AU - Ganguly, Debasis

AU - Leveling, Johannes

AU - Magdy, Walid

AU - Jones, Gareth J F

PY - 2011/12/13

Y1 - 2011/12/13

N2 - Queries in patent prior art search are full patent applications and much longer than standard ad hoc search and web search topics. Standard information retrieval (IR) techniques are not entirely effective for patent prior art search because of ambiguous terms in these massive queries. Reducing patent queries by extracting key terms has been shown to be ineffective mainly because it is not clear what the focus of the query is. An optimal query reduction algorithm must thus seek to retain the useful terms for retrieval favouring recall of relevant patents, but remove terms which impair IR effectiveness. We propose a new query reduction technique decomposing a patent application into constituent text segments and computing the Language Modeling (LM) similarities by calculating the probability of generating each segment from the top ranked documents. We reduce a patent query by removing the least similar segments from the query, hypothesising that removal of these segments can increase the precision of retrieval, while still retaining the useful context to achieve high recall. Experiments on the patent prior art search collection CLEF-IP 2010 show that the proposed method outperforms standard pseudo-relevance feedback (PRF) and a naive method of query reduction based on removal of unit frequency terms (UFTs).

AB - Queries in patent prior art search are full patent applications and much longer than standard ad hoc search and web search topics. Standard information retrieval (IR) techniques are not entirely effective for patent prior art search because of ambiguous terms in these massive queries. Reducing patent queries by extracting key terms has been shown to be ineffective mainly because it is not clear what the focus of the query is. An optimal query reduction algorithm must thus seek to retain the useful terms for retrieval favouring recall of relevant patents, but remove terms which impair IR effectiveness. We propose a new query reduction technique decomposing a patent application into constituent text segments and computing the Language Modeling (LM) similarities by calculating the probability of generating each segment from the top ranked documents. We reduce a patent query by removing the least similar segments from the query, hypothesising that removal of these segments can increase the precision of retrieval, while still retaining the useful context to achieve high recall. Experiments on the patent prior art search collection CLEF-IP 2010 show that the proposed method outperforms standard pseudo-relevance feedback (PRF) and a naive method of query reduction based on removal of unit frequency terms (UFTs).

KW - patent search

KW - pseudo-relevance feedback

KW - query reduction

UR - http://www.scopus.com/inward/record.url?scp=83055186798&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83055186798&partnerID=8YFLogxK

U2 - 10.1145/2063576.2063863

DO - 10.1145/2063576.2063863

M3 - Conference contribution

AN - SCOPUS:83055186798

SN - 9781450307178

SP - 1953

EP - 1956

BT - International Conference on Information and Knowledge Management, Proceedings

ER -