CMIC@INEX 2008: Link-the-wiki track

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes the runs that I submitted to the INEX 2008 Link-the-Wiki track. I participated in the incoming File-to-File and the outgoing Anchor-to-BEP tasks. For the File-to-File task I used a generic IR engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages337-342
Number of pages6
Volume5631 LNCS
DOIs
Publication statusPublished - 4 Nov 2009
Externally publishedYes
Event7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008 - Dagstuhl Castle, Germany
Duration: 15 Dec 200818 Dec 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5631 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008
CountryGermany
CityDagstuhl Castle
Period15/12/0818/12/08

Fingerprint

Anchors
Wikipedia
Engine
Query
Engines
Experiment
Experiments

Keywords

  • Document Linking
  • Keyphrase extraction

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Darwish, K. (2009). CMIC@INEX 2008: Link-the-wiki track. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5631 LNCS, pp. 337-342). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5631 LNCS). https://doi.org/10.1007/978-3-642-03761-0_34

CMIC@INEX 2008 : Link-the-wiki track. / Darwish, Kareem.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5631 LNCS 2009. p. 337-342 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5631 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Darwish, K 2009, CMIC@INEX 2008: Link-the-wiki track. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5631 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5631 LNCS, pp. 337-342, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, 15/12/08. https://doi.org/10.1007/978-3-642-03761-0_34
Darwish K. CMIC@INEX 2008: Link-the-wiki track. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5631 LNCS. 2009. p. 337-342. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-03761-0_34
Darwish, Kareem. / CMIC@INEX 2008 : Link-the-wiki track. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5631 LNCS 2009. pp. 337-342 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{71dbd52cd615401abbae9457a0f3fce2,
title = "CMIC@INEX 2008: Link-the-wiki track",
abstract = "This paper describes the runs that I submitted to the INEX 2008 Link-the-Wiki track. I participated in the incoming File-to-File and the outgoing Anchor-to-BEP tasks. For the File-to-File task I used a generic IR engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors.",
keywords = "Document Linking, Keyphrase extraction",
author = "Kareem Darwish",
year = "2009",
month = "11",
day = "4",
doi = "10.1007/978-3-642-03761-0_34",
language = "English",
isbn = "3642037607",
volume = "5631 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "337--342",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - CMIC@INEX 2008

T2 - Link-the-wiki track

AU - Darwish, Kareem

PY - 2009/11/4

Y1 - 2009/11/4

N2 - This paper describes the runs that I submitted to the INEX 2008 Link-the-Wiki track. I participated in the incoming File-to-File and the outgoing Anchor-to-BEP tasks. For the File-to-File task I used a generic IR engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors.

AB - This paper describes the runs that I submitted to the INEX 2008 Link-the-Wiki track. I participated in the incoming File-to-File and the outgoing Anchor-to-BEP tasks. For the File-to-File task I used a generic IR engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors.

KW - Document Linking

KW - Keyphrase extraction

UR - http://www.scopus.com/inward/record.url?scp=70350485022&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350485022&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-03761-0_34

DO - 10.1007/978-3-642-03761-0_34

M3 - Conference contribution

AN - SCOPUS:70350485022

SN - 3642037607

SN - 9783642037603

VL - 5631 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 337

EP - 342

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -