Building a heterogeneous information retrieval test collection of Arabic document images

Kareem Darwish, Walid Magdy, Ossama Emam, Abdelrahim Abdelsapor, Noha Adly, Magdi Nagi

Research output: Contribution to conferencePaper

Abstract

This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.

Original languageEnglish
Pages657-662
Number of pages6
Publication statusPublished - 1 Jan 2006
Externally publishedYes
Event5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy
Duration: 22 May 200628 May 2006

Other

Other5th International Conference on Language Resources and Evaluation, LREC 2006
CountryItaly
CityGenoa
Period22/5/0628/5/06

Fingerprint

information retrieval
experiment
Information Retrieval

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Darwish, K., Magdy, W., Emam, O., Abdelsapor, A., Adly, N., & Nagi, M. (2006). Building a heterogeneous information retrieval test collection of Arabic document images. 657-662. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

Building a heterogeneous information retrieval test collection of Arabic document images. / Darwish, Kareem; Magdy, Walid; Emam, Ossama; Abdelsapor, Abdelrahim; Adly, Noha; Nagi, Magdi.

2006. 657-662 Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.

Research output: Contribution to conferencePaper

Darwish, K, Magdy, W, Emam, O, Abdelsapor, A, Adly, N & Nagi, M 2006, 'Building a heterogeneous information retrieval test collection of Arabic document images' Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, 22/5/06 - 28/5/06, pp. 657-662.
Darwish K, Magdy W, Emam O, Abdelsapor A, Adly N, Nagi M. Building a heterogeneous information retrieval test collection of Arabic document images. 2006. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.
Darwish, Kareem ; Magdy, Walid ; Emam, Ossama ; Abdelsapor, Abdelrahim ; Adly, Noha ; Nagi, Magdi. / Building a heterogeneous information retrieval test collection of Arabic document images. Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.6 p.
@conference{62a14e1e296b46a98020442783ecb8e6,
title = "Building a heterogeneous information retrieval test collection of Arabic document images",
abstract = "This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.",
author = "Kareem Darwish and Walid Magdy and Ossama Emam and Abdelrahim Abdelsapor and Noha Adly and Magdi Nagi",
year = "2006",
month = "1",
day = "1",
language = "English",
pages = "657--662",
note = "5th International Conference on Language Resources and Evaluation, LREC 2006 ; Conference date: 22-05-2006 Through 28-05-2006",

}

TY - CONF

T1 - Building a heterogeneous information retrieval test collection of Arabic document images

AU - Darwish, Kareem

AU - Magdy, Walid

AU - Emam, Ossama

AU - Abdelsapor, Abdelrahim

AU - Adly, Noha

AU - Nagi, Magdi

PY - 2006/1/1

Y1 - 2006/1/1

N2 - This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.

AB - This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.

UR - http://www.scopus.com/inward/record.url?scp=85039920818&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039920818&partnerID=8YFLogxK

M3 - Paper

SP - 657

EP - 662

ER -