Abstract
This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.
Original language | English |
---|---|
Pages | 657-662 |
Number of pages | 6 |
Publication status | Published - 1 Jan 2006 |
Externally published | Yes |
Event | 5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy Duration: 22 May 2006 → 28 May 2006 |
Other
Other | 5th International Conference on Language Resources and Evaluation, LREC 2006 |
---|---|
Country | Italy |
City | Genoa |
Period | 22/5/06 → 28/5/06 |
Fingerprint
ASJC Scopus subject areas
- Education
- Library and Information Sciences
- Linguistics and Language
- Language and Linguistics
Cite this
Building a heterogeneous information retrieval test collection of Arabic document images. / Darwish, Kareem; Magdy, Walid; Emam, Ossama; Abdelsapor, Abdelrahim; Adly, Noha; Nagi, Magdi.
2006. 657-662 Paper presented at 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy.Research output: Contribution to conference › Paper
}
TY - CONF
T1 - Building a heterogeneous information retrieval test collection of Arabic document images
AU - Darwish, Kareem
AU - Magdy, Walid
AU - Emam, Ossama
AU - Abdelsapor, Abdelrahim
AU - Adly, Noha
AU - Nagi, Magdi
PY - 2006/1/1
Y1 - 2006/1/1
N2 - This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.
AB - This paper describes the development of an Arabic document image collection containing 34, 651 documents from 1, 378 different books and 25 topics with their relevance judgments. The books from which the collection is obtained are a part of a larger collection 75, 000 books being scanned for archival and retrieval at the Bibliotheca Alexandrina (BA). The documents in the collection vary widely in topics, fonts, and degradation levels. Initial baseline experiments were performed to examine the effectiveness of different index terms, with and without blind relevance feedback, on Arabic OCR degraded text.
UR - http://www.scopus.com/inward/record.url?scp=85039920818&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85039920818&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85039920818
SP - 657
EP - 662
ER -