Leveraging the web for automating tag expansion for low-content items

Ayush Singhal, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Tags, as high quality semantic descriptors, are used in categorization, clustering and efficient retrieval of various items in the web corpus. Images, videos, songs and similar multimedia items are the most common items which are tagged either manually or in a semiautomatic manner. However, the tagging process becomes complicated when the content structure of an item is not interpretable. Such a problems occurs in items like scientific research datasets or documents with very little text content. In this work, we propose a generalized approach to automate tag expansion for such low-content items. We leverage intelligence of the web to generate secondary content for such items for the tag expansion process. While automating tag expansion, we also address the problem of topic drift by automating removal of the noisy tags from the set of candidate new tags. The effectiveness of the proposed approach is tested on a real world dataset. The performance of the proposed is compared with Wikipedia based nearest neighbor tagging (WikiSem) and non-negative matrix factorization (NMF) based tag expansion approaches. Based on the Mean Reciprocal Rank (MRR) metric, the proposed approach was twice as accurate as the WikiSem baseline (0.27 vs 0.13) and at least 2.25 times the NMF baselines (0.27 vs 0.12).

Original languageEnglish
Title of host publicationProceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages545-552
Number of pages8
ISBN (Print)9781479958801
DOIs
Publication statusPublished - 27 Feb 2014
Externally publishedYes
Event15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014 - San Francisco, United States
Duration: 13 Aug 201415 Aug 2014

Other

Other15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014
CountryUnited States
CitySan Francisco
Period13/8/1415/8/14

Fingerprint

Factorization
Semantics

ASJC Scopus subject areas

  • Information Systems

Cite this

Singhal, A., & Srivastava, J. (2014). Leveraging the web for automating tag expansion for low-content items. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014 (pp. 545-552). [7051937] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRI.2014.7051937

Leveraging the web for automating tag expansion for low-content items. / Singhal, Ayush; Srivastava, Jaideep.

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 545-552 7051937.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Singhal, A & Srivastava, J 2014, Leveraging the web for automating tag expansion for low-content items. in Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014., 7051937, Institute of Electrical and Electronics Engineers Inc., pp. 545-552, 15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014, San Francisco, United States, 13/8/14. https://doi.org/10.1109/IRI.2014.7051937
Singhal A, Srivastava J. Leveraging the web for automating tag expansion for low-content items. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 545-552. 7051937 https://doi.org/10.1109/IRI.2014.7051937
Singhal, Ayush ; Srivastava, Jaideep. / Leveraging the web for automating tag expansion for low-content items. Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 545-552
@inproceedings{1d334883928d4a42a85c94bf534852ce,
title = "Leveraging the web for automating tag expansion for low-content items",
abstract = "Tags, as high quality semantic descriptors, are used in categorization, clustering and efficient retrieval of various items in the web corpus. Images, videos, songs and similar multimedia items are the most common items which are tagged either manually or in a semiautomatic manner. However, the tagging process becomes complicated when the content structure of an item is not interpretable. Such a problems occurs in items like scientific research datasets or documents with very little text content. In this work, we propose a generalized approach to automate tag expansion for such low-content items. We leverage intelligence of the web to generate secondary content for such items for the tag expansion process. While automating tag expansion, we also address the problem of topic drift by automating removal of the noisy tags from the set of candidate new tags. The effectiveness of the proposed approach is tested on a real world dataset. The performance of the proposed is compared with Wikipedia based nearest neighbor tagging (WikiSem) and non-negative matrix factorization (NMF) based tag expansion approaches. Based on the Mean Reciprocal Rank (MRR) metric, the proposed approach was twice as accurate as the WikiSem baseline (0.27 vs 0.13) and at least 2.25 times the NMF baselines (0.27 vs 0.12).",
author = "Ayush Singhal and Jaideep Srivastava",
year = "2014",
month = "2",
day = "27",
doi = "10.1109/IRI.2014.7051937",
language = "English",
isbn = "9781479958801",
pages = "545--552",
booktitle = "Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Leveraging the web for automating tag expansion for low-content items

AU - Singhal, Ayush

AU - Srivastava, Jaideep

PY - 2014/2/27

Y1 - 2014/2/27

N2 - Tags, as high quality semantic descriptors, are used in categorization, clustering and efficient retrieval of various items in the web corpus. Images, videos, songs and similar multimedia items are the most common items which are tagged either manually or in a semiautomatic manner. However, the tagging process becomes complicated when the content structure of an item is not interpretable. Such a problems occurs in items like scientific research datasets or documents with very little text content. In this work, we propose a generalized approach to automate tag expansion for such low-content items. We leverage intelligence of the web to generate secondary content for such items for the tag expansion process. While automating tag expansion, we also address the problem of topic drift by automating removal of the noisy tags from the set of candidate new tags. The effectiveness of the proposed approach is tested on a real world dataset. The performance of the proposed is compared with Wikipedia based nearest neighbor tagging (WikiSem) and non-negative matrix factorization (NMF) based tag expansion approaches. Based on the Mean Reciprocal Rank (MRR) metric, the proposed approach was twice as accurate as the WikiSem baseline (0.27 vs 0.13) and at least 2.25 times the NMF baselines (0.27 vs 0.12).

AB - Tags, as high quality semantic descriptors, are used in categorization, clustering and efficient retrieval of various items in the web corpus. Images, videos, songs and similar multimedia items are the most common items which are tagged either manually or in a semiautomatic manner. However, the tagging process becomes complicated when the content structure of an item is not interpretable. Such a problems occurs in items like scientific research datasets or documents with very little text content. In this work, we propose a generalized approach to automate tag expansion for such low-content items. We leverage intelligence of the web to generate secondary content for such items for the tag expansion process. While automating tag expansion, we also address the problem of topic drift by automating removal of the noisy tags from the set of candidate new tags. The effectiveness of the proposed approach is tested on a real world dataset. The performance of the proposed is compared with Wikipedia based nearest neighbor tagging (WikiSem) and non-negative matrix factorization (NMF) based tag expansion approaches. Based on the Mean Reciprocal Rank (MRR) metric, the proposed approach was twice as accurate as the WikiSem baseline (0.27 vs 0.13) and at least 2.25 times the NMF baselines (0.27 vs 0.12).

UR - http://www.scopus.com/inward/record.url?scp=84946688278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946688278&partnerID=8YFLogxK

U2 - 10.1109/IRI.2014.7051937

DO - 10.1109/IRI.2014.7051937

M3 - Conference contribution

AN - SCOPUS:84946688278

SN - 9781479958801

SP - 545

EP - 552

BT - Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -