Supervised models for multimodal image retrieval based on visual, semantic and geographic information

Duc Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G B De Natale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Nowadays, large-scale networked social media need better search technologies to achieve suitable performance. Multimodal approaches are promising technologies to improve image ranking. This is particularly true when metadata are not completely reliable, which is a rather common case as far as user annotation, time and location are concerned. In this paper, we propose to properly combine visual information with additional multi-faceted information, to define a novel multimodal similarity measure. More specifically, we combine visual features, which strongly relate to the image content, with semantic information represented by manually annotated concepts, and geo tagging, very often available in the form of object/subject location. Furthermore, we propose a supervised machine learning approach, based on Support Vector Machines (SVMs), to automatically learn optimized weights to combine the above features. The resulting models is used as a ranking function to sort the results of a multimodal query.

Original languageEnglish
Title of host publicationProceedings - International Workshop on Content-Based Multimedia Indexing
Pages206-210
Number of pages5
DOIs
Publication statusPublished - 1 Oct 2012
Externally publishedYes
Event2012 10th International Workshop on Content-Based Multimedia Indexing, CBMI 2012 - Annecy, Haute-Savoie, France
Duration: 27 Jun 201229 Jun 2012

Other

Other2012 10th International Workshop on Content-Based Multimedia Indexing, CBMI 2012
CountryFrance
CityAnnecy, Haute-Savoie
Period27/6/1229/6/12

Fingerprint

Image retrieval
Semantics
Metadata
Support vector machines
Learning systems

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Dang-Nguyen, D. T., Boato, G., Moschitti, A., & De Natale, F. G. B. (2012). Supervised models for multimodal image retrieval based on visual, semantic and geographic information. In Proceedings - International Workshop on Content-Based Multimedia Indexing (pp. 206-210). [6269806] https://doi.org/10.1109/CBMI.2012.6269806

Supervised models for multimodal image retrieval based on visual, semantic and geographic information. / Dang-Nguyen, Duc Tien; Boato, Giulia; Moschitti, Alessandro; De Natale, Francesco G B.

Proceedings - International Workshop on Content-Based Multimedia Indexing. 2012. p. 206-210 6269806.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dang-Nguyen, DT, Boato, G, Moschitti, A & De Natale, FGB 2012, Supervised models for multimodal image retrieval based on visual, semantic and geographic information. in Proceedings - International Workshop on Content-Based Multimedia Indexing., 6269806, pp. 206-210, 2012 10th International Workshop on Content-Based Multimedia Indexing, CBMI 2012, Annecy, Haute-Savoie, France, 27/6/12. https://doi.org/10.1109/CBMI.2012.6269806
Dang-Nguyen DT, Boato G, Moschitti A, De Natale FGB. Supervised models for multimodal image retrieval based on visual, semantic and geographic information. In Proceedings - International Workshop on Content-Based Multimedia Indexing. 2012. p. 206-210. 6269806 https://doi.org/10.1109/CBMI.2012.6269806
Dang-Nguyen, Duc Tien ; Boato, Giulia ; Moschitti, Alessandro ; De Natale, Francesco G B. / Supervised models for multimodal image retrieval based on visual, semantic and geographic information. Proceedings - International Workshop on Content-Based Multimedia Indexing. 2012. pp. 206-210
@inproceedings{d302f8607a9b4092b31619118b70daa3,
title = "Supervised models for multimodal image retrieval based on visual, semantic and geographic information",
abstract = "Nowadays, large-scale networked social media need better search technologies to achieve suitable performance. Multimodal approaches are promising technologies to improve image ranking. This is particularly true when metadata are not completely reliable, which is a rather common case as far as user annotation, time and location are concerned. In this paper, we propose to properly combine visual information with additional multi-faceted information, to define a novel multimodal similarity measure. More specifically, we combine visual features, which strongly relate to the image content, with semantic information represented by manually annotated concepts, and geo tagging, very often available in the form of object/subject location. Furthermore, we propose a supervised machine learning approach, based on Support Vector Machines (SVMs), to automatically learn optimized weights to combine the above features. The resulting models is used as a ranking function to sort the results of a multimodal query.",
author = "Dang-Nguyen, {Duc Tien} and Giulia Boato and Alessandro Moschitti and {De Natale}, {Francesco G B}",
year = "2012",
month = "10",
day = "1",
doi = "10.1109/CBMI.2012.6269806",
language = "English",
isbn = "9781467323697",
pages = "206--210",
booktitle = "Proceedings - International Workshop on Content-Based Multimedia Indexing",

}

TY - GEN

T1 - Supervised models for multimodal image retrieval based on visual, semantic and geographic information

AU - Dang-Nguyen, Duc Tien

AU - Boato, Giulia

AU - Moschitti, Alessandro

AU - De Natale, Francesco G B

PY - 2012/10/1

Y1 - 2012/10/1

N2 - Nowadays, large-scale networked social media need better search technologies to achieve suitable performance. Multimodal approaches are promising technologies to improve image ranking. This is particularly true when metadata are not completely reliable, which is a rather common case as far as user annotation, time and location are concerned. In this paper, we propose to properly combine visual information with additional multi-faceted information, to define a novel multimodal similarity measure. More specifically, we combine visual features, which strongly relate to the image content, with semantic information represented by manually annotated concepts, and geo tagging, very often available in the form of object/subject location. Furthermore, we propose a supervised machine learning approach, based on Support Vector Machines (SVMs), to automatically learn optimized weights to combine the above features. The resulting models is used as a ranking function to sort the results of a multimodal query.

AB - Nowadays, large-scale networked social media need better search technologies to achieve suitable performance. Multimodal approaches are promising technologies to improve image ranking. This is particularly true when metadata are not completely reliable, which is a rather common case as far as user annotation, time and location are concerned. In this paper, we propose to properly combine visual information with additional multi-faceted information, to define a novel multimodal similarity measure. More specifically, we combine visual features, which strongly relate to the image content, with semantic information represented by manually annotated concepts, and geo tagging, very often available in the form of object/subject location. Furthermore, we propose a supervised machine learning approach, based on Support Vector Machines (SVMs), to automatically learn optimized weights to combine the above features. The resulting models is used as a ranking function to sort the results of a multimodal query.

UR - http://www.scopus.com/inward/record.url?scp=84866665660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866665660&partnerID=8YFLogxK

U2 - 10.1109/CBMI.2012.6269806

DO - 10.1109/CBMI.2012.6269806

M3 - Conference contribution

AN - SCOPUS:84866665660

SN - 9781467323697

SP - 206

EP - 210

BT - Proceedings - International Workshop on Content-Based Multimedia Indexing

ER -