Unpaired Image Captioning by Language Pivoting

Jiuxiang Gu, Shafiq Rayhan Joty, Jianfei Cai, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsMartial Hebert, Vittorio Ferrari, Cristian Sminchisescu, Yair Weiss
PublisherSpringer Verlag
Pages519-535
Number of pages17
ISBN (Print)9783030012458
DOIs
Publication statusPublished - 1 Jan 2018
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 8 Sep 201814 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11205 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th European Conference on Computer Vision, ECCV 2018
CountryGermany
CityMunich
Period8/9/1814/9/18

    Fingerprint

Keywords

  • Image captioning
  • Unpaired learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Gu, J., Rayhan Joty, S., Cai, J., & Wang, G. (2018). Unpaired Image Captioning by Language Pivoting. In M. Hebert, V. Ferrari, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 519-535). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01246-5_31