Unpaired Image Captioning by Language Pivoting

Jiuxiang Gu, Shafiq Rayhan Joty, Jianfei Cai, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsMartial Hebert, Vittorio Ferrari, Cristian Sminchisescu, Yair Weiss
PublisherSpringer Verlag
Pages519-535
Number of pages17
ISBN (Print)9783030012458
DOIs
Publication statusPublished - 1 Jan 2018
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 8 Sep 201814 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11205 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th European Conference on Computer Vision, ECCV 2018
CountryGermany
CityMunich
Period8/9/1814/9/18

Fingerprint

Pivoting
Computer vision
Pivot
Natural Language
Processing
Target
Language
Computer Vision
Baseline
Benchmark
Evaluate
Demonstrate

Keywords

  • Image captioning
  • Unpaired learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Gu, J., Rayhan Joty, S., Cai, J., & Wang, G. (2018). Unpaired Image Captioning by Language Pivoting. In M. Hebert, V. Ferrari, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 519-535). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01246-5_31

Unpaired Image Captioning by Language Pivoting. / Gu, Jiuxiang; Rayhan Joty, Shafiq; Cai, Jianfei; Wang, Gang.

Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. ed. / Martial Hebert; Vittorio Ferrari; Cristian Sminchisescu; Yair Weiss. Springer Verlag, 2018. p. 519-535 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gu, J, Rayhan Joty, S, Cai, J & Wang, G 2018, Unpaired Image Captioning by Language Pivoting. in M Hebert, V Ferrari, C Sminchisescu & Y Weiss (eds), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11205 LNCS, Springer Verlag, pp. 519-535, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 8/9/18. https://doi.org/10.1007/978-3-030-01246-5_31
Gu J, Rayhan Joty S, Cai J, Wang G. Unpaired Image Captioning by Language Pivoting. In Hebert M, Ferrari V, Sminchisescu C, Weiss Y, editors, Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Springer Verlag. 2018. p. 519-535. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-01246-5_31
Gu, Jiuxiang ; Rayhan Joty, Shafiq ; Cai, Jianfei ; Wang, Gang. / Unpaired Image Captioning by Language Pivoting. Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. editor / Martial Hebert ; Vittorio Ferrari ; Cristian Sminchisescu ; Yair Weiss. Springer Verlag, 2018. pp. 519-535 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{62ecab4f8f9942af945ad58c125bb7c0,
title = "Unpaired Image Captioning by Language Pivoting",
abstract = "Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.",
keywords = "Image captioning, Unpaired learning",
author = "Jiuxiang Gu and {Rayhan Joty}, Shafiq and Jianfei Cai and Gang Wang",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-01246-5_31",
language = "English",
isbn = "9783030012458",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "519--535",
editor = "Martial Hebert and Vittorio Ferrari and Cristian Sminchisescu and Yair Weiss",
booktitle = "Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings",

}

TY - GEN

T1 - Unpaired Image Captioning by Language Pivoting

AU - Gu, Jiuxiang

AU - Rayhan Joty, Shafiq

AU - Cai, Jianfei

AU - Wang, Gang

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

AB - Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

KW - Image captioning

KW - Unpaired learning

UR - http://www.scopus.com/inward/record.url?scp=85055110136&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055110136&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01246-5_31

DO - 10.1007/978-3-030-01246-5_31

M3 - Conference contribution

SN - 9783030012458

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 519

EP - 535

BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings

A2 - Hebert, Martial

A2 - Ferrari, Vittorio

A2 - Sminchisescu, Cristian

A2 - Weiss, Yair

PB - Springer Verlag

ER -