Learning cross-modal embeddings for cooking recipes and food images

Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, Antonio Torralba

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)

Abstract

In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.

Original languageEnglish
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3068-3076
Number of pages9
Volume2017-January
ISBN (Electronic)9781538604571
DOIs
Publication statusPublished - 6 Nov 2017
Event30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States
Duration: 21 Jul 201726 Jul 2017

Other

Other30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
CountryUnited States
CityHonolulu
Period21/7/1726/7/17

Fingerprint

Cooking
Image retrieval
Semantics
Neural networks

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition

Cite this

Salvador, A., Hynes, N., Aytar, Y., Marin, J., Ofli, F., Weber, I., & Torralba, A. (2017). Learning cross-modal embeddings for cooking recipes and food images. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (Vol. 2017-January, pp. 3068-3076). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR.2017.327

Learning cross-modal embeddings for cooking recipes and food images. / Salvador, Amaia; Hynes, Nicholas; Aytar, Yusuf; Marin, Javier; Ofli, Ferda; Weber, Ingmar; Torralba, Antonio.

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2017. p. 3068-3076.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salvador, A, Hynes, N, Aytar, Y, Marin, J, Ofli, F, Weber, I & Torralba, A 2017, Learning cross-modal embeddings for cooking recipes and food images. in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 3068-3076, 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, United States, 21/7/17. https://doi.org/10.1109/CVPR.2017.327
Salvador A, Hynes N, Aytar Y, Marin J, Ofli F, Weber I et al. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Vol. 2017-January. Institute of Electrical and Electronics Engineers Inc. 2017. p. 3068-3076 https://doi.org/10.1109/CVPR.2017.327
Salvador, Amaia ; Hynes, Nicholas ; Aytar, Yusuf ; Marin, Javier ; Ofli, Ferda ; Weber, Ingmar ; Torralba, Antonio. / Learning cross-modal embeddings for cooking recipes and food images. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Vol. 2017-January Institute of Electrical and Electronics Engineers Inc., 2017. pp. 3068-3076
@inproceedings{c74fca00429e40d88128ac26607532d7,
title = "Learning cross-modal embeddings for cooking recipes and food images",
abstract = "In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.",
author = "Amaia Salvador and Nicholas Hynes and Yusuf Aytar and Javier Marin and Ferda Ofli and Ingmar Weber and Antonio Torralba",
year = "2017",
month = "11",
day = "6",
doi = "10.1109/CVPR.2017.327",
language = "English",
volume = "2017-January",
pages = "3068--3076",
booktitle = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Learning cross-modal embeddings for cooking recipes and food images

AU - Salvador, Amaia

AU - Hynes, Nicholas

AU - Aytar, Yusuf

AU - Marin, Javier

AU - Ofli, Ferda

AU - Weber, Ingmar

AU - Torralba, Antonio

PY - 2017/11/6

Y1 - 2017/11/6

N2 - In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.

AB - In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available.

UR - http://www.scopus.com/inward/record.url?scp=85038000096&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038000096&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2017.327

DO - 10.1109/CVPR.2017.327

M3 - Conference contribution

VL - 2017-January

SP - 3068

EP - 3076

BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -