Composite retrieval of diverse and complementary bundles

Sihem Amer-Yahia, Francesco Bonchi, Carlos Castillo, Esteban Feuerstein, Isabel Mendez-Diaz, Paula Zabala

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.

Original languageEnglish
Article number6742606
Pages (from-to)2662-2675
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume26
Issue number11
DOIs
Publication statusPublished - 2014

Fingerprint

Starters
Nutrition
Clustering algorithms
Composite materials
Experiments

Keywords

  • complementarity
  • Composite retrieval
  • diversity
  • maximum edge subgraph

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Cite this

Amer-Yahia, S., Bonchi, F., Castillo, C., Feuerstein, E., Mendez-Diaz, I., & Zabala, P. (2014). Composite retrieval of diverse and complementary bundles. IEEE Transactions on Knowledge and Data Engineering, 26(11), 2662-2675. [6742606]. https://doi.org/10.1109/TKDE.2014.2306678

Composite retrieval of diverse and complementary bundles. / Amer-Yahia, Sihem; Bonchi, Francesco; Castillo, Carlos; Feuerstein, Esteban; Mendez-Diaz, Isabel; Zabala, Paula.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 11, 6742606, 2014, p. 2662-2675.

Research output: Contribution to journalArticle

Amer-Yahia, S, Bonchi, F, Castillo, C, Feuerstein, E, Mendez-Diaz, I & Zabala, P 2014, 'Composite retrieval of diverse and complementary bundles', IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 11, 6742606, pp. 2662-2675. https://doi.org/10.1109/TKDE.2014.2306678
Amer-Yahia S, Bonchi F, Castillo C, Feuerstein E, Mendez-Diaz I, Zabala P. Composite retrieval of diverse and complementary bundles. IEEE Transactions on Knowledge and Data Engineering. 2014;26(11):2662-2675. 6742606. https://doi.org/10.1109/TKDE.2014.2306678
Amer-Yahia, Sihem ; Bonchi, Francesco ; Castillo, Carlos ; Feuerstein, Esteban ; Mendez-Diaz, Isabel ; Zabala, Paula. / Composite retrieval of diverse and complementary bundles. In: IEEE Transactions on Knowledge and Data Engineering. 2014 ; Vol. 26, No. 11. pp. 2662-2675.
@article{4cd46c26638e4abbabbd46b2e4d0f721,
title = "Composite retrieval of diverse and complementary bundles",
abstract = "Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.",
keywords = "complementarity, Composite retrieval, diversity, maximum edge subgraph",
author = "Sihem Amer-Yahia and Francesco Bonchi and Carlos Castillo and Esteban Feuerstein and Isabel Mendez-Diaz and Paula Zabala",
year = "2014",
doi = "10.1109/TKDE.2014.2306678",
language = "English",
volume = "26",
pages = "2662--2675",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "11",

}

TY - JOUR

T1 - Composite retrieval of diverse and complementary bundles

AU - Amer-Yahia, Sihem

AU - Bonchi, Francesco

AU - Castillo, Carlos

AU - Feuerstein, Esteban

AU - Mendez-Diaz, Isabel

AU - Zabala, Paula

PY - 2014

Y1 - 2014

N2 - Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.

AB - Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.

KW - complementarity

KW - Composite retrieval

KW - diversity

KW - maximum edge subgraph

UR - http://www.scopus.com/inward/record.url?scp=84923169561&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923169561&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2014.2306678

DO - 10.1109/TKDE.2014.2306678

M3 - Article

AN - SCOPUS:84923169561

VL - 26

SP - 2662

EP - 2675

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 11

M1 - 6742606

ER -