Evaluating variable-length multiple-option lists in chatbots and mobile search

Pepa Atanasova, Preslav Nakov, Georgi Karadzhov, Yasen Kiprov, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the “ten blue links” metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking followup questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.

Original languageEnglish
Title of host publicationSIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages997-1000
Number of pages4
ISBN (Electronic)9781450361729
DOIs
Publication statusPublished - 18 Jul 2019
Event42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019 - Paris, France
Duration: 21 Jul 201925 Jul 2019

Publication series

NameSIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019
CountryFrance
CityParis
Period21/7/1925/7/19

Fingerprint

Mobile devices
Evaluation
Question Answering
Proliferation
Incentives
Mobile Devices
Likelihood
Likely
Necessary
Evaluate

Keywords

  • Chatbots
  • Evaluation Measures
  • Mobile Search

ASJC Scopus subject areas

  • Information Systems
  • Applied Mathematics
  • Software

Cite this

Atanasova, P., Nakov, P., Karadzhov, G., Kiprov, Y., & Sebastiani, F. (2019). Evaluating variable-length multiple-option lists in chatbots and mobile search. In SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 997-1000). (SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery, Inc. https://doi.org/10.1145/3331184.3331308

Evaluating variable-length multiple-option lists in chatbots and mobile search. / Atanasova, Pepa; Nakov, Preslav; Karadzhov, Georgi; Kiprov, Yasen; Sebastiani, Fabrizio.

SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2019. p. 997-1000 (SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Atanasova, P, Nakov, P, Karadzhov, G, Kiprov, Y & Sebastiani, F 2019, Evaluating variable-length multiple-option lists in chatbots and mobile search. in SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, pp. 997-1000, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21/7/19. https://doi.org/10.1145/3331184.3331308
Atanasova P, Nakov P, Karadzhov G, Kiprov Y, Sebastiani F. Evaluating variable-length multiple-option lists in chatbots and mobile search. In SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc. 2019. p. 997-1000. (SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/3331184.3331308
Atanasova, Pepa ; Nakov, Preslav ; Karadzhov, Georgi ; Kiprov, Yasen ; Sebastiani, Fabrizio. / Evaluating variable-length multiple-option lists in chatbots and mobile search. SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2019. pp. 997-1000 (SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval).
@inproceedings{2c15140039ac4fe9a50deabc784c31a1,
title = "Evaluating variable-length multiple-option lists in chatbots and mobile search",
abstract = "In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the “ten blue links” metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking followup questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.",
keywords = "Chatbots, Evaluation Measures, Mobile Search",
author = "Pepa Atanasova and Preslav Nakov and Georgi Karadzhov and Yasen Kiprov and Fabrizio Sebastiani",
year = "2019",
month = "7",
day = "18",
doi = "10.1145/3331184.3331308",
language = "English",
series = "SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval",
publisher = "Association for Computing Machinery, Inc",
pages = "997--1000",
booktitle = "SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval",

}

TY - GEN

T1 - Evaluating variable-length multiple-option lists in chatbots and mobile search

AU - Atanasova, Pepa

AU - Nakov, Preslav

AU - Karadzhov, Georgi

AU - Kiprov, Yasen

AU - Sebastiani, Fabrizio

PY - 2019/7/18

Y1 - 2019/7/18

N2 - In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the “ten blue links” metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking followup questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.

AB - In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the “ten blue links” metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking followup questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.

KW - Chatbots

KW - Evaluation Measures

KW - Mobile Search

UR - http://www.scopus.com/inward/record.url?scp=85073802530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073802530&partnerID=8YFLogxK

U2 - 10.1145/3331184.3331308

DO - 10.1145/3331184.3331308

M3 - Conference contribution

AN - SCOPUS:85073802530

T3 - SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

SP - 997

EP - 1000

BT - SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery, Inc

ER -