Finding high-quality content in social media

Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne

Research output: Chapter in Book/Report/Conference proceedingConference contribution

696 Citations (Scopus)

Abstract

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions - social media sites - becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans.

Original languageEnglish
Title of host publicationWSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining
Pages183-193
Number of pages11
DOIs
Publication statusPublished - 6 May 2008
Externally publishedYes
Event2008 International Conference on Web Search and Data Mining, WSDM 2008 - Palo Alto, CA, United States
Duration: 11 Feb 200812 Feb 2008

Other

Other2008 International Conference on Web Search and Data Mining, WSDM 2008
CountryUnited States
CityPalo Alto, CA
Period11/2/0812/2/08

Fingerprint

Social Media
Availability
Feedback
Question Answering
Spam
Social Interaction
Vary
Community

Keywords

  • Community question answering
  • Media
  • User interactions

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software
  • Theoretical Computer Science

Cite this

Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding high-quality content in social media. In WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 183-193) https://doi.org/10.1145/1341531.1341557

Finding high-quality content in social media. / Agichtein, Eugene; Castillo, Carlos; Donato, Debora; Gionis, Aristides; Mishne, Gilad.

WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining. 2008. p. 183-193.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Agichtein, E, Castillo, C, Donato, D, Gionis, A & Mishne, G 2008, Finding high-quality content in social media. in WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining. pp. 183-193, 2008 International Conference on Web Search and Data Mining, WSDM 2008, Palo Alto, CA, United States, 11/2/08. https://doi.org/10.1145/1341531.1341557
Agichtein E, Castillo C, Donato D, Gionis A, Mishne G. Finding high-quality content in social media. In WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining. 2008. p. 183-193 https://doi.org/10.1145/1341531.1341557
Agichtein, Eugene ; Castillo, Carlos ; Donato, Debora ; Gionis, Aristides ; Mishne, Gilad. / Finding high-quality content in social media. WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining. 2008. pp. 183-193
@inproceedings{0823b9d18a3d4ecd846b104f0f02748a,
title = "Finding high-quality content in social media",
abstract = "The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions - social media sites - becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans.",
keywords = "Community question answering, Media, User interactions",
author = "Eugene Agichtein and Carlos Castillo and Debora Donato and Aristides Gionis and Gilad Mishne",
year = "2008",
month = "5",
day = "6",
doi = "10.1145/1341531.1341557",
language = "English",
pages = "183--193",
booktitle = "WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining",

}

TY - GEN

T1 - Finding high-quality content in social media

AU - Agichtein, Eugene

AU - Castillo, Carlos

AU - Donato, Debora

AU - Gionis, Aristides

AU - Mishne, Gilad

PY - 2008/5/6

Y1 - 2008/5/6

N2 - The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions - social media sites - becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans.

AB - The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions - social media sites - becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans.

KW - Community question answering

KW - Media

KW - User interactions

UR - http://www.scopus.com/inward/record.url?scp=42949138243&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42949138243&partnerID=8YFLogxK

U2 - 10.1145/1341531.1341557

DO - 10.1145/1341531.1341557

M3 - Conference contribution

SP - 183

EP - 193

BT - WSDM'08 - Proceedings of the 2008 International Conference on Web Search and Data Mining

ER -