Classifying websites by industry sector

A study in feature design

Giacomo Berardi, Andrea Esuli, Tiziano Fagni, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Classifying companies by industry sector is an important task in finance, since it allows investors and research analysts to analyse specific subsectors of local and global markets for investment monitoring and planning purposes. Traditionally this classification activity has been performed manually, by dedicated specialists carrying out in-depth analysis of a company's public profile. However, this is more and more unsuitable in nowadays's globalised markets, in which new companies spring up, old companies cease to exist, and existing companies refocus their efforts to different sectors at an astounding pace. As a result, tools for performing this classification automatically are increasingly needed. We address the problem of classifying companies by industry sector via the automatic classification of their websites, since the latter provide rich information about the nature of the company and market segment it targets. We have built a website classification system and tested its accuracy on a dataset of more than 20,000 company websites classified according to a 2-level taxonomy of 216 leaf classes explicitly designed for market research purposes. Our experimental study provides interesting insights as to which types of features are the most useful for this classification task.

Original languageEnglish
Title of host publicationProceedings of the ACM Symposium on Applied Computing
PublisherAssociation for Computing Machinery
Pages1053-1059
Number of pages7
Volume13-17-April-2015
ISBN (Print)9781450331968
DOIs
Publication statusPublished - 13 Apr 2015
Event30th Annual ACM Symposium on Applied Computing, SAC 2015 - Salamanca, Spain
Duration: 13 Apr 201517 Apr 2015

Other

Other30th Annual ACM Symposium on Applied Computing, SAC 2015
CountrySpain
CitySalamanca
Period13/4/1517/4/15

Fingerprint

Websites
Industry
Taxonomies
Finance
Planning
Monitoring

Keywords

  • Industry sectors
  • Website classification

ASJC Scopus subject areas

  • Software

Cite this

Berardi, G., Esuli, A., Fagni, T., & Sebastiani, F. (2015). Classifying websites by industry sector: A study in feature design. In Proceedings of the ACM Symposium on Applied Computing (Vol. 13-17-April-2015, pp. 1053-1059). Association for Computing Machinery. https://doi.org/10.1145/2695664.2695722

Classifying websites by industry sector : A study in feature design. / Berardi, Giacomo; Esuli, Andrea; Fagni, Tiziano; Sebastiani, Fabrizio.

Proceedings of the ACM Symposium on Applied Computing. Vol. 13-17-April-2015 Association for Computing Machinery, 2015. p. 1053-1059.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Berardi, G, Esuli, A, Fagni, T & Sebastiani, F 2015, Classifying websites by industry sector: A study in feature design. in Proceedings of the ACM Symposium on Applied Computing. vol. 13-17-April-2015, Association for Computing Machinery, pp. 1053-1059, 30th Annual ACM Symposium on Applied Computing, SAC 2015, Salamanca, Spain, 13/4/15. https://doi.org/10.1145/2695664.2695722
Berardi G, Esuli A, Fagni T, Sebastiani F. Classifying websites by industry sector: A study in feature design. In Proceedings of the ACM Symposium on Applied Computing. Vol. 13-17-April-2015. Association for Computing Machinery. 2015. p. 1053-1059 https://doi.org/10.1145/2695664.2695722
Berardi, Giacomo ; Esuli, Andrea ; Fagni, Tiziano ; Sebastiani, Fabrizio. / Classifying websites by industry sector : A study in feature design. Proceedings of the ACM Symposium on Applied Computing. Vol. 13-17-April-2015 Association for Computing Machinery, 2015. pp. 1053-1059
@inproceedings{c5e9afc95932486f9ac6b3f95dd7b03e,
title = "Classifying websites by industry sector: A study in feature design",
abstract = "Classifying companies by industry sector is an important task in finance, since it allows investors and research analysts to analyse specific subsectors of local and global markets for investment monitoring and planning purposes. Traditionally this classification activity has been performed manually, by dedicated specialists carrying out in-depth analysis of a company's public profile. However, this is more and more unsuitable in nowadays's globalised markets, in which new companies spring up, old companies cease to exist, and existing companies refocus their efforts to different sectors at an astounding pace. As a result, tools for performing this classification automatically are increasingly needed. We address the problem of classifying companies by industry sector via the automatic classification of their websites, since the latter provide rich information about the nature of the company and market segment it targets. We have built a website classification system and tested its accuracy on a dataset of more than 20,000 company websites classified according to a 2-level taxonomy of 216 leaf classes explicitly designed for market research purposes. Our experimental study provides interesting insights as to which types of features are the most useful for this classification task.",
keywords = "Industry sectors, Website classification",
author = "Giacomo Berardi and Andrea Esuli and Tiziano Fagni and Fabrizio Sebastiani",
year = "2015",
month = "4",
day = "13",
doi = "10.1145/2695664.2695722",
language = "English",
isbn = "9781450331968",
volume = "13-17-April-2015",
pages = "1053--1059",
booktitle = "Proceedings of the ACM Symposium on Applied Computing",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Classifying websites by industry sector

T2 - A study in feature design

AU - Berardi, Giacomo

AU - Esuli, Andrea

AU - Fagni, Tiziano

AU - Sebastiani, Fabrizio

PY - 2015/4/13

Y1 - 2015/4/13

N2 - Classifying companies by industry sector is an important task in finance, since it allows investors and research analysts to analyse specific subsectors of local and global markets for investment monitoring and planning purposes. Traditionally this classification activity has been performed manually, by dedicated specialists carrying out in-depth analysis of a company's public profile. However, this is more and more unsuitable in nowadays's globalised markets, in which new companies spring up, old companies cease to exist, and existing companies refocus their efforts to different sectors at an astounding pace. As a result, tools for performing this classification automatically are increasingly needed. We address the problem of classifying companies by industry sector via the automatic classification of their websites, since the latter provide rich information about the nature of the company and market segment it targets. We have built a website classification system and tested its accuracy on a dataset of more than 20,000 company websites classified according to a 2-level taxonomy of 216 leaf classes explicitly designed for market research purposes. Our experimental study provides interesting insights as to which types of features are the most useful for this classification task.

AB - Classifying companies by industry sector is an important task in finance, since it allows investors and research analysts to analyse specific subsectors of local and global markets for investment monitoring and planning purposes. Traditionally this classification activity has been performed manually, by dedicated specialists carrying out in-depth analysis of a company's public profile. However, this is more and more unsuitable in nowadays's globalised markets, in which new companies spring up, old companies cease to exist, and existing companies refocus their efforts to different sectors at an astounding pace. As a result, tools for performing this classification automatically are increasingly needed. We address the problem of classifying companies by industry sector via the automatic classification of their websites, since the latter provide rich information about the nature of the company and market segment it targets. We have built a website classification system and tested its accuracy on a dataset of more than 20,000 company websites classified according to a 2-level taxonomy of 216 leaf classes explicitly designed for market research purposes. Our experimental study provides interesting insights as to which types of features are the most useful for this classification task.

KW - Industry sectors

KW - Website classification

UR - http://www.scopus.com/inward/record.url?scp=84955451907&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955451907&partnerID=8YFLogxK

U2 - 10.1145/2695664.2695722

DO - 10.1145/2695664.2695722

M3 - Conference contribution

SN - 9781450331968

VL - 13-17-April-2015

SP - 1053

EP - 1059

BT - Proceedings of the ACM Symposium on Applied Computing

PB - Association for Computing Machinery

ER -