Using machine learning to predict ranking of webpages in the gift industry: Factors for search-engine optimization

Joni Salminen, Roope Marttila, Bernard J. Jansen, Juan Corporan, Tommi Salenius

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450362924
DOIs
Publication statusPublished - 24 Mar 2019
Event9th International Conference on Information Systems and Technologies, ICIST 2019 - Cairo, Egypt
Duration: 24 Mar 201926 Mar 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference9th International Conference on Information Systems and Technologies, ICIST 2019
CountryEgypt
CityCairo
Period24/3/1926/3/19

Fingerprint

Search engines
Learning systems
Blogs
Decision trees
Industry

Keywords

  • Content Marketing
  • E-Commerce
  • Machine Learning
  • Online Marketing
  • Rank Prediction
  • Search-Engine Optimization

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Salminen, J., Marttila, R., Jansen, B. J., Corporan, J., & Salenius, T. (2019). Using machine learning to predict ranking of webpages in the gift industry: Factors for search-engine optimization. In Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019 [a6] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3361570.3361578

Using machine learning to predict ranking of webpages in the gift industry : Factors for search-engine optimization. / Salminen, Joni; Marttila, Roope; Jansen, Bernard J.; Corporan, Juan; Salenius, Tommi.

Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019. Association for Computing Machinery, 2019. a6 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salminen, J, Marttila, R, Jansen, BJ, Corporan, J & Salenius, T 2019, Using machine learning to predict ranking of webpages in the gift industry: Factors for search-engine optimization. in Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019., a6, ACM International Conference Proceeding Series, Association for Computing Machinery, 9th International Conference on Information Systems and Technologies, ICIST 2019, Cairo, Egypt, 24/3/19. https://doi.org/10.1145/3361570.3361578
Salminen J, Marttila R, Jansen BJ, Corporan J, Salenius T. Using machine learning to predict ranking of webpages in the gift industry: Factors for search-engine optimization. In Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019. Association for Computing Machinery. 2019. a6. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3361570.3361578
Salminen, Joni ; Marttila, Roope ; Jansen, Bernard J. ; Corporan, Juan ; Salenius, Tommi. / Using machine learning to predict ranking of webpages in the gift industry : Factors for search-engine optimization. Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019. Association for Computing Machinery, 2019. (ACM International Conference Proceeding Series).
@inproceedings{173463e3907142b3856f5993e21e58c4,
title = "Using machine learning to predict ranking of webpages in the gift industry: Factors for search-engine optimization",
abstract = "We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.",
keywords = "Content Marketing, E-Commerce, Machine Learning, Online Marketing, Rank Prediction, Search-Engine Optimization",
author = "Joni Salminen and Roope Marttila and Jansen, {Bernard J.} and Juan Corporan and Tommi Salenius",
year = "2019",
month = "3",
day = "24",
doi = "10.1145/3361570.3361578",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019",

}

TY - GEN

T1 - Using machine learning to predict ranking of webpages in the gift industry

T2 - Factors for search-engine optimization

AU - Salminen, Joni

AU - Marttila, Roope

AU - Jansen, Bernard J.

AU - Corporan, Juan

AU - Salenius, Tommi

PY - 2019/3/24

Y1 - 2019/3/24

N2 - We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.

AB - We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.

KW - Content Marketing

KW - E-Commerce

KW - Machine Learning

KW - Online Marketing

KW - Rank Prediction

KW - Search-Engine Optimization

UR - http://www.scopus.com/inward/record.url?scp=85076128368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076128368&partnerID=8YFLogxK

U2 - 10.1145/3361570.3361578

DO - 10.1145/3361570.3361578

M3 - Conference contribution

AN - SCOPUS:85076128368

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the 9th International Conference on Information Systems and Technologies, ICIST 2019

PB - Association for Computing Machinery

ER -