An analysis of factors used in search engine ranking

Albert Bifet, Carlos Castillo, Paul Alexandru Chirita, Ingmar Weber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

This paper investigates the influence of different page features on the ranking of search engine results. We use Google (via its API) as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden scores as a binary classification problem. To this problem we then apply both linear and non-linear methods. In all cases, we split the data into a training set and a test set to obtain a meaningful, unbiased estimator for the quality of our predictor. Although our results clearly show that the scoring function cannot be approximated well using only the observed features, we do obtain many interesting insights along the way and discuss ways of obtaining a better estimate and main limitations in trying to do so.

Original languageEnglish
Title of host publicationProceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference
Pages48-57
Number of pages10
Publication statusPublished - 1 Dec 2005
Externally publishedYes
Event1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference - Chiba, Japan
Duration: 10 May 200510 May 2005

Other

Other1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference
CountryJapan
CityChiba
Period10/5/0510/5/05

Fingerprint

Search engines
Testbeds
Application programming interfaces (API)
Statistical methods

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Bifet, A., Castillo, C., Chirita, P. A., & Weber, I. (2005). An analysis of factors used in search engine ranking. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference (pp. 48-57)

An analysis of factors used in search engine ranking. / Bifet, Albert; Castillo, Carlos; Chirita, Paul Alexandru; Weber, Ingmar.

Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference. 2005. p. 48-57.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bifet, A, Castillo, C, Chirita, PA & Weber, I 2005, An analysis of factors used in search engine ranking. in Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference. pp. 48-57, 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference, Chiba, Japan, 10/5/05.
Bifet A, Castillo C, Chirita PA, Weber I. An analysis of factors used in search engine ranking. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference. 2005. p. 48-57
Bifet, Albert ; Castillo, Carlos ; Chirita, Paul Alexandru ; Weber, Ingmar. / An analysis of factors used in search engine ranking. Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference. 2005. pp. 48-57
@inproceedings{2be93b4d41fa46fb9b8d0e3ecc3e2c2b,
title = "An analysis of factors used in search engine ranking",
abstract = "This paper investigates the influence of different page features on the ranking of search engine results. We use Google (via its API) as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden scores as a binary classification problem. To this problem we then apply both linear and non-linear methods. In all cases, we split the data into a training set and a test set to obtain a meaningful, unbiased estimator for the quality of our predictor. Although our results clearly show that the scoring function cannot be approximated well using only the observed features, we do obtain many interesting insights along the way and discuss ways of obtaining a better estimate and main limitations in trying to do so.",
author = "Albert Bifet and Carlos Castillo and Chirita, {Paul Alexandru} and Ingmar Weber",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "48--57",
booktitle = "Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference",

}

TY - GEN

T1 - An analysis of factors used in search engine ranking

AU - Bifet, Albert

AU - Castillo, Carlos

AU - Chirita, Paul Alexandru

AU - Weber, Ingmar

PY - 2005/12/1

Y1 - 2005/12/1

N2 - This paper investigates the influence of different page features on the ranking of search engine results. We use Google (via its API) as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden scores as a binary classification problem. To this problem we then apply both linear and non-linear methods. In all cases, we split the data into a training set and a test set to obtain a meaningful, unbiased estimator for the quality of our predictor. Although our results clearly show that the scoring function cannot be approximated well using only the observed features, we do obtain many interesting insights along the way and discuss ways of obtaining a better estimate and main limitations in trying to do so.

AB - This paper investigates the influence of different page features on the ranking of search engine results. We use Google (via its API) as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden scores as a binary classification problem. To this problem we then apply both linear and non-linear methods. In all cases, we split the data into a training set and a test set to obtain a meaningful, unbiased estimator for the quality of our predictor. Although our results clearly show that the scoring function cannot be approximated well using only the observed features, we do obtain many interesting insights along the way and discuss ways of obtaining a better estimate and main limitations in trying to do so.

UR - http://www.scopus.com/inward/record.url?scp=84876931971&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876931971&partnerID=8YFLogxK

M3 - Conference contribution

SP - 48

EP - 57

BT - Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005 - Held in Conjunction with the 14th International World Wide Web Conference

ER -