RSARF

Prediction of residue solvent accessibility from protein sequence using random forest method

Ganesan Pugalenthi, Krishna Kumar Kandaswamy, Kuo Chen Chou, Saravanan Vivekanandan, Prasanna Kolatkar

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and SUPPL.ementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.

Original languageEnglish
Pages (from-to)50-56
Number of pages7
JournalProtein and Peptide Letters
Volume19
Issue number1
Publication statusPublished - 1 Jan 2012
Externally publishedYes

Fingerprint

Proteins
Benchmarking
Protein Folding
Protein folding
Amino Acid Sequence
Membrane Proteins
Amino Acids
Testing
Datasets

Keywords

  • Accessible surface area
  • Conserved residue
  • Functional residue
  • Hydrophobic core
  • Protein interface
  • Protein structure prediction

ASJC Scopus subject areas

  • Biochemistry
  • Structural Biology

Cite this

RSARF : Prediction of residue solvent accessibility from protein sequence using random forest method. / Pugalenthi, Ganesan; Kandaswamy, Krishna Kumar; Chou, Kuo Chen; Vivekanandan, Saravanan; Kolatkar, Prasanna.

In: Protein and Peptide Letters, Vol. 19, No. 1, 01.01.2012, p. 50-56.

Research output: Contribution to journalArticle

Pugalenthi, G, Kandaswamy, KK, Chou, KC, Vivekanandan, S & Kolatkar, P 2012, 'RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method', Protein and Peptide Letters, vol. 19, no. 1, pp. 50-56.
Pugalenthi, Ganesan ; Kandaswamy, Krishna Kumar ; Chou, Kuo Chen ; Vivekanandan, Saravanan ; Kolatkar, Prasanna. / RSARF : Prediction of residue solvent accessibility from protein sequence using random forest method. In: Protein and Peptide Letters. 2012 ; Vol. 19, No. 1. pp. 50-56.
@article{a82f8052b96648a7bcf98f90288dfd21,
title = "RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method",
abstract = "Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0{\%}, 5{\%}, 10{\%}, 25{\%}, and 50{\%}). The prediction accuracy for 0{\%}, 5{\%}, 10{\%}, 25{\%}, and 50{\%} thresholds are 72.9{\%}, 78.25{\%}, 78.12{\%}, 77.57{\%} and 72.07{\%} respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and SUPPL.ementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.",
keywords = "Accessible surface area, Conserved residue, Functional residue, Hydrophobic core, Protein interface, Protein structure prediction",
author = "Ganesan Pugalenthi and Kandaswamy, {Krishna Kumar} and Chou, {Kuo Chen} and Saravanan Vivekanandan and Prasanna Kolatkar",
year = "2012",
month = "1",
day = "1",
language = "English",
volume = "19",
pages = "50--56",
journal = "Protein and Peptide Letters",
issn = "0929-8665",
publisher = "Bentham Science Publishers B.V.",
number = "1",

}

TY - JOUR

T1 - RSARF

T2 - Prediction of residue solvent accessibility from protein sequence using random forest method

AU - Pugalenthi, Ganesan

AU - Kandaswamy, Krishna Kumar

AU - Chou, Kuo Chen

AU - Vivekanandan, Saravanan

AU - Kolatkar, Prasanna

PY - 2012/1/1

Y1 - 2012/1/1

N2 - Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and SUPPL.ementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.

AB - Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and SUPPL.ementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.

KW - Accessible surface area

KW - Conserved residue

KW - Functional residue

KW - Hydrophobic core

KW - Protein interface

KW - Protein structure prediction

UR - http://www.scopus.com/inward/record.url?scp=84858167590&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858167590&partnerID=8YFLogxK

M3 - Article

VL - 19

SP - 50

EP - 56

JO - Protein and Peptide Letters

JF - Protein and Peptide Letters

SN - 0929-8665

IS - 1

ER -