DeepSol

A deep learning framework for sequence-based protein solubility prediction

Sameer Khurana, Reda Rawi, Khalid Kunji, Gwo Yu Chuang, Halima Bensmail, RaghvenPhDa Mall

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).

Original languageEnglish
Pages (from-to)2605-2613
Number of pages9
JournalBioinformatics
Volume34
Issue number15
DOIs
Publication statusPublished - 1 Jan 2018

Fingerprint

Solubility
Learning
Proteins
Protein
Prediction
Predictors
Shopping centers
Pharmaceuticals
Protein Sequence
Backbone
Correlation coefficient
Computer Simulation
Drug products
Framework
Deep learning
Availability
Neural Networks
Neural networks
Predict
Pharmaceutical Preparations

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

DeepSol : A deep learning framework for sequence-based protein solubility prediction. / Khurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo Yu; Bensmail, Halima; Mall, RaghvenPhDa.

In: Bioinformatics, Vol. 34, No. 15, 01.01.2018, p. 2605-2613.

Research output: Contribution to journalArticle

@article{b19f706f9edd4bbfbf5d4ec79a6f2dd0,
title = "DeepSol: A deep learning framework for sequence-based protein solubility prediction",
abstract = "Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).",
author = "Sameer Khurana and Reda Rawi and Khalid Kunji and Chuang, {Gwo Yu} and Halima Bensmail and RaghvenPhDa Mall",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/bty166",
language = "English",
volume = "34",
pages = "2605--2613",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "15",

}

TY - JOUR

T1 - DeepSol

T2 - A deep learning framework for sequence-based protein solubility prediction

AU - Khurana, Sameer

AU - Rawi, Reda

AU - Kunji, Khalid

AU - Chuang, Gwo Yu

AU - Bensmail, Halima

AU - Mall, RaghvenPhDa

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).

AB - Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018).

UR - http://www.scopus.com/inward/record.url?scp=85054955688&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054955688&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty166

DO - 10.1093/bioinformatics/bty166

M3 - Article

VL - 34

SP - 2605

EP - 2613

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 15

ER -