DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2747-2749
Number of pages3
ISBN (Electronic)9781538654880
DOIs
Publication statusPublished - 21 Jan 2019
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: 3 Dec 20186 Dec 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Conference

Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
CountrySpain
CityMadrid
Period3/12/186/12/18

Fingerprint

Crystallization
Learning
Proteins
Diffraction
Crystals
Deep learning
X ray crystallography
X Ray Crystallography
Computer Simulation
Servers
Neural networks
Engineers
Costs and Cost Analysis
Costs

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Elbasir, A., Moovarkumudalvan, B., Kunji, K., Kolatkar, P., Bensmail, H., & Mall, R. (2019). DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction. In H. Schmidt, D. Griol, H. Wang, J. Baumbach, H. Zheng, Z. Callejas, X. Hu, J. Dickerson, ... L. Zhang (Eds.), Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 (pp. 2747-2749). [8621202] (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2018.8621202

DeepCrystal : A Deep Learning Framework for Sequence-based Protein Crystallization Prediction. / Elbasir, Abdurrahman; Moovarkumudalvan, Balasubramanian; Kunji, Khalid; Kolatkar, Prasanna; Bensmail, Halima; Mall, RaghvenPhDa.

Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. ed. / Harald Schmidt; David Griol; Haiying Wang; Jan Baumbach; Huiru Zheng; Zoraida Callejas; Xiaohua Hu; Julie Dickerson; Le Zhang. Institute of Electrical and Electronics Engineers Inc., 2019. p. 2747-2749 8621202 (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Elbasir, A, Moovarkumudalvan, B, Kunji, K, Kolatkar, P, Bensmail, H & Mall, R 2019, DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction. in H Schmidt, D Griol, H Wang, J Baumbach, H Zheng, Z Callejas, X Hu, J Dickerson & L Zhang (eds), Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018., 8621202, Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, Institute of Electrical and Electronics Engineers Inc., pp. 2747-2749, 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, Madrid, Spain, 3/12/18. https://doi.org/10.1109/BIBM.2018.8621202
Elbasir A, Moovarkumudalvan B, Kunji K, Kolatkar P, Bensmail H, Mall R. DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction. In Schmidt H, Griol D, Wang H, Baumbach J, Zheng H, Callejas Z, Hu X, Dickerson J, Zhang L, editors, Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 2747-2749. 8621202. (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018). https://doi.org/10.1109/BIBM.2018.8621202
Elbasir, Abdurrahman ; Moovarkumudalvan, Balasubramanian ; Kunji, Khalid ; Kolatkar, Prasanna ; Bensmail, Halima ; Mall, RaghvenPhDa. / DeepCrystal : A Deep Learning Framework for Sequence-based Protein Crystallization Prediction. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. editor / Harald Schmidt ; David Griol ; Haiying Wang ; Jan Baumbach ; Huiru Zheng ; Zoraida Callejas ; Xiaohua Hu ; Julie Dickerson ; Le Zhang. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 2747-2749 (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018).
@inproceedings{c94394a82f2b4541af56f42fb100774a,
title = "DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction",
abstract = "Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 {\%}, 12.1{\%} in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1{\%}, 6.0{\%} for F-score, 1.9{\%}, 3.9{\%} for accuracy and 3.8{\%}, 7.0{\%} for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.",
author = "Abdurrahman Elbasir and Balasubramanian Moovarkumudalvan and Khalid Kunji and Prasanna Kolatkar and Halima Bensmail and RaghvenPhDa Mall",
year = "2019",
month = "1",
day = "21",
doi = "10.1109/BIBM.2018.8621202",
language = "English",
series = "Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2747--2749",
editor = "Harald Schmidt and David Griol and Haiying Wang and Jan Baumbach and Huiru Zheng and Zoraida Callejas and Xiaohua Hu and Julie Dickerson and Le Zhang",
booktitle = "Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018",

}

TY - GEN

T1 - DeepCrystal

T2 - A Deep Learning Framework for Sequence-based Protein Crystallization Prediction

AU - Elbasir, Abdurrahman

AU - Moovarkumudalvan, Balasubramanian

AU - Kunji, Khalid

AU - Kolatkar, Prasanna

AU - Bensmail, Halima

AU - Mall, RaghvenPhDa

PY - 2019/1/21

Y1 - 2019/1/21

N2 - Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.

AB - Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.

UR - http://www.scopus.com/inward/record.url?scp=85062506633&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062506633&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2018.8621202

DO - 10.1109/BIBM.2018.8621202

M3 - Conference contribution

T3 - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

SP - 2747

EP - 2749

BT - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

A2 - Schmidt, Harald

A2 - Griol, David

A2 - Wang, Haiying

A2 - Baumbach, Jan

A2 - Zheng, Huiru

A2 - Callejas, Zoraida

A2 - Hu, Xiaohua

A2 - Dickerson, Julie

A2 - Zhang, Le

PB - Institute of Electrical and Electronics Engineers Inc.

ER -