DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2747-2749
Number of pages3
ISBN (Electronic)9781538654880
DOIs
Publication statusPublished - 21 Jan 2019
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: 3 Dec 20186 Dec 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Conference

Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
CountrySpain
CityMadrid
Period3/12/186/12/18

    Fingerprint

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Elbasir, A., Moovarkumudalvan, B., Kunji, K., Kolatkar, P., Bensmail, H., & Mall, R. (2019). DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction. In H. Schmidt, D. Griol, H. Wang, J. Baumbach, H. Zheng, Z. Callejas, X. Hu, J. Dickerson, & L. Zhang (Eds.), Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 (pp. 2747-2749). [8621202] (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2018.8621202