3DCoffee: Combining protein sequences and structures within multiple sequence alignments

Orla O'Sullivan, Karsten Suhre, Chantal Abergel, Desmond G. Higgins, Cédric Notredame

Research output: Contribution to journalArticle

232 Citations (Scopus)

Abstract

Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.

Original languageEnglish
Pages (from-to)385-395
Number of pages11
JournalJournal of Molecular Biology
Volume340
Issue number2
DOIs
Publication statusPublished - 2 Jul 2004
Externally publishedYes

Fingerprint

Sequence Alignment
Proteins
Computational Biology

Keywords

  • CS, column score
  • DP, dynamic programming
  • MSA, multiple protein sequence alignment(s)
  • multiple alignment
  • NW, Needlman & Wunsch
  • S-MSA, structure-based MSA
  • sap
  • structural superposition
  • TCoffee
  • threading

ASJC Scopus subject areas

  • Molecular Biology

Cite this

3DCoffee : Combining protein sequences and structures within multiple sequence alignments. / O'Sullivan, Orla; Suhre, Karsten; Abergel, Chantal; Higgins, Desmond G.; Notredame, Cédric.

In: Journal of Molecular Biology, Vol. 340, No. 2, 02.07.2004, p. 385-395.

Research output: Contribution to journalArticle

O'Sullivan, Orla ; Suhre, Karsten ; Abergel, Chantal ; Higgins, Desmond G. ; Notredame, Cédric. / 3DCoffee : Combining protein sequences and structures within multiple sequence alignments. In: Journal of Molecular Biology. 2004 ; Vol. 340, No. 2. pp. 385-395.
@article{f55d789aa88f4c8ab7b7a2f7070a5dd9,
title = "3DCoffee: Combining protein sequences and structures within multiple sequence alignments",
abstract = "Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.",
keywords = "CS, column score, DP, dynamic programming, MSA, multiple protein sequence alignment(s), multiple alignment, NW, Needlman & Wunsch, S-MSA, structure-based MSA, sap, structural superposition, TCoffee, threading",
author = "Orla O'Sullivan and Karsten Suhre and Chantal Abergel and Higgins, {Desmond G.} and C{\'e}dric Notredame",
year = "2004",
month = "7",
day = "2",
doi = "10.1016/j.jmb.2004.04.058",
language = "English",
volume = "340",
pages = "385--395",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - 3DCoffee

T2 - Combining protein sequences and structures within multiple sequence alignments

AU - O'Sullivan, Orla

AU - Suhre, Karsten

AU - Abergel, Chantal

AU - Higgins, Desmond G.

AU - Notredame, Cédric

PY - 2004/7/2

Y1 - 2004/7/2

N2 - Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.

AB - Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.

KW - CS, column score

KW - DP, dynamic programming

KW - MSA, multiple protein sequence alignment(s)

KW - multiple alignment

KW - NW, Needlman & Wunsch

KW - S-MSA, structure-based MSA

KW - sap

KW - structural superposition

KW - TCoffee

KW - threading

UR - http://www.scopus.com/inward/record.url?scp=2942619012&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2942619012&partnerID=8YFLogxK

U2 - 10.1016/j.jmb.2004.04.058

DO - 10.1016/j.jmb.2004.04.058

M3 - Article

C2 - 15201059

AN - SCOPUS:2942619012

VL - 340

SP - 385

EP - 395

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 2

ER -