Inference of global HIV-1 sequence patterns and preliminary feature analysis

Yan Wang, Reda Rawi, Daniel Hoffmann, Binlian Sun, Rongge Yang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01-AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01-AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.

Original languageEnglish
Pages (from-to)228-238
Number of pages11
JournalVirologica Sinica
Volume28
Issue number4
DOIs
Publication statusPublished - 1 Aug 2013
Externally publishedYes

Fingerprint

human immunodeficiency virus
HIV-1
genome
amino acid
Zambia
Southern Africa
Amino Acids
Isoleucine
Sequence Alignment
Viral Genome
pruning
vaccine
epidemiology
Leucine
footprint
Glycine
Genetic Recombination
recombination
Cluster Analysis
genomics

Keywords

  • global HIV-1 sequence
  • Pattern inference
  • Repeated Incremental Pruning to Produce Error Reduction (RIPPER)

ASJC Scopus subject areas

  • Virology
  • Molecular Medicine
  • Immunology

Cite this

Inference of global HIV-1 sequence patterns and preliminary feature analysis. / Wang, Yan; Rawi, Reda; Hoffmann, Daniel; Sun, Binlian; Yang, Rongge.

In: Virologica Sinica, Vol. 28, No. 4, 01.08.2013, p. 228-238.

Research output: Contribution to journalArticle

Wang, Yan ; Rawi, Reda ; Hoffmann, Daniel ; Sun, Binlian ; Yang, Rongge. / Inference of global HIV-1 sequence patterns and preliminary feature analysis. In: Virologica Sinica. 2013 ; Vol. 28, No. 4. pp. 228-238.
@article{04a3d9848c8a4f23a29cc9b0fee3d224,
title = "Inference of global HIV-1 sequence patterns and preliminary feature analysis",
abstract = "The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5{\%} accuracy, Chinese from non-Chinese sequences with 98.3{\%} accuracy, African from non-African sequences with 88.4{\%} accuracy, and southern African from non-southern African sequences with 91.2{\%} accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01-AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01-AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.",
keywords = "global HIV-1 sequence, Pattern inference, Repeated Incremental Pruning to Produce Error Reduction (RIPPER)",
author = "Yan Wang and Reda Rawi and Daniel Hoffmann and Binlian Sun and Rongge Yang",
year = "2013",
month = "8",
day = "1",
doi = "10.1007/s12250-013-3348-z",
language = "English",
volume = "28",
pages = "228--238",
journal = "JAPCA",
issn = "1073-161X",
publisher = "Taylor and Francis Ltd.",
number = "4",

}

TY - JOUR

T1 - Inference of global HIV-1 sequence patterns and preliminary feature analysis

AU - Wang, Yan

AU - Rawi, Reda

AU - Hoffmann, Daniel

AU - Sun, Binlian

AU - Yang, Rongge

PY - 2013/8/1

Y1 - 2013/8/1

N2 - The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01-AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01-AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.

AB - The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01-AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01-AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.

KW - global HIV-1 sequence

KW - Pattern inference

KW - Repeated Incremental Pruning to Produce Error Reduction (RIPPER)

UR - http://www.scopus.com/inward/record.url?scp=84881437158&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881437158&partnerID=8YFLogxK

U2 - 10.1007/s12250-013-3348-z

DO - 10.1007/s12250-013-3348-z

M3 - Article

C2 - 23913180

AN - SCOPUS:84881437158

VL - 28

SP - 228

EP - 238

JO - JAPCA

JF - JAPCA

SN - 1073-161X

IS - 4

ER -