197
Views
0
CrossRef citations to date
0
Altmetric
Review

Use of profile hidden Markov models in viral discovery: current insights

, , &
Pages 29-45 | Published online: 14 Jul 2017

Figures & data

Figure 1 Multiple sequence alignment of VP1 (major capsid protein) sequences from Alpavirinae phages.

Notes: Multiple sequence alignment was performed with ClustalX using default parameters.Citation57 Colors indicate conservation of residues according to the ClustalX color scheme.Citation58
Figure 1 Multiple sequence alignment of VP1 (major capsid protein) sequences from Alpavirinae phages.

Table 1 Probability values for positions 46–49 from the multiple sequence alignment depicted in Figure 1

Table 2 Probability values with pseudocounts for positions 46–49 from the multiple sequence alignment depicted in Figure 1

Figure 2 Diagram representing a profile hidden Markov model (profile HMM).

Notes: Match states are represented as red rectangles, deletion (silent) states as green circles, and insertion states as blue diamonds. The red numerical values next to the arrows indicate transition probabilities. The equalities inside the states indicate amino acid probabilities, generally called emission probabilities. These emission probabilities do not include the use of pseudocounts. Match states use emission probabilities computed from the original alignment; insertion states use background amino acid probability values of 1/20. The transition probabilities highlighted with red circles indicate the probabilities described in the text. The other transition probabilities were arbitrarily set to make the figure more homogeneous and to increase clarity.
Figure 2 Diagram representing a profile hidden Markov model (profile HMM).

Table 3 Web resources of viral profile HMM databases and tools

Figure 3 Distribution of orthologous groups from vFamCitation35 (A) and pVOGsCitation36 (B) according to the viral families.

Notes: To obtain quantitative data, the number of corresponding profile HMM/orthologous groups was determined for each viral family based on the annotation provided in the database files. Profile HMMs from the original databases are derived from viruses of either single or multiple families.
Abbreviations: pVOGs, Prokaryotic Virus Orthologous Groups; vFAM, viral profile HMM database; profile HMMs, profile hidden Markov models.
Figure 3 Distribution of orthologous groups from vFamCitation35 (A) and pVOGsCitation36 (B) according to the viral families.

Table 4 Publicly available targeted assembly tools that use profile HMM seeds

Figure S1 Distribution of number of proteins per orthologous group for vFamCitation1 (A) and pVOGsCitation2 (B).

Notes: Data were obtained from the annotation files provided by the database authors and bins of size 10 were used for building the histograms. For increased readability, pVOGs data are shown only up to 1,000 proteins per orthologous group (just six groups presented numbers larger than that, up to a maximum of 8,131 proteins in the largest group).
Abbreviations: pVOGs, Prokaryotic Virus Orthologous Groups; vFAM, viral profile HMM database.
Figure S1 Distribution of number of proteins per orthologous group for vFamCitation1 (A) and pVOGsCitation2 (B).