Seguritan, V., Alves, N., Jr., Arnoult, M., Raymond, A., Lorimer, D., Burgin, A. B., Jr., Salamon, P., Segall, A. M. (2012) Artificial Neural Networks Trained To Detect Viral and Phage Structural Proteins. PLoS Computational Biology. doi: 10.1371/journal.pcbi.1002657.

Bacteriophages are extremely abundant and diverse biological entities. All phage particles are comprised of nucleic acids and structural proteins, with few other packaged proteins. Despite their simplicity and abundance, more than 70% of phage sequences in the viral Reference Sequence database encode proteins with unknown function based on FASTA annotations. As a result, the use of sequence similarity is often insufficient for detecting virus structural proteins among unknown viral sequences. Viral structural protein function is challenging to detect from sequence data because structural proteins possess few known conserved catalytic motifs and sequence domains. To address these issues we investigated the use of Artificial Neural Networks as an alternative means of predicting function. Here, we trained thousands of networks using the amino acid frequency of structural protein sequences and identified the optimal architectures with the highest accuracies. Some hypothetical protein sequences detected by our networks were expressed and visualized by TEM, and produced images that strongly resemble virion structures. Our results support the utility of our neural networks in predicting the functions of unknown viral sequences.

iVIREONS is a web-based interface to our ensembles of trained artificial neural networks (ANNs) that were trained to identify virion structural proteins by voting on translated open reading frames. Our networks correctly identify, with a high degree of accuracy, ORFs in GenBank that have annotations such as capsid, tape measure, portal, tail, fiber, baseplate, connector, neck, and collar. We have trained additional neural network ensembles to identify more specific classes of structural proteins, namely major capsid and tail proteins. Please refer to the paper or the background page of this site for more information about our training, testing, and validation methods.

We appreciate your comments and suggestions. Please let us know if you use our networks and experimentally validate their predictions. Sequences that have been predicted and validated will be used to improve the accuracy of our networks.

The development of this site is in progress. Please revisit this site for updates.