International Conference on Cheminformatics and Computational Chemical Biology
Brisbane, Australia
Maldonado-RodrÃguez R
National Polytechnic Institute, Mexico
Title: VAMPhyRE a novel way to produce Virtual Genomic Fingerprints useful to construct phylogenomic trees with homologous k-mers which can be used to reveal key sequences
Biography
Biography: Maldonado-RodrÃguez R
Abstract
Our research group has created a new strategy for phylogenenomic constructions. This procedure is free of the errors associated to sequence alignments and, for bacteria, uses as homologous characters, shared and distinctive 21-mers located in any of both DNA strands and, having identical sequences or single base substitutions when shared. Two, intellectually protected components, are used for this purpose. The Virtual Analysis Method for Phylogenomic fingeRprint Estimation (VAMPhyRE) and the VAMPhyRE Probe Set (VPS). For bacteria, VPS-13 is used, which is constituted by 15,264 13-mer sequences, which were selected from all the 67,108.864 13-mer possible sequences, by shuffling, and extracting those sequences having 35-65% GC, high sequential entropy and at least two internal and spaced sequence differences. VAMPhyRE selects the 21-mer homologous sequences using a three step procedure. First searches genomic 13-mer sequences having identical or single target/probe differences. Then takes each 13-mer target sequence found by one probe, and extends it to 21-mer, by adding the 4 bases located at both flanks in the respective targets. These are the virtual genomic fingerprints (VGF). Finally it compares each of the 21-mer sequences present in one strain with the 21-mer sequences found by the same probe in the other strains to mark them as shared, when they have at least 19 identities, or distinctive homologous hits. This information is used to calculate distance scores and to construct the phylogenomic trees. This strategy has been successfully applied to Bacillus anthracis, Mycobacterium and other Actinomycetes. Since the genomic 21-mers found under our strategy correspond to conserved sequences they have key molecular roles, such as transport, catalysis, production of energy, regulation, biosynthesis, etc. and therefore, they perform essential biological roles such as, the capability to grow in different environments or hosts, to cause disease, or to produce substances of interest. Therefore, these VGF are a source of sequences which in combination with the respective phylogenomic tree, the genomic databases and the biological properties, can be used for diverse biotechnological applications such as diagnostics, metagenomic analysis, resistance, virulence, markers of strains of industrial interest, etc. We currently are working in data mining of this information. The following scheme illustrates the information included in the previous abstract.