The G+C content is 52 8% (Figure 2, Table 3) Out of the 1,122 pr

The G+C content is 52.8% (Figure 2, Table 3). Out of the 1,122 predicted genes, 1,068 genes were protein-coding. A set of 54 genes coded for RNA and 9 were identified as pseudogenes. The majority of the protein-coding genes (61.6% of all genes) were assigned a putative function while 33.6% of all genes code for proteins therefore with unknown function. The distribution of genes into COGs functional categories is presented in Figure 2 and Table 4. Figure 2 Graphical circular map of the T. pallidum strain DAL-1 genome. From the outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), …

Table 3 Genome Statistics Table 4 Number of genes associated with general COG functional categories Insights into the genome Sequence changes differentiating the DAL-1 and Nichols genomes were identified mainly in the TPADAL_0136 gene (encoding fibronectin binding protein [42]) and comprised 94 nt changes. In addition, a repeat containing gene, TPADAL_0470 was found to contain 288 nts insertion composed of twelve, 24-bp repetitions. tpr genes including tprF (TP0316), tprG (TP0317) and tprK (TP0897) contained 2, 1 and 4 nt changes, respectively. However, the tprK gene was found variable within the DAL-1 strain and therefore the reported 4 nt changes do not refer to the variable tprK region [43]. Tpr proteins are known virulence factors in treponemes [43-48] and the changes in the primary sequence of the protein may be of importance in increased DAL-1 rabbit virulence.

In addition to the changes in the above mentioned genes, additional 31 nt changes were found throughout the genome (6 single nucleotide deletions, 3 single nucleotide insertions, 16 single nucleotide substitutions, one 2-nt deletion and one 4-nt deletion). All the indels (with exception of the 4-nt deletion) were found to be located in the G or C homopolymers. Indels resulted in truncation or elongation of several proteins including TPADAL_0012 (hypothetical protein, finally not annotated), TPADAL_0040 (probable methyl-accepting chemotaxis protein), TPADAL_0067 (conserved hypothetical protein), TPADAL_0127a (hypothetical protein), TPADAL_0134a (hypothetical protein), TPADAL_470 (conserved hypothetical protein), TPADAL_0479 (hypothetical protein), and TPADAL_0609 (AsnS, asparagine-tRNA ligase).

In addition, TPADAL_0859-860 was identified as a fused protein (TPADAL_0859). Two of the indels in the G or C homopolymers were found in the intergenic regions (IGR TPADAL_0225-226, IGR TPADAL_0316-317). Since G homopolymers, of variable length, affected gene expression rates of tpr genes [49], these differences may change the gene expression pattern in the DAL-1 genome. Out of the 16 single nucleotide substitutions, 3 were located in intergenic regions (IGR TPADAL_0126c-0126d, Brefeldin_A IGR TPADAL_0582-584, IGR TPADAL_0698-700) and three resulted in synonymous mutations (TPADAL_0228, 0742, 0939).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>