Mühlhausen S, Kollmar M
BMC Genomics 15, 411 (2014)
BACKGROUND: Many eukaryotes have been shown to use alternative schemes to the universal genetic code. While most Saccharomycetes, including Saccharomyces cerevisiae, use the standard genetic code translating the CUG codon as leucine, some yeasts, including many but not all of the "Candida", translate the same codon as serine. It has been proposed that the change in codon identity was accomplished by an almost complete loss of the original CUG codons, making the CUG positions within the extant species highly discriminative for the one or other translation scheme. RESULTS: In order to improve the prediction of genes in yeast species by providing the correct CUG decoding scheme we implemented a web server, called Bagheera, that allows determining the most probable CUG codon translation for a given transcriptome or genome assembly based on extensive reference data. As reference data we use 2071 manually assembled and annotated sequences from 38 cytoskeletal and motor proteins belonging to 79 yeast species. The web service includes a pipeline, which starts with predicting and aligning homologous genes to the reference data. CUG codon positions within the predicted genes are analysed with respect to amino acid similarity and CUG codon conservation in related species. In addition, the tRNACAG gene is predicted in genomic data and compared to known leu-tRNACAG and ser-tRNACAG genes. Bagheera can also be used to evaluate any mRNA and protein sequence data with the codon usage of the respective species. The usage of the system has been demonstrated by analysing six genomes not included in the reference data. CONCLUSIONS: Gene prediction and consecutive comparison with reference data from other Saccharomycetes are sufficient to predict the most probable decoding scheme for CUG codons. This approach has been implemented into Bagheera (http://www.motorprotein.de/bagheera).