Genetic Code Analysis Toolkit


Download latest version as:

Note: This requires a Java 8 installation (or higher), you may want to verify your Java installation here.



The genetic code is the major tool that nature uses for the transmission of information. By assigning amino acids to triplets of nucleic bases, so-called codons, it serves as a kind of dictionary between the two worlds of nucleic acids on the one hand and proteins on the other hand. As such it is involved in the communication process, called translation, and one of its most conserved properties is degeneracy respectively redundancy, i.e. several codons may encode the same amino acid. This is also expressed in the central dogma of molecular biology which consists of replication, (reversed) transcription and translation. Degeneracy is the essential tool in any system that allows error detection or even error correction and the probability that genetic information is preserved without such an error detecting/error correcting system is zero.

Why the genetic code is as it is and how it evolved from a long evolutionary process are still two of the major open questions. Several approaches from communication theory, mathematics and physics have been made to explain the structure and functionality of the genetic code table. One of the approaches is based on the finding of so-called circular codes in large genes of prokaryotes and eukaryotes by Arquis and Michel. Such codes are subsets of the set of codons that seem to be used by nature in order to eventually detect frame shift errors in the translation process. Circular codes are weaker versions of comma-free codes which can detect frame shift errors immediately. The investigation of the structure of circular and comma-free codes and their use in nature was the motivation for the development of the Genetic Code Analysis Toolkit, to explore the genetic code and its features.

The Genetic Code Analysis Toolkit provides workflows and algorithms for the analysis of the genetic code and is capable of handling sets of codons as well as sequences of codons that come from GenBank or from FASTA files. The tool also comes with a comprehensive editor that provides visualisations of the processes implemented. This is one of the advantages to other tools like bioconductor (Huber et al. (2015)) which require a knowledge in scripting.

Genetic Code Analysis Toolkit is intended to be used primarely by biologists who perform elementary tasks and do some simple scripting.

The toolkit is developed open source at the University of Applied Sciences in Mannheim. The source code can be found on GitHub (see the references / links section at the top). If you would like to contribute to the toolkit, please feel free to contact us via GitHub or e-mail.