Role of gene ontology in bioinformatics and bioremediation studies

By Chandrika Kapagunta on December 13, 2017

In the previous articles, gene and protein of interest were studied with respect to their closely related variants found in NCBI database. In this article, gene and their products are studied on the basis of a standardised approach of annotating the correct information. This is based on gene annotation and gene ontology. Gene ontology refers to a consistent method of describing genes and gene products across all species and databases. It is a major bioinformatics initiative that aims to standardize the representation of gene and gene product attributes across species (Berardini, 2009). Under gene ontology, concepts and classes are used to define a gene function along with the relationships existing between these concepts (Purwantini et al., 2014).

Classification of gene ontology

The classification is based on three aspects

  1. molecular function,
  2. the cellular component
  3. and the biological process of the particular gene.

The molecular function involves the elemental activities of a gene product at the molecular level such as binding or catalysis. The cellular component refers to  parts of a cell or its extracellular environment. Moreover, the biological process refers to the pathways and larger processes made up of the activities of multiple gene products (Kang, 2012). The field of bioremediation requires consistent definitions of genes proteins. It also demands their interactions with other biological processes. Furthermore, genes involved in remediation of pollutants can be recognised through their expression patterns. It is often studied in gene ontology analysis. This article explores the application of gene ontology in the field of bioremediation, specifically phytoremediation.

Gene annotation and gene ontology in bioinformatics

Before defining the concepts involved in developing gene ontology of a particular gene or protein, the concept of ‘annotation’ needs to be understood. Gene annotation refers to a nucleotide sequence with respect to its function. It can also be defined as the process of identifying genes, coding regions, non-coding RNAs, alternatively spliced transcripts and pseudogenes present on a genome. Furthermore, one can also define the functions of these genes and sequence regions using annotation techniques (Wilming and Harrow, 2009).  Once a genome is sequenced, it needs to be annotated to make sense of it (Colantuoni, 2011). Therefore, genome annotation is the process of attaching biological information to sequences. It consists of two main steps which include identifying elements on the genome, a process called ‘gene prediction’. Second is attaching biological information to these elements. Annotation could be manual or through computer analysis (Standage, 2007).

Annotation and gene ontology have found significant application in bioremediation studies. A standardised set of terms associated with a gene and its product is useful during exploratory studies of new organisms or communities. Moreover, functions of genes or proteins can be inferred from a diverse community not studied previously. Mining for relevant products or processes from an existing database can also be done. However both of these are possible only if correct standardised gene ontology terms are available (Pappas et al., 2017).

Gene annotation using BLAST

The most basic level of annotation is the BLAST analysis for finding similarities and then annotating genomes. The main workforce for gene annotation is to identify homologs orthologs of genes in newly sequenced genomes. Their functions are known in a relevant genome. Genome ontology thus functions as a specialized database for annotation tools (Koonin and Galperin, 2003).

The exponential growth in the volume of accessible biological information has generated a confusion. The confusion relates to the annotation of molecular information about genes and their products. Therefore, the gene ontology project is a collaborative effort to develop ontologies. These ontologies aim to support biologically meaningful annotation of genes and their products in a wide variety of organisms. Gene ontology includes a description of the three components namely:

  1. the molecular function,
  2. biological process,
  3. the cellular component.

It also includes a community database resource that supports the use of these ontologies (Berardini, 2009).

The need for gene ontology

In the current scenario, a significant amount of knowledge about the functional characterization of the genes is already available in public databases. However, some of this knowledge is described through free-text statements. But these statements are normally ambiguous, domain specific and context dependent. To cope with this, the research community is developing and using bio-ontologies for the functional annotation of genes. Gene ontology has various applications. Some of these are the integration of proteomic information from different organisms and assigning functions to protein domains. Apart from this, gene ontology is also responsible for finding functional similarities in genes that are over-expressed or under-expressed in diseases. Furthermore, gene oncology is also useful in predicting the likelihood that a particular gene is involved in diseases that haven’t yet been mapped to specific genes. Other applications include:

  • Analysis of groups of genes that are co-expressed during development.
  • Development of automated ways of deriving information about gene function from literature.
  • Verifying models of genetic, metabolic and product interaction networks (Oti, Ballouz and Wouters, 2011).

Role of gene ontology in bioremediation

In bioremediation studies Next Generation Sequencing technologies  are emerging. Due to this, heavy sequencing of promising organisms capable of remediating pollutants is taking place. Using gene ontology terms and databases, annotation of these sequences is done correctly. It also enables recognition of important genes responsible for remediation. Moreover, gene annotation of large scale transcriptomes of promising organisms can reveal up-regulated or down-regulated genes. These genes can then be studied for their potential role in metabolic pathways related to bioremediation processes. Lastly, gene ontology of protein sequences can help reveal their functions (domains and motifs) which can further help in studying their interactions within the cell.

Gene ontologies of several microbes have been studied with respect to their functions and role in remediating particular pollutants. The table below shows the same.

Organism Application Description Tools used Author
Lysinibacillus sphaericus strain OT4b.31 Bioremediation of heavy-metal

polluted environments

Chromosomal scaffold: 4096672 (Circular).
Extrachromosomal elements: 759630 (Linear)
Total genes: 4938
Rapid Annotation using Sub-system Technolgy (RAST), Blast2GO pipelines for functional annotation. Gene3D, ProDom, SMART, PANTHER, SignalIP and TM-HMM databases. Peña-Montenegro, Lozano and Dussán, (2013).
Halomonas zincidurans strain B6 Heavy metal remediation, including zinc Genome size: 3,554,760 bp
DNA Coding: 3,153,982
Tool: Glimmer v.3.0, tRNAscan-SE, RNAmmer v.1.2; Database: KEGG. Huo et al., (2014).
Arabidopsis thaliana (thale cress) Heavy metal transport/detoxification Gene AtHMAD1 Tool: AutoFACT; Databases: KEGG, COG, PFAM, SMART databanks. Soares-Cavalcanti et al., (2012).
Brevibacterium linens BS258 Environmental Remediation via Microbially Induced Calcite Precipitation Genome size (bp): 3,862,244
G+C content (%): 64.16
Tool: Prokaryotic Genome Annotation Pipeline (PGAP) version 2.10 and Rapid Annotation using Sub-system Technolgy (RAST). Zhu, Wu and Suna, (2016).
Stenotrophomonas sp. DDT-1 Bioregradation of DDT Genome size: 4, 514, 569 bp
Protein coding genes: 4,033
Tool: BIOLOG; Databases: KEGG, COG, GO pathway databases. Pan et al., (2016).

Advantages for studies in bioremediation

Today there is a wide range of annotation and gene ontology tools available. They also have databases to compare with. Therefore studies in bioremediation can gain significant advantages. New organisms with promising potential can be studied in-depth and relevant genes and proteins involved in remediation can be isolated. Thus, information gained from gene ontology studies and pathway analysis can directly contribute to designing efficient systems. These systems will take into consideration the interaction pathways of relevant genes. Their relationships with biotic and abiotic factors can also be studied. Gene ontology, therefore, offers an efficient platform for evaluating bioremediation potential of organisms. It helps develop practical and feasible systems for real time applications. In major applications of gene ontology, gene interaction studies can be performed. It will help understand the role of a set of genes involved in hydrocarbon phytoremediation in Arabidopsis thaliana.


  • Berardini, T. Z. (2009) ‘The Gene Ontology in 2010: Extensions and refinements’, Nucleic Acids Research, 38(SUPPL.1), pp. 331–335. doi: 10.1093/nar/gkp1018.
  • Colantuoni, C. (2011) ‘Gene Annotation Gene : Protein coding unit of genomic DNA’, Gene, pp. 1–22.
  • Huo, Y. et al. (2014) ‘High quality draft genome sequence of the heavy metal resistant bacterium Halomonas zincidurans type strain B6 T’, Standards in Genomic Sciences, 9(1), pp. 1–9. doi: 10.1186/1944-3277-9-30.
  • Kang, J. W. (2012) ‘Bioremediation of Trichloroethylene: Analysis of the Plant Gene Response to TCE and Characterization of a Novel TCE-Degrading Endophyte’, p. 123.
  • Koonin, V. E. and Galperin, M. Y. (2003) ‘Genome Annotation and Analysis’, in Sequence – Evolution – Function: Computational Approaches in Comparative Genomics. Boston: Kluwer Academic Publishers.
  • Oti, M., Ballouz, S. and Wouters, M. a (2011) ‘In Silico Tools for Gene Discovery’, Methods in Molecular Biology, 760(i), pp. 175–187. doi: 10.1007/978-1-61779-176-5.
  • Pan, X. et al. (2016) ‘Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis.’, Scientific reports. Nature Publishing Group, 6(October 2015), p. 21332. doi: 10.1038/srep21332.
  • Pappas, K. M. et al. (2017) Genetic and Genome-Wide Insights into Microbes Studied for Bioenergy. Frontiers in Microbiology.
  • Peña-Montenegro, T. D., Lozano, L. and Dussán, J. (2013) ‘Genome sequence and description of the heavy metal tolerant bacterium Lysinibacillus sphaericus strain OT4b.31.’, Standards in genomic sciences, 9(1), pp. 42–56. doi: 10.4056/sigs.4227894.
  • Purwantini, E. et al. (2014) ‘Genetic resources for methane production from biomass described with the Gene Ontology’, Frontiers in Microbiology, 5(DEC), pp. 1–18. doi: 10.3389/fmicb.2014.00634.
  • Soares-Cavalcanti, N. M. et al. (2012) ‘In silico identification of known osmotic stress responsive genes from Arabidopsis in Soybean and Medicago’, Genetics and Molecular Biology, 35(SUPPL.1), pp. 315–321. doi: 10.1590/S1415-47572012000200012.
  • Standage, D. (2007) ‘Basics of Genome Annotation An-no-ta-tion \ ˌ a-n əә – ˈ t ā -sh əә n \’, pp. 12–48.
  • Wilming, L. and Harrow, J. (2009) ‘Gene Annotation Methods’, in Edwards, D., Stajich, J., and Hansen, D. (eds) Bioinformatics: Tools and Applications. New York, Dordrecht,Heidelberg, London: Springer Science & Business Media, pp. 121–136.
  • Zhu, Y., Wu, S. and Suna, C. (2016) ‘Complete Genome Sequence of Brevibacterium linens BS258 , a Potential Marine Actinobacterium for Environmental Remediation via Microbially Induced Calcite Precipitation’, Journal of Oceanography and Marine Research, 4(2). doi: 10.4172/2572-3103.1000148.