How to use DAVID for functional annotation in Biomarker studies?

By Avishek Majumder on December 20, 2018

The database for annotation, visualisation and integrated discovery (DAVID) is a bioinformatics tool that consists of an integrated biological database and analytical tools. It helps to analyse large gene or protein sequences to extract meaningful information. DAVID provides a high throughput data mining environment. It helps to analyse the gene lists that are a result of high throughput genomic experiment (Huang, Sherman, & Lempicki, 2009). This gives the option for one or more pathways mining tools like gene functional classification, functional annotation chart or clustering and functional annotation table. It also helps understand the biological themes in the list of genes that are enriched in genome-scale studies.

Challenges prior to DAVID

During the analyses of the potential biomarkers for the non-small cell lung cancer, the microarray data available at the GEO database at NCBI was collected. After normalisation of the data, there is a need for the functional enrichment of the data for finding out the differentially expressed genes for the potential biomarker. This can be achieved by performing the analysis on Bioconductor using R program. But it is not feasible for everyone to perform the analysis by this method. DAVID offers the options of integrated Gene Ontology tools and KEGG, which explains the biological significance of the DEG.

Protocol

The list of genes is derived from GEO database from NCBI

Then the above-derived gene list will be analyzed using DAVID by following the steps as follows;

Click on the Functional Annotation bar in the DAVID homepage.

Functional annotation tool page. On the left-hand side, the option for pasting the gene list is given in which paste the gene list derived from the GEO database then enter the identifier for the gene list. Click on the option for the gene list as shown below and enter ‘submit list’.

The website then generates the annotation summary results. In this page analyse the gene list by using various tools mentioned here like functional annotation clustering, functional annotation charts and functional annotation table. Furthermore, all the genes in the gene list that are involved in the specific disease formation along with the reason can be analysed.

Functional Annotation page after entering the gene list

Functional annotation clustering in DAVID utilizes a fuzzy clustering concept by which it classifies the genes in the basis of the degree of co-association among themselves. This reduces the burden of associating different terms associated with a similar biological process, thus allowing the biological interpretation to be more focused at the ‘biological module’ level. The 2D view tool is also provided for examining the internal relationships among the clustered terms and genes.

The functional annotation chart provides a gene-term enrichment analysis that helps to identify the most relevant biological function associated with the gene list. This tool consists of extended annotation coverage as compared to other enrichment analysis tool. This consist of over 40 annotation categories apart from the Gene ontology, i.e. GO terms, protein functional domains, protein-protein interaction, disease association, sequence features, disease association, bio pathways, homology, gene functional summaries, gene tissue expression and literature.

DAVID functional annotation chart — **DAVID** functional annotation chart

Functional Annotation table is a query engine for the DAVID knowledge base, without statistical calculations. This is a useful analytic module particularly when users want to closely look at the annotation of highly interesting genes.

Functional Annotation table for query engine — Functional Annotation table for the query engine

Challenges and benefits of DAVID

S.No.	Example question to ask	Main function	Advantage	Drawbacks
Gene name batch viewer	What are the genes in my list?	Display all the genes names in a linear tabular text format and search for other functionally related genes.	Explore genes one by one. Important genes are quickly identified from the list. Annotation of the gene of interest is identified. Analysis of all the genes is performed.	Important genes are not easily differentiated from the non-specific genes without enrichment calculations. Related genes are scattered in the results due to lose interrelationships.
Gene functional classification	What are the major gene families on my list?	Functionally related genes are classified into groups. 2D view for related gene-term relationship.	Explores genes group by group rather than single genes each time. Highlights important gene groups by enrichment scores.	Some genes that do not have strong association with other groups will be left out from the analysis.
Functional annotation chart	Which annotation terms are enriched for my gene list?	Enriched annotation terms are identified in linear tabular text format. Genes are viewed on pathway maps.	Explores singular enriched terms in a simple format.	Result will include redundant terms. Therefore, functional analysis of the genes will be compromised due to the redundancy.
Functional annotation clustering	Which annotation groups are enriched for my gene list?	Cluster functionally related annotations into groups. 2D view for related gene term relationship.	Explores annotation group by group rather than singularly. Highlights important annotation groups by enrichment scores.	Some enriched terms without strong neighbours will be left out from the analysis.
Functional annotation table	What are the associated annotations for each of my genes?	Query selected annotations for given genes.	Quickly explores all annotations for given genes. Good for analysis of small number of focused genes.	It is difficult to explore large gene list. No enrichment analysis.

The functional enrichment analysis for the differentially expressed genes was performed using DAVID software that provides us significantly enriched GO terms and KEGG pathways. Consequently, the functions of up-regulated genes and down-regulated genes will be revealed after the analysis. Then, by Recursive feature extraction of the selected genes, we can finally obtain the potential biomarker for the particular disease (Qu, Li, Li, & Chen, 2016). This software made the study easy and efficient as it is easily available online and also there is no need for any programming language to perform the analysis.

References

Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols, 4(1), 44–57. https://doi.org/10.1038/nprot.2008.211.
Qu, T., Li, Y., Li, X., & Chen, Y. (2016). Identification of potential biomarkers and drugs for papillary thyroid cancer based on gene expression profile analysis. Molecular Medicine Reports, 5041–5048. https://doi.org/10.3892/mmr.2016.5855.

Challenges prior to DAVID

Protocol

Challenges and benefits of DAVID

S.No.

Example question to ask

Main function

Advantage

Drawbacks

References

Discuss

1 thought on “How to use DAVID for functional annotation in Biomarker studies?”

proofreading