Extracting information of TGF-β1 gene using National Center for Biotechnology Information (NCBI)

By Avishek Majumder on July 11, 2017

In order to understand the process of using National Center for Biotechnology Information (NCBI) for extraction of gene information for bioinformatics studies, the author uses case example of Transforming Growth Factor beta 1 (TGFβ1) gene which encodes for the TGFB1 protein found in humans. Transforming growth factor beta 1 or TGF-β1 is a polypeptide and therefore belongs to the superfamily of cytokines (Ciftci et al., 2014). It has many cellular functions, such as cell growth, cell proliferation, cell differentiation and apoptosis (Jackowska et al., 2013).

Furthermore, the TGF-β1 in humans is encoded by the TGF-β1 gene. They are predominantly found in immune cells of leukocytes and platelets. Moreover, de-regulation of the gene may lead to accidental apoptosis of cell leading to various chronic and genetic diseases. However, the most important function of this gene is to heal wounds and control the immune system (Healy et al., 2009).

Researchers have found many diseases caused by mutations in TGF-β1 gene and proteins Duchenne Muscular Dystrophy (Aartsma-Rus, Ginjaar, & Bushby, 2016; Govoni et al., 2013; Verhaart & Aartsma-Rus, 2012), Kidney disease and Diabetes (Kurpas et al., 2015; Lan & Chung, 2012; Nabrdalik et al., 2013; Schnaper et al., 2009), Cancer (Fang, Yu, Zhong, & Yao, 2010; Jonson et al., 2003; Pal et al., 2016; Schirmer et al., 2012; Xu et al., 2011; Zhi-Yong, Guang-Ling, Mei-Mei, Ya-Nan, & He-Qin, 2011).

Similarly, other studies include Scleroderma (Haas & Writer, 2013; Lee et al., 2012), Camurati-Engelmann Disease (Damiá & Gómez, 2015; Janssens, 2006; Ra et al., 2015), Lung Disease, (Dorfman et al., 2008; Fernandez & Eickelberg, 2012; Mak et al., 2009), Obesity (Collet, Laplanche, & de Vernejoul, 2013; Fuentes & Martínez, 2017; Yadav et al., 2011; Yan et al., 2014), Marfan Syndrome (Benke et al., 2013; Bolar et al., 2012; Verstraeten et al, 2016; Yang, Huang, & Duan, 2012) and finally Cardiac Fibrosis (Dobaczewski, Chen, & Frangogiannis, 2011; Pohlers et al., 2009; Ramprasath et al., 2012; Xiao et al., 2010).

Extraction of general information of gene from GenBank

Since the information to be extracted is nucleotide, go to the dropdown menu and choose “Nucleotide” to retrieve results. As shown in the image below,”i’ nucleotide was chosen and then the 3rd search result Homo sapiens partial TGF-β1 gene for transforming growth factor, beta 1, allele TGF-β1*p036, exon 1 was also chosen.

NCBI Genbank Homepage

Identifying the accession numbers

Accession numbers are unique identifiers of data entry that are annotated and useful for further researches. In addition accession numbers of entries serve as primary repositories of sequence and other molecular data. As shown in the image below, the accession number of the gene is represented as LM651059.1. However, do not confuse accession number with Reference Sequence (RefSeq) numbers. Although both serve the same purpose, the RefSeq numbers are NCBI conducted annotation projects  and are not part of any database as they are not annotated and non-redundant. Furthermore, RefSeq and accession numbers can be distinguished from GenBank accessions by underscore in the third position.

giving basic information about TGF-β1 gene
TGF-β1 gene entry basic information

How not to confuse between RefSeq number and accession number is shown using an example for a predicted NCBI entry. As it can be seen, the entry is not annotated and non-redundant and termed predictive, hence, there is a NCBI RefSeq number but not accession number.

Referencing sequence of thek TGFB1 gene in the NCBI
RefSeq gene entry sample

Characterizing the TGF-β1 gene

Furthermore, for any other information related to the nucleotide entry, detailed information on the gene that is redundant and annotated is mentioned in the same page after opening the entry link from the search results (See Image below).

TGF-β1 gene complete information in Genbank page

In conclusion, characterizing the gene can be done on the basis of putative information available on the NCBI GenBank page represented below. The characterization chart starts with:

  • Firstly, Locus section consists of: LM651059 (locus name), 2888bp (sequence strength), Deoxyribonucleic acid (DNA) (molecule type), PRI (GenBank division, primate sequences) and 02-sep-2014 (last modification date). Therefore Locus name helps entries with similar sequences; number of nucleotide base pairs for sequence length/strength and GenBank division shows to which group or division a record belongs.
  • Definition of sequence includes information such as source organism, gene name/protein name, or some description of the sequence’s function. The definitions as mentioned here are: Homo sapiens partial TGF-β1 gene for transforming growth factor, beta 1, allele TGF-β1*p036, exon 1.
  • Accession number gives the unique identifier for a sequence record; here, the accession number is LM651059. Another fact is that accession numbers never change, even if the information in the record changes.
  • Version represents sequence identifiers so that when any change is made to a sequence, it receives a new GI number and increases to its version number. 1 is the version number for the case entry.
  • Keywords are basically for describing the sequence.
  • Source represents the origin of the DNA sequence from the species of organism. Here since it is Homo sapiens, a complete classification is also represented.
  • Reference shows the list of publications by the authors of the sequence and similar other relative information. It also consists of information related to journal title, authors and the kind of submission.
  • Features represent the information about genes and gene products and their biological significance. It consists of various parts such as source, gene, regulatory, coding DNA sequencen (CDS) and Exon.
  • Source defines length of the sequence, scientific name of the source organism, and Taxon ID number. Gene shows the complete genomic sequence of the entry.
  • Finally, CDS is the coding sequence; region of nucleotides that encodes for the sequence of amino acids in a protein and also shows the location of start and stop codons. Exon shows the number of coding part in the genomic sequence for expressing mRNA.


  • Benke, K. et al., 2013. The role of transforming growth factor-beta in Marfan syndrome. Cardiology Journal, 20(3), pp.227–234.
  • Bolar, N., Van Laer, L. & Loeys, B.L., 2012. Marfan syndrome. Current Opinion in Pediatrics, 24(4), pp.498–504.
  • de Bonilla Damiá, Á. & García Gómez, F.J., 2015. Camurati-Engelmann disease. Reumatologia Clinica, pp.395–396.
  • Ciftci, R. et al., 2014. High serum transforming growth factor beta 1 (TGFB1) level predicts better survival in breast cancer. Tumor Biology, 35(7), pp.6941–6948.
  • Collet, C., Laplanche, J.L. & de Vernejoul, M.C., 2013. Camurati-engelmann disease with obesity in a newly identified family carrying a missense p.Arg156Cys mutation in the TGFB1 gene. American Journal of Medical Genetics, Part A, 161(8), pp.2074–2077.
  • Dobaczewski, M., Chen, W. & Frangogiannis, N.G., 2011. Transforming growth factor (TGF)-?? signaling in cardiac remodeling. Journal of Molecular and Cellular Cardiology, 51(4), pp.600–606.
  • Dorfman, R. et al., 2008. Complex two-gene modulation of lung disease severity in children with cystic fibrosis. Journal of Clinical Investigation, 118(3), pp.1040–1049.
  • Fang, F. et al., 2010. TGFB1 509 C/T polymorphism and colorectal cancer risk: A metaanalysis. Medical Oncology, 27(4), pp.1324–1328.
  • Fernandez, I.E. & Eickelberg, O., 2012. The Impact of TGF-β on Lung Fibrosis. Proceedings of the American Thoracic Society, 9(3), pp.111–116.
  • Haas, B.M.J. & Writer, S., 2013. Scleroderma models : skin in the game. Science Business eXchange, pp.1–3.
  • Healy, J. et al., 2009. Functional impact of sequence variation in the promoter region of TGFB1. International Journal of Cancer, 125(6), pp.1483–1489.
  • Isabel Fuentes, C. & Carlos Martínez, S., 2017. TGFB1 (transforming growth factor, beta 1). Atlas of Genetics and Cytogenetics in Oncology and Haematology, 1.
  • Jackowska, M. et al., 2013. Differential expression of GDF9, TGFB1, TGFB2 and TGFB3 in porcine oocytes isolated from follicles of different size before and after culture in vitro. Acta Veterinaria Hungarica, 61(1), pp.99–115.
  • Janssens, K., 2006. Camurati-Engelmann disease: review of the clinical, radiological, and molecular data of 24 families and implications for diagnosis and treatment. Journal of Medical Genetics, 43(1), pp.1–11.
  • Jonson, T. et al., 2003. Pancreatic carcinoma cell lines with SMAD4 inactivation show distinct expression responses to TGFB1. Genes Chromosomes and Cancer, 36(4), pp.340–352.
  • Kurpas, D. et al., 2015. Does Health Status Influence Acceptance of Illness in Patients with Chronic Respiratory Diseases? Advs Exp. Medicine, Biology-Neuroscience and respiration., 6(October 2014), pp.57–66.
  • Lan, H.Y. & Chung, A.C.K., 2012. TGF-β/Smad Signaling in Kidney Disease. Seminars in Nephrology, 32(3), pp.236–243.
  • Lee, C.G. et al., 2012. Chitinase 1 Is a Biomarker for and Therapeutic Target in Scleroderma-Associated Interstitial Lung Disease That Augments TGF- 1 Signaling. The Journal of Immunology, 189(5), pp.2635–2644.
  • Mak, J.C.W. et al., 2009. Elevated plasma TGF-β1 levels in patients with chronic obstructive pulmonary disease.Respiratory Medicine, 103(7), pp.1083–1089.
  • Nabrdalik, K. et al., 2013. Association of rs1800471 polymorphism of TGF-β1 gene with chronic kidney disease occurrence and progression and hypertension appearance. Archives of Medical Science, 9(2), pp.230–237.
  • Pal, S.K. et al., 2016. THBS1 is induced by TGF-β1 in the cancer stroma and promotes invasion of oral squamous cell carcinoma. Journal of Oral Pathology and Medicine, 45(10), pp.730–739.
  • Pohlers, D. et al., 2009. TGF-β and fibrosis in different organs – molecular pathway imprints. Biochimica et Biophysica Acta – Molecular Basis of Disease, 1792(8), pp.746–756.
  • Ra, P. et al., 2015. Camurati ­ Engelmann Disease. NCBI, pp.1–17.
  • Ramprasath, T. et al., 2012. Genetic association of Glutathione peroxidase ­ 1 (GPx-­1) and NAD(P)H:Quinone Oxidoreductase 1 (NQO1) variants and their association of CAD in patients with type-2 diabetes. Molecular Cell Biochemistry, 361(1-2), pp.143–150.
  • Schirmer, M.A. et al., 2012. Acute toxicity of radiochemotherapy in rectal cancer patients: A risk particularly for carriers of the TGF-β1 Pro25 variant. International Journal of Radiation Oncology Biology Physics, 83(1), pp.149–157.
  • Schnaper, H.W. et al., 2009. TGF-beta signal transduction in chronic kidney disease. Front Biosci (Landmark Ed), 14(4), pp.2448–2465.
  • Verstraeten, A. et al., 2016. Marfan Syndrome and Related Disorders: 25 Years of Gene Discovery. Human Mutation, 37(6), pp.524–531.
  • Xiao, H. et al., 2010. Metformin attenuates cardiac fibrosis by inhibiting the TGF1-Smad3 signalling pathway.Cardiovascular Research, 87(3), pp.504–513.
  • Xu, L. et al., 2011. Association between the TGF-β1 -509C/T and TGFBR2 -875A/G polymorphisms and gastric cancer: A case-control study. Oncology Letters, 2(2), pp.371–377.
  • Yadav, H. et al., 2011. Protection from obesity and diabetes by blockade of TGF-β/Smad3 signaling. Cell Metabolism, 14(1), pp.67–79.
  • Yan, J. et al., 2014. Obesity- and aging-induced excess of central transforming growth factor-β potentiates diabetic development via an RNA stress response. Nature Medicine, 20(9), pp.1001–1008.
  • Yang, L., Huang, X. & Duan, S., 2012. Clinical application and technique of 64-slice spiral CT subtraction angiography in head and neck. VASA. Zeitschrift f??r Gef??sskrankheiten. Journal for vascular diseases, 41(1), pp.27–33.
  • Zhi-Yong, L. et al., 2011. MicroRNA-663 targets TGF-β1 and regulates lung cancer proliferation. Asian Pacific Journal of Cancer Prevention, 12(11), pp.2819–2823.