Gene Prediction- Importance and Methods

  • Gene prediction by computational methods for finding the location of protein coding regions is one of the essential issues in bioinformatics.
  • Gene prediction basically means locating genes along a genome. Also called gene finding, it refers to the process of identifying the regions of genomic DNA that encode genes.
  • This includes protein coding genes, RNA genes and other functional elements such as the regulatory genes.

Gene Prediction

Interesting Science Videos

Importance of Gene Prediction

  • Helps to annotate large, contiguous sequences
  • Aids in the identification of fundamental and essential elements of genome such as functional genes, intron, exon, splicing sites, regulatory sites, gene encoding known proteins, motifs, EST, ACR, etc.
  • Distinguish between coding and non-coding regions of a genome
  • Predict complete exon – intron structures of protein coding regions
  • Describe individual genes in terms of their function
  • It has vast application in structural genomics ,functional genomics , metabolomics, transcriptomics, proteomics, genome studies and other genetic related studies including genetics disorders detection, treatment and prevention.

Bioinformatics and the Prediction of Genes

  • With databases of human and model organism DNA sequences increasing quickly with time, it has become almost impossible to carry out the conventional painstaking experimentation on living cells and organisms to predict genes.
  • Formerly, statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other.
  • However, today, the frontiers of bioinformatics research are making it increasingly possible to predict the function of such a deluge of genes based on its sequence alone.

Methods of Gene Prediction

Two classes of methods are generally adopted:

A. Similarity based searches

It is a method based on sequence similarity searches.

  • It is a conceptually simple approach that is based on finding similarity in gene sequences between ESTs (expressed sequence tags), proteins, or other genomes to the input genome.
  • This approach is based on the assumption that functional regions (exons) are more conserved evolutionarily than nonfunctional regions (intergenic or intronic regions).
  • Once there is similarity between a certain genomic region and an EST, DNA, or protein, the similarity information can be used to infer gene structure or function of that region. 
    • Local alignment and global alignment are two methods based on similarity searches. The most common local alignment tool is the BLAST family of programs, which detects sequence similarity to known genes, proteins, or ESTs.
    • Two more types of software, PROCRUSTES and GeneWise , use global alignment of a homologous protein to translated ORFs in a genomic sequence for gene prediction.
    • A new heuristic method based on pairwise genome comparison has been implemented in the software called CSTfinder.

B. Ab- initio prediction

It is a method based on gene structure and signal-based searches.

  • It uses gene structure as a template to detect genes
  • Ab initio gene predictions rely on two types of sequence information: signal sensors and content sensors.
  • Signal sensors refer to short sequence motifs, such as splice sites, branch points, polypyrimidine tracts, start codons and stop codons.
  • On the other hand content sensors refer to the patterns of codon usage that are unique to a species, and allow coding sequences to be distinguished from the surrounding non-coding sequences by statistical detection algorithms. Exon detection must rely on the content sensors.
  • The search by this method thus relies on the major feature present in the genes.
    • Many algorithms are applied for modeling gene structure, such as Dynamic Programming, linear discriminant analysis, Linguist methods, Hidden Markov Model and Neural Network.
    • Based on these models, a great number of ab initio gene prediction programs have been developed. Some of the frequently used ones are GeneID, FGENESH, GeneParser, GlimmerM, GENSCAN etc.

References

  1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5187414/
  2. https://www.researchgate.net/publication/281858060_Bioinformatics_Approaches_for_Gene_Finding
  3. https://ksvi.mff.cuni.cz/~mraz/bioinf/BioAlg10-9.pdf
  4. https://www.academia.edu/17972052/Bioinformatics_Approaches_for_Gene_Finding
  5. http://genome.crg.es/courses/laCaixa05/laCaixa05.pdf

About Author

Photo of author

Sagar Aryal

Sagar Aryal is a microbiologist and a scientific blogger. He is doing his Ph.D. at the Central Department of Microbiology, Tribhuvan University, Kathmandu, Nepal. He was awarded the DAAD Research Grant to conduct part of his Ph.D. research work for two years (2019-2021) at Helmholtz-Institute for Pharmaceutical Research Saarland (HIPS), Saarbrucken, Germany. Sagar is interested in research on actinobacteria, myxobacteria, and natural products. He is the Research Head of the Department of Natural Products, Kathmandu Research Institute for Biological Sciences (KRIBS), Lalitpur, Nepal. Sagar has more than ten years of experience in blogging, content writing, and SEO. Sagar was awarded the SfAM Communications Award 2015: Professional Communicator Category from the Society for Applied Microbiology (Now: Applied Microbiology International), Cambridge, United Kingdom (UK). Sagar is also the ASM Young Ambassador to Nepal for the American Society for Microbiology since 2023 onwards.

1 thought on “Gene Prediction- Importance and Methods”

  1. Hello Dr Sagar.
    Thank you for your notes. I find them very rich in information for my students.
    Keep the good work going

    Reply

Leave a Comment