Shotgun sequencing is a method used to determine the DNA sequence of an organism by randomly breaking up DNA into small fragments and reassembling the sequences back together using overlapping regions. The name “shotgun” is derived from the random and explosive-like fragmentation process similar to a shotgun blast.
The idea of shotgun sequencing was first proposed in 1979 by Staden to speed up the sequencing process as this method allows the sequencing of larger genomes in a shorter amount of time. The first shotgun sequencing protocol was developed by Messing in 1981 using the M13 phage vector. A year later, in 1982, Sanger used the shotgun method to sequence the phage λ genome. In 1995, Venter and Smith developed the whole-genome shotgun sequencing method to sequence the Haemophilus influenzae genome. Venter used this method in the late 1990s to sequence the human genome.
Shotgun sequencing is now often done using next-generation sequencing (NGS) platforms. NGS technologies have become widely used due to their affordability and speed. NGS can handle the vast amounts of data generated by shotgun sequencing.
Interesting Science Videos
Principle of Shotgun Sequencing
Shotgun sequencing works on the principle of randomly breaking DNA into small pieces and sequencing them individually. The main principle is to generate a large number of short DNA sequences by fragmentation which are then analyzed by specialized bioinformatics tools to identify overlapping regions. These overlapping regions are used to piece together the reads and reconstruct the entire genome.
Shotgun sequencing begins with the extraction and purification of DNA from the organism of interest. This purified DNA is then fragmented into small random pieces. Each fragment is individually sequenced using different sequencing technologies which generates a vast amount of short DNA reads. Different bioinformatics tools are then used to assemble the overlaps between these reads and analyze the sequencing data to reconstruct the complete genome.
Types of Shotgun Sequencing
There are two main methods of shotgun sequencing:
1. Hierarchical Shotgun Sequencing
- Hierarchical shotgun sequencing, also known as clone-by-clone sequencing, involves sequencing large genomes by first cloning DNA fragments into vectors and mapping the genome before sequencing.
- The extracted DNA is divided into fragments using restriction enzymes or mechanical shearing and these fragments are inserted into vectors such as bacterial artificial chromosomes (BACs) to create a clone library. The next step involves creating a physical map of the genomes using techniques like restriction mapping.
- Then, individual clones are selected and prepared for sequencing. The sequence data is assembled and annotated to reconstruct the complete genome sequence. Assembled sequences are checked and if necessary, gaps are filled using additional sequencing methods.
- The main advantage of this method is the ability to handle large genomes. The mapping step also provides useful information about the structure of the genomes. However, this process can be time-consuming and costly as it includes physical map construction and individual region sequencing.
- The Human Genome Project used this method to successfully sequence the human genomes.
2. Whole Genome Shotgun Sequencing
- Whole-genome shotgun sequencing directly sequences the entire genome without the initial mapping step.
- In this method, the DNA is randomly broken into small fragments and sequenced. The sequenced data is assembled using bioinformatics tools. These assembled sequences are annotated and analyzed to generate the complete genome sequence.
- This method is faster and more cost-effective than hierarchical shotgun sequencing as it does not require the construction of a physical map and individual region sequencing.
- However, assembling the sequenced fragments can be difficult in this method. The absence of a physical map also makes the data analysis difficult.
- Craig Venter and colleagues successfully sequenced and assembled the human genome using this method at Celera Genomics which was founded to sequence human genomes faster than the Human Genome Project.
Hierarchical vs. Whole Genome Shotgun Sequencing
Characteristics | Hierarchical Shotgun Sequencing | Whole Genome Shotgun Sequencing |
Method | This involves sequencing individual clones in an ordered manner. | This involves sequencing random fragments of the genome. |
Physical map | It involves creating a physical map before sequencing. | It does not require a physical map. |
Time | It is more time-consuming due to multiple steps. | It is faster as it eliminates the physical mapping step. |
Suitability for genome size | It is better suited for large and complex genomes. | It is more efficient for small genomes. |
Computational requirement | It is less computationally complex and requires lower computational resources. | It is more computationally complex and requires higher computational resources. |
Process of Shotgun Sequencing
The process of Shotgun Sequencing is divided into the following 7 steps.
1. Sample Preparation
In this initial step, environmental or biological samples of interest are collected and processed for DNA extraction. The extraction of DNA is done using different physical and chemical methods. At first, the cells are lysed to release DNA. Then the DNA is separated from other cellular components.
2. DNA Fragmentation
The extracted DNA of interest is then randomly fragmented into small pieces using methods such as sonication. Fragments are generated randomly to ensure an unbiased representation of the genome. These fragments undergo end repair to create blunt ends suitable for adapter ligation.
3. Library Construction
This step involves preparing DNA fragments for sequencing. The DNA fragments with ligated adapters are amplified to create a library of fragments ready for sequencing. The resulting library contains a collection of all prepared DNA fragments which is loaded onto the sequencing platform.
4. Sequencing
Each of the fragments is sequenced independently. Several rounds of sequencing are performed on the same DNA sample to generate multiple short reads. Shotgun sequencing uses different high-throughput sequencing technologies that can generate short reads from randomly fragmented DNA. This generates a vast amount of sequence data quickly. The raw sequence data is processed to determine the nucleotide sequence using base calling.
5. Assembly
In this step, the sequenced data and the overlapping fragments are used to assemble the short DNA reads into longer contiguous sequences called contigs. The contigs are further aligned and assembled to reconstruct the complete genome sequence. Any gaps between contigs are filled using additional sequencing techniques or bioinformatics tools. Quality control is used to remove low-quality reads and adapter sequences before assembly. It is also done after the assembly to check the quality of contigs and to correct errors.
6. Annotation and Analysis
Then the sample is annotated to predict the structure and function of the genes. It includes structural and functional annotation. It is also used to determine the non-coding regions including regulatory elements. This step is useful for transforming raw sequence data into meaningful information.
Advantages of Shotgun Sequencing
- Shotgun sequencing is more cost-effective than traditional methods as it reduces the time and resources associated with genome sequencing.
- Shotgun sequencing can be done on large amounts of DNA samples and it can sequence entire genomes.
- Shotgun sequencing is fast as it can sequence many DNA fragments simultaneously and does not require the time-consuming steps of mapping before sequencing.
- It can process millions of fragments simultaneously generating vast amounts of data in a short period.
Limitations of Shotgun Sequencing
- Shotgun sequencing generates massive amounts of data that require significant computational resources and bioinformatics tools to assemble the short sequence reads into a complete genome.
- Complex genomes, particularly those with repetitive sequences can be challenging to assemble and can lead to errors in the sequence. Incorrect assembly of fragments due to repetitive sequences or sequencing errors can lead to inaccurate genome reconstruction.
- In cases where errors occur from shotgun sequencing, additional sequencing using more labor-intensive methods may be required.
- There can be regions of the genome that are not covered by any sequenced fragments leading to gaps in the assembled genome.
- Regions with low complexity can be underrepresented or missed in shotgun sequencing.
Applications of Shotgun Sequencing
- Shotgun sequencing is used in whole genome studies which plays an important role in understanding genetic variations and mutations associated with rare diseases or different types of cancer.
- Shotgun sequencing is widely used in metagenomics to study the genomes of microbial communities present in environmental samples.
- Shotgun sequencing is useful in clinical diagnostics to detect genetic disorders and pathogens directly from patient samples.
- It also helps in identifying non-coding regions of the genome which is essential for understanding gene functions and expression patterns.
- Shotgun sequencing can be used in forensic science for analysis of forensic DNA samples.
- Shotgun sequencing can also be used to improve the accuracy of existing reference genome sequences by removing errors, filling gaps, and correcting errors.
References
- A Guide To Next-Generation Shotgun Sequencing In Metagenomics: Technique, Advantages and Challenges – Genetic Education
- Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002. Chapter 6, Sequencing Genomes. Available from: https://www.ncbi.nlm.nih.gov/books/NBK21117/
- Gan JH. (2023). Unlocking genetic mysteries: The power of shotgun sequencing. Journal of Bacteriology and Infectious Diseases, 7(5), 161. https://www.alliedacademies.org/articles/unlocking-genetic-mysteries-the-power-of-shotgun-sequencing.pdf
- Genome Sequencing (ndsu.edu)
- Giani, A. M., Gallo, G. R., Gianfranceschi, L., & Formenti, G. (2019). Long walk to genomics: History and current approaches to genome sequencing and assembly. Computational and structural biotechnology journal, 18, 9–19. https://doi.org/10.1016/j.csbj.2019.11.002
- Hierarchical Sequencing vs. Whole Genome Shotgun Sequencing – What’s the Difference? | This vs. That (thisvsthat.io)
- Sharpton T. J. (2014). An introduction to the analysis of shotgun metagenomic data. Frontiers in plant science, 5, 209. https://doi.org/10.3389/fpls.2014.00209
- Shotgun Metagenomic Sequencing (illumina.com)
- Shotgun Metagenomic Sequencing Guide (microbiomeinsights.com)
- Shotgun Sequencing (genome.gov)
- Waterston, R. H., Lander, E. S., & Sulston, J. E. (2002). On the sequencing of the human genome. Proceedings of the National Academy of Sciences, 99(6), 3712–3716. https://doi.org/10.1073/pnas.042692499
- What is shotgun sequencing? (yourgenome.org)
- Whole Genome Shotgun Sequencing | Sequencing.com