DNA sequencing
Adapted from Wikipedia · Adventurer experience
DNA sequencing is the way we find out the order of tiny building blocks in DNA. These blocks are called nucleotides, and there are four types: adenine, thymine, cytosine, and guanine.
Knowing the DNA sequence is important for many things. It helps scientists study living things, and it is useful in medical diagnosis, biotechnology, forensic biology, virology, and systematics. It can even help doctors find diseases and decide the best way to treat a patient.
DNA sequencing has gotten much better over time. It now lets scientists read all the genes, called genomes, of humans, animals, plants, and tiny organisms. In the 1970s, scientists used harder methods. Later, they used brighter lights and machines to make the process faster and easier.
Applications
DNA sequencing is a useful tool that helps scientists learn about the genetic code. It can show the order of parts in genes, bigger areas of DNA, whole chromosomes, or even the entire genomes of living things. It also helps in studying RNA and proteins by looking at the parts of DNA that make them.
This technology is important in many areas, such as molecular biology, where it helps researchers learn about genes and diseases. In evolutionary biology, sequencing shows how different species are related. It is also used in metagenomics to find tiny organisms in places like water or soil. In virology, sequencing helps scientists study viruses and see how they change. In medicine, it can help find genetic diseases and guide treatment. In forensic science, DNA sequencing helps identify people by their unique genetic patterns.
The four canonical bases
Main article: Nucleotide
DNA is made up of four main building blocks called bases: thymine (T), adenine (A), cytosine (C), and guanine (G). DNA sequencing is a way to find out the exact order of these bases in a piece of DNA. While most DNA uses just these four bases, some viruses and special cases can have slightly different building blocks. Scientists are always discovering new ways DNA can be arranged.
History
Discovery of DNA structure and function
DNA was found in 1869 by Friedrich Miescher. For years, scientists thought proteins held genetic information. This changed in 1944 when Oswald Avery, Colin MacLeod, and Maclyn McCarty showed that DNA could change bacteria traits. In 1953, James Watson and Francis Crick proposed the double-helix model of DNA, showing how genetic information is stored and passed on.
RNA sequencing
RNA sequencing was one of the first ways scientists studied genes. In the 1970s, Walter Fiers and his team at the University of Ghent in Belgium were the first to sequence a complete gene and a small virus called Bacteriophage MS2.
Early DNA sequencing methods
The first methods for reading DNA sequences were developed in the 1970s. Scientists like Ray Wu at Cornell University used special techniques to read DNA sequences. Later, Walter Gilbert and Allan Maxam at Harvard created a method to sequence DNA. Around the same time, Frederick Sanger and Alan Coulson developed a method that made DNA sequencing faster and easier.
Sequencing of full genomes
The first complete DNA genome sequenced was a small virus called bacteriophage φX174 in 1977. In 1984, scientists sequenced the Epstein-Barr virus, which was a big step because they knew little about it before. By 1995, scientists had sequenced the entire genome of a bacterium called Haemophilus influenzae. By 2003, a big international project called the Human Genome Project created a draft of the human genome, and by 2022, scientists had filled in the last missing pieces.
High-throughput sequencing (HTS) methods
In the late 1990s and early 2000s, scientists developed faster ways to sequence DNA, called "next-generation" or "second-generation" sequencing. These methods break the genome into tiny pieces and sequence many pieces at once, making it possible to sequence entire genomes quickly. These advances have helped scientists learn more about health, human history, and personalized medicine.
Main article: Whole genome sequencing
Basic methods
Main article: Maxam-Gilbert sequencing Main article: Sanger sequencing
Two main ways help us read the order of DNA building blocks.
The first way was made by Allan Maxam and Walter Gilbert in 1977. It used special chemicals to cut DNA at certain spots. This method was complex and needed special materials, so it was not used much after better ways were made.
The second way was made by Frederick Sanger in 1977. It became very popular because it was simpler and more reliable. Scientists later made Sanger’s method better by using bright labels and machines to do the work automatically. This made DNA sequencing faster and cheaper. The Sanger method helped sequence the first human genome in 2001, starting the field of genomics. After that, even newer methods made sequencing quicker and more affordable.
Today, most DNA sequencing uses a method called “sequencing by synthesis.” This method watches a special enzyme as it adds DNA building blocks one by one to make a new copy of DNA. By seeing which block is added each time, scientists can read the DNA sequence. This method is used in many modern machines that can handle large amounts of DNA very quickly.
Large-scale sequencing and de novo sequencing
Large-scale sequencing helps scientists study very long pieces of DNA, like entire chromosomes. Scientists break the DNA into smaller pieces. These pieces are copied many times and studied. The small pieces are then put back together like a puzzle to make the full DNA sequence.
"De novo sequencing" means finding out a DNA sequence from scratch, without any prior knowledge. One common method is called "shotgun sequencing." In this method, DNA is broken into random pieces. Each piece is sequenced, and then all the pieces are put together based on how they fit overlap. This method works well for sequencing large amounts of DNA.
High-throughput methods
High-throughput sequencing includes methods like exome sequencing, genome sequencing, and transcriptome profiling. These tools help scientists study many pieces of DNA quickly. This makes research faster and more affordable.
These changes have improved how we learn about living things and help doctors treat illnesses better. Companies like Illumina, Qiagen, and ThermoFisher Scientific have worked hard to make these tools better for everyone.
| Method | Read length | Accuracy (single read not consensus) | Reads per run | Time per run | Cost per 1 billion bases (in US$) | Advantages | Disadvantages |
|---|---|---|---|---|---|---|---|
| Single-molecule real-time sequencing (Pacific Biosciences) | 30,000 bp (N50); maximum read length >100,000 bases | 87% raw-read accuracy | 4,000,000 per Sequel 2 SMRT cell, 100–200 gigabases | 30 minutes to 20 hours | $7.2-$43.3 | Fast. Detects 4mC, 5mC, 6mA. | Moderate throughput. Equipment can be very expensive. |
| Ion semiconductor (Ion Torrent sequencing) | up to 600 bp | 99.6% | up to 80 million | 2 hours | $66.8-$950 | Less expensive equipment. Fast. | Homopolymer errors. |
| Pyrosequencing (454) | 700 bp | 99.9% | 1 million | 24 hours | $10,000 | Long read size. Fast. | Runs are expensive. Homopolymer errors. |
| Sequencing by synthesis (Illumina) | MiniSeq, NextSeq: 75–300 bp; MiSeq: 50–600 bp; HiSeq 2500: 50–500 bp; HiSeq 3/4000: 50–300 bp; HiSeq X: 300 bp | 99.9% (Phred30) | MiniSeq/MiSeq: 1–25 Million; NextSeq: 130-00 Million; HiSeq 2500: 300 million – 2 billion; HiSeq 3/4000 2.5 billion; HiSeq X: 3 billion | 1 to 11 days, depending upon sequencer and specified read length | $5 to $150 | Potential for high sequence yield, depending upon sequencer model and desired application. | Equipment can be very expensive. Requires high concentrations of DNA. |
| Combinatorial probe anchor synthesis (cPAS- BGI/MGI) | BGISEQ-50: 35-50bp; MGISEQ 200: 50-200bp; BGISEQ-500, MGISEQ-2000: 50-300bp | 99.9% (Phred30) | BGISEQ-50: 160M; MGISEQ 200: 300M; BGISEQ-500: 1300M per flow cell; MGISEQ-2000: 375M FCS flow cell, 1500M FCL flow cell per flow cell. | 1 to 9 days depending on instrument, read length and number of flow cells run at a time. | $5– $120 | ||
| Sequencing by ligation (SOLiD sequencing) | 50+35 or 50+50 bp | 99.9% | 1.2 to 1.4 billion | 1 to 2 weeks | $60–130 | Low cost per base. | Slower than other methods. Has issues sequencing palindromic sequences. |
| Nanopore Sequencing | Dependent on library preparation, not the device, so user chooses read length (up to 2,272,580 bp reported). | ~92–97% single read | dependent on read length selected by user | data streamed in real time. Choose 1 min to 48 hrs | $7–100 | Longest individual reads. Accessible user community. Portable (Palm sized). | Lower throughput than other machines, Single read accuracy in 90s. |
| GenapSys Sequencing | Around 150 bp single-end | 99.9% (Phred30) | 1 to 16 million | Around 24 hours | $667 | Low-cost of instrument ($10,000) | |
| Chain termination (Sanger sequencing) | 400 to 900 bp | 99.9% | N/A | 20 minutes to 3 hours | $2,400,000 | Useful for many applications. | More expensive and impractical for larger sequencing projects. This method also requires the time-consuming step of plasmid cloning or PCR. |
Methods in development
DNA sequencing methods that are still being made include using tiny openings called nanopores to read the DNA sequence. Scientists also use special types of microscopy to see where nucleotides sit in long DNA pieces. These new ways try to make sequencing faster, cheaper, and simpler.
One new idea uses electrical checks to tell apart DNA bases as they go through a channel. Another way uses tiny bits of known DNA to find unknown sequences. Scientists can also use tools like mass spectrometry to weigh DNA bits and spot small changes. These methods help when studying human DNA, especially if the DNA is damaged. Other ways use tiny chips to test many things at once or watch how DNA copies itself.
Market share
In 2022, a company named Illumina had about 80% of the market for DNA sequencing. Other companies, like PacBio, Oxford, 454, and MGI, made up the rest. This means most people and scientists used Illumina's methods to read DNA.
Sample preparation
Before we can read the code inside DNA, scientists prepare tiny pieces from plants or animals. They take out DNA or RNA and make sure the strands stay long and undamaged. If they take out RNA, they change it into a special kind of DNA called complementary DNA (cDNA) to study it more easily.
Depending on the method used to read the DNA, extra steps might be needed. Some methods need special treatments before reading can start. Scientists always check the quality of their samples to make sure everything is just right for clear results.
Development initiatives
In October 2006, the X Prize Foundation began a project to improve how we read all the DNA in a person. This project, called the Archon X Prize, offered $10 million to the first group that could make a machine to read 100 full human DNA sets quickly and accurately.
Each year, the National Human Genome Research Institute gives money for new research and inventions in the study of DNA. This includes creating new and better ways to read DNA.
Computational challenges
DNA sequencing makes many small pieces of data. Scientists need to fit these pieces together like a puzzle. Some parts of DNA repeat many times, which can make it tricky to know where each piece belongs.
Scientists use special computer programs to help organize and check the DNA pieces. These programs can remove parts of the data that might cause mistakes. After getting the data, there is still much work to understand it using biology and computer science tools.
| Name of algorithm | Type of algorithm |
|---|---|
| Cutadapt | Running sum |
| ConDeTri | Window based |
| ERNE-FILTER | Running sum |
| FASTX quality trimmer | Window based |
| PRINSEQ | Window based |
| Trimmomatic | Window based |
| SolexaQA | Window based |
| SolexaQA-BWA | Running sum |
| Sickle | Window based |
Ethical issues
Further information: Bioethics
DNA sequencing has raised important questions about fairness and safety. One big issue is who can see the information from your DNA test. You should always be asked before your DNA is used. Some people worry that this information could be used in unfair ways. For example, there are concerns that insurers might use it to change prices.
Laws like the Genetic Information Nondiscrimination Act in the United States help protect people from being treated unfairly because of their DNA. However, some experts think we need more rules to keep everyone safe, especially with new and more detailed DNA tests. These tests can sometimes show information about you and your family.
Images
This article is a child-friendly adaptation of the Wikipedia article on DNA sequencing, available under CC BY-SA 4.0.
Images from Wikimedia Commons. Tap any image to view credits and license.
Safekipedia