Molecular cloning allows the isolation of individual fragments of DNA in quantities suitable for detailed characterization, including the determination of nucleotide sequence. Indeed, determination of the nucleotide sequences of many genes has elucidated not only the structure of their protein products, but also the properties of DNA sequences that regulate gene expression. Furthermore, the coding sequences of novel genes are frequently related to those of previously studied genes, and the functions of newly isolated genes can often be correctly deduced on the basis of such sequence similarities.
Current methods of DNA sequencing are both rapid and accurate, and determining the sequence of several kilobases of DNA is a straightforward task for most molecular biology laboratories. Thus, it is now far easier to clone and sequence DNA than it is to determine the amino acid sequence of a protein. Since the nucleotide sequence of a gene can be readily translated into the amino acid sequence of its encoded protein, the easiest way of determining protein sequence is the sequencing of a cloned gene.
The most common method of DNA sequencing is based on premature termination of DNA synthesis resulting from the inclusion of chain-terminating dideoxynucleotides (which do not contain the deoxyribose 3′ hydroxyl group) in DNA polymerase reactions (Figure 3.24). DNA synthesis is initiated from a primer that has been labeled at one end with a radioisotope. Four separate reactions are run, each including one dideoxynucleotide (either A, C, G, or T) in addition to its normal counterpart. Incorporation of a dideoxynucleotide stops further DNA synthesis because no 3′ hydroxyl group is available for addition of the next nucleotide. Thus, a series of labeled DNA molecules is generated, each terminating at the base represented by the dideoxynucleotide in each reaction. These fragments of DNA are then separated according to size by gel electrophoresis and detected by exposure of the gel to X-ray film (autoradiography). The size of each fragment is determined by its terminal dideoxynucleotide, so the DNA sequence corresponds to the order of fragments read from the gel.
Large-scale DNA sequencing is frequently performed using automated systems, which use fluorescence-labeled primers in dideoxynucleotide sequencing reactions (Figure 3.25). As the newly syntehsized DNA strands are electrophoresed through a gel, they pass through a laser beam that excites the fluorescent label. The resulting emitted light is then detected by a photomultiplier, and a computer collects and analyzes the data. This type of automated DNA sequencing has enabled the large-scale analysis required for determination of the complete genome sequences of bacteria, yeast, C. elegans, and Drosophila, and is soon expected to yield the complete sequence of the human genome