Looks like harmless gibberish, until you learn it's the DNA signature of Huntington's disease.
"Many persons executed at the Salem witch trials may have had Huntington's," Robert Simpson said. A molecular biologist and medical doctor, Simpson gave the first in this year's Frontiers of Science lectures at Penn State on "How DNA Research Works in Medicine, Law, and Science."
Folksinger Woody Guthrie's mother had the disease: Her face would twitch and her lips would snarl and her teeth would show, Guthrie wrote. She would start out in a low grumbling voice and gradually get to talking as loud as her throat could stand it; and her arms would draw up at her sides, then behind her back and swing in all kinds of curves. "The description of Woody's mother would certainly fit with an early New Englander's idea of someone possessed by spirits," Simpson said.
DNA research has erased the stigma of such inherited diseases. Understanding the mutations involved could lead to better treatments, even to therapy to correct genetic defects. Tests devised to study genes are changing history, and the law. DNA testing has convicted criminals on the evidence of a hair. It has found black descendants for Thomas Jefferson. And linked a certain president with a stained blue dress.
"A major revolution is in the offing," said Simpson, "comparable to when the microscope was introduced."
How DNA Works "A surprising statistic about the human genome," he said, "is the length of a unique sequence. It turns out to be about 16 to 20 base pairs, or about half an inch of a string stretched from New York to the West Coast."
If you turn that string into a cross-country zipper, the base pairs are the teeth. A zipper the size of the human genome would need some 3 billion teeth. Coil it into a ball 50 feet in diameter and you'll have an idea of how your genes—all 100,000 of them—are crammed into the 23 pairs of chromosomes in the nucleus of each of your cells.
These analogies, the long string and the coiled zipper, give an idea of how huge and delicate and difficult to get at the human genome is. But to understand how to decipher the information it contains, you need to think of it as a book.
"Measured as Manhattan telephone books, each containing about 1,000 pages of 10-point type," said Simpson, "the genome of the bacterium E. coli is about a third of a book. Baker's yeast, which is my specialty, is a full book. The human genome will occupy two hundred books."
These 200,000 pages of genetic information, encoding everything from the color of your eyes to your likelihood of colon cancer, are written in the language of DNA. In structure it's a double helix: two strands of sugars and phosphates linked by pairs of the four bases, A, T, C, or G. The four bases create the alphabet. Every word in this language is three letters long, and stands for one amino acid. Each sentence, which can be many hundreds of words long, is a gene. But to read the human genome like a book, scientists still have to figure out the grammar: where the genetic sentences stop, what's a noun and what's a verb, and what exactly do they mean?
Two things make this task possible:
The first is base pairing. DNA's bases pair up in a very specific fashion. A joins only with T, while C joins only with G. If you know the sequence of bases on one strand of an unzipped snippet of DNA, you can easily guess the sequence on the other strand. Furthermore, single strands of DNA are sticky. Like magnets, they'll find their complement and click back together if they possibly can.
Second, as Simpson pointed out, each set of 16 to 20 bases (more than a word, but less than a sentence) is unique. It's as if in all the works of Shakespeare, the bard only once said to be, or not to be.
But just as Shakespeare's works differ from Steven King's, the exact A-T-C-G sequence of one person's genome won't be the same as another person's. These polymorphisms can lead to physical differences (blue eyes vs. brown). They can cause disease, or increase the risks of it. But many are as subtle as a difference in blood type. Ten variations, or alleles, can be found at the same spot on ten people's DNA, and all of them are normal.
This fact makes possible the DNA fingerprinting used by forensics experts to solve criminal cases. It also lets DNA researchers find a gene without knowing exactly what it does. To find the Huntington's gene, researchers took DNA samples from a large Venezuelan family with many Huntington's patients. Using enzymes that cut DNA whenever they find a certain short sequence, the researchers chopped the DNA up until it looked like a plate of spaghetti. They sorted the noodles by length and saw that the pattern of DNA fragments was different for people with the disease than for those without it. Linking the patterns to the patients' family tree, the researchers saw that people with the disease all had one very long fragment. This fragment, on chromosome 4, must hold a mutation—all those CAG repeats.
The Gutenberg of Genetics Sequencing the gene for Huntington's took 10 years. Now deciphering DNA is quicker, due to a technique called PCR, or Polymerase Chain Reaction.
PCR is to genes what Gutenberg's press was to the written word. All it takes is heat, a patented enzyme (originally found in a hotspring at Yellowstone National Park), and two primers, 16- to 20-base-long bits of DNA that flank the gene you want to copy. Mix the primers with your DNA sample. Add free nucleotides—each one a base with its sugar-and-phosphate scaffold—and the hotspring enzyme, known as taq polymerase. Heat almost to boiling: the DNA will unzip into its two separate strands. As it cools, the primers, being small and quick, stick on before the two strands can zip back up. Then the taq polymerase goes to work.
A polymerase is an ordinary enzyme, part of a cell's repair kit. The taq polymerase, from the hotspring-dwelling bacterium Thermus aquaticus, is best for PCR because it works in hot water, where most proteins seize up. The job of a polymerase is to make a polymer—to link molecules into a chain. Beginning at one primer, it runs down the strand of DNA, identifies each base, finds its partner among the free nucleotides, and links them together. When it gets to the second primer, it stops. Now instead of one double-stranded piece of DNA we have two. "Each time we repeat this process," said Simpson, "the new product joins with the original molecule to be copied." It's a chain reaction. "The numbers are striking: 25 cycles can generate over 30 million copies."
Before PCR, to get enough copies of a gene to sequence it, you had to clone it. First you spliced the gene into a plasmid vector (a small ring of DNA from a bacterium). You slipped the plasmid back into a bacteria cell and let it grow into a colony, then extracted the human DNA.
Cloning was always the slow step in DNA research. It also took up a lot of lab space: To cover the whole human genome would have required a million colonies of bacteria, each growing in its own Petri dish. "With PCR," said Simpson, "you can store a gene as a code in a computer. Anybody can access the information and synthesize the DNA."
Using PCR, researchers have organized everything that's known about the sequence of the normal human genome. This "physical map" currently has 25,000 landmarks, or "sequence-tagged sites" for which an exact A-T-C-G sequence is known. To make this map took 15 million PCR reactions—a task made somewhat less Herculean by a robotic machine that can do 150,000 reactions at a time.
PCR also takes the human error away from forensic DNA identification. Commercial sets need only 28 PCR reactions to identify an individual—with an error rate of 1 in 50 billion. "Basically, it's a Universal Product Code—a barcode—for a person," said Simpson.
The DNA Chip
After PCR, the next big step is the DNA chip. Using tricks from the semiconductor trade, scientists grow 65,000 different oligonucleotides (short bits of single-stranded DNA) on a chip a half-inch square. (The next-generation chips will fit 400,000.) Among others, you can now get the human tumor suppressor gene p53 and the breast cancer genes BRCA1 or BRCA2 and two genes often mutated in drug-resistant strains of the AIDS virus, each on its own chip. You can make a chip to match any gene, as long as you know the gene's sequence.
Since the bits on the chip are single strands of DNA, they're sticky. They're looking for their match. You can take a blood sample from an AIDS patient, extract the DNA, squirt it onto a chip and see, by the pattern of which pieces stick, if the patient has a drug-resistant strain. "The whole analysis takes five minutes."
"With these chips," Simpson added, "we can start looking at polygenic diseases, at diseases that are influenced by multiple genes. Hypertension, diabetes, some forms of cancer. Things that ærun in the family,' but no one knows why."
Robert Simpson, M.D., Ph.D., holds the Verne M. Willaman Chair of Molecular Biology at Penn State, 308 Althouse Lab, University Park, PA 16802; 814-863-0276; rts4@psu.edu. He is an expert on the structure of chromatin, a protein-DNA complex found in the nucleus of cells.