Computer Model Predicts Outcome of DNA Shuffling

March 12, 2001

University Park, Pa. --- Industries using DNA shuffling to improve enzymes, therapeutic proteins, vaccines and viral vectors may soon have a computational method for predicting the number and likely locations of crossovers, according to a Penn State research team.

"To date, the application of these methods has been based on experience and empirical methods and there was no model to understand the process which can be time consuming, expensive and of uncertain outcome," says Costas D. Maranas, assistant professor of chemical engineering. "We used thermodynamics and reaction engineering to evaluate and model this complex reaction network so we can now predict where the DNA from different parent genes will recombine."

DNA shuffling uses related genes from different species or genes with related function, fragments them and reassembles then through recombination. Researchers then place recombined genes into Escherichia coli to identify which new genes produce usable or potentially interesting products. Those genes that express a potentially interesting protein or enzyme are again fragmented and reassembled to form new recombinant genes. The process continues until a protein with the desired qualities is found.

"Beginning with genes that produce enzymes that are moderately good detergents, for example, the process iteratively searches for enzymes that are better detergents," says Maranas.

The important factors in creating recombined genes with DNA shuffling include the temperature at which annealing — joining of single-stranded DNA induced by cooling — occurs, the similarity of the genes and the size of the DNA fragments. The computer program developed by Maranas; Gregory L. Moore, graduate student in chemical engineering; Stefan Lutz, postdoctoral fellow in chemistry; and Stephen J. Benkovic, the University professor, the Evan Pugh Professor of Chemistry and holder of the Eberly Chair in Chemistry, was described in the March 13 issue of the Proceedings of the National Academy of Sciences.

The mathematical model, which provides a predictive framework for DNA shuffling, looked into how fragment length, annealing temperature, sequence identity and the number of shuffled parent sequences affect the number, type and distribution of crossovers along the length of reassembled sequences. The more similar genes are, the more potential for crossover exists. Thermodynamic analysis showed that there is a relatively narrow temperature range where the majority of annealing events take place. At really low temperatures, the DNA may anneal out of frame, forming nonsense sequences, so the objective is to minimize the temperature but optimize the legitimate crossovers and joinings to produce something functional.

"If the sequences are very different and the experiment is done at high temperature, there will be no crossovers at all," says Maranas. Fragment size also can affect the number of crossovers observed.

The researchers wanted to determine the probability that a full-length reassembled sequence of DNA would have a specific number of crossovers. The original application of this reassembly algorithm overestimated the total number of crossovers seen experimentally, especially for genes that had high sequence identity. In reality, the algorithm was not overestimating, but some of the crossovers, those that occur in regions where the parent genes are identical, are invisible experimentally because they look like non-crossover segments. These silent crossovers do not provide any diversity and the reassembly algorithm now excludes them.

"The program reveal these trends," says Maranas. "You can explore many what if scenarios before expending time and resources in the lab. These scenarios may include different fragment lengths, annealing temperatures or even the selection of different parent genes. "

Comparisons with experimental data have revealed good agreement despite the fact that there are no adjustable parameters in the model.

"While perfect quantitative agreement may be hard to judge, the program does seem to capture the observed trends," says Maranas. "At minimum, we can predict whether no crossovers, a few crossovers or many crossovers will be generated."

?DNA shuffling can only be applied to genes where at least 50 to 60 percent of the sequence is identical, but there are other protocols that can deal with less similar genes and the researchers are currently investigating these.

**aem**

Contacts:
A'ndrea Elyse Messer (814) 865-9481 (o)/ (814) 867-1774 (h) aem1@psu.edu
Vicki Fong (814) 865-9481 (o)/ (814) 238-1221(h) vfong@psu.edu
EDITORS: Dr. Maranas is available at (814) 863-9958 or at costas@psu.edu by email.