As scientists work toward further personalizing medical treatment through genomics, heritability -- the proportion of observed variation in a particular trait that can be attributed to inherited genetic factors -- is key to understanding more precisely how a person's DNA contributes to risk factors for such hereditary diseases as Alzheimer's, Parkinson's syndrome and various cancers.
The process of determining heritability, however, is tedious and often fruitless, as genetic variation can be extremely difficult to assess, according to Marylyn Ritchie, associate professor of biochemistry and molecular biology at Penn State and director of the Center for Systems Genomics, part of the Huck Institutes of the Life Sciences. Studies often require thousands of participants in both "case" and "control" groups, and in the case of rare genetic disorders, tens or even hundreds of thousands of participants might be required in order to generate enough data to link a given mutation or set of mutations to a particular condition.
"Working with DNA sequence data, you'll get the variants in the genome that are common and shared among people, and then you'll also get rare variation -- base changes that are unique to individuals or at least less common in a population," Ritchie explains. "We typically do studies with thousands of people, but to study rare variation, you either need to get tens or hundreds of thousands of people -- which is not cost-effective -- or you need to do some other type of analysis to try to work with those rare variants. So we're trying to develop new algorithms and tools to analyze those data."
Rather than analyzing each DNA base independently, a common approach to studying rare genetic variation is to use a software program to "bin" together all the variants within a gene and count how many of the subjects with a disease have any variation in that gene. Those data are then compared with data from a control group in order to find out which variants may be significant in the context of the disease.
"That looks like a promising approach," Ritchie says, "but the limitation is that the researcher has to annotate and subsequently bin the data in a very manual way, and it's a very arduous process -- it takes a lot of effort, and you can only annotate and bin the variants based on what knowledge you already have or what you can gather from other data sources to figure out how they go together."