Penn State and Geisinger team up to establish new program for graduate students

$2.4M program is designed to train the next generation of biomedical scientists

UNIVERSITY PARK, Pa. — A new $2.4 million program for graduate students seeking to contribute to breakthrough discoveries in medicine and biology has been established at Penn State, with nearly $1.4 million in funding from the National Library of Medicine of the U.S. National Institutes of Health and more than $1 million from Penn State.

The new Biomedical Big Data to Knowledge Training Program (B2D2K) brings together Pennsylvania data scientists, biomedical researchers, and life-science researchers at Penn State and the Geisinger Genomic Medicine Institute to accelerate advances in the biomedical and life sciences. These sciences rely increasingly on the ability of researchers to analyze, interpret and visualize very large and very complex datasets, known as "big data."

"Students admitted to this training program will become a new generation of scientists who can mine mountains of complex scientific data to reveal the information buried there that can lead to advances in genetic and other types of biological and health-related research," said Marylyn Ritchie, the program's director. Ritchie is a professor in the Eberly College of Science Department of Biochemistry and Molecular Biology with a specialty in bioinformatics and genomics. At the Geisinger Health System, Ritchie is chair of the Department of Biomedical and Translational Informatics and chief research informatics officer.

"This new training program emphasizes the integration of data science into clinical and biomedical research," Ritchie said. "It will enable young scientists to enter the workforce with expertise in the emerging field of biomedical data science. They will gain data-analysis and communications know-how that they can use throughout their careers in research that is critical for discoveries that will benefit human health."

The B2D2K program will support up to nine Penn State graduate students per year who are working toward a doctoral degree in an area of science related to the goals of the program. Each B2D2K trainee will be co-mentored by faculty members with complementary expertise in data sciences and biomedical sciences.

Graduate students in the B2D2K program's inaugural group, which began training during the 2017 spring semester, are Anna Basile (biochemistry, microbiology and molecular biology), Awtum Brashear (immunology and infectious disease), Miriam Brinberg (human development and family studies), Thanh Le (information sciences and technology), Robert Nichols (molecular toxicology), and Jaiwei Wen (statistics).

The second round of trainee selections will be announced in late spring 2017, and will be admitted to the program to begin training in the fall 2017 semester. Eligible students may contact Associate Professor Cooduvalli S. Shashikant at to obtain more information about participating in the B2D2K program.

This training program is a component of the Big Data to Knowledge program developed by the National Institutes of Health and administered by the National Library of Medicine. It is an investment in the next generation of data scientists in order to assure that the vast wealth of biomedical data resulting from significant scientific discoveries can be mined quickly and efficiently in order to achieve useful results for human health and healing.

The Penn State B2D2K program was developed by Ritchie and two Penn State faculty members: Vasant Honavar and Runze Li. Honavar, whose primary expertise is in computer science, machine learning, and data analytics, is Professor and Edward Frymoyer Chair of Information Sciences and Technology, professor of computer science, director of the Center for Big Data Analytics and Discovery Informatics, and associate director of the Institute for CyberScience. Li, whose primary expertise is in statistics and data analytics, is Verne M. Willaman Professor of Statistics, professor of public health sciences, a principal investigator at the Penn State Methodology Center, and co-director of Penn State Center for Statistical Genetics.

"The program complements the informatics research initiatives of the Penn State Clinical and Translational Science Institute (CTSI), which is funded by the National Institutes of Health," said Neil Sharkey, vice president for research at Penn State. "It also leverages Penn State's strategic investments in advanced computing infrastructure through faculty hires in the data sciences."

Penn State organizations participating in the B2D2K training program include the Eberly College of Science Department of Biochemistry and Molecular Biology and Department of Statistics; the College of Information Sciences and Technology; the College of Engineering School of Electrical Engineering and Computer Science; the College of Health and Human Development Department of Human Development and Family Studies; the College of Medicine; the Huck Institutes of the Life Sciences; and the Institute for CyberScience. Geisinger organizations participating in the B2D2K training program include the Genomic Medicine Institute, the Department of Biomedical and Translational Informatics, the Department of Functional and Molecular Genomics, and the Institute for Advanced Applications.

More information about this B2D2K training program, the application process, and information about its first group of students and faculty can be found online at

Last Updated April 27, 2017