Penn State, Rutgers win grant to create Virtual Data Collaboratory

Vasant Honavar, professor at Penn State’s College of Information Sciences and Technology (IST) and College of Engineering, is leading a team of researchers to develop and evaluate a data management and sharing system called the Virtual Data Collaboratory. Credit: Penn StateCreative Commons

UNIVERSITY PARK, Pa. — Penn State has received $1.5 million of a $4 million grant from the National Science Foundation (NSF) to fund an interdisciplinary team of researchers from Penn State and Rutgers University, in partnership with several other institutions, to develop, deploy, and evaluate the Virtual Data Collaboratory, a distributed data infrastructure to support data intensive science.

The Penn State component will be led by Vasant Honavar, professor of Information Sciences and Technology and Computer Science; associate director, the Institute for Cyberscience; and director, Center for Big Data Analytics and Discovery Informatics.

“The goal of this project is to conceptualize, design, and implement a Virtual Data Collaboratory to support collaborative, data-intensive science research by multi-disciplinary teams drawn from multiple institutions,” said Honavar.

As Penn State’s principal investigator for the project, Honavar will work with a team of Penn State research colleagues, including: Jenni Evans, professor of meteorology and interim director of the Institute of Cyberscience; Karen Estlund, associate dean for technology and digital strategies; Lee Giles, professor of information sciences and technology and computer science; Wayne Figurelle, assistant director of the Institute for Cyberscience; Chuck Gilbert, technical director of the Institute for Cyberscience; and Mary Beth Rosson, professor of information sciences and technology.

The research team will design the VDC, a federated infrastructure that integrates state of the art data-intensive computing platforms, storage, and networking with an innovative data services layer across Penn State, Rutgers, and other institutions in the region. The project will be interconnected through a high-speed network with the potential to expand to incorporate academic and research institutions across the United States. The VDC will leverage existing data repositories including the Ocean Observatories Initiative and the Protein Data Bank, and existing investments in advanced cyberinfrastructure such as the NSF-funded Big Data Regional Hubs, XSEDE, and OSG, among others.

“Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens and our ability to acquire, share, integrate, and analyze disparate types of data,” said Honavar.  “However, realizing the full potential of data to accelerate science calls for significant advances in data and computational infrastructure to support collaborative data-intensive science by teams of researchers that transcend institutional and disciplinary boundaries,” said Honavar.

“VDC will provide the collaborative infrastructure and platform for developing and integrating algorithmic abstractions of scientific domains, coupled with methods and tools for data analytics, modeling, and simulation, cognitive tools to advance science. VDC will support reproducible, sharable, and reconfigurable data-intensive scientific workflows,” said Honavar.

The research team will use several collaborative science use cases to develop and evaluate the VDC infrastructure. One such case includes using the VDC to assemble sets of protein-DNA and protein-RNA complexes and interfaces, and developing computational methods such as machine learning for the reliable prediction of these interfaces. The group will share data and computational infrastructure to advance the understanding of how proteins recognize and bind to DNA and RNA structures and their role in biological processes like aging and disease.

“VDC will benefit the efforts of the Institute for Cyberscience to develop and deploy advanced computational infrastructure to support data and computation enabled scientific discovery at Penn State,” said Evans.

“[The project] will benefit not only fundamental and applied research in the Data Sciences, but also the education and training of a diverse cadre of data scientists. I am excited to be working on this project with colleagues at Penn State, Rutgers, and a number of other institutions,” said Honavar.

Last Updated April 21, 2017