Institute for Computational and Data Sciences

Project aims to use artificial intelligence to turn health data into predictions

National Science Foundation grant could help scientists use machine learning algorithms to help guide decisions on a range of health, economic, social issues

A $599,883 grant from the National Science Foundation could help scientists use machine learning algorithms to use large amounts of data to guide decisions. Credit: Photo by Irwan iwe on UnsplashAll Rights Reserved.

UNIVERSITY PARK, Pa — A $599,883 grant from the National Science Foundation could help scientists use machine learning (ML) algorithms to use large amounts of data to guide decisions on a range of issues, from disease spread to economic meltdowns to social unrest.

The grant is focused on developing algorithms that can analyze longitudinal data, which is data that has been collected over time. While this type of data can be incredibly valuable in creating models to predict events, such as future medical issues, the volume of data, the large number of variables and the arbitrary gaps in when the data is collected make it difficult to discover actionable information, said Vasant Honavar, professor of information sciences and technology; Huck Chair in Biomedical Data Sciences and Artificial Intelligence; director, Center for Artificial Intelligence Foundations and Scientific Applications and associate director, Institute for Computational and Data Sciences.

“This grant was initially motivated by work that we were doing with healthcare data — and particularly electronic health records data,” said Honavar. “Think about every time you go to the hospital — let's say, for a routine checkup — the doctors will probably record a range of physiological parameters, for example, they may take your blood pressure, or blood sugar, or heart rate, along with the results of other tests. Basically, this is your health inventory — a bunch of parameters — that define your health status at that point in time.”

According to Honavar, who is principal investigator of the grant, the data could be used in different ways, such as predicting health risks for certain conditions, or watching how risk for health conditions change over time.

However, effective use of longitudinal data to predict health risks presents several challenges, according to Honavar. First, this information typically is not collected every day, or even at regular intervals. This creates arbitrary gaps in the data, according to Honavar, which makes predictions more difficult. He added that the second problem is that there may be hundreds or thousands of variables that could be measured during a medical visit or operation.

The machine learning techniques that the researchers are developing through the grant might be able to produce more accurate predictions by flexible modeling of longitudinal data, and by accurate learning of complex correlations in the massive amounts of data with large numbers of variables.

While the grant’s primary focus is developing advanced machine learning tools for predictive modeling of longitudinal data, and applying the resulting tools on healthcare data, Honavar expects the research could produce applications in other fields, such as education, social sciences, life sciences and economics.

“For example, in online education platforms, you can record all sorts of data about the student interactions with the educational materials, and, because it’s collected over time, it would be longitudinal data,” said Honavar. “Again, the students don’t go to the platform at regular times, so the data is gathered at different times.  But, if you could overcome these arbitrary gaps in data collection, you might be able to determine, for example, where the student is in the course and then design some personalized educational interventions.”

The project will explore the use of a deep learning model — called scalable deep kernel gaussian process regression — that can discover the patterns between unobserved or hidden states, potentially helping to create predictions with large amounts of data sampled at irregular times.

Researchers also intend to address a critical issue, referred to as the "AI explainability problem." According to the researchers, one issue with some AI models is that they produce predictions, but it is often difficult to determine how the model arrived at the solution. By learning the correlations behind these patterns, the researchers suggest that the model may be able to produce results that are more explainable.

The researchers added that the model will be designed to be scalable — able to handle growing amounts of data — for use in real-world situations.

In collaboration with clinical experts, the research team expects to use longitudinal data sets, including electronic health records and data from real-world healthcare applications.

Honavar added the research could offer other benefits, including producing materials for new educational courses and helping introduce students, including women and unrepresented minorities, to educational, research and career opportunities. The team also will distribute the research results, along with software and data sets to continue the research.

The grant was funded by NSF’s Information and Intelligent Systems.

Last Updated October 11, 2022

Contacts