UNIVERSITY PARK — The intricate, beautiful images of the universe streaming from the James Webb Space Telescope (JWST) are more than just pretty pixels that find their way onto computer or smartphone screens. These images represent data — lots and lots of data; in fact, the JWST delivers approximately 235 gigabytes of science data every day – about the same amount of data in a 10-day high def movie binge watching session.
JWST and other telescopes and sensors have provided today’s astronomers with an ever-growing stream of data. These sources give astronomers the unprecedented ability to look deeper into space and farther back in time than ever before — to make new discoveries, including studying how stars die. Recent Penn State work using data from JWST may change the way scientists understand the origin of galaxies.
Managing all this data isn’t without its problems, however. Astronomers must rely on supercomputers and advanced algorithms, referred to as machine learning, to take this flood of data and create accurate models of the vastness of space, unveil discoveries and inspire new questions, as well as create stunning pictures of the universe.
Joel Leja and V. Ashley Villar, both assistant professors of astronomy and astrophysics and ICDS co-hires, are among the scientists establishing Penn State as a leader in using machine learning techniques to better handle massive streams of data.
According to Leja, machine learning approaches allow researchers to crunch numbers more efficiently and accurately than previous methods. In some cases, such as interpreting galaxy imaging, these machine learning techniques can be nearly a million times faster than traditional analyses, he added.
Before the advent of machine learning, crunching data involved using analytical equations and compiling large amounts of data into tables. Researchers — often graduate students — would spend a considerable amount of time gathering and analyzing data. Without machine learning, calculations were often repetitive and time-consuming, and there was no efficient way to speed up the process.
Leja said it was a lot like planning a massively complicated trip.
“Let’s say you're trying to find the best way from Los Angeles to San Francisco,” said Leja. “Using the old techniques, we would make a list of roads, try every single route, calculate the whole distance on every tiny road — the small roads, the major highways, roundabout ways — and we would need to map every route, doing it one by one. It’s not a very good way to do it. It typically gets you the right answer, but machine learning tries to do this in a much smarter way using data — for example, it might instead use millions of previous travel routes and just quickly ask which one is fastest.”
Machine learning doesn’t just cut back on human labor, the approaches can cut down on computational labor, which, in turn, saves energy, according to Villar.
“The human labor issue is important, but we also have to consider the computer labor problem,” said Villar. “It’s using so many hours of computational time, which also means it is using a lot of energy.”