UNIVERSITY PARK, Pa. — Historically mapping the amount of water content in soil is essential to determine whether crops will flourish, wildfires will ignite or floods will destroy the land. Employing deep learning technology to existing soil moisture data can help accurately predict these moisture changes over time, according to a team of Penn State researchers.
The NASA satellite Soil Moisture Active Passive (SMAP) was launched in 2015 to measure the soil moisture near the ground surface around the world. Though not perfect, SMAP’s data has shown highly accurate results that are extremely valuable, especially to high-risk communities. However, since the data only goes back two years, it severely limits its use.
Chaopeng Shen, assistant professor of civil engineering, along with Daniel Kifer, associate professor in computer science and engineering, and graduate students Kuai Fang and Xiao Yang, have employed a deep learning technology called the Long Short-Term Memory (LSTM) network to extend SMAP’s observations over time. LSTM is a building block for deep networks that can be used to learn patterns of soil moisture dynamics and is able to identify where traditional physics-based models make mistakes in charting soil moisture changes.
“Currently, if we want to look at the long-term history of how things have changed, we cannot,” Shen said. “What this deep learning method does is it learns the patterns of the past moisture dynamics. We provide the data, such as rainfall precipitation and temperature, in order to train it. Then we are able to take that trained model and run it in a time window where there is no data.”
Shen compared the technology to a newborn baby who comes into the world as a blank slate and must be taught how things work. Once the baby has consumed enough information from outside sources, it is able to start making assumptions on its own. The more it learns, the better it gets at differentiating between different situations.
The researchers trained the LSTM network by inputting historical climate data and SMAP’s satellite observations over the past two years as well as model results from physics-based models. They then tested its accuracy against a popular land surface model, which tends to underestimate moisture in wet seasons and overestimate it in dry seasons.
The team found that the LSTM network was capable of correcting model bias and approximate SMAP-observed top-surface soil moisture data with highly similar results to SMAP. Despite having a lot of flexibility, when properly controlled, LSTM networks can avoid overfitting and outperform simpler methods.
The LSTM network was shown to be helpful in long-range soil moisture hindcasting or forecasting. This result can help with weather modeling, flood predictions and other applications.
“This is the first time we have shown that big data time series deep learning has a large advantage over conventional methods in hydrology because it is able to absorb and digest a lot of data,” Shen said. “Other methods don't have this kind of capacity.”
In the past, it was thought that a model this large would capture too many “noises,” Shen said.
To remedy this issue, the researchers used an algorithm called Dropout, which randomly turns off different parts of the network during training. Similar to training many smaller networks and then combining their results, this makes the results more stable.
Now that the team has figured out how to better predict soil moisture through the LSTM network, they plan to make the predictions more learnable and more interpretable.
“The network has grown to be strong, but we are not sure why it’s so good,” Shen said. “We have to go back and learn what it has learned and try to visualize the knowledge it has gained. ... As scientists, we care about the mechanisms behind these phenomenon, but we shouldn't expect to understand all the mechanisms overnight. The interpretative part of deep learning research is quite nascent, but it has already made significant achievements recently. I hope the community can watch it as loving parents and see it mature.”
This one-year project was supported by a multi-disciplinary seed grant from the Penn State College of Engineering and the Penn State Institute for CyberScience.