Using machine learning algorithms to analyze remote sensing and ground-truth Lake Chad’s level data
Abstract. Lake Chad is facing critical environmental situations since the 1960s due to the effects of climate change and anthropogenic activities on its ecosystems. The statistical analyses of remote sensing climate variables (i.e., evapotranspiration, specific humidity, soil temperature, air temperature, precipitation, soil moisture) and remote sensing and ground-truth lake level applied to the period 1993–2012 reveal that remote sensing lake level data has a skewed distribution and positive significant association with only soil moisture, whereas ground-truth lake level has a symmetrical distribution and negative significant associations with all the climate variables. The regression of remote sensing and ground-truth lake level onto climate variables using Linear Regression (LR), Support Vector Regression (SVR), Regression Tree (RT), Random Forest Regression (RF), and Deep Learning (DL) methods show that (i) RF outperforms the other models with the highest coefficient of determination (R2) and explained variance score (EVS) values and (ii) SVR has the lowest Mean Absolute Error (MAE), Mean Squared Error (MSE), and k-fold cross-validation (k-fold CV) values. The RF feature ranking function shows that soil temperature is the major driver of remote sensing lake level fluctuations, whereas precipitation is the first factor for ground-truth lake level. This study provides more in-depth knowledge of the factors influencing Lake Chad’s level and perspectives for an integrated and forward-looking water management system for connecting climate change, vulnerability, human activities, and water balance research in the Lake Chad human-environment system. We cannot get the necessary ground truth data at this time because of the challenging security situations in the region. However, the development of the data analysis methodology reported here is of fundamental importance in understanding the water cycle dynamics in this important basin, even under challenging field conditions. Verification studies can be performed when more ground-truth data eventually become available.
Viewed (geographical distribution)