Dynamic weighted ensemble of geoscientific models via automated machine learning-based classification
Abstract. Despite recent developments in geoscientific (e.g., physics/data-driven) models, effectively assembling multiple models for approaching a benchmark solution remains challenging in many sub-disciplines of geoscientific fields. Here, we proposed an automated machine learning-assisted ensemble framework (AutoML-Ens) that attempts to resolve this challenge. Details of the methodology and workflow of AutoML-Ens were provided, and a prototype model was realized with the key strategy of mapping between the probabilities derived from the machine learning classifier and the dynamic weights assigned to the candidate ensemble members. Based on the newly proposed framework, its applications for two real-world examples (i.e., mapping global soil water retention parameters and estimating remotely sensed cropland evapotranspiration) were investigated and discussed. Results showed that compared to conventional ensemble approaches, AutoML-Ens was superior across the datasets (the training, testing, and overall datasets) and environmental gradients with improved performance metrics (e.g., coefficient of determination, Kling-Gupta efficiency, and root mean squared error). The better performance suggested the great potential of AutoML-Ens for improving quantification and reducing uncertainty in estimates due to its two unique features, i.e., assigning dynamic weights for candidate models and taking full advantage of AutoML-assisted workflow. In addition to the representative results, we also discussed the interpretational aspects of the used framework and its possible extensions. More importantly, we emphasized the benefits of combining data-driven approaches with physics constraints for geoscientific model ensemble problems with high dimensionality in space and non-linear behaviors in nature.
Hao Chen et al.
Status: open (until 04 Jun 2023)
RC1: 'Comment on egusphere-2022-1326', Anonymous Referee #1, 06 Feb 2023
- CC1: 'Reply on RC1', Hao Chen, 03 May 2023 reply
- RC2: 'Comment on egusphere-2022-1326', Anonymous Referee #2, 25 May 2023 reply
Hao Chen et al.
Hao Chen et al.
Viewed (geographical distribution)
Review of Dynamic weighted ensemble of geoscientific models via automated machine learning-based classification by Chen et al.
This manuscript demonstrates the merits of automatic ML (AutoML) for two geoscience use cases. In general, the paper is well written. The authors developed an ML workflow to find the best combination of models or the optimal model. They used the term ML classifier. It took me a while to understand this is different from the conventional classification problem for which the goal is to identification class labels for each sample. Instead, the goal in this work is to find the weights for combining the physics-based model ensemble.
My main question is whether it is necessary to use the ensemble-bassed AutoML in your use cases. Can you simply use a single ML model, e.g., XGBoost, to find the model weights/probabilities? Your workflow sounds like an ensemble of ML models for an ensemble physics models. Is this right? If so, the computational burden may be overwhelming.
Other minor comments:
Figure 3 (d)-(j). It seems all models fall outside the gray uncertainty envelope related to the 17 models. AutoML also represents an ensemble of ML models. In addition to plotting the ensemble mean from AutoML, can you develop an uncertainty envelope based on the AutoML ensemble.
Figure 7. Both AutoML-Ens and STIC use very similar reddish color. Can you make a stronger contrast?