the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning for improvement of upper tropospheric relative humidity in ERA5 weather model data
Abstract. Knowledge of humidity in the upper troposphere and lower stratosphere (UTLS) is of special interest due to its importance for cirrus cloud formation and its climate impact. However, the UTLS water vapor distribution in current weather models is subject to large uncertainties. Here, we develop a dynamic-based humidity correction method using artificial neural network (ANN) to improve the relative humidity over ice (RHi) in ECMWF numerical weather predictions. The model is trained with time-dependent thermodynamic and dynamical variables from ECMWF ERA5 and humidity measurements from the In-service Aircraft for a Global Observing System (IAGOS). Previous and current atmospheric variables within ±2 ERA5 pressure layers around the IAGOS flight altitude are used for ANN training. RHi, temperature and geopotential exhibit the highest impact on ANN results, while other dynamical variables are of minor importance. The ANN shows excellent performance and the predicted RHi in the UT has a mean absolute error MAE of 6.6 % and a coefficient of determination R2 of 0.93, which is significantly improved compared to ERA5 RHi (MAE of 15.7 %; R2 of 0.66). The ANN model also improves the prediction skill for all sky UT/LS and cloudy UTLS and removes the artificial peak at RHi = 100 %. The contrail predictions are in better agreement with MSG observations of ice optical thickness than the results without humidity correction for a contrail cirrus scene over the Atlantic. The ANN method can be applied to other weather models to improve humidity predictions and to support aviation and climate research applications.
- Preprint
(1695 KB) - Metadata XML
-
Supplement
(1083 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-2012', Kevin McCloskey, 12 Jul 2024
Hello, this could be a very impactful finding if the ANN model generalizes well to weather conditions that it hasn't seen. I notice though in your Supplemental S2 section you describe randomly splitting the IAGOS waypoints into train/validation/test sets. Doing your cross validation in this way has a risk that your ANN model is overfitting. This is likely not a problem if you restrict your usage of the trained ANN to retrospective studies where the model inference is only applied to ERA5 data in the same times/places the model was trained on. However, if you attempted to apply an ANN trained in this way to a forecast of weather which has not happened in the real world yet, you would likely see a drop in metrics. To report metrics that are predictive of how the model will perform when applied to a weather forecast, it is best practice to train the ANN on an archived weather forecast (eg, ECMWF HRES) and use a chronological cross validation split: ie, the train set is comprised of data from time periods that are disjoint from the time periods used for the validation and test sets. For example, don't include in your validation/test sets any data from days that were included in your training set. This type of cross validation setup avoids the risk of the ANN 'memorizing' specific datapoints from the training set which are effectively also present in the validation/test sets, in a way that would not be the case when you apply the ANN to real weather forecast data. This is especially a concern here given the IAGOS waypoints occur once every 4 seconds and so adjacent datapoints (having extremely similar model inputs and target outputs) will frequently be randomly split across the train/test boundary. With the current cross validation setup, the impact of this model still seems strong, but limited to use in retrospective analyses.
Citation: https://doi.org/10.5194/egusphere-2024-2012-CC1 -
CC2: 'Comment on egusphere-2024-2012', Scott Geraedts, 15 Jul 2024
In addition to the ETS, it would be nice to have the full contingency table used to evaluate the model (e.g. for the cases in Table 2), so that other metrics could be computed if readers are interested
Citation: https://doi.org/10.5194/egusphere-2024-2012-CC2 -
RC1: 'Comment on egusphere-2024-2012', Anonymous Referee #1, 23 Aug 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2012/egusphere-2024-2012-RC1-supplement.pdf
-
CC3: 'Comment on egusphere-2024-2012', Olivier Boucher, 02 Sep 2024
The authors write that "This collocation of model meteorological variables and measured humidity values from the year 2020 comprises 3.99 million individual data points, from which 80%, 10%, and 10% are randomly selected for training, testing during the model development, and validating the ANNs, respectively." Presumably they refer to the full-resolution (i.e., 4s sampling) IAGOS data. Given the high sampling frequency, and if our understanding is correct, there is a strong autocorrelation in the data. Thus randomly selecting the training, testing and validation datasets implies that very similar conditions to those of the testing and validation datasets have been met in the training dataset. There is a well-known risk that this inflates artificially the model performance (see e.g. https://doi.org/10.1016/j.ophoto.2022.100018). At the very least the authors should select separate IAGOS flights in their training, testing and validation datasets. Even better they should consider dates that are at least one day apart for a given region.
It is unclear if the testing or validation dataset are used in Section 4 as text on line 185 and in Section 4 appears contradictory. In any case we would recommend the testing and validation datasets to be temporally disjoint from the training dataset at every location.
Citation: https://doi.org/10.5194/egusphere-2024-2012-CC3 -
CC4: 'Comment on egusphere-2024-2012', Olivier Boucher, 02 Sep 2024
I apologize for posting my piecemeal comments. I have two more questions.
Line 268: it is not clear what the 56 inputs consist of. 8 times 2 times 5 makes 80 so presumably some variables have fewer times or pressure levels. Which ones? Table 1 does not really clarify this. Could the authors provide more details?
ANN: were the input and output data normalized and how? I could not find the information neither in the main text nor in the Appendix (sorry if I missed it). Section 3.3 presents an ablation study where one ERA5 variable is set to zero at a time, so I assume the data have been centred (as usual practice is to set the variable to its mean value).
Citation: https://doi.org/10.5194/egusphere-2024-2012-CC4 -
RC2: 'Review of Wang et al.', Anonymous Referee #2, 10 Oct 2024
In this study, the authors train an artificial neural network to predict the distribution of relative humidity over ice in the UTLS over Western Europe. The network is trained on a mixture of thermodynamical and dynamical variables, although the former explain most of the prediction skill. The network is better than ERA5 at predicting RHi, and its inputs lead to better contrail prediction from the Cocip model in one case study.
The paper deals with an important topic. It is very well written. The introduction is excellent. The figures illustrate the discussion well, although I would have preferred to see more maps because scatterplots only give an incomplete indication of the ability of the network to reproduce patterns of humidity.
Others have commented in the online discussion on the need to better separate the training dataset from the validation dataset. I will not elaborate further on that aspect but revisions to the method are clearly needed there.
I have a couple of additional comments that would need to be addressed before the study is published.
- First, I am surprised by the selection of the study region, as shown in Figure 1. Why doesn’t it extend further west? Given that the network relies on the temporal evolution of humidity it seems it would make sense to include the regions where most of the humid regions are either formed or advected from. There is plenty of IAGOS data over the North Atlantic and Eastern US too.
- Second, the lack of importance of the dynamical variables in explaining the prediction is surprising. The explanation proposed by the authors, that of a strong correlation between thermodynamical and dynamical variables, is plausible. But time scales are crucial in that correlation, so I wonder whether the study design somehow maximises the correlation. By the choice of the study region for example, which excludes the North Atlantic where dynamics might affect the evolution of humidity more clearly? Or by the choice of lead times? On that point, the question on temporal dependence asked in Section 3.1 on lines 227-228 is never really answered. How much does including distributions 6hr before current time improve the network prediction, for example?
Other comments:
- Lines 188-189: I am not sure that the RHi peak is always artificial. Sanogo et al. (2024) https://doi.org/10.5194/acp-24-5495-2024 suggest that the peak is seen in IAGOS in cloudy conditions. See their Figures 4 and 5.
- Lines 308-311: Can you clarify how that statement relates to the statement on correlation made earlier in the paragraph?
Citation: https://doi.org/10.5194/egusphere-2024-2012-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
462 | 148 | 39 | 649 | 35 | 19 | 14 |
- HTML: 462
- PDF: 148
- XML: 39
- Total: 649
- Supplement: 35
- BibTeX: 19
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1