the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning for improvement of upper tropospheric relative humidity in ERA5 weather model data
Abstract. Knowledge of humidity in the upper troposphere and lower stratosphere (UTLS) is of special interest due to its importance for cirrus cloud formation and its climate impact. However, the UTLS water vapor distribution in current weather models is subject to large uncertainties. Here, we develop a dynamic-based humidity correction method using artificial neural network (ANN) to improve the relative humidity over ice (RHi) in ECMWF numerical weather predictions. The model is trained with time-dependent thermodynamic and dynamical variables from ECMWF ERA5 and humidity measurements from the In-service Aircraft for a Global Observing System (IAGOS). Previous and current atmospheric variables within ±2 ERA5 pressure layers around the IAGOS flight altitude are used for ANN training. RHi, temperature and geopotential exhibit the highest impact on ANN results, while other dynamical variables are of minor importance. The ANN shows excellent performance and the predicted RHi in the UT has a mean absolute error MAE of 6.6 % and a coefficient of determination R2 of 0.93, which is significantly improved compared to ERA5 RHi (MAE of 15.7 %; R2 of 0.66). The ANN model also improves the prediction skill for all sky UT/LS and cloudy UTLS and removes the artificial peak at RHi = 100 %. The contrail predictions are in better agreement with MSG observations of ice optical thickness than the results without humidity correction for a contrail cirrus scene over the Atlantic. The ANN method can be applied to other weather models to improve humidity predictions and to support aviation and climate research applications.
- Preprint
(1695 KB) - Metadata XML
-
Supplement
(1083 KB) - BibTeX
- EndNote
Status: open (until 23 Aug 2024)
-
CC1: 'Comment on egusphere-2024-2012', Kevin McCloskey, 12 Jul 2024
reply
Hello, this could be a very impactful finding if the ANN model generalizes well to weather conditions that it hasn't seen. I notice though in your Supplemental S2 section you describe randomly splitting the IAGOS waypoints into train/validation/test sets. Doing your cross validation in this way has a risk that your ANN model is overfitting. This is likely not a problem if you restrict your usage of the trained ANN to retrospective studies where the model inference is only applied to ERA5 data in the same times/places the model was trained on. However, if you attempted to apply an ANN trained in this way to a forecast of weather which has not happened in the real world yet, you would likely see a drop in metrics. To report metrics that are predictive of how the model will perform when applied to a weather forecast, it is best practice to train the ANN on an archived weather forecast (eg, ECMWF HRES) and use a chronological cross validation split: ie, the train set is comprised of data from time periods that are disjoint from the time periods used for the validation and test sets. For example, don't include in your validation/test sets any data from days that were included in your training set. This type of cross validation setup avoids the risk of the ANN 'memorizing' specific datapoints from the training set which are effectively also present in the validation/test sets, in a way that would not be the case when you apply the ANN to real weather forecast data. This is especially a concern here given the IAGOS waypoints occur once every 4 seconds and so adjacent datapoints (having extremely similar model inputs and target outputs) will frequently be randomly split across the train/test boundary. With the current cross validation setup, the impact of this model still seems strong, but limited to use in retrospective analyses.
Citation: https://doi.org/10.5194/egusphere-2024-2012-CC1 -
CC2: 'Comment on egusphere-2024-2012', Scott Geraedts, 15 Jul 2024
reply
In addition to the ETS, it would be nice to have the full contingency table used to evaluate the model (e.g. for the cases in Table 2), so that other metrics could be computed if readers are interested
Citation: https://doi.org/10.5194/egusphere-2024-2012-CC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
56 | 13 | 9 | 78 | 10 | 3 | 3 |
- HTML: 56
- PDF: 13
- XML: 9
- Total: 78
- Supplement: 10
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1