Technical Note: Hybrid Machine Learning Model for Bias Correction of UTLS Relative Humidity against IAGOS Observations in ERA5 Reanalysis
Abstract. Persistent contrail cirrus form in Ice-Supersaturated Regions (ISSRs) and are responsible for a large portion of aviation’s non-CO2 climate impact. Avoiding ISSRs through strategic flight rerouting has been proposed as a short-term mitigation strategy. However, accurate forecast of ISSRs is hindered by the difficulty of predicting Relative Humidity with respect to Ice, RHi, at cruising altitude. Observations are problematic: Satellite-based global measurements carry large uncertainties while aircraft in-situ measurements offer a limited spatial coverage. On the contrary, ERA5 reanalysis data are offer a global estimate of RHi, but it is known to exhibit a dry bias near the tropopause where ISSRs are located as well as significant random errors.
In this study, we develop a hybrid ensemble machine learning (ML) model to improve RHi estimates in the Upper Troposphere (UT) and Lower Stratosphere (LS) using ERA5 and aircraft measurements from the In-service Aircraft for a Global Observing System (IAGOS). The model combines a XGBoost regressor for drier conditions (RHi < 85 %) and an Artificial Neural Network (ANN) for more humid cases (RHi > 85 %). This hybrid approach significantly outperforms raw ERA5 data, reducing the mean absolute error from 13.7 % to 11.4 % and improving the Equitable Threat Score (ETS) for ISSR detection from 0.36 to 0.44. The greatest improvement is observed in the lower stratosphere, where the ETS increases by 0.18 and the Mean Absolute Error (MAE) drops from 13.19 % to 10.71 %. These improvements mark a key step toward more reliable identification of ISSRs, helping reduce the uncertainties that currently limit effective flight-rerouting strategies.
## Overall
This is a nice technical note expanding on the work of Wang et al 2025. I think the manuscript deserves to be published after improving the clarity of the presentation and considering the major comments below. Ideally the manuscript would be accompanied by example training code in the author's language of choice.
## Major Comments
- L3 & L46: Many publications on this topic often make some form of the statement "[There are] considerable errors in RH_i estimates" which makes "accurate forecast of ISSRs [difficult]." I'm interested to see more analysis on the type and distribution of errors to better understand how ISSR forecast errors will result in ineffective (or inefficient) avoidance measures. In our experience, RH_i (and ISSRs) have high pointwise error, but overall ISSR regions are (generally) spatially and temporally correlated with ISSR forecasts.
- L253: What are the requirements to support effective contrail avoidance strategies?
- L51: Wang et al 2025 published a ANN humidity correction methodology. This publication adds an XGBoost regression for RH_i < 85%, and a different training/validation data split. Given the similarities, this line deserves a whole paragraph describing the differences with Wang 2025, and how this methodology aims to improve on the previous work.
- L74: What kind of biases in the weather might this domain selection introduce? Have you tested how well your models apply outside this domain?
- L83: Did you consider model levels? It may be worth exploring if the higher vertical resolution would improve your results.
- L117-121: How did you interpolate the values for T and q? Linear interpolation in q introduces bias when working with coarse pressure levels.
- Table 1: Teoh et al 2024 introduced a latitude correction for the humidity correction. Should latitude be a feature?
## Minor Comments
- L31: "are spending" -> "spend"
- L33: Suggest using stats from more recent Teoh, R. et al. (2024) “Global aviation contrail climate effects from 2019 to 2021,” Atmospheric Chemistry and Physics, 24(10), pp. 6071–6093. Available at: https://doi.org/10.5194/acp-24-6071-2024.
- L37: Its worth motivating why we need to detect ISSRs. Its presumed that the reader knows "to meteorologically forecast ISSRs with enough accuracy" we need ISSR detections. May want to add context e.g. "Global ISSR forecasts are generally derived for numerical weather forecasting systems, or nowcast from in situ measurements or inferred from remote sensing. Both approaches rely on accurate detections of ISSRs, in the first case to validate models, or in the second through measurements"
- L44: Not just ERA5 - any numerical weather prediction system. I'd flip this around - numerical weather prediction models provide a comprehensive prediction across the global atmosphere. ERA5 is a highly trusted source of numerical weather prediction.
- L45: Define what a dry-bias means
- L48: Other publications with humidity correction: (constant) Schumann, 2012; Schumann et al., 2015; Teoh et al., 2020; Schumann et al., 2021; (piecewise function) Teoh et al 2022; Teoh et al 2024; (quantile mapping) Platt et al 2024
- L55: This sentence sounds like an LLM. I'd move L59-L61 up front, remove this sentence, and then have L57-58. Can you be more specific as to why you chose the hybrid model? From this description it sounds like you used XGBoost for compute performance reasons rather than accuracy.
- L94: How long is the "longer period"?
- L104: Just confirming that IAGOS accuracy is a function of RH_i or of absolute humidity. I had remembered that humidity sensor accuracy was a function of absolute humidity.
- L126: How does this compare to Wang 2025?
- L156: Is it possible the ANN is overfitting these engineered features? You acknowledge the proper data split, but could you use additional data outside the domain to gain confidence?
- L160: This criteria sounds more like "No existing cirrus" rather than "clear sky." Could also look at the IAGOS ice crystal measurements to judge pre-existing cirrus (Petzoldt 2025)
- L170: (Re)Introduce acronym MAE
- L182: Add citation? Where does this baseline come from?
- L186-188: Its not clear to me why "structured input data" ~ drier regimes. Its more clear to me that "high humidity conditions" ~ complex non-linear dependencies.
- L221-222: This is first clear explanation of why XGBoost is preferable to ANN for the drier regimes. L230 - L233 is also great. Bring this language up front!
- L223: Repeats the previous line
- Table 3 is super helpful - It would be helpful to use this language up front when describing the benefits of the hybrid architecture.
- Table 3, Table 4: How do these results compare with Wolf et al 2025 or Platt et al 2024 (quantile mapping)