A merged hurricane boundary layer height dataset in the western Atlantic based on dropsonde measurements and ERA5 reanalysis
Abstract. The hurricane boundary layer (HBL) critically regulates exchanges of heat, moisture, and momentum between the ocean and atmosphere. Accurate estimation of HBL height (HBLH) is essential for understanding hurricane dynamics. Dropsonde observations provide high accuracy but have limited spatiotemporal coverage, whereas ERA5 reanalysis offers continuous coverage with systematic biases. This study presents a merged HBLH dataset for 75 hurricanes over the western Atlantic during 2002–2024, generated by integrating 4438 dropsonde profiles, ERA5 reanalysis, and IBTrACS hurricane records through a Random Forest machine learning framework. Nineteen input variables representing thermodynamic, dynamic, and hurricane-specific parameters were used to predict the HBLH bias between dropsondes and ERA5. The corrected dataset retains the original ERA5 resolution (0.25° × 0.25°, 1-hourly) and significantly reduces systematic errors relative to dropsonde observations. Validation shows a correlation coefficient of 0.93 with dropsonde-derived HBLH, and reductions in MAE from 544 m to 159 m and RMSE from 661 m to 246 m. The merged HBLH reproduces the radial and asymmetric structure within hurricane domains more accurately than ERA5, while providing continuous temporal and spatial coverage suitable for further analysis of HBL dynamics under hurricane conditions. The dataset is publicly available at https://doi.org/10.5281/zenodo.17196964.
Summary: The goal of the project is to develop a correction for the
ERA5 prediction of PBL height in TC conditions. The method chosen is to
fit a random forest model for the ERA5 height bias based compared to a
database of drop sonde estimates of the PBL depth. The basic result of
the model is that for the calibration data, the bulk error score RMSE
and MAE are both reduced and there is an R^2 of 0.94. However, when
applied to the test data, the R^2 drops significantly to 0.74 and the
two error scores roughly double.
Recommendation: Reject and resubmit.
The basic take-away from the paper is that it is likely that an ERA5
TCBL height bias correction can be developed using the drop sonde
data. However, the discrepancy between training and test results imply
a high degree of over-fitting. R^2 = 0.94 is very high, effectively
some sort of black box has been constructed that maps ERA5 variables
into the drop sonde heights very effectively, explaining all but 6% of
the variance. However, when presented with all new data, the amount of
unexplained variance is now 26%. Plus, both summary measures of bias
error are much larger: RMSE from 148 to 305 m; MAE from 94 to 209 m)
It can be hypothesized that the model has learned all of the quirks in
the training data and figured out how to compensate. However, the
withheld data has its own quirks and the model does not
generalize. Likely suspects are that the model is too deep or that it
splits too easily.
It seems likely that a useful bias model can be derived, but that it
is unlikely to have the somewhat gaudy performance statistics quoted
here for the training data. I'd encourage the authors to reevaluate
their model development strategy and start over. Often the greatest
amount of labor is all of the data preparation, which they have
completed. So the new effort is just a more careful model fitting
procedure and a more critical eye applied to the results. There should
only be a small drop-off between fitting and testing statistics.
Smaller suggestions:
(1) You mention "boundary layer" values of wind, temperature, etc.,
but never define what you mean. Are these some sort of mean values or
are they the level values between the ERA5 model hight and the
surface?
(2) In the model fitting, you break up the wind into zonal and
meriodional components. In the TC, a more dynamically relevant
coordinate system might be radial and azimuthal. One represents the
primary circulation and the other represents the secondary
circulation. These are also more naturally oriented with the
interesting dynamical and thermodynamical processes that affect the
PBL depth.