the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
How Does Assimilating a Large Commercial GNSS RO Dataset Impact HAFS Hurricane Forecasts? An Evaluation in Support of the ROMEX Experiment
Abstract. While Global Navigation Satellite System (GNSS) radio occultation (RO) data assimilation improves tropical cyclone (TC) intensity forecasts, the scaling of these impacts with RO observation volume remains unclear. This observing system experiment (OSE) study evaluates the impact of assimilating the large commercial GNSS RO profile dataset from the Radio Occultation Modeling Experiment (ROMEX) on 84 Hurricane Analysis and Forecast System (HAFS) model forecasts of four 2022 Atlantic hurricanes. The ROMEX dataset contains about 20,000 daily global Spire and PlanetiQ profiles, which is roughly triple the volume of government-provided RO data that the National Centers for Environmental Prediction (NCEP) assimilated operationally in 2022. Compared to a Control experiment that uses only operational RO data, assimilating ROMEX data together with operational RO profiles in HAFS yields ~ 5–15 % relative skill improvement in minimum central sea-level pressure (PMIN) absolute intensity forecast errors in short-range forecasts, and it nearly eliminates a ~ 2–3 hPa PMIN over-intensification bias in medium-to-long range forecasts. Additionally, ROMEX commercial RO data assimilation reduces HAFS temperature and water vapor errors in the middle-to-upper troposphere. A sensitivity experiment shows that lower-tropospheric RO data assimilated below the 5-km impact height provide a substantial contribution to ROMEX forecast improvements relative to Control. These results demonstrate that quadrupling the volume of assimilated GNSS RO data yields a meaningful positive impact on regional model TC forecasts.
- Preprint
(6540 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 16 May 2026)
- RC1: 'Comment on egusphere-2026-1000', Anonymous Referee #1, 10 Apr 2026 reply
-
RC2: 'Comment on egusphere-2026-1000', Anonymous Referee #2, 17 Apr 2026
reply
This manuscript presents a summary of Hurricane Analysis and Forecast System (HAFS) experiments to test the impact of assimilating different radio occultation (RO) missions – and thereby more/less RO observations – into the 12x12-degree inner mesh of the model. These experiments, forecasting four Atlantic-basin tropical cyclones (TCs) from 2022, consist of a control, a data-omission experiment, an enhanced data experiment, and a data-degraded experiment. In the control, all previously-available government-owned RO data were assimilated; COSMIC-2 was omitted from this set for the omission experiment; ROMEX commercial RO data were included with the control data for the enhanced experiment; and all RO data below 5 km was omitted for the degraded experiment. The authors find that, for this sample of TCs, the extra commercial RO observations, especially those below 5 km, improve the short- to mid-range forecast performance.
One of the key questions raised in the justification of ROMEX was what the forecast impact would be of a sizeable increase in the number of available RO profiles, particularly for the lower troposphere. This study adds valuable information toward answering that question. The authors nicely show that, in one TC (Fiona), the increased sample density nudges the model to dry the near eyewall environment and thereby inhibit vortex enhancement that the control simulation resolves. Further, by eliminating RO data below 5 km, the authors show evidence that RO observations in the lowermost troposphere provide positive impact to the forecasts.
I overall find the work to be of high quality and significance, but have concerns about some of the ways the authors pose their experiments, the discussion or omission of assimilation counts, the completeness of the analysis, and the clarity of the discussion.
Major comments:
1. The two denial experiments need more justification for why the particulars of each setup were chosen.
Why would one want to withhold only COSMIC-2 and not, say COSMIC-2 and MetOp or perhaps all RO? Are there particular questions in the community about the value of COSMIC-2 that need to be answered? If so, the authors should introduce and discuss these questions.
Given the repeated points the authors make comparing low signal-to-noise (SNR) observations from Spire with those from COSMIC-2, is it because they are trying to make statements about the value of high SNR observations? This needs clear discussion if so. As well, as I point out in later a comment, PlanetiQ produce high SNR observations and are selectively omitted when the authors discuss the SNR.
If this is a means of producing a “no RO” experiment, this is obviously an imprecise setup for doing that. Though COSMIC-2 dominates the Control sampling, there are a non-trivial quantity of other government-provided data. Consider Figure 2: there are O(100) unique COSMIC-2 occultations assimilated for Hurricane Ian (2022). Given the global sampling values given in Section 2c, we would expect there to be O(10) unique occultations from remaining missions in the Control_noC2 experiment for this storm. Healy et al. (2005) demonstrated that even 160 CHAMP profiles/day could improve a global forecast; O(10) in a 12x12-degree mesh will not have a trivial impact.
For the ROMEX_noLowlevRO experiment, what motivated the choice of 5 km impact height as the threshold below which RO data are withheld?
2. The assimilation counts for each experiment and each storm should be presented.
One can get a sense of the RO data counts for Hurricane Ian (2022) from Figure 2, but this is only one of the storms, and a graphical presentation is not the easiest to interpret for understanding, e.g., the actual increase of observations between the Control and ROMEX experiments. A table of values would be quite valuable for the reader, especially for appreciating how the data quantity has changed in the ROMEX experiment for Hurricane Fiona, the sole subject of section 5.
Related, in the Summary, it is appropriate to give the total global counts of the government-owned and commercial datasets, but it does not give the correct context for this work that considers a moving 12x12-degree grid in the Atlantic basin. It would be appropriate to also give the total occultation counts that were actually assimilated in this work.
3. Are the results, especially comparisons with Control, dependent on latitude?
If “the statistical forecast error analysis…was heavily influenced by…cases which spent most of their time…outside the tropics” (lines 697-699), then the impact of removing COSMIC-2 from the assimilation – one of the authors’ primary experiments – is not so clear. The authors acknowledge that COSMIC-2 sampling decreases in the extratropics, and it is apparent from Figure 2 that the COSMIC-2 sampling within the mesh decreases as Hurricane Ian (2022) moves poleward. How then does the statistical analysis change when only considering those forecasts that are for TC locations equatorward of some latitude? As a reviewer, it’s not obvious what latitude this should be as the assimilated data count for each assimilation window in each experiment and storm is not given (though I encourage them to consider its inclusion in the manuscript). The authors would, however, have access to this information and could select an appropriate latitude.
4. Omission of PlanetiQ in discussion of low vs. high SNR datasets should be corrected.
Consider Figure 3 and some of the discussion around it (e.g., line 313; this also applies to lines 647-650). The separation in the figure and the omission in the discussion of PlanetiQ helps the authors raise what they find to be an “interesting result” about COSMIC-2. But isn’t it more accurate to say that the result is that the low SNR dataset (Spire) has a notably different rejection rate in the lowermost atmosphere than both two other high SNR datasets (COSMIC-2 and PlanetiQ)?
The authors presented results for each of the three selected missions side-by-side in Figure 2. I don’t think there’s justification for not doing so in Figure 3. And the discussion about low vs. high SNR datasets should not omit PlanetiQ and should clarify what the authors find interesting about the results.
5. The analysis and storytelling of the results in Section 5 is quite nice.
Other comments:
1. Section 3a: why was any discussion about MetOp excluded from this subsection? As it is an assimilated RO dataset in all four experiments, it should be introduced and included in figures here alongside COSMIC-2, Spire, and PlanetiQ. With only ~1100 global profiles/day it likely has a smaller impact than COSMIC-2 and Spire, but from a readers’ perspective, its numbers are not so few as to be justifiably ignored.
2. In various parts (lines 438, 512, 514, 532, others?), when discussing the ROMEX_noLowlevRO experiment, the authors state that it shows how the assimilation of sub-5-km RO observations changes the forecast quality relative to the ROMEX experiment. It reads a bit awkward, however, as it is possible to interpret the wording as not contrasting the authors’ ROMEX experiment that they perform but rather with the ROMEX data itself. Since all RO below 5 km is being rejected, this is more than just the impact of ROMEX (the dataset); the authors do make this clear in line 534. This is a subtlety, but clarity will ensure the reader does not misinterpret. Perhaps something like “our ROMEX experiment” in these spots would help?
3. Section 2d, conclusions: I would be fine with retaining the naming, but is “data denial” the best classification for the experiments? One of the big questions this paper seeks to answer is what the impact of the ROMEX commercial data has on TC forecasting. One of the three (plus a control) experiments is a data enhancement experiment; the other two are data denial. I suppose one could consider Control to be a data denial from the ROMEX experiment, but that doesn’t quite represent the experimental setup.
4. Please include all data (e.g., RO, GFS) in the data availability statement. It is appreciated that the datasets are available upon request, but the source of the data are sure to be public-facing repositories, even if, like the ROMEX dataset, they have restrictions.
5. Please update references to include DOIs.
Line-by-line comments:
Line 28, 124: “EXperiment"
Line 40, 690: what is the relative increase within the study domain? COSMIC-2 observations are not global and thus, the global sampling does not scale to a 12x12-degree mesh that may fall outside the tropics.
Line 133: there are more than this number of occultations per day, and some NWP centers are leveraging all of them. Perhaps better to say “…approximately 27,000 global daily commercial RO…”
Line 135: here too, it may help clarity to reword along the lines of “…impact of assimilating a large subset of the total commercial RO bending angle dataset…”
Line 171: was that the minimum pressure at landfall? Was the record based on the minimum pressure anywhere along the hurricane track or at the time of landfall?
Lines 185-188: you may consider including this statement earlier in this subsection, for instance, at line 159. It is fine here, but I was wondering while reading the earlier lines where the authors got the various bits of information about the storms.
Lines 201-203: what the is model lid for HAFS? Vertical resolution? How is bending angle initialized at upper levels for HAFS?
Line 308-309: what is the total count of profiles in this 1000-950 layer for the three missions?
Lines 327-336: I think this work warrants some additional explanation for the reader. Is this analysis from HAFS using those occultations that fall in the inner nest?
Line 401: I have no doubts about the quality of ERA5, but I’m also unfamiliar with its skill in reanalyzing TC locations and pressure minima. Have you measured ERA5 errors in TC location/intensity relative to NHC or is there an appropriate reference?
Citation: https://doi.org/10.5194/egusphere-2026-1000-RC2 -
RC3: 'Comment on egusphere-2026-1000', Anonymous Referee #3, 22 Apr 2026
reply
General comments
This manuscript uses the HAFS model with the GSI 4DEnVar system to examine the impacts of assimilating ROMEX, COSMIC-2, and lower-level radio occultation observations on hurricane forecasts. The ROMEX dataset includes multiple RO sources, with the commercial observations coming primarily from Spire and PlanetiQ. The study performs cycling data assimilation and forecasting for four hurricane cases and further presents a more detailed case analysis of Hurricane Fiona (2022). Overall, the topic is valuable for assessing the potential benefits of commercial RO data for hurricane forecasting. However, several aspects of the interpretation and mechanistic discussion would benefit from further clarification to strengthen the rigor and overall persuasiveness of the study.
Specific comments
- Figure 2 shows the GNSS RO observations available for assimilation for Hurricane Ian, whereas the more detailed case analysis in Section 5 focuses on Hurricane Fiona. The rationale for choosing Ian rather than Fiona in Fig. 2 is not entirely clear. If Fiona is the primary case for the subsequent detailed discussion, it may also be useful to provide the corresponding observation-availability information for Fiona, which would improve the continuity of the manuscript and better support the later analysis.
- In Fig. 3c, PlanetiQ, like Spire, is also characterized by lower SNR than COSMIC-2. It is therefore somewhat unclear why only Spire exhibits a markedly lower rejection rate, whereas PlanetiQ does not show a similar behavior. The authors are encouraged to elaborate on the possible reasons for this difference.
- Observation-error estimation generally requires sample statistics accumulated over a sufficiently long period. However, Fig. 4 does not clearly indicate whether these results are derived from a single hurricane case or from statistics aggregated over all four cases. Even in the latter case, the sample still appears rather limited for the results to be interpreted as robust and broadly applicable observation-error estimates. I therefore encourage the authors to clarify that these are local sample statistics or local diagnostic estimates to avoid possible misunderstanding by readers.
- Figures 5 and 6 contain several statistically significant results that do not appear fully consistent with the broader implication of the manuscript that assimilating more observations is generally beneficial. For example, while it is understandable that Control_noC2 exhibits larger errors than Control in Fig. 5, Control_noC2 also shows statistically significant relative skill improvement at 60 and 72 h in Fig. 6. In addition, Fig. 7 suggests that the Pmin bias in Control_noC2 is overall closer to zero than that in Control. Taken together, these results appear to suggest that excluding COSMIC-2 may, in some respects, lead to better forecast performance, which seems somewhat at odds with the broader interpretation presented in the manuscript. Further clarification from the authors would therefore be helpful.
- The mechanistic interpretation presented in lines 599–608 is currently somewhat stronger than what is directly supported by the diagnostics shown. Figures 11–13 support the inference that ROMEX analyses are drier in Fiona’s inner-core / near-storm environment and that the subsequent forecasts exhibit a weaker inner-core structure and reduced over-intensification bias. By contrast, the proposed links involving enhanced AAM convergence, vertical AAM transport, and vorticity tilting are physically plausible, but they are not explicitly diagnosed by the present figures. I therefore recommend that the authors either (i) substantially soften this discussion and frame it as a qualitative interpretation consistent with prior theory, or (ii) provide additional momentum or vorticity budget diagnostics to more directly support these claims.
- The mechanistic interpretation presented in lines 599–608 is currently somewhat stronger than what is directly supported by the diagnostics shown. Figures 11–13 support the inference that ROMEX analyses are drier in Fiona’s inner-core / near-storm environment and that the subsequent forecasts exhibit a weaker inner-core structure and reduced over-intensification bias. By contrast, the proposed links involving enhanced AAM convergence, vertical AAM transport, and vorticity tilting are physically plausible, but they are not explicitly diagnosed by the present figures. I therefore recommend that the authors either (i) substantially soften this discussion and frame it as a qualitative interpretation consistent with prior theory, or (ii) provide additional momentum or vorticity budget diagnostics to more directly support these claims.
- Based on Figs. 11–14, the proposed mechanism after line 627 appears physically plausible, but it is not directly demonstrated by the diagnostics shown. In particular, the inferences in lines 631–637 could be further supported by additional diagnostics, such as vorticity or angular-momentum budget analyses, and warm-core tilt diagnostics, if the authors wish to maintain a mechanistic interpretation at this level of specificity.
- More broadly, Section 5 contains an interesting and potentially important case study, but the mechanistic interpretation is currently stronger than what the diagnostics directly support. I therefore recommend that the authors either soften the mechanistic discussion throughout this section or provide additional diagnostics to more directly support the proposed dynamical interpretations.
Technical corrections
- In the first paragraph, the discussion abruptly shifts from bending-angle retrieval to the roles of water vapor, temperature, and pressure in refractivity. A smoother transition would improve the logical flow.
- Line 192: “VMAX stratifies hurricane intensity according to the Saffir-Simpson Scale”. Please remove the extra space in the sentence.
- Line 219: Please spell out the full name of “TCVitals” when it is first introduced.
- The manuscript uses both “HAFS” and “HAFS-A” in different places. Please make the terminology consistent throughout the manuscript.
Citation: https://doi.org/10.5194/egusphere-2026-1000-RC3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 131 | 96 | 9 | 236 | 9 | 11 |
- HTML: 131
- PDF: 96
- XML: 9
- Total: 236
- BibTeX: 9
- EndNote: 11
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript "How Does Assimilating a Large Commercial GNSS RO Dataset
Impact HAFS Hurricane Forecasts? An Evaluation in Support of the ROMEX
Experiment" by William Miller et al. presents a thorough study on the impact
of large amounts of real GNSS-RO data on the prediction of four 2022 Atlantic
hurricanes. The manuscript is well readable and accessible to readers
unfamiliar with hurricane modeling. It is mostly very clear and almost ready
for publication.
The manuscript could still be slightly improved by addressing the minor
issues discussed below:
- Page 5-6, line 143ff: "...assimilated in the lower troposphere can
positively impact HAFS forecasts, given the tendency for these observations
to have larger forward operator errors and/or likelihood of rejection there,
compared to RO data from the middle or upper troposphere."
While the reader may understand the gist of this statement, it is slightly
inaccurate and ambiguous. In the lower troposphere, observations may have
larger errors due to the complex path that GNSS signals propagate and the
resulting processing to bending angles. On the other hand, forward modeling
may use an overly simple operator (e.g. 1d Abel integral instead of
ray-tracing), have a large representativity error, and model background
error is larger. I recommend writing "...to have larger forward modeling
errors..." or something similar to summarize this.
- Page 7, line 181: "a 15-foot peak storm surge". Can the authors improve
this so that a reader used to SI units does not have to calculate?
- Page 8, lines 199ff: The description of the model configuration lacks a
specification of the model top and number of model levels. What is the type
of nesting? 1-way, or 2-way with feedback to the coarser model?
(Presumably the former.)
It appears that the outer domain is not part of the ROMEX experiment in the
sense that the additional RO data are not assimilated there. The authors
might wish to clarify this already here. Only on page 15, lines 357-359
state that the outer domain stays the same for all experiments, and in the
conclusions on page 31, lines 706ff.
- Page 9, lines 236-237: "... background super-refractivity (SR) layer where
the vertical refractivity gradient is large."
Note that "large" could be specified more clearly as, e.g., above the
critical value, if this threshold is chosen.
- Page 4, line 97, and
page 9, line 245, 255: The official spelling of "MetOp" is now "Metop",
e.g., https://www.eumetsat.int/our-satellites/metop-series
- Page 12, lines 312ff: When discussing possibly extreme O-B outliers in the
lower troposphere, it is important to note that Spire's processing is known
to involve a screening of profiles before sending them out, unlike UCAR's
processing of COSMIC-2. Therefore, a fair comparison seems difficult.
However, the comparison of rejection rates above 25 hPa could be adjusted to
a top at about 1 hPa (~ 45 km), if possible.
- Page 15, Fig.5a: It is very difficult to understand the statistical
significance of the results from this plot. Would it be possible to present
the results in a different way, perhaps by splitting?
- Page 17, lines 400ff and Fig.8/9(b,d,f): In the mean differences shown here,
ERA5 should cancel out, right? Or is it mean absolute differences to ERA5?