Retrieval of thermodynamic profiles in the lower troposphere from GNSS radio occultation using deep learning

Aichinger-Rosenberger, Matthias; Sjoberg, Jeremiah

doi:10.5194/egusphere-2025-2767

Preprints

https://doi.org/10.5194/egusphere-2025-2767

Preprints

26 Jun 2025

| 26 Jun 2025

Retrieval of thermodynamic profiles in the lower troposphere from GNSS radio occultation using deep learning

Matthias Aichinger-Rosenberger and Jeremiah Sjoberg

Abstract. Global Navigation Satellite Systems (GNSS) radio occultation (RO) is one of the most vital remote sensing techniques globally and of major importance for numerical weather prediction (NWP) and climate science. However, retrieving profiles of atmospheric quantities such as temperature or humidity from GNSS observations is not straightforward and dedicated algorithms still have their limitations. One of these limitations is the need for external meteorological data in the retrieval process. Various new RO missions have led to an enormous increase in data amounts and with over 10000 globally-distributed, daily profiles, RO can be considered big data nowadays. In this study, we make use of this fact by developing a new retrieval method based on a deep learning model, which only needs RO-specific quantities as an input to produce atmospheric profiles. The model is trained on almost a full year of data from COSMIC-2 and Spire RO missions, using vertical profiles of bending angle (BA) and other RO parameters as input features and operational results from a standard retrieval algorithm as target values for supervised learning. Initial results from both internal and external validation using reanalysis and radiosonde data suggest that this method produces results with an accuracy comparable to standard algorithms, while mitigating the need for external information in the retrieval process itself. These initial results serve as a starting point for further development of data-driven models for RO, which could significantly enhance the quality of RO products utilized in, e.g., climate sciences by mitigating external biases and increasing independence from other techniques.

Received: 14 Jun 2025 – Discussion started: 26 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Matthias Aichinger-Rosenberger and Jeremiah Sjoberg

Status: final response (author comments only)

CEC1:
'Comment on egusphere-2025-2767', Juan Antonio Añel, 24 Jul 2025

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, you have archived your code in a site that does not comply with the requirements of our policy, hosted by the ETH. We can not accept this. You must store your code in one of the acceptable repositories according to our policy. Also, I have not seen a license listed in the web page of your code. If you do not include a license, the code continues to be your property and can not be used by others. Therefore, when uploading the model's code to the repository, you could want to choose a free software/open-source (FLOSS) license: GPLv3, GPLv2, Apache License, MIT License, etc.
Second, you do not provide repositories for the input and output data used to produce your work. You have included links to generic webpages and main portals for the data; however, you must provide links and DOIs for repositories containing the data that you specifically have used and produced in your work.
Therefore, the current situation with your manuscript is irregular. Please, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we will not be able of continuing the peer review process or publishing your manuscript in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-2767-CEC1
- AC1: 'Reply on CEC1', Matthias Aichinger-Rosenberger, 03 Sep 2025
  
  Hello,
  Thank you for making us aware of this problem. We have now published the AROMA model as well as code and data, used in the study for evaluation, in the following repository:
  Aichinger-Rosenberger, M., & SJOBERG, J. (2025). AROMA: Python code and data to reproduce results from Aichinger-Rosenberger & Sjoberg (2025). Zenodo.
  https://doi.org/10.5281/zenodo.16417123
  This also contains the radiosonde data set used in the study, which does not have a DOI itself (contrary to the radio occultation and ERA5 data, which are publicly available). Therefore, it took longer than expected to provide this data here, apologies for this.
  Best regards
  Matthias Aichinger-Rosenberger
  
  Citation: https://doi.org/10.5194/egusphere-2025-2767-AC1
RC1:
'Comment on egusphere-2025-2767', Anonymous Referee #1, 29 Jul 2025

Review of the manuscript entitled Retrieval of thermodynamic profiles in the lower troposphere from GNSS radio occultation using deep learning by Aichinger-Rosenberger and Sjoberg

General Comments
In this manuscript, the authors propose a machine learning retrieval method (AROMA) using a simple Multi-Layer Perceptron model to retrieve thermodynamic atmospheric profiles (pressure, temperature, and specific humidity) from GNSS-RO observations, primarily from COSMIC-2 and Spire. The retrievals are trained against CDAAC 1D-Var data and validated using three sources: the CDAAC products themselves (test set evaluation), reanalysis data (ERA5), and independent radiosonde observations (RAOB). While the general idea of using machine learning for RO retrievals is not new, the authors claim novelty through the exclusive use of CDAAC 1D-Var products as training targets, aiming to replicate retrievals in a more computationally efficient or independent way. However, there are several concerns regarding the methodology, terminology, completeness of the analysis, and scientific contribution.
While the manuscript is generally well-structured and written in acceptable English, the scientific novelty is limited. The use of MLPs for GNSS-RO profile retrieval has already been explored in past studies, e.g. Lasota (2021), Hooda et al. (2023). The main difference here is the use of CDAAC 1D-Var as target values. However, the authors do not convincingly show that this leads to improved performance or independence from NWP models. On the contrary, the CDAAC retrievals themselves rely on ECMWF background fields, and ERA5, used for external validation, also assimilates many of the same observations. This compromises the claimed “independence” of the proposed method.
The study focuses heavily on bias and standard deviation, while omitting or underusing the more informative RMSE in figures and discussion. RMSE is widely accepted as a more representative metric in geophysical retrieval evaluation. Moreover, the use of inconsistent units (%, K, hPa) across tables and figures makes comparisons difficult. A more uniform and clearly explained presentation of metrics is required.
The authors overuse of vague qualitative language. Terms such as "small", "good agreement", "slightly degraded", "negligible" appear throughout the text without being substantiated by numerical values or objective criteria. For a technically oriented audience, these expressions are insufficient. All such statements should be backed up with concrete values or quantifiable thresholds.
Claims such as "almost identical accuracy", "very satisfactory results", or "similar performance to CDAAC" are often too strong given the evidence provided. In particular, the AROMA method shows noticeably worse results for specific humidity, which is glossed over in the text. Additionally, the comparison to other studies (e.g., Lasota, 2021) lacks numerical transparency and fair contextualization (e.g., different latitude bands).
Furthermore, the authors had access to longer observational periods (e.g., COSMIC-2 has been operating since 2019), but only used ~300 days of data. They also did not explore more modern and potentially better-performing DL architectures such as CNNs, RNNs, or Transformers, which could offer improvements in the reconstruction of vertical structures or better handling of sequential dependencies in profile data. This limits the potential impact of the study.
The "Outlook" section is vague and does not offer concrete suggestions or rationale for the chosen next steps. Given the availability of longer time series and more diverse RO data, it is unclear why these were not used in the current study. Moreover, no consideration is given to improving the model architecture, hyperparameter tuning strategies, or integration with other ML advances.
Figure captions and table descriptions are often imprecise or incomplete. X-axis scaling is often poorly chosen - in many plots, the data curves are compressed or overlapping, making interpretation difficult. Several figures lack panel labels (e.g., a, b, c), which makes it hard to reference specific subplots from the text.
These issues, while not invalidating the study, significantly reduce its clarity, reproducibility, and impact. Addressing them would considerably strengthen the manuscript.
Specific comments:
Line 118: The authors use a maximum horizontal distance of 500 km for RO–RAOB pairing. This threshold is very large and may introduce considerable representativeness errors, especially in dynamically active regions or the lower troposphere. Please justify this choice and consider sensitivity testing with smaller radii (e.g., 200 km), which are more common in literature.
Line 132: The term "refractional radius" is used in the manuscript, but "impact parameter" is more widely recognized and standard in the GNSS-RO and atmospheric remote sensing communities. For consistency and clarity, consider switching to the standard term.
Line 136: The coefficients k1k_1k1 and k2k_2k2 are introduced in line 136 without a reference. Please cite a standard source (e.g., Smith and Weintraub, 1953 or ITU-R recommendations) to support these values and aid reproducibility.
Line 165: The authors describe ANNs as supervised models "by definition". This is misleading, as neural networks can also be used in unsupervised (e.g., autoencoders, SOMs), semi-supervised, or reinforcement learning settings. Please revise to: "In this study, a supervised ANN is used..."
Line 174: The authors mention hyperparameter tuning but do not explain what it entails. Since this is a key concept in building neural networks, a short clarification would benefit readers unfamiliar with machine learning terminology. Additionally, the transition from a description of network architecture to applications in atmospheric science feels abrupt. Introducing a linking sentence would help maintain logical flow and improve readability.
Line 182: The manuscript mentions that hyperparameters were tuned, but does not describe how. Did the authors use grid search, random search, or another method? Please include at least a brief description and justify the chosen approach.
The authors report testing only a few configurations (e.g., 1000, 2000, 2500 neurons). Why were smaller and potentially more efficient architectures (e.g., 32–512 neurons) not tested? Consider expanding the hyperparameter search space and reporting performance for smaller models. This could improve generalizability and reduce overfitting.
Moreover, early stopping or validation monitoring is not mentioned. Using a fixed number of epochs without regularization can result in overfitting - especially with limited training data.
Line 190: The manuscript claims similar performance across input feature combinations, yet no quantitative comparison is given. Please report error metrics (e.g., RMSE) for each tested combination to support this claim.
Furthermore, consider testing the inclusion of metadata features (e.g., satellite mission ID, time of day/month), which might capture climatological variability. Was this tested?
Lastly, feature importance analysis (e.g., permutation importance, Random Forest, or SHAP) could support the chosen input configuration and identify redundant inputs.
Line 211: It is unclear whether a validation set was used separately from the test set. Using the test set for both evaluation and tuning can result in biased performance estimates. Please clarify this split.
Also, what does “random split” mean? If RO profiles are temporally correlated (e.g., from the same day), a random split may lead to data leakage. A time-based or mission-based split would improve robustness.
Line 225: The method used for interpolating ERA5 to RO locations is not explained. Please specify the horizontal (e.g., bilinear) and vertical interpolation method (e.g., linear, log-pressure).
Line 231: Also, Min–Max scaling is sensitive to outliers. Did the authors test Z-score normalization, especially for variables like LSW or SNR?
Line 235: Moreover, why were targets (pressure, temperature, humidity) scaled? Many regression models perform well without target scaling. Was the effect tested?
Line 241: The manuscript uses RMSE, bias, and STD in the results, but these metrics are not introduced in the methodology section. Please define all evaluation metrics in Section 3 and explain why they were chosen.
Line 247: The so-called “internal validation” refers to performance on the test set (i.e., hold-out data). In ML, “internal validation” is uncommon terminology — consider renaming to “test set evaluation” for clarity.
Figure 1: The figures show bias ± STD, but not RMSE, even though RMSE is arguably a more comprehensive and common error metric. For consistency with Table 3, RMSE profiles should also be plotted.
Table 3: Table 3 presents RMSE, bias, and R, but it is unclear how these were aggregated - per level, per profile, or vertically averaged? Please clarify. Also, define R explicitly in the caption - likely Pearson correlation, but currently ambiguous.
Section 4: Since data from both COSMIC-2 and Spire were used, it would be valuable to show performance separately for each mission. Differences in instrument characteristics may affect retrieval quality and generalizability.
Line 275: The figures (2–4) only show bias ± STD, despite RMSE values being reported in Table 4 . Since RMSE is a more informative metric (combining both systematic and random errors), vertical RMSE profiles should be plotted alongside, or instead of, bias ± STD for clarity.
Consider plotting the vertical error profiles for all three missions (C2, Spire, PlanetiQ) in a single figure per variable (e.g., three lines per plot). This would simplify comparison and reduce redundancy across Figures 2–4.
Table 4: Please explicitly reference Table 4 in the main text. Also, revise the caption: "Results of external validation of C2 profiles using ERA5" could be misread as validation of ERA5. Suggest: "External validation of GNSS-RO retrievals using ERA5 as reference".
Figure 6: Figure uses average differences rather than RMSE, which can mask true error magnitudes. RMSE better captures both bias and spread. Including it would make the performance evaluation more meaningful.
The x-axis scale in most figures in the manuscript is poorly chosen. It compresses the spread between profiles, making the plots less informative. Dynamically adjusting the scale to fit actual data ranges would help.
Please clarify in the caption or legend what each of the four lines represents. It seems they indicate mean ± STD for CDAAC and AROMA, but this must be clearly stated.
Line 299: The text states: "negligible differences in relative errors for temperature and pressure", yet differences reach 2 hPa - non-negligible in many RO applications. Consider rephrasing or quantifying what is meant by "negligible" in this context.
Figure 7: How were the three RAOB-RO profile pairs in Figure 7 selected? Random sampling, specific regions, or performance extremes? Clarifying selection criteria would aid interpretation.
The figure (and also other figures in the manuscript) lacks subpanel labels (e.g., a, b, c), which makes textual references ambiguous. Please label each panel clearly.
Line 315: The text says "large differences... in the lowest 2 km", but Figure 7 shows significant deviations up to 8 km, exceeding 6 K. Recommend clarifying the vertical range and associating differences with the specific method (AROMA).
Section 4: Include actual numerical error values (e.g., max pressure error = 2.3 hPa, temperature = 6.3 K) to substantiate claims like "slightly degraded performance". Avoid vague descriptors like "some larger differences".
Line 321: The method is presented as "novel", but MLPs have been previously used for RO profile retrieval (e.g., Lasota, 2021). The novelty lies more in the use of CDAAC 1D-Var as ground truth. For genuine innovation, deeper architectures (CNNs, RNNs, Transformers) could be tested.
Line 323: Calling the test set evaluation "internal validation" is misleading. This is standard ML procedure. Please refer to it as "test set evaluation" for clarity.
Line 326: Although bias is low, RMSE remains high (see Table 3). Avoid claiming "good agreement" if total errors are still significant.
Section 5: Replace generic descriptors like "small", "slight", "significant", and "good agreement" with quantified metrics or thresholds. This improves transparency and enables comparison with literature.
Line 331: "Instabilities in the retrieval" needs clarification. Does this refer to signal loss, high noise, or unphysical values? Specify the cause and manifestation.
Line 326: Be consistent with error metrics – mix of K, %, hPa hinders interpretation. Prefer using physical units consistently (RMSE in hPa/K), and clarify if % is used (relative to what?).
Line 347: Only 300 days of data were used, though C2 has been active since 2019. Please justify the short training period - limited computational capacity, data availability?
Line 358: Drawing conclusions from only 3 RAOB–RO pairs is insufficient. Clearly state the illustrative nature of these examples and consider showing a “failure case” to discuss model limitations.
Line 362: Direct comparison with Lasota (2021) is needed. Present numerical values side-by-side and discuss why AROMA performs worse for humidity (Q).
Since Lasota (2021) focused on tropics/subtropics, your RMSE (2.1/1.9/1.0) vs hers (1.9/1.9/0.5) should be broken down by latitude to make a fair comparison.
Line 380: As in earlier sections, the conclusions emphasize low bias but omit consistent RMSE reporting. Replace vague claims ("good agreement", "very satisfactory results") with quantitative metrics and exact values.
Line 402: The "future work" section mentions obvious extensions (longer period, more missions) without explaining why they were not done here. No new architectures, tuning strategies, or validation steps are proposed.
Line 410: Critical limitations are not addressed:

- Use of CDAAC 1DVar (based on ECMWF) may contradict the goal of independence from NWP.

- Performance in lower troposphere is not isolated.

- No insight into model robustness in sparse-data regions.
Recommendation
While the manuscript touches on an important topic - data-driven retrieval of RO profiles - it falls short in several key areas: originality, clarity of methodology, proper evaluation practices, and rigorous analysis. The method presented here does not outperform existing ones and the contribution is incremental at best. If the authors address the methodological gaps, improve evaluation rigor, and provide a more compelling future outlook, the paper could become a valuable contribution. In its current form, however, major revisions are needed.

Citation: https://doi.org/10.5194/egusphere-2025-2767-RC1
- AC2: 'Reply on RC1', Matthias Aichinger-Rosenberger, 21 Nov 2025
  
  Dear referee,
  
  Thank you for your constructive and thorough feedback. It helps to improve our manuscript considerably.
  
  We prepared a document including your comments and our respective answers to them (in blue).
  Yours sincerely,
  
  Matthias Aichinger-Rosenberger and Jeremiah Sjoberg
  
  Citation: https://doi.org/10.5194/egusphere-2025-2767-AC2
RC2:
'Comment on egusphere-2025-2767', Anonymous Referee #2, 14 Aug 2025

Review of
Retrieval of thermodynamic profiles in the lower troposphere from GNSS radio occultation using deep learning
by
Aichinger-Rosenberger and Sjoberg
General comments
The manuscript presents an artificial neural network (ANN) approach for the retrieval of temperature, pressure, and specific humidity profiles in the neutral atmosphere, using a proposed framework called Advancing the GNSS-RO retrieval of atmospheric profiles using MAchine-learning (AROMA). Model training is based on a large dataset of profiles from COSMIC2, commercial data from Spire, and CDAAC 1D-Var products, which serve as target values for pressure, temperature, and specific humidity. Validation is carried out against 1D-Var profiles from CDAAC (set not used during training), as well as ERA5 reanalyses, radiosondes, and commercial data from PlanetiQ receivers. While the authors report generally small errors and high correlation values when comparing model outputs to validation datasets, the analysis of the results lacks depth and the comparison to previous studies remains unclear.
I believe a comparison against previous studies is crucial. Including RMSE results would facilitate comparison with works such as Lasota et al. (2021) (while acknowledging differences between the studies). Although some quantitative values are provided, I find the analysis of results somewhat vague. The manuscript frequently uses terms like “agree very well,” “very high agreement,” and “similar overall performance” without specifying the range of agreement or indicating whether the differences are statistically significant. An assessment of which specific regions may be performing better or worse could also be beneficial.
In addition, the authors state that a key advantage of their approach is its independence from external meteorological data. The model is trained using 1D-Var products as reference data, which are themselves based on ECMWF background fields, meaning that the training data is not completely independent. The authors comment on this in the Conclusions section but still claim the AROMA’s main advantage is the independence of external meteorological data. I think this limitation should be more clearly acknowledged and articulated in the manuscript.
Overall, the paper is well written and well structured. I think it is encouraging to see incremental progress in the application of machine learning techniques to GNSS-RO data. The study is relevant and contributes to the field by expanding the training dataset in terms of size and time coverage, incorporating commercial GNSS-RO data for training and validation. However, I think the presentation and discussion of the results needs to be improved and the novelty better articulated. Therefore, I suggest major improvements are needed before publication.
Specific comments
L10: “from both internal and external validation”. Is this a common terminology for ML studies?
L24-26: The study explores an alternative approach to retrieving thermodynamic profiles from RO; however, the importance and practical utility of these profiles are barely discussed.
L54: The term “internal and external validation” is introduced for the first time in the main body of the manuscript. Apologies if I am unfamiliar, but this doesn’t seem to be a commonly used terminology. If authors choose to keep it, it would be helpful to briefly define it here for clarity.
L62: How similar or different is your ANN compared to previous studies?
L67: I recommend including a brief outline of the paper at the end of the Introduction section. This would help guide the reader and clarify the flow of the manuscript, especially given the multi-step nature of the proposed methodology.
L115: RAOB is defined twice in the manuscript but is not used consistently throughout the document. Please revise.
L118: Could the authors clarify the rationale for choosing a 500 km collocation distance for RO observations in this study? This value appears notably larger than the 200 km distance that is typically used in the literature.
L136: Please provide a citation for the constant terms.
L164: With regard to the ANN, can you explain why you chose a feed-forward multilayer perceptron (MLP) over more modern alternatives? Could more complex or structured architectures (e.g., CNNs, RNNs) be better suited?
L165: The statement “ANNs are supervised neural networks,” is not correct. Please revise to reflect that ANNs can be used in both supervised and unsupervised learning contexts.
L175: With regard to hyperparameter tuning, this seems to be slightly more explored in Section 3.3 on model setup. However, I think more context should be provided on how this is done and what is the practice in other studies.
L182-188: In general, there is a lack of justification for the chosen ANN architecture and the hyperparameter tuning process. What is the reasoning behind the parameter values presented in Table 1? Would it have been feasible to test larger batch sizes or a greater number of epochs? What limitations have you encountered? It would be helpful if the authors could comment on why this particular combination was the most successful, as this insight could be valuable for future work in this area. Additionally, providing figures or metrics to support these results would strengthen the manuscript.
L194: signal-to-noise ratio (SNR) is defined in L74. Please use it accordingly.
L227: Is there a reason why the top height level is 20 km? Climate and other studies using the retrieved thermodynamic RO profiles use data only up to this height?
Section 4: As noted in my general comments, it would be helpful to use a more specific metrics range instead of vague terms. Providing actual ranges for the reported agreement would make the analysis clearer and more informative.
L259: There is a negative bias above 12 km in both temperature and pressure in AROMA. Is this observed in other studies as well? Can the authors comment on what could be causing these biases? Are there specific regions contributing to them?
Figure 1, 2, 3, 4, 6, and 7: All these figures could benefit from adding a letter identification. Also, I think these plots would benefit from adding the RMSE profiles in addition to or instead of the STD. Confidence intervals would also be helpful to see in these plots. What binning size is used?
Table 4: not referenced in the manuscript. Please revise.
L331: “instabilities in the retrieval.” Can you clarify what this means?

Citation: https://doi.org/10.5194/egusphere-2025-2767-RC2
- AC3: 'Reply on RC2', Matthias Aichinger-Rosenberger, 21 Nov 2025
  
  Dear referee,
  
  Thank you for your constructive and thorough feedback. It helps to improve our manuscript considerably.
  
  We prepared a document including your comments and our respective answers to them (in blue).
  Yours sincerely,
  
  Matthias Aichinger-Rosenberger and Jeremiah Sjoberg
  
  Citation: https://doi.org/10.5194/egusphere-2025-2767-AC3
RC3:
'Comment on egusphere-2025-2767', Anonymous Referee #3, 19 Aug 2025
The manuscript presents a DL-based RO retrieval method (AROMA) for retrieving RO atmospheric profiles. It offers an alternative to the current 1D-Var methods used in many RO data processing centers. This data-driven retrieval could be very useful given its computational efficiency and independence of NWP background data. However, there are several major flaws in the experimental design, validation methods, result interpretation and presentation. In particular, the metric chosen for many evaluations is not scientifically meaningful, and the resulting statements are therefore not valid.
Major comments:
Given that AROMA is a data-driven method, the amount of data used is very important. I do not understand why the dataset is limited to 300 days in 2022 and only two missions. What is the reason of not using data for a longer period or additional missions? This does not make much sense considering the data-driven topic and low computational cost.

Further, given the variations among missions in terms of data size, penetration, global distribution, etc., extending the training over a longer period would allow for a more interesting exploration of performance dependence on mission, latitude, SNR, and other factors.

L76, L225–230,
how many profiles are left after the QC (e.g., quality flag QC, 0.5 km penetration QC, etc.)?

“over 50% of profiles reaching below 200 m above the Earth’s surface.” This is C2, how about the penetration of Spire? Because the study aims to retrieve GNSS-RO thermodynamic variables in the lower troposphere, why do the authors cut the retrieval to 500 m given the 1D-Var retrieval produces as much information as possible? The authors are expected to address this issue with larger training data set. Also, a figure could be helpful to answer these questions.

“… provided approximately 8000–10000 profiles per day in January 2021”. The number for the study period should be mentioned.

The presentation could be improved, particularly in the introduction and description of the data, given that many different types are involved in this study. I found it difficult to follow. For example, the authors use CDAAC, wetPf2, and 1D-Var retrieval interchangeably. In Section 2.1, the authors should clearly introduce what CDAAC, atmPrf, wetPf2, 1D-Var, bending angle profiles, and thermodynamic profiles are, and then use these terms consistently in the following sections.

Please have panel labels for all multi-panel figures.

Figures 1-4:
I understand that errors in bending angles are normalized by the observations and expressed as percentages, but reporting percent error in temperature or pressure is unusual and misleading. Think about the observation error specification of RO and radiosonde temperature in data assimilation. eFor example, the C2 STDV (Figure 2, top middle panel) at 2 km is about 0.7 K and 1.5 K for CDAAC and AROMA, respectively. Reporting AROMA’s error as ~0.5% at 2 km (Figure 2, bottom middle panel) is therefore misleading. A more appropriate approach is to calculate the change in STDV relative to a reference STDV, i.e., the STDV difference between the two methods normalized by CDAAC’s method. In addition, the significance of such differences should be included in the revision.

Therefore, many statements are not valid, and I only list some of them,

L276, “In general, the results indicate very similar overall performance of both retrievals, with slightly lower STD values for CDAAC throughout the entire domain.” The differences between CDAAC and AROMA are large. For example, the temperature STD difference for Spire is about 0.8 K and 1.2 K at 4 km and 2 km respectively.

L 293, “They depict a very high agreement between RAOB and RO profiles in general, with bias and STD values for pressure and temperature seldom exceeding 1%.” Due to the inappropriate metric used, it is not a scientifically meaningful statement.

L382, “In terms of relative errors, both pressure and temperature retrievals show small deviations to CDAAC, with bias values below 0.25% and STD values between 0.25 - 1%.”
It is inconsistent that some figures show STDV values while the tables provide RMSE (e.g., Table 5 vs. Figs. 2-4).

The authors present figures for C2, Spire, and PlanetiQ, but provide no discussion of PlanetiQ. What is the purpose of Figure 4? I note that PlanetiQ’s STDV is, on average, the largest among the three missions for AROMA. The authors should clearly describe these figures and provide a scientific explanation for these differences.

Figure 5, the authors should explicitly describe what the colors represent in the caption.

L289, I do not understand why only limited RAOB data are used here. Why don’t the authors use a larger dataset for verification? Again, a significance level is needed.

Technical issues/Specific comments
L13: RO -> RO retrieval process.
L21: Over the last decades?
L22 & many more: remove “()” in the reference citations. (e.g. Ruston et al. (2022)) -> (e.g. Ruston et al. 2022)
L25: remove the extra “)”
L40: what exactly do the authors mean by “RO product”?
L56: water vapor pressure?
L75: “Large” is not proper for penetration.
L109: spatial -> horizontal; please also mention the vertical resolution.
L116,193,195, etc.: please use the acronyms (e.g., ROAB) if are defined already.
L118–119, is there any justification to the 500 km/3h collocation threshold? Maybe some explanation or citation?
L132, Eq.1: all the parameters in the equation should be described.
L135, Eq.2: This classic equation should be given citation(s).
L149, Eq.3: x_bis usually used to represent the background vectors and capital B (similar to O) usually represents the background error covariance matrix.
L150: it is “H” not H[x] that denotes a forward operator.
L155: ROM SAF
L165: it is not fully correct. Maybe something like “ANNs are used as supervised neural network in this study, which map input…..”
L176: it looks a bit weird to have the ANN and ML studies in the introduction section and then again here. These could be reorganized to improve the readability.
L221–224, It does not seem like a pre-processing. The authors only used profiles with “good” quality flags. There was not any processing involved.
L226, the coordinate of the features is IMPH, how is it related to MSLH?
L251–252: This sentence feels weird. Maybe change to “Figure 1 shows ….”
L 271: I do not think it is an “experiment” here.
L 324, I do not see the use of atmPrf in any evaluation.
L 324, 300000->300,000
Citation: https://doi.org/10.5194/egusphere-2025-2767-RC3
- AC4: 'Reply on RC3', Matthias Aichinger-Rosenberger, 21 Nov 2025
  
  Dear referee,
  
  Thank you for your constructive and thorough feedback. It helps to improve our manuscript considerably.
  
  We prepared a document including your comments and our respective answers to them (in blue).
  Yours sincerely,
  
  Matthias Aichinger-Rosenberger and Jeremiah Sjoberg
  
  Citation: https://doi.org/10.5194/egusphere-2025-2767-AC4

Matthias Aichinger-Rosenberger and Jeremiah Sjoberg

Viewed

Total article views: 2,014 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,873	117	24	2,014	25	35

HTML: 1,873
PDF: 117
XML: 24
Total: 2,014
BibTeX: 25
EndNote: 35

Views and downloads (calculated since 26 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	63	13	3	79
Jul 2025	72	41	7	120
Aug 2025	365	17	5	387
Sep 2025	1,286	15	4	1,305
Oct 2025	62	14	2	78
Nov 2025	25	17	3	45

Cumulative views and downloads (calculated since 26 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	63	13	3	79
Jul 2025	72	41	7	120
Aug 2025	365	17	5	387
Sep 2025	1,286	15	4	1,305
Oct 2025	62	14	2	78
Nov 2025	25	17	3	45

Viewed (geographical distribution)

Total article views: 1,957 (including HTML, PDF, and XML) Thereof 1,957 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 21 Nov 2025

Short summary

This paper presents a data-driven framework for retrieving atmospheric profiles from Global Navigation Satellite Systems (GNSS) radio occultation (RO). The benefit compared to standard products is its independence of external information. Profiles are validated using reanalysis and radiosonde data, with results showing accuracy comparable to standard methods. This represents a promising investigation on the applicability of such models in RO, which could advance the quality of RO products.


Total:	0
HTML:	0
PDF:	0
XML:	0