Measurement Report: Quantifying the Trade-off Between Station Number and Spatial Layout in Sparse GNSS Networks for Calibrating All-Weather FY-4A Precipitable Water Vapor

Ma, Yongchao; Chen, Zhengsheng; Liu, Tong; Yu, Zhibin; Wang, Zhihao

doi:10.5194/egusphere-2026-1149

Preprints

https://doi.org/10.5194/egusphere-2026-1149

Preprints

08 May 2026

| 08 May 2026

Measurement Report: Quantifying the Trade-off Between Station Number and Spatial Layout in Sparse GNSS Networks for Calibrating All-Weather FY-4A Precipitable Water Vapor

Yongchao Ma, Zhengsheng Chen, Tong Liu, Zhibin Yu, and Zhihao Wang

Abstract. Integrating satellite-derived precipitable water vapor (PWV) provides data with high spatiotemporal resolution, which is crucial for monitoring and forecasting extreme weather. However, current fusion and calibration methods typically relies on dense GNSS networks, hindering application in data-sparse regions. It remains unclear whether improving calibration under sparse conditions depends more on increasing station numbers or optimizing their spatial placement. To address this, we developed a machine learning-based calibration framework for FY-4A all-weather PWV and conducted controlled experiments across China. Our key finding is that for a fixed station budget, a spatially random layout consistently outperforms clustered or geographically biased distributions, reducing RMSE by up to 27 %. While increasing station density improves spatial generalization, with RMSE at independent stations dropping from 3.24 mm to 2.28 mm and bias converging near zero, performance gains saturate beyond approximately 120–160 stations. Spatially, errors under sparse, non-uniform networks concentrate in regions with strong humidity gradients or complex terrain; a uniform layout distributes errors more evenly. Temporally, all calibrated models capture seasonal cycles, with residual errors peaking in summer due to convective activity. This study demonstrates that in sparse network design, maximizing spatial coverage uniformity is more critical than simply adding stations. We thus provide a transferable framework and a quantitative principle for generating reliable satellite PWV products where GNSS observations are limited.

Received: 28 Feb 2026 – Discussion started: 08 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Yongchao Ma, Zhengsheng Chen, Tong Liu, Zhibin Yu, and Zhihao Wang

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-1149', Anonymous Referee #3, 19 May 2026
Overall Assessment
This paper investigates the GNSS calibration of the FY-4A all-weather (or all-sky) precipitable water vapor (PWV) product. It systematically conducts controlled experiments focusing on two variables—“number of training sites” and “spatial layout”—within mainland China, based on a random forest framework. The paper’s core contribution—explicitly proposing the quantitative principle of “layout over density” under sparse GNSS network conditions and indicating that performance tends to saturate when the number of sites reaches approximately 120–160 stations—demonstrates strong practical significance for engineering applications and methodological value. It provides useful insights for calibrating satellite-based PWV products in data-sparse regions (such as the Tibetan Plateau, Africa, and oceanic islands). The paper has a complete structure, logically clear experimental design, and its conclusions are generally supported by the figures and tables. The cited literature covers the main advances in MODIS/FY series PWV calibration over the past five years. However, the manuscript requires important revisions in the following areas before it reaches a publishable standard.
Specific Comments
The definition of the spatial layout experiments (clustered/surrounding/random) and the corresponding validation sets are unclear.

The meaning of the subplot labels (a/a1, b/b1) in Figures 3 and 4 needs to be explained more clearly in the main text. Furthermore, the color bar ranges and unit labels in Figures 5–7 suffer from overlap and occlusion; layout adjustments are recommended.

Lines 13–16: "a spatially random layout consistently outperforms clustered or geographically biased distributions, reducing RMSE by up to 27%." It is suggested to clarify against which baseline (clustered or surrounding) this 27% reduction is measured, and to state that this was obtained under the same validation set.

Lines 7–60: The literature review mentions that "the number of GNSS stations is a key prerequisite," but lacks a review on whether the variable "spatial layout" has been discussed in existing studies. It is suggested to add 1–2 references on geostatistical sampling design (e.g., works by Wadoux et al., Brus & de Gruijter) or studies on the spatial representativeness of training samples in machine learning, to better highlight the novelty of this work.

Lines 74–78: The description of the research objective is lengthy. It is suggested to split it into two layers: the "scientific question" (how do site number and layout independently affect calibration performance) and the "engineering objective" (providing configuration strategies under sparse networks), to enhance logical clarity.

Lines 89–93: 244 GNSS stations are used as the training set, and 16 GNSS stations as the independent validation set—how were these 16 stations randomly selected? Was spatial stratification considered (e.g., ensuring representation from eastern, central, western regions, and the Tibetan Plateau)? Please explain the sampling strategy.

Lines 118–120: The reference for the Saastamoinen model should be Saastamoinen (1972), not Vedel et al. (2001). Vedel et al.'s work is an evaluation of the model, not its original proposal. Please verify and correct.

Lines 161–166: The input features for the RF reconstruction model (Stage I) include Lat, Lon, DEM, Time, T2m, SP, TCW, but not NDVI and land cover type (LCT); whereas the Stage II calibration model incorporates NDVI and LCT. Please explain the rationale behind this feature selection—why are surface parameters not needed in the reconstruction stage, but are required in the calibration stage?

Line 175: The calibration model formula uses the "reconstructed FY-4A PWV" as an input, but there is a lack of discussion on how the inherent uncertainty of this reconstructed product (from Stage I) propagates to Stage II. Has the error accumulation of the two-stage RF been assessed? Please add relevant analysis or discussion.

Lines 195–197 (Table 1): The table layout is confusing; entries like "200/Surrounding" are difficult to interpret. It is suggested to split it into two separate tables or redesign the column structure for better readability.

Lines 231–235 (Figure 4): The FY-4A original product has a Bias of −2.17 mm and an RMSE of 3.93 mm. Please clarify the baseline for these metrics—specifically, which time period and what matching conditions they are based on—and state this in the main text.

Lines 297–305: What is the basis for the north-south division into "black-box regions"? Is it based on geographical boundaries (e.g., the Qinling-Huaihe Line) or statistical differences in water vapor variability? Please clarify the division criteria and annotate the boundaries in the figure.

Lines 329–344 + Table 4: This is the most critical concern in this review. In the three layout experiments, the validation sets are Area 1, Area 2, Area 3, and "remaining sites" respectively, resulting in three RMSE values that are not on the same comparison baseline. The conclusion that "Random is 27.08% lower than Clustered" needs to be recalculated using the same independent validation set (e.g., the 16 independent GNSS stations or the 80 IGRA stations). Please rerun this experiment and report the corrected results.

Lines 378–391 (Figure 8): Three GNSS stations are selected for time-series comparison. What are the latitude, elevation, and climate zone of these three stations? Are they representative? It is suggested to at least add a table listing the basic information of the selected sites.

Figure 6: The color bar label "ERRO(mm)" should be changed to "Error (mm)"; the "PWV(mm)" and "ERRO" color bars overlap and occlude each other; separate layout is suggested.

Figure 7: The three-row, three-column structure is good, but the naming "RF_Area_1/2/3" is suggested to be changed to "RF_Clustered / RF_Surrounding / RF_Random" for better readability.

Lines 55–56: "Ma et al. reported that when only 14 GNSS stations in the Tibetan Plateau were used…"—the citation is missing the year; please add it.

Line 230: Incorrect figure number citation ("Figs. 2 and 3 and Table 2" should be "Figs. 3 and 4"); please correct.

Throughout the text, the terms "all-weather" and "all-sky" are used interchangeably; it is recommended to unify them to a single term.
Citation: https://doi.org/10.5194/egusphere-2026-1149-RC1
- AC2: 'Reply on RC1', Yongchao Ma, 14 Jul 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1149/egusphere-2026-1149-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-1149-AC2
- AC4: 'Reply on RC1', Yongchao Ma, 14 Jul 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1149/egusphere-2026-1149-AC4-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-1149-AC4
RC2:
'Comment on egusphere-2026-1149', Anonymous Referee #4, 19 May 2026
This paper addresses the key issue that current fusion and calibration methods for satellite-derived precipitable water vapor (PWV) products rely excessively on dense GNSS networks. Taking FY-4A all-weather PWV as the research object, a two-stage (reconstruction + calibration) machine learning framework based on random forest (RF) is constructed. The study quantifies the impact of the number and spatial layout of training stations on model accuracy, spatial generalization ability, and temporal stability across mainland China. An actionable conclusion is drawn: “given a fixed station budget, the importance of a spatially uniform random layout outweighs simply increasing the number of stations,” and a performance saturation threshold of approximately 120–160 stations is identified. Overall, the scientific problem is clearly defined and of practical value, the data foundation is solid, and the experimental design is rigorous. In summary, the research content is valuable, the methodology is generally reasonable, and the conclusions have certain general significance. However, further refinement is needed in conceptual rigor and writing quality.
The geometric definitions of the three layouts in the spatial arrangement experiment (does “surrounding” refer to surrounding the validation station or the study area?) are unclear. Whether the number of stations is strictly identical across the three layouts is not specified. The reconstruction stage directly cites Wang et al. (2026) without providing necessary accuracy information, affecting the interpretability of the calibration stage results.

There are errors in figure citation (e.g., line 253: “Figs. 2 and 3” should be “Figs. 3 and 4”), and some English expressions are not sufficiently accurate.

In line 166, the authors state: “Detailed validation of the reconstruction has been reported in previous studies (Wang et al. 2026) and is not repeated here.” However, reconstruction accuracy directly affects the input quality of the calibration stage and consequently all downstream conclusions. It is recommended to at least report the overall RMSE/Bias/R² of the reconstructed FY-4A PWV relative to GNSS PWV in a table.

The current table mixes hyperparameters for different station counts (30–244) with three spatial layouts in two columns, resulting in poor readability. It is recommended to split this into two separate tables or adopt a clearer multi-column structure.

Line 192 mentions using Bayesian optimization with 5 iterations for hyperparameter search, but the search space (e.g., range of tree numbers, maximum depth, minimum samples per leaf node) is not specified. Five iterations are generally insufficient for Bayesian optimization. Please justify this choice or extend the iteration count and report hyperparameter convergence.

Figure 4 uses bar charts to display Bias and RMSE for 16 independent validation stations and 6 models, resulting in extremely high information density. Moreover, the negative bias bars of the raw FY-4A data obscure the details of the calibrated models. Suggestions: (i) Plot the raw FY-4A results separately or use a logarithmic scale; (ii) Add a cross-reference between station numbers and geographic locations (can be annotated in Fig. 1b); (iii) Supplement the standard deviation across the 16 stations for each model to characterize spatial consistency.

Table 2 shows that RF_120 achieves better Bias (0.02 mm) and RMSE (2.36 mm) than RF_244 (-0.14 mm, 2.37 mm). This phenomenon contradicts the intuition that “more training samples lead to better performance.” A discussion of this phenomenon is recommended at the end of Section 4.1.

The future outlook in lines 436–438 is too general. It is recommended to add specific directions, such as: (i) transfer learning validation in truly sparse regions (e.g., Tibetan Plateau, Sahara, ocean islands); (ii) comparison with satellite–radiosonde data assimilation methods.

In the abstract, “hindering application” should be “hindering its application”; “relies” should be “rely.”

The manuscript mixes terms such as “training station number,” “training sample size,” and “station density.” It is recommended to unify the terminology.

The titles of Figures 5, 6, and 7 are too brief and lack key information. It is recommended to clearly state in each title what each subfigure (a, b, c, etc.) corresponds to, so that the figures can be understood independently.

In Table 1, expressions such as “200/Surrounding” do not conform to table formatting standards. It is recommended to rename the “Number of Station” column to “Configuration” and change “200/Surrounding” to a more standard format like “200 (Surrounding).”
Citation: https://doi.org/10.5194/egusphere-2026-1149-RC2
- AC3: 'Reply on RC2', Yongchao Ma, 14 Jul 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1149/egusphere-2026-1149-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-1149-AC3
- AC5: 'Reply on RC2', Yongchao Ma, 14 Jul 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1149/egusphere-2026-1149-AC5-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-1149-AC5
RC3:
'Comment on egusphere-2026-1149', Anonymous Referee #2, 24 May 2026

This manuscript focuses on the calibration of FY-4A all-weather precipitable water vapor (PWV) products under sparse GNSS network conditions, and systematically analyzes the effects of training station number and spatial layout on model accuracy, spatial generalization capability, and temporal stability. The research topic is of practical significance, and the experimental design is relatively clear. The results indicate that, under limited GNSS station availability, spatial representativeness and coverage uniformity play an important role in improving model generalization. Overall, the manuscript is well structured, the data sources and validation strategy are reasonable, and the conclusions have potential application value. It is recommended that the manuscript be accepted after minor revisions concerning methodological details, interpretation of results, figure presentation, and language/formatting issues.
Comment 1: Further highlight the core innovation of the manuscript
The authors are encouraged to further highlight the core innovation of the manuscript. The current manuscript provides a relatively complete description of the research background and experimental results, but the main contributions of the study are somewhat scattered. It is suggested that the authors add a more focused summary of the contributions at the end of the Introduction. Specifically, the authors should clearly state that this study is not only intended to construct a GNSS-constrained FY-4A PWV calibration model, but more importantly, to quantitatively evaluate, through controlled experiments, the effects of GNSS training station number and spatial layout on model accuracy, spatial generalization capability, and temporal stability. The proposed station configuration principle, namely prioritizing spatial representativeness and coverage uniformity under limited station availability, should also be emphasized.
Comment 2: Clarify the resolution harmonization and scale consistency of auxiliary variables
In Section 2.2, some variables were subjected to resolution harmonization and scale transformation. The authors are advised to provide further details in the data processing or methodology section regarding how variables such as DEM and NDVI were spatially matched to the FY-4A PWV grid. Adding these explanations would help improve the transparency of the data processing workflow and enhance the rationality and reproducibility of the model input design.
Comment 3: Provide more details on the construction of the three spatial layouts
The authors are encouraged to provide more details on how the three station spatial layouts were constructed. Section 3.3 mentions three station distribution patterns: random, surrounding, and clustered. However, it is still unclear how these layouts were generated. For example, it should be clarified whether the random layout was generated through repeated random sampling and whether the averaged results were reported, how the spatial clustering range of the clustered layout was determined, and whether the same number of training stations was strictly maintained among the three layouts. The authors are advised to supplement these methodological details and explain how the corresponding validation stations were determined, in order to enhance the reproducibility of the experiments and the reliability of the conclusions.
Comment 4: Improve the presentation of figures, equations, and language
The overall structure of the manuscript is clear, and the figures, tables, and equations generally support the methodological description and experimental analysis. However, there are still some minor issues related to figure presentation, equation formatting, and formatting consistency. The authors are advised to further check and revise these aspects to improve the readability, standardization, and overall consistency of the manuscript.
(1) Some figures contain a large amount of information, especially multi-panel result figures. It is suggested that the authors further standardize the size and layout of the figures. For example, the aspect ratios of the subplots in Figures 6 and 7 could be appropriately adjusted to improve the standardization and consistency of the manuscript.
(2) In some figures, the text overlaps with graphical elements. The authors are advised to further check and standardize the font size, legends, color bars, axis labels, and subplot numbering, so that comparisons among different station numbers, model results, and spatial layouts can be presented more clearly.
(3) The authors should carefully check whether the equation formatting and variable descriptions are complete. For example, the explanation of the variables in Equation (4) should be further supplemented, and the symbol formatting in Equations (7)–(10) should be kept consistent. This would help avoid ambiguity when readers interpret the evaluation metrics.

Citation: https://doi.org/10.5194/egusphere-2026-1149-RC3
- AC1: 'Reply on RC3', Yongchao Ma, 14 Jul 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1149/egusphere-2026-1149-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-1149-AC1

Yongchao Ma, Zhengsheng Chen, Tong Liu, Zhibin Yu, and Zhihao Wang

Data sets

Calibration model for FY-4A PWV based on different GNSS station network Y. Ma https://doi.org/10.5281/zenodo.18751647

Yongchao Ma, Zhengsheng Chen, Tong Liu, Zhibin Yu, and Zhihao Wang

Viewed

Total article views: 331 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
291	24	16	331	19	23

HTML: 291
PDF: 24
XML: 16
Total: 331
BibTeX: 19
EndNote: 23

Views and downloads (calculated since 08 May 2026)

Month	HTML	PDF	XML	Total
May 2026	272	22	15	309
Jun 2026	16	2	0	18
Jul 2026	3	1	4

Cumulative views and downloads (calculated since 08 May 2026)

Month	HTML	PDF	XML	Total
May 2026	272	22	15	309
Jun 2026	16	2	0	18
Jul 2026	3	1	4

Viewed (geographical distribution)

Total article views: 325 (including HTML, PDF, and XML) Thereof 325 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Jul 2026

Short summary

Current satellite water vapor fusion and calibration methodologies predominantly rely on high-density ground-based station networks, rendering them unsuitable for sparsely monitored regions. This study pioneers the investigation of response relationships between station distribution patterns and satellite water vapor calibration, offering crucial insights for reference station selection in water vapor fusion within sparse monitoring areas.


Total:	0
HTML:	0
PDF:	0
XML:	0