the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluation of high-resolution meteorological data products using flux tower observations across Brazil
Abstract. In the past decade, the scientific community has seen an increase in the number of global hydrometeorological products. This has been possible with efforts to push continental and global land surface modelling to hyper-resolution applications. As the resolution of these datasets increase, so does the need to compare their estimates against local in-situ measurements. This is particularly important for Brazil, whose large continental scale domain results in a wide range of climates and biomes. In this study, high-resolution (0.1 to 0.25 degrees) global and regional meteorological datasets are compared against flux tower observations at 11 sites across Brazil (for periods between 1999–2010), covering Brazil’s main land cover types (tropical rainforest, woodland savanna, various croplands, and tropical dry forests). The purpose of the study is to assess the quality of four global reanalysis products [ERA5-Land, GLDAS2.0, GLDAS2.1, and MSWEPv2.2] and one regional gridded dataset developed from local interpolation of meteorological variables across the country [Brazilian National Meteorological Database (referred here as BNMD)]. The surface meteorological variables we considered were precipitation, air temperature, wind speed, atmospheric pressure, downward shortwave and longwave radiation, and specific humidity. Data products were evaluated for their ability to reproduce the daily and monthly meteorological observations at flux towers. A ranking system for data products was developed based on the mean squared error (MSE). To identify the possible causes for these errors, further analysis was undertaken to determine the contributions of correlation, bias, and variation to the MSE. Results show that, for precipitation, MSWEP outperforms the other datasets at daily scales but at a monthly scale BNMD performs best. For all other variables, ERA5-Land achieved the best ranking (smallest) errors at the daily scale and averaged the best rank for all variables at the monthly scale. GLDAS2.0 performed least well at both temporal scales, however the newer version (GLDAS2.1) was an improvement of its older version for almost every variable. BNMD wind speed and GLDAS2.0 shortwave radiation outperformed the other datasets at a monthly scale. The largest contribution to the MSE at the daily scale for all datasets and variables was the correlation contribution whilst at the monthly scale it was the bias contribution. ERA5-Land is recommended when using multiple hydro-meteorological variables to force land-surface models within Brazil.
- Preprint
(1903 KB) - Metadata XML
-
Supplement
(174 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-883', Anonymous Referee #1, 14 Jun 2025
General Comments
The manuscript presents an evaluation of five gridded meteorological products. The analysis is based on the comparison between those products and the meteorological data from 11 flux towers in Brazil. Overall, the manuscript meets general readability standards, but would benefit from professional proofreading to enhance clarity and polish. Tables and figures are well-prepared. The scientific gap and objectives are clearly stated.Nonetheless, I have some concerns regarding the Materials (Datasets) section. First, using only 11 sites seems insufficient for a study area as large and ecologically diverse as Brazil. Looking at Figure 1, these 11 sites can be grouped into 5 or 6 clusters. For example, as shown in Table 1, K67, K77, and K83 share the same elevation and are located within 50 km of each other. The primary difference among nearby flux tower sites appears to be land cover, which spans only four types: Tropical Rainforest, Croplands, Tropical Dry Forest, and Woodland Savanna (note that, according to Cabral et al., 2015, PDG is classified as Woodland Savanna: https://www.sciencedirect.com/science/article/pii/S2214581815000440.
My point is: why limit the analysis to these specific 11 flux tower sites? In my view, excluding conventional meteorological data would make sense only if the primary aim were to assess the ability of gridded products to represent actual ET. However, based on Section 2.1 of the manuscript, "Variables were included so that reference evapotranspiration could be calculated according to the standard FAO methodology." If that is the case, then incorporating data from the Brazilian National Meteorological Institute (INMET) could significantly increase the number of observation points and improve geographic coverage.
While it's acknowledged that South America has far fewer flux towers compared to North America or Europe, other flux towers in Brazil—available in the AmeriFlux or FLUXNET networks—are not listed in Tables 1 and 2. I strongly encourage the authors to consider including towers located in underrepresented biomes, such as the Caatinga, Pantanal, and Atlantic Forest. In summary, the authors should elaborate more clearly on why only those 11 flux towers were selected.
Methodology
The 80% data availability threshold is a sound strategy. Since flux tower data are being used to evaluate gridded products, only high-quality observed data should be considered; therefore, gap filling should be avoided. It is also crucial to ensure that the datasets share common days for comparison, which appears to be the case here.
Another important point is that performance metrics derived from larger datasets are generally more reliable, with increased statistical significance and reduced uncertainty. Thus, it would be helpful to demonstrate that, despite site differences, the results remain comparable.
Regarding temporal averaging, it is not clear whether the hourly samples retrieved in each iteration were selected randomly. While the use of two-sample K-S tests is appropriate, its efficiency may vary depending on the time of day during which the data gaps occur. For instance, under a 30-minute resolution, a dataset with evenly distributed missing values (e.g., one every hour) is likely to be much smaller than a sample with missing records only at night, for example.
As for the conversion from daily to monthly averages, is using only 50% of the days sufficient? Is there evidence that this threshold does not compromise monthly estimates? For example, if most of the missing days were cloudy, the resulting monthly average could be biased toward sunnier conditions.
Finally, it is unclear why no similar approach was applied to harmonize spatial resolution among the products. Could the authors provide justification or evidence that temporal averaging is more critical than spatial aggregation in this context? Since both observations and products were resampled to a common temporal scale, aligning spatial resolution seems equally important. A comparison between Tables 3 and 4 (particularly at the daily scale) suggests that performance improves with higher spatial resolution. Pixel-to-point comparisons may introduce bias, even for high-resolution products. Therefore, I am concerned about the fairness of comparisons involving coarser-resolution datasets.
In light of the major concerns outlined above, along with the minor issues noted below, I recommend that the manuscript undergo major revisions before it can be considered for publication.
Specific Comments
Abstract
Please clarify what you mean by “downward shortwave and longwave radiation”. Does shortwave refer only to incoming radiation, and longwave to both incoming and outgoing?
Introduction
- L42–43: While I understand the intention, model performance does not inherently depend on the validation of gridded products. Perhaps you meant:
"The validation of gridded weather products is essential to ensure a fair and reliable assessment of model performance." - L48: “Ability” may be more suitable than “competency” in this context.
- L69–71: Please revise for clarity and punctuation. Suggested revision:
"For example, a 2018 study using data from 11,427 rain gauges across Brazil revealed that the Amazon Basin—which holds 70% of the country’s freshwater—has the lowest gauge density, with only 199 located in the entire state of Amazonas." Also note that the Amazon Basin holds 70% of Brazil’s freshwater, not the state of Amazonas, which may be misleading. Clarify the distinction. - L71–72: Revise punctuation.
- L73: Consider replacing "To combat" with "To address".
- L74: This sentence could be improved. Are you saying that both flux towers and eddy covariance stations are types of measurement stations? Note that in L76, eddy covariance is described as a technique used in flux towers.
- L78–80: Suggested revision:
"...however, comparatively less work has been done in data-poor regions like South America."
- L80: Specify the type of gridded products (e.g., remote sensing-based, reanalysis, etc.). There are many evaluations of remote sensing rainfall products. For ET, examples include: 10.1002/2017WR021682; 10.3390/rs14112526; 10.1016/j.isprsjprs.2023.12.001; 10.1016/j.jag.2019.04.009; 10.1007/s00704-025-05406-1;
- L82: The phrase "ecologically different" is unclear. For instance, aren’t K67, K77, and K83 ecologically similar? Same biome ≠ same land use. Consider identifying land use types in Figure 1.
- L85: Please revise:
"Secondly, [...] with observational data for each variable considered."
L87: Suggested alternatives:
(i) "Finally, how do these errors vary spatially and seasonally?"
(ii) "Finally, how do these errors vary by location and season?"
Section 2 – Datasets
In subsections 2.2.1–2.2.4, key information (e.g., time span, spatial resolution) is provided for some datasets but missing for others. Although Table 3 contains these details, consistency across subsections would improve readability.
L190: Ensure numbers are separated from units (e.g., “50 km” not “50km”). Also, define SYNOP, as it is not introduced anywhere in the manuscript.
Section 3.6
L208: Clarify what “the differences” refers to—differences between which datasets, variables, or scales?
L261–264: The observed shift in bias from daily to monthly scale is interesting. Can the authors explain why this occurs?
Section 5.3
L401: Please refer to the figure or table that supports this statement.
Section 5.4
L426: Likewise, cite the relevant figure or table.
L430–433: The claim that larger samples yield more robust correlations is valid. Has such a correlation been demonstrated in the manuscript? Please refer to the relevant figure or table.
Citation: https://doi.org/10.5194/egusphere-2025-883-RC1 -
AC1: 'Reply on RC1', Jamie Brown, 11 Jul 2025
We would like to thank the reviewer for their careful and constructive assessment of our manuscript. We appreciate the positive comments regarding the scientific objectives, presentation of figures and tables, and the overall clarity of our work. We also appreciate the suggestions made to strengthen the robustness of the analysis, particularly in relation to site selection, data representativeness, temporal averaging, and spatial resolution. We have addressed each point below and made appropriate revisions to the manuscript accordingly. Please find our detailed responses below.
- Selection of Flux Tower Sites
"Using only 11 sites seems insufficient for a study area as large and ecologically diverse as Brazil... authors should elaborate more clearly on why only those 11 flux towers were selected."
We appreciate the importance of this comment. We agree that Brazil’s ecological diversity warrants a wide spatial sampling. At the time we began this study, access to high-quality micrometeorological data was limited. The 11 flux tower sites selected represent those for which full meteorological forcing data were curated and available, including variables required for ET₀ estimation via FAO methodology (radiation, wind speed, humidity, etc.). We had access to two extra sites CAX (Caxiuanã, a tropical rainforest riverine in the state of Para) and USE (Usina Santa Eliza, a sugarcane site in the state of Sao Paulo) which were rejected because they had data quality issues or lacked complete or sufficient data coverage to meet the quality control threshold.
Despite omitting some biomes, (i.e. Caatinga and Pantanal), we feel that the main largest biomes (particularly, the Amazon and the Cerrado) as well as croplands and grasslands/pastures, have been represented. This omission will be mentioned explicitly in the revised version of the manuscript as a limitation to our study, while encouraging other studies to account for these biomes if they wish to do so.
We will also provide a detailed explanation in the revised manuscript on why conventional meteorological stations (INMET) were not included. Importantly, some of the reanalysis and blended products assessed (e.g., BNMD) incorporate INMET station data in their development. Using flux tower data, which are entirely independent of INMET, allows us to evaluate these products more objectively. We highlight the use of FLUXNET towers used as an evaluation metric for MSWEP, where the authors state “The FLUXNET data were used for this purpose (evaluate datasets) because they are completely independent; they have not been used in the development of any of the P datasets” , clarifying the importance of independent observations. Note that the flux towers used in this study were not used in the original MSWEP evaluation. This again will be clarified in the revised version.
- Dataset Size and Statistical Robustness
“Performance metrics derived from larger datasets are generally more reliable… helpful to demonstrate that, despite site differences, the results remain comparable.”
We agree and will carry out a sensitivity analysis using subsets of the longest flux tower time series (e.g., PDG, CRA, K34). This will allow us to assess how results vary with sample size. We will summarise these results and make them available as supplementary material and acknowledge this in the methodology. If performance metrics show large discrepancies this will be addressed in the discussion.
- Temporal Averaging and Missing data bias
“It is not clear whether the hourly samples retrieved… were selected randomly… Is using only 50% of the days sufficient for monthly averaging?”
We thank the reviewer for raising this point. To clarify, our analysis was performed at daily and monthly time steps only.
With regards to the infilling, there were obvious failures of instrumentation but in some cases, there was more than one instrument recording similar measurements (e.g. global radiation in, PAR in and net radiation) these had extremely strong correlations and (in some cases) vastly increased the temporal coverage of a variable.
For conversion to daily means, we used a 50% hourly coverage threshold as a compromise between data availability and temporal representativeness. We tested stricter thresholds (e.g., 100, 90, 80, 70, 60%) and observed a reduction in the number of valid days across sites, with only marginal gains in accuracy (comparing using K-S tests to a reference mean for each variable, whilst also looking at the standard deviation). While we acknowledge the potential for bias (e.g., from systematically missing cloudy days), our sensitivity analysis indicated that the 50% threshold did not significantly affect our daily or monthly estimates. This will now be discussed explicitly in the manuscript.
- Spatial Resolution Harmonisation
“It is unclear why no similar approach was applied to harmonise spatial resolution among the products… I am concerned about the fairness of comparisons involving coarser-resolution datasets.”
We recognise that spatial resolution mismatch can affect the results of point-to-pixel comparisons, particularly for coarse-resolution products. However, we have decided to focus on comparing the temporal performance of different gridded products against independent, high quality point observations, rather than to perform spatial interpolation or downscaling.
Importantly, even sophisticated interpolation efforts do not guarantee spatial consistency in performance. For example, Xavier et al. (2016) found that the accuracy of different interpolation methods varied substantially across Brazil, with no clear spatial or geographic patterns explaining where a given method performed best. Their results highlight the complexity of spatial error structures and suggest that resolution harmonisation may not systematically improve agreement with observations. Therefore, while we acknowledge this is a limitation, spatial harmonisation was not applied, as it could introduce new biases or mask product-specific spatial characteristics.
We will add a discussion of this limitation in the revised manuscript, along with appropriate citations (e.g., Xavier et al., 2016; Hofstra et al., 2008), and clarify that addressing spatial representativeness through interpolation or downscaling is outside the scope of this study.
We will review the full manuscript and address smaller clarity issues noted implicitly in the reviewer’s general remarks and specific comments. Furthermore, we will also ensure improved clarity and grammar.
Citation: https://doi.org/10.5194/egusphere-2025-883-AC1
- L42–43: While I understand the intention, model performance does not inherently depend on the validation of gridded products. Perhaps you meant:
-
RC2: 'Comment on egusphere-2025-883', Anonymous Referee #2, 14 Jun 2025
The manuscript "Evaluation of high-resolution meteorological data products using flux tower observations across Brazil" evaluates five gridded meteorological products against 11 flux towers in Brazil, distributed over six different land cover/biomes. The manuscript is well-written, and the findings align with the manuscript's goal. The decomposition of MEA to understand the contribution of the error is interesting. The general findings indicate that ERA5-Land is the best overall product.
However, since the authors are evaluating gridded meteorological products, a more comprehensive dataset/network, such as the Brazilian National Meteorological Institute (INMET) data, could have been used. Despite using flux tower data, no specific flux tower variables were tested against the gridded products. Additionally, other flux towers in Brazil could have been used to cover other regions, such as the Northeast, where the climate is predominantly semi-arid.
In terms of applications, since evapotranspiration is an important variable from flux towers and models, it would be interesting to test the precision of gridded products against a variable that takes into consideration all the base meteorological data.
Specific comments:
- Table 1: Please add the average temperature and precipitation.
- Lines 165-167: Clarify how the linear gap-filling was applied. Did the authors apply the same methodology for precipitation?
- Lines 255-256: What is the reason why MSWEPv2.2 performed better for daily rainfall BDMD at a monthly timescale?
Citation: https://doi.org/10.5194/egusphere-2025-883-RC2 -
AC2: 'Reply on RC2', Jamie Brown, 11 Jul 2025
We thank the reviewer for their thoughtful and constructive comments. We are please that the reviewer found the manuscript to be well-written and that the findings align with the stated objectives. We also appreciate the recognition of our use of MSE decomposition and the identification of ERA5-Land as the best performing product overall.
- Use of INMET data network
“Since the authors are evaluating gridded meteorological products, a more comprehensive dataset/network, such as the Brazilian National Meteorological Institute (INMET) data, could have been used.”
We appreciate this suggestion and agree that INMET data represent and important observational resource for Brazil. However, we intentionally used flux tower data as an independent evaluation dataset (as explained to similar comments by Reviewer #1). Some of the reanalysis and blended products evaluated in this study (e.g. BNMD) already assimilate INMET data as part of their development. Using flux tower observations, which are entirely independent from these conventional meteorological networks, provides a more objective evaluations of gridded product performance. This rationale will be clearly articulated in the revised manuscript.
- Lack of direct comparison between gridded products and flux tower variables
“Despite using flux tower data, no specific flux tower variables were tested against the gridded products.”
In this study, we focused specifically on core meteorological variables (e.g., air temperature, precipitation, relative humidity, etc.) recorded by the flux towers as the basis for comparison with the gridded products. While we did not assess flux-derived variables such as latent heat flux or evapotranspiration, we agree that doing so would offer valuable insights into how differences in meteorological forcing translate to land-atmosphere exchange estimates. We will clarify this scope in the revised manuscript and outline the evaluation of derived variables such as evapotranspiration as a key direction for future work.
- Limited geographic coverage of flux tower sites
“Other flux towers in Brazil could have been used to cover other regions, such as the Northeast, where the climate is predominantly semi-arid.”
We agree that Brazil’s ecological diversity warrants a wide spatial sampling (as also responded to similar comments made by Reviewer #1). At the time we began this study, access to high-quality micrometeorological data was limited. The 11 flux tower sites selected represent those for which full meteorological forcing data were curated and available, including variables required for ET₀ estimation via FAO methodology (radiation, wind speed, humidity, etc.). We had access to two extra sites CAX (Caxiuanã, a tropical rainforest riverine in the state of Para) and USE (Usina Santa Eliza, a sugarcane site in the state of Sao Paulo) which were rejected because they had data quality issues or lacked complete or sufficient data coverage to meet the quality control threshold.
Despite omitting some biomes, (i.e. Caatinga and Pantanal), we feel that the main largest biomes (particularly, the Amazon and the Cerrado) as well as cropland and grasslands/pastures, have been represented. This omission will be mentioned explicitly in the revised version of the manuscript as a limitation to our study, while encouraging other studies to account for these biomes if they wish to do so.
- Assessment of evapotranspiration as an integrated variable
“Since evapotranspiration is an important variable from flux towers and models, it would be interesting to test the precision of gridded products against a variable that takes into consideration all the base meteorological data.”
We agree that evapotranspiration is a highly integrative and policy-relevant variable. However, as noted above, the current study focused on the direct evaluation of meteorological forcing variables. A full assessment of ET would require a different methodological framework and validation against flux-derived ET estimates (e.g., via energy balance closure), which we believe to be beyond the scope of our study. We appreciate this suggestion and will explicitly mention this point in the discussion as a direction for future work.
Specific comments
Table 1 - “Please add the average temperature and precipitation.”
We will revise Table 1 to include the long-term average temperature and precipitation for each site.
Lines 165-167 – “Clarify how the linear gap-filling was applied. Did the authors apply the same methodology for precipitation?”
The gap filling method for precipitation has been explained in a separate paragraph in section 3.2, however, we will elaborate on how the linear regression was performed i.e. what variables were used for the interpolation.
Lines 255-256: “What is the reason why MSWEPv2.2 performed better for daily rainfall BDMB at a monthly timescale?”
The way we chose to lay out the manuscript was to first state the results and then discuss them in the following section. An attempt to explain this has been made in line 386 but in the manuscript, we will add a caveat “see further explanation in Section 5.2.”
Citation: https://doi.org/10.5194/egusphere-2025-883-AC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
462 | 147 | 21 | 630 | 37 | 16 | 30 |
- HTML: 462
- PDF: 147
- XML: 21
- Total: 630
- Supplement: 37
- BibTeX: 16
- EndNote: 30
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1