Monitoring and modeling seasonally varying anthropogenic and biogenic <em>CO</em><sub>2</sub> over a large tropical metropolitan area

Alberti, Rafaela Cruz Alves; Lauvaux, Thomas; Vara-Vela, Angel Liduvino; Barrero, Ricard Segura; Karoff, Christoffer; Andrade, Maria de Fátima; Marques, Márcia Talita Amorim; Benavente, Noelia Rojas; Cabral, Osvaldo Machado Rodrigues; da Rocha, Humberto Ribeiro; Ynoue, Rita Yuri

doi:https://doi.org/10.5194/egusphere-2024-3060

Preprints

https://doi.org/10.5194/egusphere-2024-3060

Preprints

24 Oct 2024

| 24 Oct 2024

Monitoring and modeling seasonally varying anthropogenic and biogenic CO₂ over a large tropical metropolitan area

Rafaela Cruz Alves Alberti, Thomas Lauvaux, Angel Liduvino Vara-Vela, Ricard Segura Barrero, Christoffer Karoff, Maria de Fátima Andrade, Márcia Talita Amorim Marques, Noelia Rojas Benavente, Osvaldo Machado Rodrigues Cabral, Humberto Ribeiro da Rocha, and Rita Yuri Ynoue

Abstract. Atmospheric CO₂ concentrations over urban areas indirectly reflect local fossil fuel emissions and biogenic fluxes, offering a potential approach to assess city climate policies. However, atmospheric models used to simulate urban CO₂ plumes face significant uncertainties, particularly in complex urban environments with dense populations and vegetation. This study aims to address these challenges and fill the research gap regarding such vegetated and urbanized areas by conducting a comprehensive analysis of atmospheric CO₂ dynamics in the Metropolitan Area of São Paulo, Brazil, and its surroundings, using the WRF-GHG atmospheric model. The simulations are evaluated using observations from ground stations collected across the METROCLIMA GHG network, the first greenhouse gas monitoring network in South America, and column concentrations (XCO₂) from the OCO-2 satellite spanning February to August 2019. We also assess and improve the performances of the biospheric model Vegetation Photosynthesis and Respiration Model (VPRM) by optimizing the model parameters of the dominant vegetation types (Atlantic forest, cerrado, sugarcane) using flux measurements from multiple eddy-covariance flux towers. We evaluate the atmospheric model's ability to replicate seasonal variations in CO₂ concentrations by comparing the simulations with measurements from two sites part of the GHG network in São Paulo. We conclude here that atmospheric concentrations over metropolitan areas located in tropical areas largely depend on our ability to represent the biogenic contribution from the surrounding vegetation, the large-scale contribution in global models, and the model’s ability to represent the local atmospheric dynamics.

Received: 30 Sep 2024 – Discussion started: 24 Oct 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3113 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (3113 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

04 Sep 2025

Monitoring and modeling seasonally varying anthropogenic and biogenic CO₂ over a large tropical metropolitan area

Atmos. Chem. Phys., 25, 9803–9829, https://doi.org/10.5194/acp-25-9803-2025,https://doi.org/10.5194/acp-25-9803-2025, 2025

Short summary

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3060', Anonymous Referee #1, 21 Nov 2024
Alberti et al. used the WRF-GHG model to simulate CO2 concentration variations in the São Paulo region from February to August 2019. The simulated values were compared with observational data for 2-meter temperature, 10-meter wind, surface CO2, and XCO2. Overall, the study's concept, analytical content, and main conclusions are not novel. The only unique aspect might be the lack of similar studies in the São Paulo region in South America. However, the manuscript currently has many critical missing pieces of information, and the model setup appears to have unreasonable aspects. In the results analysis section, some textual descriptions seem to be based more on prior knowledge rather than being clearly supported by the figures in this study. Most figures in the manuscript are of poor quality, contain several errors, and the text includes several typos. Therefore, I do not suggest publication, unless the authors could convincingly address various concerns I have mentioned below through substantial major revisions.
Major comments
My first concern is the model configuration.
Model set-up

The authors ran WRF-GHG with only a single domain at a 3 km spatial resolution. First, considering the coarse resolution of input data such as meteorological data and the CarbonTracker CO2 background and initial conditions (e.g., 3° × 2°), interpolating/downscaling such data directly down to 3 km using WRF, a regional-scale model, might lead to various instabilities or poor performance. For example, the official recommendations suggest using nesting when the input data resolution is coarser than the model resolution by more than a factor of 5–10. Boundary conditions (BCs) for external sources are typically provided at 3–6-hour intervals and lack tendencies for all predicted fields.
Most importantly, based on the distance between the two CO2 observation stations in São Paulo used in this study and the rather similar model performance at both sites shown in Fig. 7 (I will mention it in detail below), the 3 km resolution is insufficient. Given the locations of the CO2 and meteorological observation stations shown in Fig. 1, why didn’t the authors set up nesting to achieve a finer resolution, such as 1 km?
Model input: Anthropogenic Emissions

In Section 2.1.1, regarding the anthropogenic emission input data used in the model, many critical details are missing:
Lines 73–78: The authors mention that vehicular emissions are the primary emission source in the region. What is the spatial resolution of the VEIN inventory? In the analysis section in the manuscript, poor CO2 simulation performance at the observation sites is repeatedly attributed to the underestimation or overestimation of vehicular emissions. In addition, what is the spatial distribution of vehicular emissions? The authors should include a figure showing the anthropogenic emission distribution for the region.

Line 79: The authors briefly state that other emission sources are from EDGAR. What is the total anthropogenic emission for the region? What are the proportions of emissions from different anthropogenic sources? EDGAR provides emission data at a coarse resolution—how was this processed to fit the 3 km resolution of the model? Which year and version of the EDGAR data were applied? Was a temporal profile used?

In addition, Line 289-290, the authors cite “the EDGAR anthropogenic emission inventory generally overestimates the emissions around local anthropogenic sources (e.g., urban areas)”. Did the authors check the total emissions and spatial distribution of EDGAR data in this region? As mentioned above, these infos are not provided in the manuscript.

Based on the authors’ description, the WRF-GHG simulation only includes vehicular, energy, and industrial emissions as model inputs. Are these the only 3 sources accounted for, or there are also other sources like residential emissions? All of this essential information is not clearly provided in the manuscript.

Line 390-391: In Section 4 (Conclusion), the authors state that "Anthropogenic emissions were curated from diverse models and products to accurately reflect real urban conditions." However, where is the evidence to support the claim of "accurately"? This is particularly questionable given the significant bias and RMSE observed in the simulated CO2 concentrations (Figure 4A).

Line 399-400: The authors provide conclusions regarding the temporal profile of simulated CO2 emissions, but they do not introduce or present the "prescribed temporal profiles of anthropogenic emissions" in the manuscript. While this conclusion is correct based on prior knowledge, it is not well-supported by the analysis presented in this study.

Model input: Biogenic Emissions

Line 44, Line 81, Line 86: The authors mention in Line 44 that VPRM is coupled to WRF-GHG, which indicates that it could be either online or offline coupling. However, in Line 81, they say it is offline but implemented as a module, while in Line 86, they mention that VPRM's temperature and shortwave radiation inputs come from the WRF model. This raises several questions: Did the authors first run WRF to obtain these meteorological inputs, then run VPRM to calculate the biogenic fluxes, and finally use these fluxes as a tracer in a subsequent WRF-GHG simulation? If so, this is inconsistent with Line 82, because it is just a model flux input, same as anthropogenic flux input, rather than a coupled module within WRF-GHG. Additionally, how was the "default" VPRM in Fig. 3 calculated? Was it based on the online-coupled VPRM in WRF-Chem, or was it handled differently? These infos are unclear.

In Section 2.1.2, the authors dedicate a large portion of the text to introduce the VPRM model. This information could be simply cited from the VPRM paper (Mahadevan et al., 2008) or moved to the supplementary materials. Similarly, in Section 2.3, there is no need to include basic explanations of metrics such as bias, RMSE, and correlation in the main text. These are well-known concepts and can either be briefly mentioned.

In Section 3.2, I question the validity of the comparison between the optimized VPRM, default VPRM, and observed biogenic fluxes, which concludes that the optimized flux is closer to the observed data. This approach is problematic because the authors used the observed data to optimize the VPRM parameters (Line 137-138) and then compared the optimized flux against the same set of observed data. This is inappropriate, thus the main conclusion here is also questionable whether it is credible. The authors need to validate the model using independent observational data rather than the same dataset used for optimization. For example, the author could use half of the observational time series for optimizing the model parameters and the other half for validation.

My second concern is the poor quality and errors in many figures.
Figure 1: There are inconsistencies between the data types shown in Figure 1 and the site information in Table 1. For instance, Figure 1 indicates two sites observing CO, while Table 1 lists only one CO observation site. Additional issues with Figure 1 include:
Legend clarity: The legend is not intuitive. The caption should include a statement explaining that different symbols represent site types and different colors indicate the types of observational data.

Scale: A scale bar should be added to the figure. For example, it is difficult to discern the distance between the two CO2 observation sites. The authors state that the CO and CO2 sites are less than 3 km apart, therefore are the two CO2 sites only 1 km apart?

Why were the observation sites placed so close together? If the two CO2 sites are only 1 km apart, this raises the earlier question of why the model resolution was set at 3 km. With such close proximity, it is highly likely that these two sites fall within the same model grid cell, which could explain why their simulation results appear so similar, as observed in Figure 7.

In panel (b), is the land use map from WRF, or is it another map? The grid cells do not appear to follow a “regular” grid—did the authors use interpolation or smoothing? Note that in WRF NetCDF files, the land use map is provided as the dominant type for each grid cell. Additionally, it is recommended to change the colormap used in the land use map. The current color scheme and shapefile make it very difficult to distinguish between different land use types.

Figure 2b:
In the comparison of hourly model and observation wind speed, why didn’t the authors plot a 1:1 ratio line instead of a regression line? This choice seems unusual.

It is also strange that there are two types of symbols in the scatter plot—some are circles, while others are crosses.

I assume due to the precision decimal issue in the observational data, it has vertical patterns in the scatter. However, why does this only occur for wind speeds below 3 m/s, and not above?

Why does the WRF-simulated 10 m wind speed show many values of 0 m/s?

I suggest making this plot square rather than rectangular for better visual clarity.

Table 4: “Summer (February to March), Autumn (March to June), Winter (June to August)”. Why do March and June appear in different seasons simultaneously? How did the authors calculate the seasonal mean with this? Could this be the reason why the maximum and minimum seasonal values mentioned in the opening paragraph of Section 3.3 appear unusual (which I will mention below)?

Figure 5: The standard deviation in this figure is very difficult to discern. It is recommended to revise the figure, for example, by splitting it into two separate plots or offsetting the data slightly for better clarity.

Figure 6: The colorbar uses discrete colors. In the figure, do the colors represent the WRF grid values directly, or are they smoothed and interpolated? Currently, the colorbar visually is confusing—for example, the light green–dark green–light green again makes it hard to distinguish values. It is recommended to change the colormap.

Figure 7:
The titles of (g) and (h) are wrong, should be PDG site.

As I mentioned earlier, the IAG and PDJ sites are very close to each other, and the model resolution is coarse. The emission resolution has also not been provided, which makes the simulation values for these two sites nearly identical, which is problematic or less interesting to compare with the two observations.

Why are the biogenic concentrations positive in both summer and winter? This implies that the vegetation acts as an emission source rather than a sink. Given that the authors are using only daytime data (09-17h local as written in Line 301), this seems strange. Could the authors clarify the reasoning behind this? Note that in Line 222, the authors find “the domain acts as a net CO2 sink during summer”!

In panels (b), (d), (f), and (h), are the biogenic and anthropogenic emissions separately added to the background line? Why do not plot them accumulated?

Figure 9: in the caption, it is written “hourly”, but the plot is daily.

My third concern is the analysis and results of the model and observation data comparison. In summary, contrary to the conclusions given by the authors, I do not think that the model's current performance of wind and CO2 is satisfactory. Moreover, many aspects lack reasonable explanations, and some interpretations and analyses seem to be based more on empirical knowledge rather than being clearly supported by the figures provided. Specifically, the issues are as follows:
Section 2.2.1: Please include the distances between observation sites, as well as the inlet heights of the observations at each site above the ground. The lack of these important information will affect the analysis of the comparison between the model and the observation results.

Section 3.1: Line 179-180, the authors say that "the GHG model effectively captured significant changes in the observed variables". Also in Line 190, “In summary, the WRF model showed proficiency in reproducing atmospheric conditions in the study area”. However, from Figures 2b and 2c, it is clear that the model's wind speed simulation is quite poor. The scatter plots show that for wind speeds < 3 m/s, the model significantly overestimates, and > 3 m/s, it obviously underestimates. Regardless of whatever high and low wind speed, the model's performance is not good. When displaying daily data in 2c, the model consistently overestimates wind speed across different seasons. As mentioned by the authors, wind plays a critical role in CO2 transport, which indicates that there is a clear need for improvement or re-adjustment of the model parameters, rather than claiming it "effectively captured" the observed trends. In Line 187, the authors explain "the model’s misrepresentation of land use." However, land use data can be optimized or modified before model runs, particularly since the authors likely ran WRF twice (as mentioned in the VPRM section). If the model's performance was poor, the proper course of action would be to optimize the model first before continuing with the analysis. Additionally, how did the authors handle model parameters for urban areas? Did they use the Urban Canopy Model, e.g., were the building heights adjusted based on local data?

Section 3.3:
First, before analyzing the monthly and seasonal CO2 variations, did the authors compare the hourly simulation results? I would suggest adding a supplementary figure to show this comparison. Without it, important details and the true performance of the model could be hidden by the averaging process.

Line 255, Figure 7: it is clear that there are significant data gaps in the Picarro observational data. Given that the authors' analysis only covers 6 months, could the authors please specify the data availability for the observational data? Additionally, it seems that throughout the manuscript, the model and observation comparisons do not account for the absence of observational data in certain hours. It would be more reasonable to exclude simulated values during hours without observational data. Failing to do so may lead to significant discrepancies in the model-observation comparisons.

Line 236: the authors say the seasonal variation in CO2 levels is influenced by seasonal patterns of photosynthesis and vehicular traffic. What is (or where can we see) the temporal variation of the vehicular traffic in this manuscript?

In Line 233, at IAG station, the authors find that the seasonal variation peak in autumn, then winter, then summer. But in Line 237, same at IAG station, the authors say the monthly peak is in June, during a winter season. Same, the PDJ station peaks in summer in Line 241, and then peak in May a autumn season in Line 243. How could be a same station that reaches two conclusions? Therefore, the average method that the author used might not be reasonable. I think if the authors plot and show the time series of the 6 months of hourly CO2 data, they will figure out why.

Line 238-240, the authors say “During the summer months, …wind speed… typically lead to lower atmospheric stability”. However, if we look at Figure 2, the wind speed looks quite similar and stable over the six months. Thus, is the author’s conclusion based on prior knowledge or on the observational wind data from the stations shown in the figure?

Line 260-263: If the model's 3 km resolution is insufficient and the two sites are only 1 km apart, likely they are within the same grid cell, it is impossible to further draw conclusions about emission overestimation or underestimation near either site. This will make the subsequent emission estimates meaningless.

Line 313-315: For Fig 7a at IAG site, the authors write “on February 22nd and 23rd, there was a peak in the CO2 concentration of the observed data”. However, in the Fig 7a, there are no observation data on these two days!

Line 317: the authors say “The model effectively captured peaks and profiles for this period”. While there are several peaks and daily variations in Fig 7c that are not captured by the model.

Line 318: the authors say “the biogenic contributions at PDJ site emerging as more substantial (Figure 7d) compared to the IAG site”. However, the biogenic concentrations in Figures 7b and 7d seem quite similar, don’t they?

Line 328: Could the higher monthly average primarily be due to the single-day peak on August 14? Simply stating that the monthly observations are larger than the simulations might not be fully accurate, as Figure 7g shows that the observed values are smaller than the simulated ones for several days after August 24.

Line 329 and Figure A4: Is this analysis based on daytime data (09:00–17:00)? Are these hourly data? If so, according to the bias and RMSE values, the model's performance in simulating CO2 does not appear to be satisfactory.

Line 345 and Figure 8: For the CO data—or the data depicted in this figure—what time period does it cover? From Figure 1, it is difficult to immediately locate the Pinheiros site.

Line 349-350: the “hourly correlation” here is R or R²? I do not think a R with a value of 0.25 shows a good correlation.

Line 351-353: The authors mention that the correlation is good before 10 AM and after 19 PM, while the correlation is poor at noon due to the effects of vegetation. However, vegetation also plays a role at night, doesn't it?

Line 359-362: According to the authors' analysis, before August 13, both CO2 and CO show peaks, with a large part of the CO2 concentrations at IAG coming from vehicular sources. However, the model struggles to capture this due to less accurate emission data, as emissions follow the same diurnal variation every day of the month. Given this, how should the authors explain the period between August 18 and 28, where only a CO peak is observed? What could account for the absence of a CO2 peak during this period, especially if vehicular emissions are still expected to contribute significantly to CO2 concentrations?

Line 380: The authors are comparing XCO2 in this section. If the surface wind is overestimated, it does not necessarily mean that the winds at higher altitudes are also overestimated, right?

Line 381-382: The authors suddenly conclude, without any supporting figures or analysis, that there are "errors in the initial and boundary conditions of concentration provided by the Carbon Tracker." This conclusion is also highlighted in the abstract with the phrase "the large-scale contribution in global models." How did the authors arrive at this conclusion? Could they provide evidence to support this conclusion?

Section 4 (Conclusion) mainly introduce what was done in this study and some already well-known prior knowledge (e,g, wind is a pivotal factor; planetary boundary layer dynamics), rather than effectively highlighting the main conclusions of the paper. It lacks a clear synthesis of the key findings and insights that emerge from the analysis.

Minor comments
Line 55-58: The manuscript mentions WRF-GHG, WRF-Chem, and WRF when referring to the model. WRF-GHG has existed for a long time and is an older version. However, it was incorporated into WRF-Chem and became a module in WRF-Chem shortly after WRF-GHG was made publicly available. Since the authors are using WRF-Chem V4.0, why are they still referring to WRF-GHG? Did Beck et al. (2011) make any specific modifications to the model? Lines 57–58 do not clarify this point, as the GHG module in WRF-Chem also does not include chemical processes. If the authors merely added a tracer to the model without other significant modifications, it would theoretically still be appropriate to refer to it as WRF-Chem. I suggest the authors standardize the terminology throughout the manuscript to avoid confusion for readers unfamiliar with the model.

Table 3: the PARo value changes from 570 to 178615 for Atlantic Forest. Does the author have some explanation? Does this value have actual physical meaning? Is it reasonable? Or is it just a mathematical optimization?

Technical corrections
Line 125, Fig 4 caption: typo, it should be PFT as in Line 123, not PTF.

Line 132: Add S and W to the latitude and longitude.

Line 191: typo, “wind Direction” D should be small letter.

Line 196: typo, Figure 3?

Fig 5 caption, Table 4 caption, Line 275, Line 303, Figure 9 caption, etc: CO2 should be CO₂

Line 355: why “both profiles (modeled and simulated CO2)”? modeled and simulated are the same?

Line 221: I suggest to describe by PFT would be more appropriate than simply say ”negative across most of the domain”

Line 301: 09-17h local is not only mid-afternoon but daytime.

Line 340 and 341: typo for Figure A2b and A2d, the authors refer to wrong figures.

Line 357: reference typo

Line 376: “positive RMSE”? RMSE is always positive…
Citation: https://doi.org/10.5194/egusphere-2024-3060-RC1
- AC1: 'Reply on RC1', Rafaela Cruz Alves Alberti, 07 Feb 2025
  
  Please find the final author comments in the supplement on RC1
  
  Citation: https://doi.org/10.5194/egusphere-2024-3060-AC1
RC2:
'Comment on egusphere-2024-3060', Anonymous Referee #2, 26 Nov 2024

Please see the attached file.

Citation: https://doi.org/10.5194/egusphere-2024-3060-RC2
- AC2: 'Reply on RC2', Rafaela Cruz Alves Alberti, 07 Feb 2025
  
  Please find the final author comments in the supplement on RC2
  
  Citation: https://doi.org/10.5194/egusphere-2024-3060-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3060', Anonymous Referee #1, 21 Nov 2024
Alberti et al. used the WRF-GHG model to simulate CO2 concentration variations in the São Paulo region from February to August 2019. The simulated values were compared with observational data for 2-meter temperature, 10-meter wind, surface CO2, and XCO2. Overall, the study's concept, analytical content, and main conclusions are not novel. The only unique aspect might be the lack of similar studies in the São Paulo region in South America. However, the manuscript currently has many critical missing pieces of information, and the model setup appears to have unreasonable aspects. In the results analysis section, some textual descriptions seem to be based more on prior knowledge rather than being clearly supported by the figures in this study. Most figures in the manuscript are of poor quality, contain several errors, and the text includes several typos. Therefore, I do not suggest publication, unless the authors could convincingly address various concerns I have mentioned below through substantial major revisions.
Major comments
My first concern is the model configuration.
Model set-up

The authors ran WRF-GHG with only a single domain at a 3 km spatial resolution. First, considering the coarse resolution of input data such as meteorological data and the CarbonTracker CO2 background and initial conditions (e.g., 3° × 2°), interpolating/downscaling such data directly down to 3 km using WRF, a regional-scale model, might lead to various instabilities or poor performance. For example, the official recommendations suggest using nesting when the input data resolution is coarser than the model resolution by more than a factor of 5–10. Boundary conditions (BCs) for external sources are typically provided at 3–6-hour intervals and lack tendencies for all predicted fields.
Most importantly, based on the distance between the two CO2 observation stations in São Paulo used in this study and the rather similar model performance at both sites shown in Fig. 7 (I will mention it in detail below), the 3 km resolution is insufficient. Given the locations of the CO2 and meteorological observation stations shown in Fig. 1, why didn’t the authors set up nesting to achieve a finer resolution, such as 1 km?
Model input: Anthropogenic Emissions

In Section 2.1.1, regarding the anthropogenic emission input data used in the model, many critical details are missing:
Lines 73–78: The authors mention that vehicular emissions are the primary emission source in the region. What is the spatial resolution of the VEIN inventory? In the analysis section in the manuscript, poor CO2 simulation performance at the observation sites is repeatedly attributed to the underestimation or overestimation of vehicular emissions. In addition, what is the spatial distribution of vehicular emissions? The authors should include a figure showing the anthropogenic emission distribution for the region.

Line 79: The authors briefly state that other emission sources are from EDGAR. What is the total anthropogenic emission for the region? What are the proportions of emissions from different anthropogenic sources? EDGAR provides emission data at a coarse resolution—how was this processed to fit the 3 km resolution of the model? Which year and version of the EDGAR data were applied? Was a temporal profile used?

In addition, Line 289-290, the authors cite “the EDGAR anthropogenic emission inventory generally overestimates the emissions around local anthropogenic sources (e.g., urban areas)”. Did the authors check the total emissions and spatial distribution of EDGAR data in this region? As mentioned above, these infos are not provided in the manuscript.

Based on the authors’ description, the WRF-GHG simulation only includes vehicular, energy, and industrial emissions as model inputs. Are these the only 3 sources accounted for, or there are also other sources like residential emissions? All of this essential information is not clearly provided in the manuscript.

Line 390-391: In Section 4 (Conclusion), the authors state that "Anthropogenic emissions were curated from diverse models and products to accurately reflect real urban conditions." However, where is the evidence to support the claim of "accurately"? This is particularly questionable given the significant bias and RMSE observed in the simulated CO2 concentrations (Figure 4A).

Line 399-400: The authors provide conclusions regarding the temporal profile of simulated CO2 emissions, but they do not introduce or present the "prescribed temporal profiles of anthropogenic emissions" in the manuscript. While this conclusion is correct based on prior knowledge, it is not well-supported by the analysis presented in this study.

Model input: Biogenic Emissions

Line 44, Line 81, Line 86: The authors mention in Line 44 that VPRM is coupled to WRF-GHG, which indicates that it could be either online or offline coupling. However, in Line 81, they say it is offline but implemented as a module, while in Line 86, they mention that VPRM's temperature and shortwave radiation inputs come from the WRF model. This raises several questions: Did the authors first run WRF to obtain these meteorological inputs, then run VPRM to calculate the biogenic fluxes, and finally use these fluxes as a tracer in a subsequent WRF-GHG simulation? If so, this is inconsistent with Line 82, because it is just a model flux input, same as anthropogenic flux input, rather than a coupled module within WRF-GHG. Additionally, how was the "default" VPRM in Fig. 3 calculated? Was it based on the online-coupled VPRM in WRF-Chem, or was it handled differently? These infos are unclear.

In Section 2.1.2, the authors dedicate a large portion of the text to introduce the VPRM model. This information could be simply cited from the VPRM paper (Mahadevan et al., 2008) or moved to the supplementary materials. Similarly, in Section 2.3, there is no need to include basic explanations of metrics such as bias, RMSE, and correlation in the main text. These are well-known concepts and can either be briefly mentioned.

In Section 3.2, I question the validity of the comparison between the optimized VPRM, default VPRM, and observed biogenic fluxes, which concludes that the optimized flux is closer to the observed data. This approach is problematic because the authors used the observed data to optimize the VPRM parameters (Line 137-138) and then compared the optimized flux against the same set of observed data. This is inappropriate, thus the main conclusion here is also questionable whether it is credible. The authors need to validate the model using independent observational data rather than the same dataset used for optimization. For example, the author could use half of the observational time series for optimizing the model parameters and the other half for validation.

My second concern is the poor quality and errors in many figures.
Figure 1: There are inconsistencies between the data types shown in Figure 1 and the site information in Table 1. For instance, Figure 1 indicates two sites observing CO, while Table 1 lists only one CO observation site. Additional issues with Figure 1 include:
Legend clarity: The legend is not intuitive. The caption should include a statement explaining that different symbols represent site types and different colors indicate the types of observational data.

Scale: A scale bar should be added to the figure. For example, it is difficult to discern the distance between the two CO2 observation sites. The authors state that the CO and CO2 sites are less than 3 km apart, therefore are the two CO2 sites only 1 km apart?

Why were the observation sites placed so close together? If the two CO2 sites are only 1 km apart, this raises the earlier question of why the model resolution was set at 3 km. With such close proximity, it is highly likely that these two sites fall within the same model grid cell, which could explain why their simulation results appear so similar, as observed in Figure 7.

In panel (b), is the land use map from WRF, or is it another map? The grid cells do not appear to follow a “regular” grid—did the authors use interpolation or smoothing? Note that in WRF NetCDF files, the land use map is provided as the dominant type for each grid cell. Additionally, it is recommended to change the colormap used in the land use map. The current color scheme and shapefile make it very difficult to distinguish between different land use types.

Figure 2b:
In the comparison of hourly model and observation wind speed, why didn’t the authors plot a 1:1 ratio line instead of a regression line? This choice seems unusual.

It is also strange that there are two types of symbols in the scatter plot—some are circles, while others are crosses.

I assume due to the precision decimal issue in the observational data, it has vertical patterns in the scatter. However, why does this only occur for wind speeds below 3 m/s, and not above?

Why does the WRF-simulated 10 m wind speed show many values of 0 m/s?

I suggest making this plot square rather than rectangular for better visual clarity.

Table 4: “Summer (February to March), Autumn (March to June), Winter (June to August)”. Why do March and June appear in different seasons simultaneously? How did the authors calculate the seasonal mean with this? Could this be the reason why the maximum and minimum seasonal values mentioned in the opening paragraph of Section 3.3 appear unusual (which I will mention below)?

Figure 5: The standard deviation in this figure is very difficult to discern. It is recommended to revise the figure, for example, by splitting it into two separate plots or offsetting the data slightly for better clarity.

Figure 6: The colorbar uses discrete colors. In the figure, do the colors represent the WRF grid values directly, or are they smoothed and interpolated? Currently, the colorbar visually is confusing—for example, the light green–dark green–light green again makes it hard to distinguish values. It is recommended to change the colormap.

Figure 7:
The titles of (g) and (h) are wrong, should be PDG site.

As I mentioned earlier, the IAG and PDJ sites are very close to each other, and the model resolution is coarse. The emission resolution has also not been provided, which makes the simulation values for these two sites nearly identical, which is problematic or less interesting to compare with the two observations.

Why are the biogenic concentrations positive in both summer and winter? This implies that the vegetation acts as an emission source rather than a sink. Given that the authors are using only daytime data (09-17h local as written in Line 301), this seems strange. Could the authors clarify the reasoning behind this? Note that in Line 222, the authors find “the domain acts as a net CO2 sink during summer”!

In panels (b), (d), (f), and (h), are the biogenic and anthropogenic emissions separately added to the background line? Why do not plot them accumulated?

Figure 9: in the caption, it is written “hourly”, but the plot is daily.

My third concern is the analysis and results of the model and observation data comparison. In summary, contrary to the conclusions given by the authors, I do not think that the model's current performance of wind and CO2 is satisfactory. Moreover, many aspects lack reasonable explanations, and some interpretations and analyses seem to be based more on empirical knowledge rather than being clearly supported by the figures provided. Specifically, the issues are as follows:
Section 2.2.1: Please include the distances between observation sites, as well as the inlet heights of the observations at each site above the ground. The lack of these important information will affect the analysis of the comparison between the model and the observation results.

Section 3.1: Line 179-180, the authors say that "the GHG model effectively captured significant changes in the observed variables". Also in Line 190, “In summary, the WRF model showed proficiency in reproducing atmospheric conditions in the study area”. However, from Figures 2b and 2c, it is clear that the model's wind speed simulation is quite poor. The scatter plots show that for wind speeds < 3 m/s, the model significantly overestimates, and > 3 m/s, it obviously underestimates. Regardless of whatever high and low wind speed, the model's performance is not good. When displaying daily data in 2c, the model consistently overestimates wind speed across different seasons. As mentioned by the authors, wind plays a critical role in CO2 transport, which indicates that there is a clear need for improvement or re-adjustment of the model parameters, rather than claiming it "effectively captured" the observed trends. In Line 187, the authors explain "the model’s misrepresentation of land use." However, land use data can be optimized or modified before model runs, particularly since the authors likely ran WRF twice (as mentioned in the VPRM section). If the model's performance was poor, the proper course of action would be to optimize the model first before continuing with the analysis. Additionally, how did the authors handle model parameters for urban areas? Did they use the Urban Canopy Model, e.g., were the building heights adjusted based on local data?

Section 3.3:
First, before analyzing the monthly and seasonal CO2 variations, did the authors compare the hourly simulation results? I would suggest adding a supplementary figure to show this comparison. Without it, important details and the true performance of the model could be hidden by the averaging process.

Line 255, Figure 7: it is clear that there are significant data gaps in the Picarro observational data. Given that the authors' analysis only covers 6 months, could the authors please specify the data availability for the observational data? Additionally, it seems that throughout the manuscript, the model and observation comparisons do not account for the absence of observational data in certain hours. It would be more reasonable to exclude simulated values during hours without observational data. Failing to do so may lead to significant discrepancies in the model-observation comparisons.

Line 236: the authors say the seasonal variation in CO2 levels is influenced by seasonal patterns of photosynthesis and vehicular traffic. What is (or where can we see) the temporal variation of the vehicular traffic in this manuscript?

In Line 233, at IAG station, the authors find that the seasonal variation peak in autumn, then winter, then summer. But in Line 237, same at IAG station, the authors say the monthly peak is in June, during a winter season. Same, the PDJ station peaks in summer in Line 241, and then peak in May a autumn season in Line 243. How could be a same station that reaches two conclusions? Therefore, the average method that the author used might not be reasonable. I think if the authors plot and show the time series of the 6 months of hourly CO2 data, they will figure out why.

Line 238-240, the authors say “During the summer months, …wind speed… typically lead to lower atmospheric stability”. However, if we look at Figure 2, the wind speed looks quite similar and stable over the six months. Thus, is the author’s conclusion based on prior knowledge or on the observational wind data from the stations shown in the figure?

Line 260-263: If the model's 3 km resolution is insufficient and the two sites are only 1 km apart, likely they are within the same grid cell, it is impossible to further draw conclusions about emission overestimation or underestimation near either site. This will make the subsequent emission estimates meaningless.

Line 313-315: For Fig 7a at IAG site, the authors write “on February 22nd and 23rd, there was a peak in the CO2 concentration of the observed data”. However, in the Fig 7a, there are no observation data on these two days!

Line 317: the authors say “The model effectively captured peaks and profiles for this period”. While there are several peaks and daily variations in Fig 7c that are not captured by the model.

Line 318: the authors say “the biogenic contributions at PDJ site emerging as more substantial (Figure 7d) compared to the IAG site”. However, the biogenic concentrations in Figures 7b and 7d seem quite similar, don’t they?

Line 328: Could the higher monthly average primarily be due to the single-day peak on August 14? Simply stating that the monthly observations are larger than the simulations might not be fully accurate, as Figure 7g shows that the observed values are smaller than the simulated ones for several days after August 24.

Line 329 and Figure A4: Is this analysis based on daytime data (09:00–17:00)? Are these hourly data? If so, according to the bias and RMSE values, the model's performance in simulating CO2 does not appear to be satisfactory.

Line 345 and Figure 8: For the CO data—or the data depicted in this figure—what time period does it cover? From Figure 1, it is difficult to immediately locate the Pinheiros site.

Line 349-350: the “hourly correlation” here is R or R²? I do not think a R with a value of 0.25 shows a good correlation.

Line 351-353: The authors mention that the correlation is good before 10 AM and after 19 PM, while the correlation is poor at noon due to the effects of vegetation. However, vegetation also plays a role at night, doesn't it?

Line 359-362: According to the authors' analysis, before August 13, both CO2 and CO show peaks, with a large part of the CO2 concentrations at IAG coming from vehicular sources. However, the model struggles to capture this due to less accurate emission data, as emissions follow the same diurnal variation every day of the month. Given this, how should the authors explain the period between August 18 and 28, where only a CO peak is observed? What could account for the absence of a CO2 peak during this period, especially if vehicular emissions are still expected to contribute significantly to CO2 concentrations?

Line 380: The authors are comparing XCO2 in this section. If the surface wind is overestimated, it does not necessarily mean that the winds at higher altitudes are also overestimated, right?

Line 381-382: The authors suddenly conclude, without any supporting figures or analysis, that there are "errors in the initial and boundary conditions of concentration provided by the Carbon Tracker." This conclusion is also highlighted in the abstract with the phrase "the large-scale contribution in global models." How did the authors arrive at this conclusion? Could they provide evidence to support this conclusion?

Section 4 (Conclusion) mainly introduce what was done in this study and some already well-known prior knowledge (e,g, wind is a pivotal factor; planetary boundary layer dynamics), rather than effectively highlighting the main conclusions of the paper. It lacks a clear synthesis of the key findings and insights that emerge from the analysis.

Minor comments
Line 55-58: The manuscript mentions WRF-GHG, WRF-Chem, and WRF when referring to the model. WRF-GHG has existed for a long time and is an older version. However, it was incorporated into WRF-Chem and became a module in WRF-Chem shortly after WRF-GHG was made publicly available. Since the authors are using WRF-Chem V4.0, why are they still referring to WRF-GHG? Did Beck et al. (2011) make any specific modifications to the model? Lines 57–58 do not clarify this point, as the GHG module in WRF-Chem also does not include chemical processes. If the authors merely added a tracer to the model without other significant modifications, it would theoretically still be appropriate to refer to it as WRF-Chem. I suggest the authors standardize the terminology throughout the manuscript to avoid confusion for readers unfamiliar with the model.

Table 3: the PARo value changes from 570 to 178615 for Atlantic Forest. Does the author have some explanation? Does this value have actual physical meaning? Is it reasonable? Or is it just a mathematical optimization?

Technical corrections
Line 125, Fig 4 caption: typo, it should be PFT as in Line 123, not PTF.

Line 132: Add S and W to the latitude and longitude.

Line 191: typo, “wind Direction” D should be small letter.

Line 196: typo, Figure 3?

Fig 5 caption, Table 4 caption, Line 275, Line 303, Figure 9 caption, etc: CO2 should be CO₂

Line 355: why “both profiles (modeled and simulated CO2)”? modeled and simulated are the same?

Line 221: I suggest to describe by PFT would be more appropriate than simply say ”negative across most of the domain”

Line 301: 09-17h local is not only mid-afternoon but daytime.

Line 340 and 341: typo for Figure A2b and A2d, the authors refer to wrong figures.

Line 357: reference typo

Line 376: “positive RMSE”? RMSE is always positive…
Citation: https://doi.org/10.5194/egusphere-2024-3060-RC1
- AC1: 'Reply on RC1', Rafaela Cruz Alves Alberti, 07 Feb 2025
  
  Please find the final author comments in the supplement on RC1
  
  Citation: https://doi.org/10.5194/egusphere-2024-3060-AC1
RC2:
'Comment on egusphere-2024-3060', Anonymous Referee #2, 26 Nov 2024

Please see the attached file.

Citation: https://doi.org/10.5194/egusphere-2024-3060-RC2
- AC2: 'Reply on RC2', Rafaela Cruz Alves Alberti, 07 Feb 2025
  
  Please find the final author comments in the supplement on RC2
  
  Citation: https://doi.org/10.5194/egusphere-2024-3060-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Rafaela Cruz Alves Alberti on behalf of the Authors (17 Feb 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (25 Feb 2025) by Chris Wilson

RR by Anonymous Referee #1 (19 Mar 2025)

ED: Reconsider after major revisions (19 Mar 2025) by Chris Wilson

AR by Rafaela Cruz Alves Alberti on behalf of the Authors (10 Jun 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (19 Jun 2025) by Chris Wilson

AR by Rafaela Cruz Alves Alberti on behalf of the Authors (23 Jun 2025) Manuscript

Journal article(s) based on this preprint

04 Sep 2025

Monitoring and modeling seasonally varying anthropogenic and biogenic CO₂ over a large tropical metropolitan area

Atmos. Chem. Phys., 25, 9803–9829, https://doi.org/10.5194/acp-25-9803-2025,https://doi.org/10.5194/acp-25-9803-2025, 2025

Short summary

Viewed

Total article views: 941 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
562	168	211	941	44	46

HTML: 562
PDF: 168
XML: 211
Total: 941
BibTeX: 44
EndNote: 46

Views and downloads (calculated since 24 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	103	20	2	125
Nov 2024	115	26	6	147
Dec 2024	39	14	2	55
Jan 2025	33	17	0	50
Feb 2025	38	11	38	87
Mar 2025	16	9	48	73
Apr 2025	19	22	45	86
May 2025	18	7	42	67
Jun 2025	26	11	24	61
Jul 2025	31	11	2	44
Aug 2025	101	20	2	123
Sep 2025	23	0	23

Cumulative views and downloads (calculated since 24 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	103	20	2	125
Nov 2024	115	26	6	147
Dec 2024	39	14	2	55
Jan 2025	33	17	0	50
Feb 2025	38	11	38	87
Mar 2025	16	9	48	73
Apr 2025	19	22	45	86
May 2025	18	7	42	67
Jun 2025	26	11	24	61
Jul 2025	31	11	2	44
Aug 2025	101	20	2	123
Sep 2025	23	0	23

Viewed (geographical distribution)

Total article views: 932 (including HTML, PDF, and XML) Thereof 932 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 04 Sep 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (3113 KB)
Metadata XML

Short summary

This study addresses uncertainties in atmospheric models by analyzing CO₂ dynamics in a complex urban environment characterized by a dense population and tropical vegetation. High-accuracy sensors were deployed, and the WRF-GHG model was utilized to simulate CO₂ transport, capturing variations and assessing contributions from both anthropogenic and biogenic sources.


Total:	0
HTML:	0
PDF:	0
XML:	0

Monitoring and modeling seasonally varying anthropogenic and biogenic CO2 over a large tropical metropolitan area

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)

Monitoring and modeling seasonally varying anthropogenic and biogenic CO₂ over a large tropical metropolitan area