the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating the meteorological transport model ensemble for accounting uncertainties in carbon flux estimation over India
Abstract. The existing climate change scenario calls for immediate intervention to curb rising greenhouse gas emissions. An improved understanding of the regional distributions of carbon sources and sinks under the perturbed climate system is vital for assisting the above mitigation efforts. The current uncertainties in estimation can potentially be reduced by employing a multi-data modelling system capable of representing atmospheric tracer transport adequately. This study focuses on the mesoscale transport patterns that can affect atmospheric tracer distribution and examines how well they are represented in the meteorological models employed. We investigate the capability of the Weather Research and Forecasting (WRF) model to predict meteorological fields such as temperature, humidity, wind, and planetary boundary layer height (PBLH) by comparing different model simulations with surface and vertical profile observations available at urban and rural stations, Cochin and Gadanki, and with global reanalysis data over India. Combining different model schemes and data products allows us to present a model ensemble of 11 members. Using these ensemble simulations, the impacts of changes in physics schemes, initial and boundary conditions, and spatial resolutions on meteorology and, consequently, on CO2 mixing ratio simulations are quantified. Most simulations capture variations in temperature and moisture very well (R2> 0.75). The wind (R2> 0.75 for height above 2 km) and PBLH simulations (R2> 0.75 for daytime) are also reasonably correlated with the observations. The sensitivity to changing planetary boundary layer (PBL) schemes and land surface model (LSM) schemes on meteorological and CO2 mixing ratio simulations is significant, thereby producing higher inter-model differences between experiments. Our analysis provides an assessment of expected CO2 transport errors when using WRF-like models in the inverse modelling framework. We emphasise the importance of treating these errors in the carbon data assimilation system to utilize the full potential of the measurements and conclude that WRF can be utilised as a potential transport model for the regional carbon flux estimations in India.
- Preprint
(7788 KB) - Metadata XML
-
Supplement
(898 KB) - BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2023-2334', Anonymous Referee #1, 14 Jul 2024
reply
The manuscript by Thara Anna Mathew and coworkers presents an analysis of an ensemble of regional atmospheric model simulations against meteorological observations and an analysis of ensemble spread of simulated CO2 concentrations. The meteorological simulations consist of a 'physics-perturbed' ensemble using, among others, various boundary layer and land surface options within the WRF model. The CO2 ensemble is then built using the Lagrangian dispersion model STILT, driven by the different meteorological ensemble members. The paper presents a tedious, not always convincing comparison of WRF simulations with observed meteorology. Unfortunately, this is mostly done for only two sites (including profile data though) and a single month, which seems too isolated to be representative for all of India and the whole year. For CO2 concentrations, only model simulations at the same two locations are discussed, no observations. A comparison with obersvational data would have strengthened the manuscript. The presentation of objectives, methods and results is not always clear and requires improvements before publication.
Â
Major comments
Â
1) Aim of the study
From the title of the manuscript 'flux estimation' one may think that the prime aim of the paper is the quantification of the model uncertainty as this is an important input in inverse modelling. However, the main focus on the paper seems to be on identifying which WRF configuration produces the most reliable meteorological fields. Although these two questions are related, I would like to caution that a badly chosen ensemble (i.e., including configuration that cause obvious biases) is not a good predictor of model uncertainty for the 'best possible' model configuration, which should be used for inverse modelling. The authors should clarify what is the main purpose of their study and align the title of their manuscript accordingly.
Consequently, the whole discussion of the findings remains rather vague and general. What are the real consequences of the study? A best choice model configuration? Or at least a more narrow selection of possible configurations. Do the findings go beyond the expectation of what was already known about different configurations? Would the presented ensemble be a good estimator of model-data-mismatch uncertainty for an inverse estimate?
L73ff: The introduction states 4 questions that the manuscript tries to answer. Although, these questions are generally addressed by the study setup and there is a lot of elaboration on details, I find that there is no concrete answers to these questions given. Yes, the model results are sensitive to different parameterisations, which should not surprise, and yes, the land surface scheme plays a major role for variables in the PBL, again no surprise, but what is the conclusion, which scheme or schemes if there is not a single scheme that fits all situations, should be used.
Â
2) Comparison of PBL heights
Two methods are mentioned that are used for calculating PBL heights from observations. Both seem to be rather simple methods solely based on temperature profiles. There are other methods that incorporate additional parameters (e.g., Richardson Bulk methods) and may be more reliable (e.g., Seibert et al., 2020; Collaud Coen et al., 2016). The extremely large differences in PBLH as estimated from the two different methods (Fig. 8) seems completely unreasonable since, according to the text, the only difference would be the use of virtual potential temperature vs. potential temperature. Furthermore, there is no information on how WRF diagnoses boundary layer heights. Is a comparable method used for the model data or rather something more sophisticated? I suggest to revisit this analysis of PBL heights and add missing information. PBL height, next to wind speed, is usually the main driver of uncertainty of PBL concentration simulations and as such there needs to be more confidence in this comparison.
Â
3) CO2 simulations.
The description and motivation of CO2 simulations is not sufficient. For one, why is another model used for this part instead of doing the tracer transport in WRF itself. I suspect this is because of the potential use of STILT for inverse modelling. However, this is never explained in any detail and STILT comes along a little bit as a surprise. Second, I have some general doubts about how much of the ensemble spread in WRF is then actually reflected in STILT simulations. As STILT is driven offline and has its own parameterization of turbulent transport some of the WRF ensemble differences may be ignored. I would be good to understand better which variables actually drive STILT and how much these vary in the WRF ensemble. Third, it seems like temporally constant fluxes were used. This is an oversimplification for CO2 and may well result in completely different ensemble variability compared to a tracer with variable sources/sinks, especially during the course of the day. For example, anthropogenic CO2 fluxes can be expected to be lower than average during the night, but peak in the morning hours, just when PBL heights are changing. Fourth, there is no description of biospheric fluxes in the methods section (2.6). Nevertheless, biospheric CO2 contributions are presented later on. What biospheric fluxes were used? Or are these no biospheric concentrations at all but simple CO2 form biogenic sources (combustion of bio-fuels)? In the latter case this would be an oversimplification especially for the rural site where we can expect the biosphere to be an important influence for atmospheric CO2. Finally, it remains unclear if the simulations present surface CO2 simulations. Since it is recommended to place atmospheric CO2 observations, used for flux inversions, at tall tower (> 100 m above ground), it would be better to discuss the simulated variability at similar heights above ground (or even at several typical heights). Â
Â
4) Comparison metrics
The model simulations of relative humidity are compared to observations. However, relative humidity is strongly driven by temperature, which makes relative humidity a bad metric to quickly establish moisture biases in the model. I would suggest to replace relative with specific humidity in all these comparisons, also because the focus is mostly on dry atmospheres and not saturated conditions.
Â
6) Presentation of results
The presentation of the different comparisons to observations is quite confusing and not prepared very carefully. Often statements in the text cannot be seen in any of the figures or it remains unclear where one should look. Some of this confusion is added because different kinds of plots are presented for different comparison data sets. I would suggest to align the figures in the main manuscript to be similar for all comparisons. Show either bias or only absolute values, and put the corresponding additional plot into the supplement. I also don't see the benefit of presenting different ensemble members as sub-sets with mean and standard deviation. Plots like S2, where simply all members are shown with a color code that also reflects the sub-sets, seem to be more intuitive and allow identification of individual runs that vary from the others. The section on meteorology comparison is also very long and should be written up much more concisely. For example Figure 7 largely repeats what is shown in 6 already. One of them could be omitted.
Â
7) Representativeness of study
There is no discussion why the two given sites and the month of May were chosen for the analysis. Besides the fact that one site is urban and one is rural, they hardly seem to cover the kind of meteorological variability that can be expected for the whole of India. Please motivate the choice of the study sites. Also given the overarching question of using observations for flux estimation, I would question why an urban site was selected at all, given the coarse resolution of the employed transport model. At the given resolution or even when going down to 3 km as done in one run, one cannot expect the model to represent urban CO2 observations reasonably well. The authors should clarify what kind of flux inversion system they have in mind, national or city scale. For the latter, their model tool would not be adequate.
Â
Specific comments
L96: STR observation availability: I am not sure I understand the availability correctly. Do you mean that observations are mostly available during daylight hours and for some days also for the night? Can you briefly say why that is the case and how this affects the following analysis?
Â
L98: Could you provide an estimate of the STR accuracy and precision if relevant for the model comparison?
Â
Table 2: The information in the table could be arranged in a more concise way. In the middle column it is difficult to establish, which part really changed for the different sets of model runs. I would suggest to omit all elements of the model setup that are the same for all runs (e.g., Microphysics, radiation, etc.). These can be described in the text. Then add separate columns for elements that change from run to run (e.g. PBL scheme, land surface model, resolution etc).
Â
Table 2, Expt 10: According to the table it looks like the higher resolution nest was also run with a deep convection parameterization. Shouldn't this type of convection be mostly resolved at 3 km resolution and, hence, the 3 km run not employ an additional deep convection parameterization?
Â
L105: Please clarify if the balloon burst happens above the height range that you later compare with the model.
Â
Figure 1: Please improve. No vertical displacement needed for the sub-panels. Avoid white space? Subpanel labels (a,b,c). Labels and legends for the detailed maps are way too small.
Â
L195ff: In section 2.2.1 it is explained that an ERA5 ensemble is used for initializing the WRF simulations. Here, ERA-Interim data is suggested as a comparison dataset to see how the 'regional model modifies the initial data'. Why not use ERA5 here as well? You need to explain why two different sets were preferred.
Â
L251-252:Â I would like a clarification about what these squared correlation coefficients were calculated for. Is this the correlation for the whole time series of hourly data for the whole month? Or is this the correlation of the monthly mean diurnal cycle. If it is the latter, as I suspect, then I don't think this is a good metric for model evaluation as I would expect any reasonable model to get the monthly mean diurnal cycle right. More interesting would be to see if that is also the case for a longer time period that would also include day-to-day differences.
Â
L257/258: This daytime bias does not exist after 13:30, as the observations reach peak temperatures later in the day. Please correct text.
Â
L260: 'slightly closer to the observations'. What is this conclusion based on? The smaller RSME? RMSE is similar during night and day. But absolute bias is much larger during the day, at least according to the numbers given in this sentence. If the numbers are correct, they contradict what was said about cold and warm biases in the sentence before. Please check again.
Â
L268: The results from Expt 7 in terms of moisture look really strange. Is there something going completely wrong? Too moist, wind speeds too large.
Â
L275: It should also be mentioned that the model is especially bad at night-time and morning hours, whereas in the afternoon the overestimation is much less obvious and Set 3 actually gets very close to the obs.
Figure 2: What drives the large observed standard deviation at Gadanki at 9:30 and 12:30 (panel c)? These go along with low average values as well. Were the observational data quality controlled and outliers (flagged data) removed before calculation of diurnal averages? Similar, but less obvious, for RH at 7:30 (panel d).
Â
Figure 2: Is this the standard deviation of the simulations calculated just from the different runs in each set, applied to the mean (hourly) quantity or is it calculated over all data at this hour and from all experiments? I am not convinced that the latter is a good indicator of ensemble spread nor am I convinced that the first would be robust given the small number of members per set. Maybe showing individual members like in the supplement plots would be sufficient, these could still be grouped by color.
Â
L279: How were wind speed/direction averaged to monthly values? Vector average? Same treatment for observations and model? The simulated change of wind direction may indicate that the model picks up some thermally-driven flow pattern (valley winds or sea breeze) that does not seem to exist in reality. Please try to comment in manuscript. In Expt 7 general wind speeds seem too large compared to observations and no such thermally-driven cell seems to develop.
Â
L282: I cannot follow this discussion from the provided figures. Is this a reference to (random) variability at individual hours (not shown) or still referring to Fig 2?
Â
L290: Consider starting new paragraph before "Figure 4".
Â
L291ff (and elsewhere): Somehow the defined periods (t1 to t4) don't align well with general PBL development phases. I would have used something more in line with the surface heat flux as driver in mind. For example a morning transition phase from sunrise to sun maximum, followed by an unstable phase until sunset, transition phase until midnight, stable phase until sunrise (compare classical PBL development by Stull). Consider revising or at least give more reasoning on why periods of day were selected as such.
Â
Figure 4: According to the figure the height bins reach from the surface to different maximal height. Why no use distinct height bins (0-2, 2-4, 4-6, 6-8)? Also the text following L296 seems to suggest that the 0-6 and 0-8 are still interpreted as 'upper levels'. Does this mean that the labels in Figure 4 are not correct?
Â
Figure 4: There is no detailed interpretation of Figure 4 besides that correlation is bad for t3 and t4. No discussion on different experiments providing better or worse results under different conditions. The same plot for bias would be helpful as well in understanding if certain model configurations work better.
Â
L300f: The sentence seems to be missing a reference to Figure 5.
Â
L302: Where can one see the 'high correlation' mentioned in the text?
Â
L308f: A considerable influence of PBL, LSM, UCM is mentioned, but it is not mentioned which set performs better than others. Some speculation on why certain differences are observed would be helpful as well. Alternatively, this could be done in the discussion, but then this sentence should also be moved there.
Â
L313: 'models overestimate RH' As suggested above, I would move to specific humidity instead of relative humidity. Probably, it would be easier to understand impact of different LSM/UCM runs.
Â
Figure 3: It is very difficult to distinguish the uncertainty bands. Use transparent colors or omit the ribbons and simply plot all ensemble members. It would be good to discuss the temperature profiles themselves or even better potential temperature profiles as these would allow interpretation in terms of atmospheric stability.
Â
Figure 3: Why were the 10:30 and 22:30 profiles chosen and not the times of minimum and maximum temperature (5:30 and 14:30) and, most likely, most mature state of stable and unstable PBL?
Â
Figure 5: Wind direction profiles are not discussed in the text at all.
Â
All figures showing a bias: It would be helpful to have additional vertical (or horizontal) lines indicating 0 bias. Especially Figure 5.
Â
L320, Figure S4: The caption of Fig. S4 seems wrong. There is no bias shown in figure. Why is the observed profile not much smoother if it represents the monthly mean?
Â
L322f: What is this conclusion based on? The spread at 10:30 below 2 km seems as large as the spread at 16:30 below 3 km (about 2-3 m/s). So I would say that the spread in the PBL is similar at both night and day. Relative difference would even be smaller at day, contrary to what is stated in the sentence.
Â
L324: 10 m wind speed was only compared for Cochin, where models seem to overestimate wind speed especially at night. However, the 10 m comparison is usually more difficult and depends strongly on used roughness parameters. The findings from the STR comparison are in line with the radiosonde comparison (Fig. 5c). Hence, I would not emphasize the 10 m comparison here.
Â
L325: "Expt. 7. Captures vertical wind speed profile". However, from Fig. S2 we know that Expt. 7 performed worst in terms of surface variables at both sites. In Fig 5 one cannot see how Expt. 7 compares to the radiosondes at Gadanki, but it does not look as if this model run sticks out. Why does it behave so differently for the surface wind then? Low roughness? Â
Â
L327f: The sentence repeats largely what was just said (Expt 7 being different from the others). Again it is not clear which of the figures the statement corresponds to (Fig. 6 or S4)? In general, I am also not able to see corresponding results between S4 and Fig. 6. I would have thought that one (S4) shows the absolute monthly averages and the other (Fig. 6) the bias, but if one starts comparing this cannot be (e.g., very small differences in wind speed at 5 km in S4 (both 10:30 and16:30), but in Fig. 6 there is a negative bias throughout the whole column. Does S4 only represent a single day? What would be the reason to show it then?
Â
L332f: Where can I see that Expt.7 and 5 perform better? Not obvious from Fig S4 nor Fig 6.
Â
L338 and elsewhere: Don't refer to the different runs by mentioning a specific scheme, rather refer to the defined experiment number. Confusing.
Â
Figure 6: Reconsider how to display wind direction bias. Values of absolute wind direction bias > 180° don't exist. So displaying a negative bias smaller -180° should become a positive bias < 180°. Best just to display the absolute value …
Â
L345f: PBLH is essentially zero for Expt 1-3 for the whole night. How is this possible? This could indicate a very strong surface temperature inversion in these experiments. However, the comparison with 2 meter temperatures did not reveal large differences between these experiments and others.
Â
L352: Which observation-based method are you referring to? The average from both methods?
Â
L353: With generally larger PBL heights during the day, it is not surprising that the RMSE is also larger during the day. A normalized RMSE would be more interesting also with respect to the uncertainty of CO2 accumulation in the PBL.
Â
L361: Most likely this has to do with different model altitudes. Were temperatures in IMD adiabatically correct to sea level pressure?
Â
L365ff and reference to Fig S3: Figure S3 shows observed monthly mean temperature profiles and simulations from the ensemble, but nothing about ERA-Interim or MERRA-2. There is no such Figure in the supplement. Hence, the following description cannot be reviewed. In the following two Figures are described (labeled as not shown). I think these need to be part of the supplement.
Â
Figure 8a and b: There are two kinds of black dashed lines. But only one explained in the legend. Should there be an additional legend entry for Stull method? Also consider clipping the negative values. They are unphysical.
Â
Figure 8b: Colors of different sets should be repeated in the legend as well. Why the fat black dots, they dominate the figure and hide the important details. A more appropriate transparency color is needed for the ribbons as right now one cannot really see anything.
Â
Table 3: Insufficient caption: Repeat the meaning of t1-t4. First column should just be called Experiment (written out so the meaning is clear). Then numbers alone would be sufficient. Also consider less digits for RMSE and MBE. Difficult to read otherwise.
Â
Figure 10: Please add an explanation what the boxplot represents. From the text one can guess that the box itself is inner-quantile range, but what do lines indicate? Min/max, other percentiles?
Â
L389: Please explain how 'average uncertainty in total CO2' was calculated? Do you calculate the ensemble standard deviation for each hour of the simulation period and then the average? Which part of the model simulation was considered for that? Total CO2, regional CO2 increments (anthropogenic + biospheric)?
Â
L392: Similar question as before, does Figure 10 display ensemble variability and/or temporal variability.
Â
L396 and elsewhere: The abbreviation ffCO2 was not explained before. I suspect that this should indicate fossil fuel CO2. However, part of the anthropogenic emissions (as contained in EDGAR) will be from biofuels and therefore the term should not be ffCO2 but anthropogenic CO2. Unless these parts were separated and wrongly described as 'biospheric' CO2 (see comment above). Which would be very misleading. A better term would then be biofuel CO2 vs fossil fuel CO2. Please clarify.
Â
L399: Remarkable that biospheric (if it really is biospheric) CO2 remains positive throughout the day (i.e., respiration dominating over photosynthetic uptake). Is this due to the chosen month being in the dry season? But even then I would have expected a negative signal during the day. Or was there no separate treatment of respiration and uptake? Or (as above) does biospheric CO2 refer to biofuel CO2, which would explain the positive values.
Â
L439: Fig 2 does not show Expt. 10. Also in Fig S2 there seems to be no serious improvement of Expt. 10 over the other simulations. Temperature bias seems to even increase.
Â
L439: Table 2 does not present results, but lists the experiments.
Â
L447: On the contrary, adding temporal variability in the fluxes will, at certain times, also lead to larger ensemble spread (e.g., when large meet with large meteorological spread).
Â
L461: I don't agree with this interpretation of wind variability being the main or even only driver of CO2 uncertainty. In addition, it is the lower PBLH that leads to increased concentrations at night and the spread in PBLH seems to be much larger as that in wind speed (especially Expt 1-4 vs the other cases).
Â
L483f: However, there was underestimation of wind speed throughout the whole column up to 8 km for which PBL dynamics are not responsible.
Â
L490: What does 'statistically significant' refer to? Which statistical test for what?
Â
Â
Technical comments
L85: Not clear what stratosphere-troposphere stands for at this point. I suppose adding the word 'radar' and changing the abbreviation to the later used 'STR' would solve the issue.
L322: What decreases with altitude? The boundary layer? Please rephrase sentence.
Figure 7: In the caption it should say 'between' rather than 'among'.
Fig S1: Different colors need to be described in legend and/or figure caption.
Â
References
Collaud Coen, M., Praz, C., Haefele, A., Ruffieux, D., Kaufmann, P., and Calpini, B.: Determination and climatology of the planetary boundary layer height above the Swiss plateau by in situ and remote sensing measurements as well as by the COSMO-2 model, Atmos. Chem. Phys., 14, 13205-13221, doi: 10.5194/acp-14-13205-2014, 2014.
Seibert, P., Beyrich, F., Gryning, S. E., Joffre, S., Rasmussen, A., and Tercier, P.: Review and intercomparison of operational methods for the determination of the mixing height, Atmos. Environ., 34, 1001-1027, doi, 2000.
Â
Citation: https://doi.org/10.5194/egusphere-2023-2334-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
175 | 62 | 24 | 261 | 31 | 18 | 13 |
- HTML: 175
- PDF: 62
- XML: 24
- Total: 261
- Supplement: 31
- BibTeX: 18
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1