the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating the meteorological transport model ensemble for accounting uncertainties in carbon flux estimation over India
Abstract. The existing climate change scenario calls for immediate intervention to curb rising greenhouse gas emissions. An improved understanding of the regional distributions of carbon sources and sinks under the perturbed climate system is vital for assisting the above mitigation efforts. The current uncertainties in estimation can potentially be reduced by employing a multi-data modelling system capable of representing atmospheric tracer transport adequately. This study focuses on the mesoscale transport patterns that can affect atmospheric tracer distribution and examines how well they are represented in the meteorological models employed. We investigate the capability of the Weather Research and Forecasting (WRF) model to predict meteorological fields such as temperature, humidity, wind, and planetary boundary layer height (PBLH) by comparing different model simulations with surface and vertical profile observations available at urban and rural stations, Cochin and Gadanki, and with global reanalysis data over India. Combining different model schemes and data products allows us to present a model ensemble of 11 members. Using these ensemble simulations, the impacts of changes in physics schemes, initial and boundary conditions, and spatial resolutions on meteorology and, consequently, on CO2 mixing ratio simulations are quantified. Most simulations capture variations in temperature and moisture very well (R2> 0.75). The wind (R2> 0.75 for height above 2 km) and PBLH simulations (R2> 0.75 for daytime) are also reasonably correlated with the observations. The sensitivity to changing planetary boundary layer (PBL) schemes and land surface model (LSM) schemes on meteorological and CO2 mixing ratio simulations is significant, thereby producing higher inter-model differences between experiments. Our analysis provides an assessment of expected CO2 transport errors when using WRF-like models in the inverse modelling framework. We emphasise the importance of treating these errors in the carbon data assimilation system to utilize the full potential of the measurements and conclude that WRF can be utilised as a potential transport model for the regional carbon flux estimations in India.
- Preprint
(7788 KB) - Metadata XML
-
Supplement
(898 KB) - BibTeX
- EndNote
Status: closed (peer review stopped)
-
RC1: 'Comment on egusphere-2023-2334', Anonymous Referee #1, 14 Jul 2024
The manuscript by Thara Anna Mathew and coworkers presents an analysis of an ensemble of regional atmospheric model simulations against meteorological observations and an analysis of ensemble spread of simulated CO2 concentrations. The meteorological simulations consist of a 'physics-perturbed' ensemble using, among others, various boundary layer and land surface options within the WRF model. The CO2 ensemble is then built using the Lagrangian dispersion model STILT, driven by the different meteorological ensemble members. The paper presents a tedious, not always convincing comparison of WRF simulations with observed meteorology. Unfortunately, this is mostly done for only two sites (including profile data though) and a single month, which seems too isolated to be representative for all of India and the whole year. For CO2 concentrations, only model simulations at the same two locations are discussed, no observations. A comparison with obersvational data would have strengthened the manuscript. The presentation of objectives, methods and results is not always clear and requires improvements before publication.
Major comments
1) Aim of the study
From the title of the manuscript 'flux estimation' one may think that the prime aim of the paper is the quantification of the model uncertainty as this is an important input in inverse modelling. However, the main focus on the paper seems to be on identifying which WRF configuration produces the most reliable meteorological fields. Although these two questions are related, I would like to caution that a badly chosen ensemble (i.e., including configuration that cause obvious biases) is not a good predictor of model uncertainty for the 'best possible' model configuration, which should be used for inverse modelling. The authors should clarify what is the main purpose of their study and align the title of their manuscript accordingly.
Consequently, the whole discussion of the findings remains rather vague and general. What are the real consequences of the study? A best choice model configuration? Or at least a more narrow selection of possible configurations. Do the findings go beyond the expectation of what was already known about different configurations? Would the presented ensemble be a good estimator of model-data-mismatch uncertainty for an inverse estimate?
L73ff: The introduction states 4 questions that the manuscript tries to answer. Although, these questions are generally addressed by the study setup and there is a lot of elaboration on details, I find that there is no concrete answers to these questions given. Yes, the model results are sensitive to different parameterisations, which should not surprise, and yes, the land surface scheme plays a major role for variables in the PBL, again no surprise, but what is the conclusion, which scheme or schemes if there is not a single scheme that fits all situations, should be used.
2) Comparison of PBL heights
Two methods are mentioned that are used for calculating PBL heights from observations. Both seem to be rather simple methods solely based on temperature profiles. There are other methods that incorporate additional parameters (e.g., Richardson Bulk methods) and may be more reliable (e.g., Seibert et al., 2020; Collaud Coen et al., 2016). The extremely large differences in PBLH as estimated from the two different methods (Fig. 8) seems completely unreasonable since, according to the text, the only difference would be the use of virtual potential temperature vs. potential temperature. Furthermore, there is no information on how WRF diagnoses boundary layer heights. Is a comparable method used for the model data or rather something more sophisticated? I suggest to revisit this analysis of PBL heights and add missing information. PBL height, next to wind speed, is usually the main driver of uncertainty of PBL concentration simulations and as such there needs to be more confidence in this comparison.
3) CO2 simulations.
The description and motivation of CO2 simulations is not sufficient. For one, why is another model used for this part instead of doing the tracer transport in WRF itself. I suspect this is because of the potential use of STILT for inverse modelling. However, this is never explained in any detail and STILT comes along a little bit as a surprise. Second, I have some general doubts about how much of the ensemble spread in WRF is then actually reflected in STILT simulations. As STILT is driven offline and has its own parameterization of turbulent transport some of the WRF ensemble differences may be ignored. I would be good to understand better which variables actually drive STILT and how much these vary in the WRF ensemble. Third, it seems like temporally constant fluxes were used. This is an oversimplification for CO2 and may well result in completely different ensemble variability compared to a tracer with variable sources/sinks, especially during the course of the day. For example, anthropogenic CO2 fluxes can be expected to be lower than average during the night, but peak in the morning hours, just when PBL heights are changing. Fourth, there is no description of biospheric fluxes in the methods section (2.6). Nevertheless, biospheric CO2 contributions are presented later on. What biospheric fluxes were used? Or are these no biospheric concentrations at all but simple CO2 form biogenic sources (combustion of bio-fuels)? In the latter case this would be an oversimplification especially for the rural site where we can expect the biosphere to be an important influence for atmospheric CO2. Finally, it remains unclear if the simulations present surface CO2 simulations. Since it is recommended to place atmospheric CO2 observations, used for flux inversions, at tall tower (> 100 m above ground), it would be better to discuss the simulated variability at similar heights above ground (or even at several typical heights).
4) Comparison metrics
The model simulations of relative humidity are compared to observations. However, relative humidity is strongly driven by temperature, which makes relative humidity a bad metric to quickly establish moisture biases in the model. I would suggest to replace relative with specific humidity in all these comparisons, also because the focus is mostly on dry atmospheres and not saturated conditions.
6) Presentation of results
The presentation of the different comparisons to observations is quite confusing and not prepared very carefully. Often statements in the text cannot be seen in any of the figures or it remains unclear where one should look. Some of this confusion is added because different kinds of plots are presented for different comparison data sets. I would suggest to align the figures in the main manuscript to be similar for all comparisons. Show either bias or only absolute values, and put the corresponding additional plot into the supplement. I also don't see the benefit of presenting different ensemble members as sub-sets with mean and standard deviation. Plots like S2, where simply all members are shown with a color code that also reflects the sub-sets, seem to be more intuitive and allow identification of individual runs that vary from the others. The section on meteorology comparison is also very long and should be written up much more concisely. For example Figure 7 largely repeats what is shown in 6 already. One of them could be omitted.
7) Representativeness of study
There is no discussion why the two given sites and the month of May were chosen for the analysis. Besides the fact that one site is urban and one is rural, they hardly seem to cover the kind of meteorological variability that can be expected for the whole of India. Please motivate the choice of the study sites. Also given the overarching question of using observations for flux estimation, I would question why an urban site was selected at all, given the coarse resolution of the employed transport model. At the given resolution or even when going down to 3 km as done in one run, one cannot expect the model to represent urban CO2 observations reasonably well. The authors should clarify what kind of flux inversion system they have in mind, national or city scale. For the latter, their model tool would not be adequate.
Specific comments
L96: STR observation availability: I am not sure I understand the availability correctly. Do you mean that observations are mostly available during daylight hours and for some days also for the night? Can you briefly say why that is the case and how this affects the following analysis?
L98: Could you provide an estimate of the STR accuracy and precision if relevant for the model comparison?
Table 2: The information in the table could be arranged in a more concise way. In the middle column it is difficult to establish, which part really changed for the different sets of model runs. I would suggest to omit all elements of the model setup that are the same for all runs (e.g., Microphysics, radiation, etc.). These can be described in the text. Then add separate columns for elements that change from run to run (e.g. PBL scheme, land surface model, resolution etc).
Table 2, Expt 10: According to the table it looks like the higher resolution nest was also run with a deep convection parameterization. Shouldn't this type of convection be mostly resolved at 3 km resolution and, hence, the 3 km run not employ an additional deep convection parameterization?
L105: Please clarify if the balloon burst happens above the height range that you later compare with the model.
Figure 1: Please improve. No vertical displacement needed for the sub-panels. Avoid white space? Subpanel labels (a,b,c). Labels and legends for the detailed maps are way too small.
L195ff: In section 2.2.1 it is explained that an ERA5 ensemble is used for initializing the WRF simulations. Here, ERA-Interim data is suggested as a comparison dataset to see how the 'regional model modifies the initial data'. Why not use ERA5 here as well? You need to explain why two different sets were preferred.
L251-252: I would like a clarification about what these squared correlation coefficients were calculated for. Is this the correlation for the whole time series of hourly data for the whole month? Or is this the correlation of the monthly mean diurnal cycle. If it is the latter, as I suspect, then I don't think this is a good metric for model evaluation as I would expect any reasonable model to get the monthly mean diurnal cycle right. More interesting would be to see if that is also the case for a longer time period that would also include day-to-day differences.
L257/258: This daytime bias does not exist after 13:30, as the observations reach peak temperatures later in the day. Please correct text.
L260: 'slightly closer to the observations'. What is this conclusion based on? The smaller RSME? RMSE is similar during night and day. But absolute bias is much larger during the day, at least according to the numbers given in this sentence. If the numbers are correct, they contradict what was said about cold and warm biases in the sentence before. Please check again.
L268: The results from Expt 7 in terms of moisture look really strange. Is there something going completely wrong? Too moist, wind speeds too large.
L275: It should also be mentioned that the model is especially bad at night-time and morning hours, whereas in the afternoon the overestimation is much less obvious and Set 3 actually gets very close to the obs.
Figure 2: What drives the large observed standard deviation at Gadanki at 9:30 and 12:30 (panel c)? These go along with low average values as well. Were the observational data quality controlled and outliers (flagged data) removed before calculation of diurnal averages? Similar, but less obvious, for RH at 7:30 (panel d).
Figure 2: Is this the standard deviation of the simulations calculated just from the different runs in each set, applied to the mean (hourly) quantity or is it calculated over all data at this hour and from all experiments? I am not convinced that the latter is a good indicator of ensemble spread nor am I convinced that the first would be robust given the small number of members per set. Maybe showing individual members like in the supplement plots would be sufficient, these could still be grouped by color.
L279: How were wind speed/direction averaged to monthly values? Vector average? Same treatment for observations and model? The simulated change of wind direction may indicate that the model picks up some thermally-driven flow pattern (valley winds or sea breeze) that does not seem to exist in reality. Please try to comment in manuscript. In Expt 7 general wind speeds seem too large compared to observations and no such thermally-driven cell seems to develop.
L282: I cannot follow this discussion from the provided figures. Is this a reference to (random) variability at individual hours (not shown) or still referring to Fig 2?
L290: Consider starting new paragraph before "Figure 4".
L291ff (and elsewhere): Somehow the defined periods (t1 to t4) don't align well with general PBL development phases. I would have used something more in line with the surface heat flux as driver in mind. For example a morning transition phase from sunrise to sun maximum, followed by an unstable phase until sunset, transition phase until midnight, stable phase until sunrise (compare classical PBL development by Stull). Consider revising or at least give more reasoning on why periods of day were selected as such.
Figure 4: According to the figure the height bins reach from the surface to different maximal height. Why no use distinct height bins (0-2, 2-4, 4-6, 6-8)? Also the text following L296 seems to suggest that the 0-6 and 0-8 are still interpreted as 'upper levels'. Does this mean that the labels in Figure 4 are not correct?
Figure 4: There is no detailed interpretation of Figure 4 besides that correlation is bad for t3 and t4. No discussion on different experiments providing better or worse results under different conditions. The same plot for bias would be helpful as well in understanding if certain model configurations work better.
L300f: The sentence seems to be missing a reference to Figure 5.
L302: Where can one see the 'high correlation' mentioned in the text?
L308f: A considerable influence of PBL, LSM, UCM is mentioned, but it is not mentioned which set performs better than others. Some speculation on why certain differences are observed would be helpful as well. Alternatively, this could be done in the discussion, but then this sentence should also be moved there.
L313: 'models overestimate RH' As suggested above, I would move to specific humidity instead of relative humidity. Probably, it would be easier to understand impact of different LSM/UCM runs.
Figure 3: It is very difficult to distinguish the uncertainty bands. Use transparent colors or omit the ribbons and simply plot all ensemble members. It would be good to discuss the temperature profiles themselves or even better potential temperature profiles as these would allow interpretation in terms of atmospheric stability.
Figure 3: Why were the 10:30 and 22:30 profiles chosen and not the times of minimum and maximum temperature (5:30 and 14:30) and, most likely, most mature state of stable and unstable PBL?
Figure 5: Wind direction profiles are not discussed in the text at all.
All figures showing a bias: It would be helpful to have additional vertical (or horizontal) lines indicating 0 bias. Especially Figure 5.
L320, Figure S4: The caption of Fig. S4 seems wrong. There is no bias shown in figure. Why is the observed profile not much smoother if it represents the monthly mean?
L322f: What is this conclusion based on? The spread at 10:30 below 2 km seems as large as the spread at 16:30 below 3 km (about 2-3 m/s). So I would say that the spread in the PBL is similar at both night and day. Relative difference would even be smaller at day, contrary to what is stated in the sentence.
L324: 10 m wind speed was only compared for Cochin, where models seem to overestimate wind speed especially at night. However, the 10 m comparison is usually more difficult and depends strongly on used roughness parameters. The findings from the STR comparison are in line with the radiosonde comparison (Fig. 5c). Hence, I would not emphasize the 10 m comparison here.
L325: "Expt. 7. Captures vertical wind speed profile". However, from Fig. S2 we know that Expt. 7 performed worst in terms of surface variables at both sites. In Fig 5 one cannot see how Expt. 7 compares to the radiosondes at Gadanki, but it does not look as if this model run sticks out. Why does it behave so differently for the surface wind then? Low roughness?
L327f: The sentence repeats largely what was just said (Expt 7 being different from the others). Again it is not clear which of the figures the statement corresponds to (Fig. 6 or S4)? In general, I am also not able to see corresponding results between S4 and Fig. 6. I would have thought that one (S4) shows the absolute monthly averages and the other (Fig. 6) the bias, but if one starts comparing this cannot be (e.g., very small differences in wind speed at 5 km in S4 (both 10:30 and16:30), but in Fig. 6 there is a negative bias throughout the whole column. Does S4 only represent a single day? What would be the reason to show it then?
L332f: Where can I see that Expt.7 and 5 perform better? Not obvious from Fig S4 nor Fig 6.
L338 and elsewhere: Don't refer to the different runs by mentioning a specific scheme, rather refer to the defined experiment number. Confusing.
Figure 6: Reconsider how to display wind direction bias. Values of absolute wind direction bias > 180° don't exist. So displaying a negative bias smaller -180° should become a positive bias < 180°. Best just to display the absolute value …
L345f: PBLH is essentially zero for Expt 1-3 for the whole night. How is this possible? This could indicate a very strong surface temperature inversion in these experiments. However, the comparison with 2 meter temperatures did not reveal large differences between these experiments and others.
L352: Which observation-based method are you referring to? The average from both methods?
L353: With generally larger PBL heights during the day, it is not surprising that the RMSE is also larger during the day. A normalized RMSE would be more interesting also with respect to the uncertainty of CO2 accumulation in the PBL.
L361: Most likely this has to do with different model altitudes. Were temperatures in IMD adiabatically correct to sea level pressure?
L365ff and reference to Fig S3: Figure S3 shows observed monthly mean temperature profiles and simulations from the ensemble, but nothing about ERA-Interim or MERRA-2. There is no such Figure in the supplement. Hence, the following description cannot be reviewed. In the following two Figures are described (labeled as not shown). I think these need to be part of the supplement.
Figure 8a and b: There are two kinds of black dashed lines. But only one explained in the legend. Should there be an additional legend entry for Stull method? Also consider clipping the negative values. They are unphysical.
Figure 8b: Colors of different sets should be repeated in the legend as well. Why the fat black dots, they dominate the figure and hide the important details. A more appropriate transparency color is needed for the ribbons as right now one cannot really see anything.
Table 3: Insufficient caption: Repeat the meaning of t1-t4. First column should just be called Experiment (written out so the meaning is clear). Then numbers alone would be sufficient. Also consider less digits for RMSE and MBE. Difficult to read otherwise.
Figure 10: Please add an explanation what the boxplot represents. From the text one can guess that the box itself is inner-quantile range, but what do lines indicate? Min/max, other percentiles?
L389: Please explain how 'average uncertainty in total CO2' was calculated? Do you calculate the ensemble standard deviation for each hour of the simulation period and then the average? Which part of the model simulation was considered for that? Total CO2, regional CO2 increments (anthropogenic + biospheric)?
L392: Similar question as before, does Figure 10 display ensemble variability and/or temporal variability.
L396 and elsewhere: The abbreviation ffCO2 was not explained before. I suspect that this should indicate fossil fuel CO2. However, part of the anthropogenic emissions (as contained in EDGAR) will be from biofuels and therefore the term should not be ffCO2 but anthropogenic CO2. Unless these parts were separated and wrongly described as 'biospheric' CO2 (see comment above). Which would be very misleading. A better term would then be biofuel CO2 vs fossil fuel CO2. Please clarify.
L399: Remarkable that biospheric (if it really is biospheric) CO2 remains positive throughout the day (i.e., respiration dominating over photosynthetic uptake). Is this due to the chosen month being in the dry season? But even then I would have expected a negative signal during the day. Or was there no separate treatment of respiration and uptake? Or (as above) does biospheric CO2 refer to biofuel CO2, which would explain the positive values.
L439: Fig 2 does not show Expt. 10. Also in Fig S2 there seems to be no serious improvement of Expt. 10 over the other simulations. Temperature bias seems to even increase.
L439: Table 2 does not present results, but lists the experiments.
L447: On the contrary, adding temporal variability in the fluxes will, at certain times, also lead to larger ensemble spread (e.g., when large meet with large meteorological spread).
L461: I don't agree with this interpretation of wind variability being the main or even only driver of CO2 uncertainty. In addition, it is the lower PBLH that leads to increased concentrations at night and the spread in PBLH seems to be much larger as that in wind speed (especially Expt 1-4 vs the other cases).
L483f: However, there was underestimation of wind speed throughout the whole column up to 8 km for which PBL dynamics are not responsible.
L490: What does 'statistically significant' refer to? Which statistical test for what?
Technical comments
L85: Not clear what stratosphere-troposphere stands for at this point. I suppose adding the word 'radar' and changing the abbreviation to the later used 'STR' would solve the issue.
L322: What decreases with altitude? The boundary layer? Please rephrase sentence.
Figure 7: In the caption it should say 'between' rather than 'among'.
Fig S1: Different colors need to be described in legend and/or figure caption.
References
Collaud Coen, M., Praz, C., Haefele, A., Ruffieux, D., Kaufmann, P., and Calpini, B.: Determination and climatology of the planetary boundary layer height above the Swiss plateau by in situ and remote sensing measurements as well as by the COSMO-2 model, Atmos. Chem. Phys., 14, 13205-13221, doi: 10.5194/acp-14-13205-2014, 2014.
Seibert, P., Beyrich, F., Gryning, S. E., Joffre, S., Rasmussen, A., and Tercier, P.: Review and intercomparison of operational methods for the determination of the mixing height, Atmos. Environ., 34, 1001-1027, doi, 2000.
Citation: https://doi.org/10.5194/egusphere-2023-2334-RC1 -
RC2: 'Comment on egusphere-2023-2334', Anonymous Referee #2, 08 Aug 2024
Mathew et al. present a study comparing the meteorological fields simulated by the mesoscale model WRF with measurements from two sites in southern India: an urban site at the coast, and a rural site inland. An ensemble of 10 simulations using different parameterizations, resolutions, driving meteorologies, and surface schemes were compared for one month (May 2017) to surface variables and profile measurements from different sensors. Simulations of tracer transport using the ensemble was included to estimate the impact on tracer variability at these two sites was included, but there were no reference data to evaluate the results. In general, despite a lot of analysis and figures, the take-away message from the paper is unclear. There was no clear winner amongst the ensemble members, and no clear explanation for disagreements between the observations and the simulations. The choice to compare monthly mean diurnal cycles for most of the measurements is mystifying, and it is unclear if the model was subsampled in a way consistent with the measurements before this averaging. Furthermore, there appear to be substantial incongruities with the results as they are presented here, as described in more detail below.
As it is presented now, I cannot recommend this paper for publication. I outline some major and minor concerns related to the validity of the analysis below, and make some recommendations for future work.
This was a difficult paper to review. A lot of work was done, presented, and discussed, but much of the analysis was lacking cohesion. This begins with the title: as it was written, I found it hard to parse. I guess what was meant is something like “Evaluating variability in a meteorological transport model ensemble to account for uncertainties in carbon flux estimation over India”. However, no carbon flux estimation is carried out in the paper. Instead, the variability in a meteorological transport model ensemble is assessed, and there is some analysis of how this variability might affect tracer transport. The title should definitely better represent the content of the paper.
Specifically, I have some major concerns about the choice of metrics that were used for the analysis, focussing on the comparison of monthly mean diurnal cycles. How relevant is this for tracer transport? Surely day-to-day variability in meteorology will be more important. In the most extreme case, it’s possible for a variable to be severely overestimated for half the month and severely underestimated for the other half of the month, but be completely unbiased in the monthly average. Even if it might be convenient to present monthly mean data graphically, it is really critical when this averaging took place: was it before or after the statistical analysis took place? If the averaging took place beforehand, this tells us a lot less about the behaviour we are interested in.
Another major concern is related to the sampling of the data: Many of the measurements that were used for analysis are not available throughout the whole month. As an example, for the STR measurements, the text around L95 states: “The STR measures horizontal and vertical wind components continuously at 9-minute intervals, and data are available for approximately 26 days in May 2017. The data are mostly continuous from the forenoon to evening hours and throughout the day for some days (Samson et al., 2016).”
How were the WRF output files sampled for comparison with the STR data? Were the variables from the same days, times, and heights extracted before the averaging took place? This is how it ought to have been done, so that the modelled and measured variables represent the same places and times, but this is not clear from the description in Section 2.5. It is clear how the model was sampled in space, but not in time. Furthermore, there seem to be significant discrepancies between the results presented in Figure 6 (which seem to be roughly consistent with Figure S4) and those in Figure 7. Consider the wind speed bias at 10:30 LT compared to the STR profile in Figure 6c: above about 6 km, the biases go right off the plot, seemingly more negative than -3 m/s. This seems roughly consistent with the information in Figure S4e, where all ensemble members show such a bias above 6 km. However, when looking at the plots in Figure 7, which shows the bias in wind speed with respect to the STR measurements as a function of time of day for all the ensemble members, none of them show biases below about -1 m/s at 10:30 LT above 6 km. The presented information is simply not consistent, and calls into question much of the analysis.
Beyond this, it seems that uncertainties on the measurements themselves have been largely (if not completely) neglected.
Other major concerns: I’m not convinced that relative humidity (RH) is the appropriate variable to compare, as it is so strongly linked to temperature. Why not specific humidity? This is easily calculated from RH if this is the only thing recorded from the in situ measurements.
The metrics for planetary boundary layer height (PBLH) are also a bit problematic. I was surprised to see the methods that were used to derive PBLH from the radiosonde data, being based only on (potential) temperature, rather than the more widely accepted bulk Richardson method (Vogelezang and Holtslag, 1996, Eq. 2) as further described by Seidel et al. (2010, 2012). It is also critical that the same method is used to diagnose the PBLH from the profiles in the WRF model, as this can vary significantly from the model-derived PBLH that is stored in the output files. Only this way can one be sure of comparing like with like. (For an example of this, see Fig. 2 in Yver et al., 2013, and the surrounding discussions.)
The comparison of the simulated CO2 concentrations raises a lot of questions. Why introduce STILT at this point, rather than simply running the tracers online? This muddies the comparison, as STILT has its own way of dealing with some of the transport issues, and is not a 1:1 reproduction of the transport in WRF. Furthermore, only static EDGAR fluxes are mentioned in the text, suggesting that the biospheric fluxes (which are much larger!) were neglected, but then the resulting concentrations of biospheric CO2 are presented in Figures 10a and 11b. How can that be? (Incidentally, it is surprising that all the biospheric CO2 mixing ratio are positive. Is there really no uptake?) What flux product was used? Were the biospheric fluxes also run without any temporal variability? While this is already questionable for the anthropogenic fluxes, for biospheric fluxes it is simply not acceptable, as the strong diurnal cycle can really dominate the variability. Furthermore, what was used for the background signal of the CO2, in order to simulate the total CO2 mixing ratio in Figure 10c and 11c? This can also be a major source of error, if the goal is to develop a regional inversion system. In fact, Feng et al. (2019) found that prior flux uncertainties and large-scale boundary inflow can dominate uncertainties in flux estimation, as transport errors tend to average out somewhat over time.
Looking at the big picture, the aim of this study remains a bit fuzzy, and it is unclear what real conclusions can be drawn. How this analysis fits in to a larger effort to constrain fluxes is hinted at in the conclusions, but it seems that no recommendations follow from the present study. Indeed, the authors “advocate for a future study involving CO2 observations and modelling (with different meteorology and flux realizations) for assessing the full strength and weakness of the models”. I worry that such a study may end up being a larger and more complex version of the current study, and similarly lead to no clear conclusions. Perhaps it is too difficult to generalize the results based on one month of the year, comparing data from two sites, to say what the appropriate modelling settings for all of India should be?
Some minor (but still substantial) comments:
The description of how well the model agrees with observations is often very vague and contradictory. As an example, between L12 and L14, an R2 value > 0.75 is described as capturing the variations “very well” while also being “reasonably correlated” with the observations. In L252 both an R2 value > 0.95 and > 0.75 were described as agreeing with a “high degree of correlation”. In the description of Figure 9, R2 values less than 0.64 are described as “largely agreeing”, where as in section 3.3 R2 > 0.64 are repeatedly referred to as in “good agreement”. In the description of the comparison with ERA-Interim, the authors describe a “noticeably weak” correlation over southwestern India (R2 ~ -0.01). This is not weak, it is entirely absent! In the Conclusions, an R2>0.5 is described as capturing something “fairly well”, which seems overstated.
While there is no absolute standard about what correlation is “good” or “high”, sometimes looking at differences in statistical metrics can be more informative. This is especially true in a study such as this one, which seemed like it might want to determine the optimal settings for representing meteorological variables (at these two sites). In this case, having a better score in one experiment compared to another can be instructive, but here the simulations are generally lumped together into an average score when they are discussed. What does this tell us?
Of course, the bias is completely neglected in this metric, and a bias is likely more critical for something like PBLH, especially as typically only afternoon measurements of CO2 mixing ratios under well-mixed conditions are used for atmospheric inversion. While RSME and MBE appear from time to time, the statistical analysis is pretty limited.
Why were only two sites used, and for only one month of one year? Was this related to the availability of more meteorological data that could be used from these sites? It seems quite limited. Indeed, when one reads through the questions that are set out to be answered in this study (L73-L77), it seems impossible to characterize the spatial and temporal error in the modelled “diurnal and monthly variability of temperature, relative humidity, PBL, and winds (both near the surface and upper air) across India” based on these data alone.
I’m confused as to why ERA-Interim would be used for comparison with the model output. ERA5 was used to drive the model, and ERA-Interim is a very similar reanalysis product with poorer spatial and temporal resolution that has since been superseded by the next generation of ECMWF reanalyses, ERA5. Why compare to this? It does not make any sense to me. Comparing to ERA5 could make sense, but I’m not sure it would bring substantial insight. In L198-199 the authors write that this comparison “allows us to examine how the physics schemes of the regional model modify the initial data with time and what differences arise with the change in version”. Comparing it to just ERA5 could tell you something about how the physics schemes of the regional model are different, but are you trying to indirectly compare the “versions” of ECMWF reanalysis products in this way? This seems ill-advised.
L255: Here the authors describe a “time lead” (I guess a temporal offset) of about 2 hours in surface temperature variability. Why might this be?
L282: The authors state that the very large, systematic differences in diurnal pattern of the wind speeds averaged over a whole month show that the model has “difficulties in capturing random fluctuations in the wind direction”. This does not seem very random! I think it would be advisable to try to understand why this is so consistently wrong. What do the ERA5 winds show? If these are consistently wrong, one can hardly expect WRF to do much better. Do they agree on some days, but not on others? A persistent >100 degree difference in afternoon wind directions across the ensemble is a big concern. Are there other meteorological station data nearby that could be used for comparison?
I found the separation of the diurnal cycle into “representative periods” t1, t2, t3, and t4 to be generally confusing. Are these stability regimes so consistent from day to day? Figure 4 was also hard for me to parse. What is the take-away message from this figure?
L337-338: The authors write in reference to Figure 7 that the wind speed near the ground shows a large negative bias except for the CLM4 scheme (which I think is Expt. 7), which simulates near-zero biases, but when I look at the figure, it seems the Expt. 6 has lower biases near the surface, at least in the afternoon. Overall Expt. 10 and 11 (with different driving meteorology) seem to have the smallest bias compared to the STR in Figure 7, but this is not so in Figure 6, or in the comparison to radiosonde data in Figure 5. How can this be so inconsistent?
L344: Please refer to it as an offset or similar, “lead time” means something else in meteorology!
Table 3: When these statistics of monthly diurnal cycles are further subdivided into four parts of the day, is this the correlation coefficient of only six points? When the conditions are rather stable (i.e. t3, and most of t4) and the variability quite low, is it not expected that the R2 values should be really small? I am not sure what we really learn from this. Which observational data were used for this analysis? (In general, I found it difficult to keep track of which observations were used where.)
Figure 8: Is the dotted black line from the MWR? I guess it’s the other method to derive the PBLH from the MWR? Also in panel a? This is unclear, and should be in the legend. The large black dots in panel b make it unnecessarily hard to read, please remove them.
Figure 9: Confusing colour bar! White should correspond to zero, even if it's heavily skewed towards negative biases in the far north. (Incidentally, is this the only comparison which is actually showing correlation over the course of the month, rather than diurnal patterns averaged over the whole month?
In general, I found that the comparison with the gridded and reanalysis products brought nothing to the paper as it was presented here. I was confused by the comparison of the mean spatial patterns for daytime and nighttime monthly means: As this is a spatial correlation, it would be useful to know exactly which areas were compared when referring to some sub-regions (like southwestern India) which resulted in an R2 value of ~-0.01. Was there just not a lot of spatial variability over this subregion? The information is not provided. The bias between the MERRA-2 PBLH was apparently mostly ~-500 m, which I think is larger than the MBE found between WRF and the observations (in Table 3). Does this mean that the errors in MERRA-2 were larger, or is this very site-specific? There is no way to know. Why daytime mean and nighttime mean temperatures were used to assess agreement is not clear. Is this an established metric?
Figure 10 and 11: The panels should be presented in the same order (i.e. biogenic, fossil, total corresponding to the same panels. But more importantly: where did this biospheric CO2 mixing ratio come from?
L391-392: Is the larger variability at Cochin necessarily due to the influence of the coastal boundary layer? Isn’t it possible, that the more spatially heterogenous anthropogenic fluxes near this station play a larger role? Furthermore, if the diurnal cycle in the fluxes (both anthropogenic and biogenic) is being neglected, this variability is systematically underestimated in any case.
L394-396: There is some difference, but is this really significant? And how was this calculated? This is not just the spread within the set, as then there would only be 2-4 values per set rather than a full distribution. Might the spread in set 1 be larger simply because it contains four experiments rather than two (as for set 3-5)? Or is this mashing together the standard deviation over time (hourly?) with that over the ensemble members? In any case, this should be clarified.
L403-404: Same problem as the previous comment: Do you mean the variability within the ensemble, or the variability over time? The variability within the ensemble for PBLH seems highest during the day/afternoon at Gadanki, whereas the total CO2 variability is higher at night. At Cochin the ensemble spread for both is higher at night, but without taking the diurnal cycle of emissions into account, I am not sure this is a robust result. I don't see the clear correlation with windspeed either...
And do we really expect the distribution of total CO2 variability to be well correlated with air temperature? I wonder if it would be more correct to say that both CO2 variability (as simulated here) and temperature are temporally correlated with PBLH? But this does not say anything about the variability within the ensemble, which I had thought was the purpose of the analysis.
L487-488: A general comment: it is well established that urban meteorology is challenging to represent, especially with rather coarse simulations (10 km is definitely too coarse, and even 3 km is not really sufficient). In fact, when looking at the zoom over Cochin in Figure 1, I guess that this whole subplot would only be about 24 10-km pixels. Would we even expect an urban canopy model to be effective at such coarse resolution (and for a fairly small city)?
Because of these challenges, it may be best not to plan new measurement sites within cities. Instead, it may be better to measure in a peri-urban setting, where the plume downwind of the city can be measured depending on the wind direction, but which is much easier to represent and interpret in a model.
L490-491: I’m confused about where the factor of 3 to 5 is coming here. Does this follow from the previous sentence? If so, how? If not, where do these numbers come from? higher? Perhaps it is worth mentioning that this is why traditionally only afternoon measurements (under well-mixed conditions) are used in inversions. Also, the word “larger” or “higher” needs to be inserted after “3 to 5”.
L497-498: This larger study for which you are advocating seems challenging, and would need a firm basis in observational data for interpretation. In situ measurement sites in India are often a limitation here, and it would certainly need the analysis of meteorological data from more than two sites and a single month to draw any conclusions.
L500: In reference to the initial question stated as an objective of this study: have you characterized the error structure? If so, how?
Typographical/minor comments:
L3: It would be good to introduce the concept of inverse modelling before jumping in with a description of atmospheric tracer transport in a “multi-data” modelling system. (I do not know what “multi-data” means here.)
L8 (and elsewhere): it needs to be clear that there is only one urban and one rural station considered in this study, the text often does not make this clear.
L16: the -> an
L21-22: either “an unprecedented amount of greenhouse gases”, or “emitting greenhouse gases at an unprecedented rate”. Don't mix these.
L23: no N2O?
L29: is it a prediction or a simulation? Most of what we are doing is simulation.
L30: on -> for
L31: insert “the distribution of” after “modulates”
L38: remove “the” before “NWP”
L39: The confidence level, or the performance? And this reference is to the ERA5 reanalysis, which only indirectly affects the transport model WRF through the driving meteorology…
L43: insert “model” after (WRF).
L47: broad -> broader
L61: perhaps mention that FDDA is often referred to as “nudging”.
L62: it doesn’t “approximate” the model values toward the observations, but rather “nudges” them. And in which “equations”? Perhaps remove, or specify “at each timestep” or something similar?
L69: Rather than being “unique”, perhaps they are “distinct” from one another?
L73: “model in capturing the” -> “model-simulated”
L85: I guess Stratosphere-Troposphere (ST) should be Stratosphere-Troposphere Radar (STR)? The full word should also be in the section 2.2.1 title I think.
Table 1: What about February? A minor point, but I suppose that having the horizontal wind components means that, by definition, you also have the wind speed and wind direction, right?
L105: Does this profile start at the surface? At approximately what altitude does the balloon burst?
L118: Does the data logger have its own data logger? This is confusing. Also, the sentence starting in L120 seems to completely repeat the information that came before. The same thing for the sentence starting in L125. Please rewrite this paragraph.
L142: I guess the static land-use data have a spatial resolution of 30 arcseconds, not a temporal resolution of 30 s?
L143-144: second- to sixth-order schemes
L159: “the” before Yonsei and Mellor
L160: “the before Asymmetrical
L163: insert “is used” before “with the MYJ PBL scheme”
L166: “The” before “LSM”
Figure 1: The legends are impossible to read on the small maps, and the layout is strange. Please show the full d01 domain, also to demonstrate the appropriate placement of d02 in from the edge of the parent domain. Insert “the” before “ESRI” in the caption.
L179: respectively, except that they are coupled with an urban canopy model
L193-194: The sentence starting with “Daily” seems to add no new information.
L211: virtual temperature -> virtual potential temperature
L272, L276: These should all be inter-model, I guess.
L353: remove “that”
L360: Should this be R2>0.64 or R2<0.64? If it’s the latter, I’m not sure you can say that these “largely agree”.
L414-415: Remove the beginning of the sentence (before the comma), the relation to LSMs is in there twice.
L429: remove “significantly” and insert “more” after (0 to 2 km)
L432-433: What does “comparatively considerable” mean? Compared to what?
L435-436: This sentence (starting with “The”) is unclear, rewrite.
References
Feng, S., Lauvaux, T., Keller, K., Davis, K. J., Rayner, P., Oda, T., & Gurney, K. R. (2019). A road map for improving the treatment of uncertainties in high-resolution regional carbon flux inverse estimates. Geophysical Research Letters, 46, 13461–13469. https://doi.org/10.1029/2019GL082987
Seidel, D. J., Ao, C. O., and Li, K.: Estimating climatological planetary boundary layer heights from radiosonde observations: Comparison of methods and uncertainty analysis, Journal of Geophysical Research, 115, D16 113, https://doi.org/10.1029/2009JD013680, 2010.
Seidel, D. J., Zhang, Y., Beljaars, A., Golaz, J.-C., Jacobson, A. R., and Medeiros, B.: Climatology of the planetary boundary layer over the continental United States and Europe: BOUNDARY LAYER CLIMATOLOGY: U.S. AND EUROPE, Journal of Geophysical Research: 590 Atmospheres, 117, https://doi.org/10.1029/2012JD018143, 2012.
Vogelezang, D. H. P. and Holtslag, A. A. M.: Evaluation and model impacts of alternative boundary-layer height formulations, Boundary Layer Meteorology, 81, 245–269, https://doi.org/10.1007/BF02430331, 1996.
Yver, C. E., Graven, H. D., Lucas, D. D., Cameron-Smith, P. J., Keeling, R. F., and Weiss, R. F.: Evaluating transport in the WRF model along the California coast, Atmos. Chem. Phys., 13, 1837–1852, https://doi.org/10.5194
Citation: https://doi.org/10.5194/egusphere-2023-2334-RC2 -
EC1: 'Comment on egusphere-2023-2334', Patrick Jöckel, 19 Aug 2024
Since the manuscript has been rated "fair" and "poor" in 6 of 8 categories, and one referee recommended to reject it, I need to reject the manuscript from further revision, since it does not fit the journals quality requirements. I do not think that the major concerns raised by both referees can be addressed appropriately with only a simple revision of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2023-2334-EC1
Status: closed (peer review stopped)
-
RC1: 'Comment on egusphere-2023-2334', Anonymous Referee #1, 14 Jul 2024
The manuscript by Thara Anna Mathew and coworkers presents an analysis of an ensemble of regional atmospheric model simulations against meteorological observations and an analysis of ensemble spread of simulated CO2 concentrations. The meteorological simulations consist of a 'physics-perturbed' ensemble using, among others, various boundary layer and land surface options within the WRF model. The CO2 ensemble is then built using the Lagrangian dispersion model STILT, driven by the different meteorological ensemble members. The paper presents a tedious, not always convincing comparison of WRF simulations with observed meteorology. Unfortunately, this is mostly done for only two sites (including profile data though) and a single month, which seems too isolated to be representative for all of India and the whole year. For CO2 concentrations, only model simulations at the same two locations are discussed, no observations. A comparison with obersvational data would have strengthened the manuscript. The presentation of objectives, methods and results is not always clear and requires improvements before publication.
Major comments
1) Aim of the study
From the title of the manuscript 'flux estimation' one may think that the prime aim of the paper is the quantification of the model uncertainty as this is an important input in inverse modelling. However, the main focus on the paper seems to be on identifying which WRF configuration produces the most reliable meteorological fields. Although these two questions are related, I would like to caution that a badly chosen ensemble (i.e., including configuration that cause obvious biases) is not a good predictor of model uncertainty for the 'best possible' model configuration, which should be used for inverse modelling. The authors should clarify what is the main purpose of their study and align the title of their manuscript accordingly.
Consequently, the whole discussion of the findings remains rather vague and general. What are the real consequences of the study? A best choice model configuration? Or at least a more narrow selection of possible configurations. Do the findings go beyond the expectation of what was already known about different configurations? Would the presented ensemble be a good estimator of model-data-mismatch uncertainty for an inverse estimate?
L73ff: The introduction states 4 questions that the manuscript tries to answer. Although, these questions are generally addressed by the study setup and there is a lot of elaboration on details, I find that there is no concrete answers to these questions given. Yes, the model results are sensitive to different parameterisations, which should not surprise, and yes, the land surface scheme plays a major role for variables in the PBL, again no surprise, but what is the conclusion, which scheme or schemes if there is not a single scheme that fits all situations, should be used.
2) Comparison of PBL heights
Two methods are mentioned that are used for calculating PBL heights from observations. Both seem to be rather simple methods solely based on temperature profiles. There are other methods that incorporate additional parameters (e.g., Richardson Bulk methods) and may be more reliable (e.g., Seibert et al., 2020; Collaud Coen et al., 2016). The extremely large differences in PBLH as estimated from the two different methods (Fig. 8) seems completely unreasonable since, according to the text, the only difference would be the use of virtual potential temperature vs. potential temperature. Furthermore, there is no information on how WRF diagnoses boundary layer heights. Is a comparable method used for the model data or rather something more sophisticated? I suggest to revisit this analysis of PBL heights and add missing information. PBL height, next to wind speed, is usually the main driver of uncertainty of PBL concentration simulations and as such there needs to be more confidence in this comparison.
3) CO2 simulations.
The description and motivation of CO2 simulations is not sufficient. For one, why is another model used for this part instead of doing the tracer transport in WRF itself. I suspect this is because of the potential use of STILT for inverse modelling. However, this is never explained in any detail and STILT comes along a little bit as a surprise. Second, I have some general doubts about how much of the ensemble spread in WRF is then actually reflected in STILT simulations. As STILT is driven offline and has its own parameterization of turbulent transport some of the WRF ensemble differences may be ignored. I would be good to understand better which variables actually drive STILT and how much these vary in the WRF ensemble. Third, it seems like temporally constant fluxes were used. This is an oversimplification for CO2 and may well result in completely different ensemble variability compared to a tracer with variable sources/sinks, especially during the course of the day. For example, anthropogenic CO2 fluxes can be expected to be lower than average during the night, but peak in the morning hours, just when PBL heights are changing. Fourth, there is no description of biospheric fluxes in the methods section (2.6). Nevertheless, biospheric CO2 contributions are presented later on. What biospheric fluxes were used? Or are these no biospheric concentrations at all but simple CO2 form biogenic sources (combustion of bio-fuels)? In the latter case this would be an oversimplification especially for the rural site where we can expect the biosphere to be an important influence for atmospheric CO2. Finally, it remains unclear if the simulations present surface CO2 simulations. Since it is recommended to place atmospheric CO2 observations, used for flux inversions, at tall tower (> 100 m above ground), it would be better to discuss the simulated variability at similar heights above ground (or even at several typical heights).
4) Comparison metrics
The model simulations of relative humidity are compared to observations. However, relative humidity is strongly driven by temperature, which makes relative humidity a bad metric to quickly establish moisture biases in the model. I would suggest to replace relative with specific humidity in all these comparisons, also because the focus is mostly on dry atmospheres and not saturated conditions.
6) Presentation of results
The presentation of the different comparisons to observations is quite confusing and not prepared very carefully. Often statements in the text cannot be seen in any of the figures or it remains unclear where one should look. Some of this confusion is added because different kinds of plots are presented for different comparison data sets. I would suggest to align the figures in the main manuscript to be similar for all comparisons. Show either bias or only absolute values, and put the corresponding additional plot into the supplement. I also don't see the benefit of presenting different ensemble members as sub-sets with mean and standard deviation. Plots like S2, where simply all members are shown with a color code that also reflects the sub-sets, seem to be more intuitive and allow identification of individual runs that vary from the others. The section on meteorology comparison is also very long and should be written up much more concisely. For example Figure 7 largely repeats what is shown in 6 already. One of them could be omitted.
7) Representativeness of study
There is no discussion why the two given sites and the month of May were chosen for the analysis. Besides the fact that one site is urban and one is rural, they hardly seem to cover the kind of meteorological variability that can be expected for the whole of India. Please motivate the choice of the study sites. Also given the overarching question of using observations for flux estimation, I would question why an urban site was selected at all, given the coarse resolution of the employed transport model. At the given resolution or even when going down to 3 km as done in one run, one cannot expect the model to represent urban CO2 observations reasonably well. The authors should clarify what kind of flux inversion system they have in mind, national or city scale. For the latter, their model tool would not be adequate.
Specific comments
L96: STR observation availability: I am not sure I understand the availability correctly. Do you mean that observations are mostly available during daylight hours and for some days also for the night? Can you briefly say why that is the case and how this affects the following analysis?
L98: Could you provide an estimate of the STR accuracy and precision if relevant for the model comparison?
Table 2: The information in the table could be arranged in a more concise way. In the middle column it is difficult to establish, which part really changed for the different sets of model runs. I would suggest to omit all elements of the model setup that are the same for all runs (e.g., Microphysics, radiation, etc.). These can be described in the text. Then add separate columns for elements that change from run to run (e.g. PBL scheme, land surface model, resolution etc).
Table 2, Expt 10: According to the table it looks like the higher resolution nest was also run with a deep convection parameterization. Shouldn't this type of convection be mostly resolved at 3 km resolution and, hence, the 3 km run not employ an additional deep convection parameterization?
L105: Please clarify if the balloon burst happens above the height range that you later compare with the model.
Figure 1: Please improve. No vertical displacement needed for the sub-panels. Avoid white space? Subpanel labels (a,b,c). Labels and legends for the detailed maps are way too small.
L195ff: In section 2.2.1 it is explained that an ERA5 ensemble is used for initializing the WRF simulations. Here, ERA-Interim data is suggested as a comparison dataset to see how the 'regional model modifies the initial data'. Why not use ERA5 here as well? You need to explain why two different sets were preferred.
L251-252: I would like a clarification about what these squared correlation coefficients were calculated for. Is this the correlation for the whole time series of hourly data for the whole month? Or is this the correlation of the monthly mean diurnal cycle. If it is the latter, as I suspect, then I don't think this is a good metric for model evaluation as I would expect any reasonable model to get the monthly mean diurnal cycle right. More interesting would be to see if that is also the case for a longer time period that would also include day-to-day differences.
L257/258: This daytime bias does not exist after 13:30, as the observations reach peak temperatures later in the day. Please correct text.
L260: 'slightly closer to the observations'. What is this conclusion based on? The smaller RSME? RMSE is similar during night and day. But absolute bias is much larger during the day, at least according to the numbers given in this sentence. If the numbers are correct, they contradict what was said about cold and warm biases in the sentence before. Please check again.
L268: The results from Expt 7 in terms of moisture look really strange. Is there something going completely wrong? Too moist, wind speeds too large.
L275: It should also be mentioned that the model is especially bad at night-time and morning hours, whereas in the afternoon the overestimation is much less obvious and Set 3 actually gets very close to the obs.
Figure 2: What drives the large observed standard deviation at Gadanki at 9:30 and 12:30 (panel c)? These go along with low average values as well. Were the observational data quality controlled and outliers (flagged data) removed before calculation of diurnal averages? Similar, but less obvious, for RH at 7:30 (panel d).
Figure 2: Is this the standard deviation of the simulations calculated just from the different runs in each set, applied to the mean (hourly) quantity or is it calculated over all data at this hour and from all experiments? I am not convinced that the latter is a good indicator of ensemble spread nor am I convinced that the first would be robust given the small number of members per set. Maybe showing individual members like in the supplement plots would be sufficient, these could still be grouped by color.
L279: How were wind speed/direction averaged to monthly values? Vector average? Same treatment for observations and model? The simulated change of wind direction may indicate that the model picks up some thermally-driven flow pattern (valley winds or sea breeze) that does not seem to exist in reality. Please try to comment in manuscript. In Expt 7 general wind speeds seem too large compared to observations and no such thermally-driven cell seems to develop.
L282: I cannot follow this discussion from the provided figures. Is this a reference to (random) variability at individual hours (not shown) or still referring to Fig 2?
L290: Consider starting new paragraph before "Figure 4".
L291ff (and elsewhere): Somehow the defined periods (t1 to t4) don't align well with general PBL development phases. I would have used something more in line with the surface heat flux as driver in mind. For example a morning transition phase from sunrise to sun maximum, followed by an unstable phase until sunset, transition phase until midnight, stable phase until sunrise (compare classical PBL development by Stull). Consider revising or at least give more reasoning on why periods of day were selected as such.
Figure 4: According to the figure the height bins reach from the surface to different maximal height. Why no use distinct height bins (0-2, 2-4, 4-6, 6-8)? Also the text following L296 seems to suggest that the 0-6 and 0-8 are still interpreted as 'upper levels'. Does this mean that the labels in Figure 4 are not correct?
Figure 4: There is no detailed interpretation of Figure 4 besides that correlation is bad for t3 and t4. No discussion on different experiments providing better or worse results under different conditions. The same plot for bias would be helpful as well in understanding if certain model configurations work better.
L300f: The sentence seems to be missing a reference to Figure 5.
L302: Where can one see the 'high correlation' mentioned in the text?
L308f: A considerable influence of PBL, LSM, UCM is mentioned, but it is not mentioned which set performs better than others. Some speculation on why certain differences are observed would be helpful as well. Alternatively, this could be done in the discussion, but then this sentence should also be moved there.
L313: 'models overestimate RH' As suggested above, I would move to specific humidity instead of relative humidity. Probably, it would be easier to understand impact of different LSM/UCM runs.
Figure 3: It is very difficult to distinguish the uncertainty bands. Use transparent colors or omit the ribbons and simply plot all ensemble members. It would be good to discuss the temperature profiles themselves or even better potential temperature profiles as these would allow interpretation in terms of atmospheric stability.
Figure 3: Why were the 10:30 and 22:30 profiles chosen and not the times of minimum and maximum temperature (5:30 and 14:30) and, most likely, most mature state of stable and unstable PBL?
Figure 5: Wind direction profiles are not discussed in the text at all.
All figures showing a bias: It would be helpful to have additional vertical (or horizontal) lines indicating 0 bias. Especially Figure 5.
L320, Figure S4: The caption of Fig. S4 seems wrong. There is no bias shown in figure. Why is the observed profile not much smoother if it represents the monthly mean?
L322f: What is this conclusion based on? The spread at 10:30 below 2 km seems as large as the spread at 16:30 below 3 km (about 2-3 m/s). So I would say that the spread in the PBL is similar at both night and day. Relative difference would even be smaller at day, contrary to what is stated in the sentence.
L324: 10 m wind speed was only compared for Cochin, where models seem to overestimate wind speed especially at night. However, the 10 m comparison is usually more difficult and depends strongly on used roughness parameters. The findings from the STR comparison are in line with the radiosonde comparison (Fig. 5c). Hence, I would not emphasize the 10 m comparison here.
L325: "Expt. 7. Captures vertical wind speed profile". However, from Fig. S2 we know that Expt. 7 performed worst in terms of surface variables at both sites. In Fig 5 one cannot see how Expt. 7 compares to the radiosondes at Gadanki, but it does not look as if this model run sticks out. Why does it behave so differently for the surface wind then? Low roughness?
L327f: The sentence repeats largely what was just said (Expt 7 being different from the others). Again it is not clear which of the figures the statement corresponds to (Fig. 6 or S4)? In general, I am also not able to see corresponding results between S4 and Fig. 6. I would have thought that one (S4) shows the absolute monthly averages and the other (Fig. 6) the bias, but if one starts comparing this cannot be (e.g., very small differences in wind speed at 5 km in S4 (both 10:30 and16:30), but in Fig. 6 there is a negative bias throughout the whole column. Does S4 only represent a single day? What would be the reason to show it then?
L332f: Where can I see that Expt.7 and 5 perform better? Not obvious from Fig S4 nor Fig 6.
L338 and elsewhere: Don't refer to the different runs by mentioning a specific scheme, rather refer to the defined experiment number. Confusing.
Figure 6: Reconsider how to display wind direction bias. Values of absolute wind direction bias > 180° don't exist. So displaying a negative bias smaller -180° should become a positive bias < 180°. Best just to display the absolute value …
L345f: PBLH is essentially zero for Expt 1-3 for the whole night. How is this possible? This could indicate a very strong surface temperature inversion in these experiments. However, the comparison with 2 meter temperatures did not reveal large differences between these experiments and others.
L352: Which observation-based method are you referring to? The average from both methods?
L353: With generally larger PBL heights during the day, it is not surprising that the RMSE is also larger during the day. A normalized RMSE would be more interesting also with respect to the uncertainty of CO2 accumulation in the PBL.
L361: Most likely this has to do with different model altitudes. Were temperatures in IMD adiabatically correct to sea level pressure?
L365ff and reference to Fig S3: Figure S3 shows observed monthly mean temperature profiles and simulations from the ensemble, but nothing about ERA-Interim or MERRA-2. There is no such Figure in the supplement. Hence, the following description cannot be reviewed. In the following two Figures are described (labeled as not shown). I think these need to be part of the supplement.
Figure 8a and b: There are two kinds of black dashed lines. But only one explained in the legend. Should there be an additional legend entry for Stull method? Also consider clipping the negative values. They are unphysical.
Figure 8b: Colors of different sets should be repeated in the legend as well. Why the fat black dots, they dominate the figure and hide the important details. A more appropriate transparency color is needed for the ribbons as right now one cannot really see anything.
Table 3: Insufficient caption: Repeat the meaning of t1-t4. First column should just be called Experiment (written out so the meaning is clear). Then numbers alone would be sufficient. Also consider less digits for RMSE and MBE. Difficult to read otherwise.
Figure 10: Please add an explanation what the boxplot represents. From the text one can guess that the box itself is inner-quantile range, but what do lines indicate? Min/max, other percentiles?
L389: Please explain how 'average uncertainty in total CO2' was calculated? Do you calculate the ensemble standard deviation for each hour of the simulation period and then the average? Which part of the model simulation was considered for that? Total CO2, regional CO2 increments (anthropogenic + biospheric)?
L392: Similar question as before, does Figure 10 display ensemble variability and/or temporal variability.
L396 and elsewhere: The abbreviation ffCO2 was not explained before. I suspect that this should indicate fossil fuel CO2. However, part of the anthropogenic emissions (as contained in EDGAR) will be from biofuels and therefore the term should not be ffCO2 but anthropogenic CO2. Unless these parts were separated and wrongly described as 'biospheric' CO2 (see comment above). Which would be very misleading. A better term would then be biofuel CO2 vs fossil fuel CO2. Please clarify.
L399: Remarkable that biospheric (if it really is biospheric) CO2 remains positive throughout the day (i.e., respiration dominating over photosynthetic uptake). Is this due to the chosen month being in the dry season? But even then I would have expected a negative signal during the day. Or was there no separate treatment of respiration and uptake? Or (as above) does biospheric CO2 refer to biofuel CO2, which would explain the positive values.
L439: Fig 2 does not show Expt. 10. Also in Fig S2 there seems to be no serious improvement of Expt. 10 over the other simulations. Temperature bias seems to even increase.
L439: Table 2 does not present results, but lists the experiments.
L447: On the contrary, adding temporal variability in the fluxes will, at certain times, also lead to larger ensemble spread (e.g., when large meet with large meteorological spread).
L461: I don't agree with this interpretation of wind variability being the main or even only driver of CO2 uncertainty. In addition, it is the lower PBLH that leads to increased concentrations at night and the spread in PBLH seems to be much larger as that in wind speed (especially Expt 1-4 vs the other cases).
L483f: However, there was underestimation of wind speed throughout the whole column up to 8 km for which PBL dynamics are not responsible.
L490: What does 'statistically significant' refer to? Which statistical test for what?
Technical comments
L85: Not clear what stratosphere-troposphere stands for at this point. I suppose adding the word 'radar' and changing the abbreviation to the later used 'STR' would solve the issue.
L322: What decreases with altitude? The boundary layer? Please rephrase sentence.
Figure 7: In the caption it should say 'between' rather than 'among'.
Fig S1: Different colors need to be described in legend and/or figure caption.
References
Collaud Coen, M., Praz, C., Haefele, A., Ruffieux, D., Kaufmann, P., and Calpini, B.: Determination and climatology of the planetary boundary layer height above the Swiss plateau by in situ and remote sensing measurements as well as by the COSMO-2 model, Atmos. Chem. Phys., 14, 13205-13221, doi: 10.5194/acp-14-13205-2014, 2014.
Seibert, P., Beyrich, F., Gryning, S. E., Joffre, S., Rasmussen, A., and Tercier, P.: Review and intercomparison of operational methods for the determination of the mixing height, Atmos. Environ., 34, 1001-1027, doi, 2000.
Citation: https://doi.org/10.5194/egusphere-2023-2334-RC1 -
RC2: 'Comment on egusphere-2023-2334', Anonymous Referee #2, 08 Aug 2024
Mathew et al. present a study comparing the meteorological fields simulated by the mesoscale model WRF with measurements from two sites in southern India: an urban site at the coast, and a rural site inland. An ensemble of 10 simulations using different parameterizations, resolutions, driving meteorologies, and surface schemes were compared for one month (May 2017) to surface variables and profile measurements from different sensors. Simulations of tracer transport using the ensemble was included to estimate the impact on tracer variability at these two sites was included, but there were no reference data to evaluate the results. In general, despite a lot of analysis and figures, the take-away message from the paper is unclear. There was no clear winner amongst the ensemble members, and no clear explanation for disagreements between the observations and the simulations. The choice to compare monthly mean diurnal cycles for most of the measurements is mystifying, and it is unclear if the model was subsampled in a way consistent with the measurements before this averaging. Furthermore, there appear to be substantial incongruities with the results as they are presented here, as described in more detail below.
As it is presented now, I cannot recommend this paper for publication. I outline some major and minor concerns related to the validity of the analysis below, and make some recommendations for future work.
This was a difficult paper to review. A lot of work was done, presented, and discussed, but much of the analysis was lacking cohesion. This begins with the title: as it was written, I found it hard to parse. I guess what was meant is something like “Evaluating variability in a meteorological transport model ensemble to account for uncertainties in carbon flux estimation over India”. However, no carbon flux estimation is carried out in the paper. Instead, the variability in a meteorological transport model ensemble is assessed, and there is some analysis of how this variability might affect tracer transport. The title should definitely better represent the content of the paper.
Specifically, I have some major concerns about the choice of metrics that were used for the analysis, focussing on the comparison of monthly mean diurnal cycles. How relevant is this for tracer transport? Surely day-to-day variability in meteorology will be more important. In the most extreme case, it’s possible for a variable to be severely overestimated for half the month and severely underestimated for the other half of the month, but be completely unbiased in the monthly average. Even if it might be convenient to present monthly mean data graphically, it is really critical when this averaging took place: was it before or after the statistical analysis took place? If the averaging took place beforehand, this tells us a lot less about the behaviour we are interested in.
Another major concern is related to the sampling of the data: Many of the measurements that were used for analysis are not available throughout the whole month. As an example, for the STR measurements, the text around L95 states: “The STR measures horizontal and vertical wind components continuously at 9-minute intervals, and data are available for approximately 26 days in May 2017. The data are mostly continuous from the forenoon to evening hours and throughout the day for some days (Samson et al., 2016).”
How were the WRF output files sampled for comparison with the STR data? Were the variables from the same days, times, and heights extracted before the averaging took place? This is how it ought to have been done, so that the modelled and measured variables represent the same places and times, but this is not clear from the description in Section 2.5. It is clear how the model was sampled in space, but not in time. Furthermore, there seem to be significant discrepancies between the results presented in Figure 6 (which seem to be roughly consistent with Figure S4) and those in Figure 7. Consider the wind speed bias at 10:30 LT compared to the STR profile in Figure 6c: above about 6 km, the biases go right off the plot, seemingly more negative than -3 m/s. This seems roughly consistent with the information in Figure S4e, where all ensemble members show such a bias above 6 km. However, when looking at the plots in Figure 7, which shows the bias in wind speed with respect to the STR measurements as a function of time of day for all the ensemble members, none of them show biases below about -1 m/s at 10:30 LT above 6 km. The presented information is simply not consistent, and calls into question much of the analysis.
Beyond this, it seems that uncertainties on the measurements themselves have been largely (if not completely) neglected.
Other major concerns: I’m not convinced that relative humidity (RH) is the appropriate variable to compare, as it is so strongly linked to temperature. Why not specific humidity? This is easily calculated from RH if this is the only thing recorded from the in situ measurements.
The metrics for planetary boundary layer height (PBLH) are also a bit problematic. I was surprised to see the methods that were used to derive PBLH from the radiosonde data, being based only on (potential) temperature, rather than the more widely accepted bulk Richardson method (Vogelezang and Holtslag, 1996, Eq. 2) as further described by Seidel et al. (2010, 2012). It is also critical that the same method is used to diagnose the PBLH from the profiles in the WRF model, as this can vary significantly from the model-derived PBLH that is stored in the output files. Only this way can one be sure of comparing like with like. (For an example of this, see Fig. 2 in Yver et al., 2013, and the surrounding discussions.)
The comparison of the simulated CO2 concentrations raises a lot of questions. Why introduce STILT at this point, rather than simply running the tracers online? This muddies the comparison, as STILT has its own way of dealing with some of the transport issues, and is not a 1:1 reproduction of the transport in WRF. Furthermore, only static EDGAR fluxes are mentioned in the text, suggesting that the biospheric fluxes (which are much larger!) were neglected, but then the resulting concentrations of biospheric CO2 are presented in Figures 10a and 11b. How can that be? (Incidentally, it is surprising that all the biospheric CO2 mixing ratio are positive. Is there really no uptake?) What flux product was used? Were the biospheric fluxes also run without any temporal variability? While this is already questionable for the anthropogenic fluxes, for biospheric fluxes it is simply not acceptable, as the strong diurnal cycle can really dominate the variability. Furthermore, what was used for the background signal of the CO2, in order to simulate the total CO2 mixing ratio in Figure 10c and 11c? This can also be a major source of error, if the goal is to develop a regional inversion system. In fact, Feng et al. (2019) found that prior flux uncertainties and large-scale boundary inflow can dominate uncertainties in flux estimation, as transport errors tend to average out somewhat over time.
Looking at the big picture, the aim of this study remains a bit fuzzy, and it is unclear what real conclusions can be drawn. How this analysis fits in to a larger effort to constrain fluxes is hinted at in the conclusions, but it seems that no recommendations follow from the present study. Indeed, the authors “advocate for a future study involving CO2 observations and modelling (with different meteorology and flux realizations) for assessing the full strength and weakness of the models”. I worry that such a study may end up being a larger and more complex version of the current study, and similarly lead to no clear conclusions. Perhaps it is too difficult to generalize the results based on one month of the year, comparing data from two sites, to say what the appropriate modelling settings for all of India should be?
Some minor (but still substantial) comments:
The description of how well the model agrees with observations is often very vague and contradictory. As an example, between L12 and L14, an R2 value > 0.75 is described as capturing the variations “very well” while also being “reasonably correlated” with the observations. In L252 both an R2 value > 0.95 and > 0.75 were described as agreeing with a “high degree of correlation”. In the description of Figure 9, R2 values less than 0.64 are described as “largely agreeing”, where as in section 3.3 R2 > 0.64 are repeatedly referred to as in “good agreement”. In the description of the comparison with ERA-Interim, the authors describe a “noticeably weak” correlation over southwestern India (R2 ~ -0.01). This is not weak, it is entirely absent! In the Conclusions, an R2>0.5 is described as capturing something “fairly well”, which seems overstated.
While there is no absolute standard about what correlation is “good” or “high”, sometimes looking at differences in statistical metrics can be more informative. This is especially true in a study such as this one, which seemed like it might want to determine the optimal settings for representing meteorological variables (at these two sites). In this case, having a better score in one experiment compared to another can be instructive, but here the simulations are generally lumped together into an average score when they are discussed. What does this tell us?
Of course, the bias is completely neglected in this metric, and a bias is likely more critical for something like PBLH, especially as typically only afternoon measurements of CO2 mixing ratios under well-mixed conditions are used for atmospheric inversion. While RSME and MBE appear from time to time, the statistical analysis is pretty limited.
Why were only two sites used, and for only one month of one year? Was this related to the availability of more meteorological data that could be used from these sites? It seems quite limited. Indeed, when one reads through the questions that are set out to be answered in this study (L73-L77), it seems impossible to characterize the spatial and temporal error in the modelled “diurnal and monthly variability of temperature, relative humidity, PBL, and winds (both near the surface and upper air) across India” based on these data alone.
I’m confused as to why ERA-Interim would be used for comparison with the model output. ERA5 was used to drive the model, and ERA-Interim is a very similar reanalysis product with poorer spatial and temporal resolution that has since been superseded by the next generation of ECMWF reanalyses, ERA5. Why compare to this? It does not make any sense to me. Comparing to ERA5 could make sense, but I’m not sure it would bring substantial insight. In L198-199 the authors write that this comparison “allows us to examine how the physics schemes of the regional model modify the initial data with time and what differences arise with the change in version”. Comparing it to just ERA5 could tell you something about how the physics schemes of the regional model are different, but are you trying to indirectly compare the “versions” of ECMWF reanalysis products in this way? This seems ill-advised.
L255: Here the authors describe a “time lead” (I guess a temporal offset) of about 2 hours in surface temperature variability. Why might this be?
L282: The authors state that the very large, systematic differences in diurnal pattern of the wind speeds averaged over a whole month show that the model has “difficulties in capturing random fluctuations in the wind direction”. This does not seem very random! I think it would be advisable to try to understand why this is so consistently wrong. What do the ERA5 winds show? If these are consistently wrong, one can hardly expect WRF to do much better. Do they agree on some days, but not on others? A persistent >100 degree difference in afternoon wind directions across the ensemble is a big concern. Are there other meteorological station data nearby that could be used for comparison?
I found the separation of the diurnal cycle into “representative periods” t1, t2, t3, and t4 to be generally confusing. Are these stability regimes so consistent from day to day? Figure 4 was also hard for me to parse. What is the take-away message from this figure?
L337-338: The authors write in reference to Figure 7 that the wind speed near the ground shows a large negative bias except for the CLM4 scheme (which I think is Expt. 7), which simulates near-zero biases, but when I look at the figure, it seems the Expt. 6 has lower biases near the surface, at least in the afternoon. Overall Expt. 10 and 11 (with different driving meteorology) seem to have the smallest bias compared to the STR in Figure 7, but this is not so in Figure 6, or in the comparison to radiosonde data in Figure 5. How can this be so inconsistent?
L344: Please refer to it as an offset or similar, “lead time” means something else in meteorology!
Table 3: When these statistics of monthly diurnal cycles are further subdivided into four parts of the day, is this the correlation coefficient of only six points? When the conditions are rather stable (i.e. t3, and most of t4) and the variability quite low, is it not expected that the R2 values should be really small? I am not sure what we really learn from this. Which observational data were used for this analysis? (In general, I found it difficult to keep track of which observations were used where.)
Figure 8: Is the dotted black line from the MWR? I guess it’s the other method to derive the PBLH from the MWR? Also in panel a? This is unclear, and should be in the legend. The large black dots in panel b make it unnecessarily hard to read, please remove them.
Figure 9: Confusing colour bar! White should correspond to zero, even if it's heavily skewed towards negative biases in the far north. (Incidentally, is this the only comparison which is actually showing correlation over the course of the month, rather than diurnal patterns averaged over the whole month?
In general, I found that the comparison with the gridded and reanalysis products brought nothing to the paper as it was presented here. I was confused by the comparison of the mean spatial patterns for daytime and nighttime monthly means: As this is a spatial correlation, it would be useful to know exactly which areas were compared when referring to some sub-regions (like southwestern India) which resulted in an R2 value of ~-0.01. Was there just not a lot of spatial variability over this subregion? The information is not provided. The bias between the MERRA-2 PBLH was apparently mostly ~-500 m, which I think is larger than the MBE found between WRF and the observations (in Table 3). Does this mean that the errors in MERRA-2 were larger, or is this very site-specific? There is no way to know. Why daytime mean and nighttime mean temperatures were used to assess agreement is not clear. Is this an established metric?
Figure 10 and 11: The panels should be presented in the same order (i.e. biogenic, fossil, total corresponding to the same panels. But more importantly: where did this biospheric CO2 mixing ratio come from?
L391-392: Is the larger variability at Cochin necessarily due to the influence of the coastal boundary layer? Isn’t it possible, that the more spatially heterogenous anthropogenic fluxes near this station play a larger role? Furthermore, if the diurnal cycle in the fluxes (both anthropogenic and biogenic) is being neglected, this variability is systematically underestimated in any case.
L394-396: There is some difference, but is this really significant? And how was this calculated? This is not just the spread within the set, as then there would only be 2-4 values per set rather than a full distribution. Might the spread in set 1 be larger simply because it contains four experiments rather than two (as for set 3-5)? Or is this mashing together the standard deviation over time (hourly?) with that over the ensemble members? In any case, this should be clarified.
L403-404: Same problem as the previous comment: Do you mean the variability within the ensemble, or the variability over time? The variability within the ensemble for PBLH seems highest during the day/afternoon at Gadanki, whereas the total CO2 variability is higher at night. At Cochin the ensemble spread for both is higher at night, but without taking the diurnal cycle of emissions into account, I am not sure this is a robust result. I don't see the clear correlation with windspeed either...
And do we really expect the distribution of total CO2 variability to be well correlated with air temperature? I wonder if it would be more correct to say that both CO2 variability (as simulated here) and temperature are temporally correlated with PBLH? But this does not say anything about the variability within the ensemble, which I had thought was the purpose of the analysis.
L487-488: A general comment: it is well established that urban meteorology is challenging to represent, especially with rather coarse simulations (10 km is definitely too coarse, and even 3 km is not really sufficient). In fact, when looking at the zoom over Cochin in Figure 1, I guess that this whole subplot would only be about 24 10-km pixels. Would we even expect an urban canopy model to be effective at such coarse resolution (and for a fairly small city)?
Because of these challenges, it may be best not to plan new measurement sites within cities. Instead, it may be better to measure in a peri-urban setting, where the plume downwind of the city can be measured depending on the wind direction, but which is much easier to represent and interpret in a model.
L490-491: I’m confused about where the factor of 3 to 5 is coming here. Does this follow from the previous sentence? If so, how? If not, where do these numbers come from? higher? Perhaps it is worth mentioning that this is why traditionally only afternoon measurements (under well-mixed conditions) are used in inversions. Also, the word “larger” or “higher” needs to be inserted after “3 to 5”.
L497-498: This larger study for which you are advocating seems challenging, and would need a firm basis in observational data for interpretation. In situ measurement sites in India are often a limitation here, and it would certainly need the analysis of meteorological data from more than two sites and a single month to draw any conclusions.
L500: In reference to the initial question stated as an objective of this study: have you characterized the error structure? If so, how?
Typographical/minor comments:
L3: It would be good to introduce the concept of inverse modelling before jumping in with a description of atmospheric tracer transport in a “multi-data” modelling system. (I do not know what “multi-data” means here.)
L8 (and elsewhere): it needs to be clear that there is only one urban and one rural station considered in this study, the text often does not make this clear.
L16: the -> an
L21-22: either “an unprecedented amount of greenhouse gases”, or “emitting greenhouse gases at an unprecedented rate”. Don't mix these.
L23: no N2O?
L29: is it a prediction or a simulation? Most of what we are doing is simulation.
L30: on -> for
L31: insert “the distribution of” after “modulates”
L38: remove “the” before “NWP”
L39: The confidence level, or the performance? And this reference is to the ERA5 reanalysis, which only indirectly affects the transport model WRF through the driving meteorology…
L43: insert “model” after (WRF).
L47: broad -> broader
L61: perhaps mention that FDDA is often referred to as “nudging”.
L62: it doesn’t “approximate” the model values toward the observations, but rather “nudges” them. And in which “equations”? Perhaps remove, or specify “at each timestep” or something similar?
L69: Rather than being “unique”, perhaps they are “distinct” from one another?
L73: “model in capturing the” -> “model-simulated”
L85: I guess Stratosphere-Troposphere (ST) should be Stratosphere-Troposphere Radar (STR)? The full word should also be in the section 2.2.1 title I think.
Table 1: What about February? A minor point, but I suppose that having the horizontal wind components means that, by definition, you also have the wind speed and wind direction, right?
L105: Does this profile start at the surface? At approximately what altitude does the balloon burst?
L118: Does the data logger have its own data logger? This is confusing. Also, the sentence starting in L120 seems to completely repeat the information that came before. The same thing for the sentence starting in L125. Please rewrite this paragraph.
L142: I guess the static land-use data have a spatial resolution of 30 arcseconds, not a temporal resolution of 30 s?
L143-144: second- to sixth-order schemes
L159: “the” before Yonsei and Mellor
L160: “the before Asymmetrical
L163: insert “is used” before “with the MYJ PBL scheme”
L166: “The” before “LSM”
Figure 1: The legends are impossible to read on the small maps, and the layout is strange. Please show the full d01 domain, also to demonstrate the appropriate placement of d02 in from the edge of the parent domain. Insert “the” before “ESRI” in the caption.
L179: respectively, except that they are coupled with an urban canopy model
L193-194: The sentence starting with “Daily” seems to add no new information.
L211: virtual temperature -> virtual potential temperature
L272, L276: These should all be inter-model, I guess.
L353: remove “that”
L360: Should this be R2>0.64 or R2<0.64? If it’s the latter, I’m not sure you can say that these “largely agree”.
L414-415: Remove the beginning of the sentence (before the comma), the relation to LSMs is in there twice.
L429: remove “significantly” and insert “more” after (0 to 2 km)
L432-433: What does “comparatively considerable” mean? Compared to what?
L435-436: This sentence (starting with “The”) is unclear, rewrite.
References
Feng, S., Lauvaux, T., Keller, K., Davis, K. J., Rayner, P., Oda, T., & Gurney, K. R. (2019). A road map for improving the treatment of uncertainties in high-resolution regional carbon flux inverse estimates. Geophysical Research Letters, 46, 13461–13469. https://doi.org/10.1029/2019GL082987
Seidel, D. J., Ao, C. O., and Li, K.: Estimating climatological planetary boundary layer heights from radiosonde observations: Comparison of methods and uncertainty analysis, Journal of Geophysical Research, 115, D16 113, https://doi.org/10.1029/2009JD013680, 2010.
Seidel, D. J., Zhang, Y., Beljaars, A., Golaz, J.-C., Jacobson, A. R., and Medeiros, B.: Climatology of the planetary boundary layer over the continental United States and Europe: BOUNDARY LAYER CLIMATOLOGY: U.S. AND EUROPE, Journal of Geophysical Research: 590 Atmospheres, 117, https://doi.org/10.1029/2012JD018143, 2012.
Vogelezang, D. H. P. and Holtslag, A. A. M.: Evaluation and model impacts of alternative boundary-layer height formulations, Boundary Layer Meteorology, 81, 245–269, https://doi.org/10.1007/BF02430331, 1996.
Yver, C. E., Graven, H. D., Lucas, D. D., Cameron-Smith, P. J., Keeling, R. F., and Weiss, R. F.: Evaluating transport in the WRF model along the California coast, Atmos. Chem. Phys., 13, 1837–1852, https://doi.org/10.5194
Citation: https://doi.org/10.5194/egusphere-2023-2334-RC2 -
EC1: 'Comment on egusphere-2023-2334', Patrick Jöckel, 19 Aug 2024
Since the manuscript has been rated "fair" and "poor" in 6 of 8 categories, and one referee recommended to reject it, I need to reject the manuscript from further revision, since it does not fit the journals quality requirements. I do not think that the major concerns raised by both referees can be addressed appropriately with only a simple revision of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2023-2334-EC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
260 | 84 | 32 | 376 | 41 | 21 | 15 |
- HTML: 260
- PDF: 84
- XML: 32
- Total: 376
- Supplement: 41
- BibTeX: 21
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1