the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Optimizing Airborne Emission Rate Retrievals with Sub-Hectometre Resolution Numerical Modelling
Abstract. A comprehensive model-based study is designed to provide optimal flight paths for airborne top-down emission rate retrieval methodologies. The meteorology and plume dispersion were modelled using the Weather Research and Forecasting (WRF) modelling platform with the Advanced Research WRF (ARW) dynamical core at 50-m resolution. Multiple flight path designs and parameters were investigated to determine emission rate retrieval accuracy as a function of downwind distance and transect spacing, which are ultimately related to flight time and cost. Three unique source types (multiple smokestack plumes, small area sources, and a large area source) were investigated for 4 summer afternoon flight cases over 2 days. The results demonstrate that emissions estimate uncertainty is primarily due to storage and release. The average advective flux estimates are within 12 % of the known emissions for downwind distance of D ≥ 4 km. Variability between flights decreases with D. For stack sources the variability near D = 10 km is approximately half that at D = 4 km. For small area sources, there is less reduction with D, and for the large area source, variability reaches a minimum at D = 8 km. For stack sources, transect spacing is optimized at 100 m, while for area sources, a spacing of 50 m reduces uncertainty. Error due to extrapolation below the lowest flight path is less than 20 % for stack sources and less than 30 % for area sources for non-dimensionalized downwind distance of D' ≥ 3. Results demonstrate the need for surface sampling coincident with the flights to reduce extrapolation error, and the use of modeling with reanalysis data to account for storage and release effects.
- Preprint
(1022 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 02 Jan 2026)
- RC1: 'Comment on egusphere-2025-4542', Joseph Pitt, 01 Dec 2025 reply
-
RC2: 'Comment on egusphere-2025-4542', Anonymous Referee #2, 18 Dec 2025
reply
Review of: “Optimizing Airborne Emission Rate Retrievals with Sub-Hectometre Resolution Numerical Modelling”, by S. Fathi, M. Gordon & J. Hao.
By: Anonymous Reviewer
General comment:
The manuscript presents a detailed model-based study aimed at providing insights into recommended strategies for flight planning when employing mass-balance methods. It builds upon earlier works (both modelling and measurements) and is focused on various source types (dispersed area sources and tall stacks) around the Canadian Athabasca oil sands, where heavy oil industry is responsible for releases of large amounts of various atmopheric pollutants. The primary tool used in the investiagion is the WRF model run at very high spatial resolution (50 m horizontal). Based on the setup evaluated in previous studies, the model delivers highly resolved spatial concentration fields based on the assumed source distributions, which form the basis of the analysis. The authors evaluate the ability to retrieve true source emissions from the measurements performed using hypothetical airborne platforms (either aircraft or UAVs), represented by extracting model-predicted fields at locations and times mimicking a way airborne platforms would be normally flown. The authors evaluate accuracy of the estimated emissions varying factors like flight strategy, measurement distance from the source and the role of vertical data density on the said accuracy, including data density. They intepret this data in order to search recommended ways to sample similar emission sources in real-world conditions.
I find the paper extremely interesting, well written, and generally well structured, with minor editing notes listed at the end of this review. I find the quality of the modelling work very high and the interpretation of data shows very good understanding and the topic. However, I also identify several major flaws that need to be addressed before the paper can be considered for publication.
PS – I ask authors not to be discouraged by multiple remarks. These are meant to be constructive and I want to underline that I think the work is of high quality and will make a valuable scientific contributuon when the following concerns are addressed.
Specific major comments:
- I believe the use of statistics needs to be reworked. The authors incorrectly describe uncertainties, using S.E. = 1.96 σ / √n, when S.E. should be defined without the 1.96 factor. The intent is right, but according to metrology standards this quantity is correctly “expanded standard uncertainty of the mean”, or “expanded standard error” (although this is discouraged). See JCGM 2008 for details. Here also k (coverage factor) is chosen inappropriately – 1.96 is the correct value when the effective degrees of freedom are extremely large. In their study the authors have only 10 repetitions, in which k value will be higher than 2 (if the results were uncorrelated – see below).
- Following to the above, but much more important, is that the authors incorrectly assume that the observations within their subsets are independent, and ignore the existence of correlation. In fact it was demonstrated in the past (Gerbig et al., 2003) and in more recent works (Fuentes-Andrade et al. 2024, Galkowski et al. 2025) that the atmospheric signals of atmospheric pollutants are correlated at spatial and temporal scales large enough to be of significance to the meaasurements like those investigated in this work. In Galkowski et al., CO2 emissions from elevated stacks were found to be auto-correlated down to a distance of 4 kilometers with persistent spatial (mostly horizontal) structures affecting plumes down to distances of even 20 kilometers. Although here the correlated structures are likely to be shorter (smaller PBLH, lower emission altitudes), the authors cannot ignore correlation in the signals in their analysis. Formally evaluating and including impact of correlation are likely affect the results in tow major ways:
a) the uncertainty ranges calculated for the emission rates are expected to increase, as the effective degrees of freedom (number of independent measurements) will be reduced for each distance, for which emissions were evaluated. For methods on evaluating degrees of freedom, see e.g. works cited above.
b) close to the emission source, due to impact of turbulence, persistent turbulent structures form that cause the cross-section mass in the plume to generate peak-to-through structures that are advected downwind from the source (Galkowski et al.). As a thought experiment - if the speed of these structures in the studied cases was (unluckily) the same as advection speed of those structures, it might be that also the sampling of the plume at different distances was not independent, leading to potential biases in the estimations (worst-case: if the extreme peak (through) was always sampled – one would observe consistently positive (negative) bias in evaluated emission, respectively. The authors need to evaluate whether this synchronization of sampling and plume structure is responsible for the observed biases. - If my understanding is correct, what authors call “storage” is actually a momentary turbulent flux (positive or negative) – but it is resolved in the model, so not considered by author’s definition of turbulent flux as described in Appendix B of Fathi et al. 2023. If my assumption is correct, then a more appropriate term here would be “large eddy turbulent flux”. I would like to suggest adding discussion on relationship between “storage”, turbulence and advective fluxes, as well as the effect of their interplay, somewhere in the study.
- The description and analysis of the role of wind speed should be expanded. Only very rudimentary information about how the effective wind speed and direction was calculated is given (L182). How the wind was calculated for each screen (or group flight) is crucial, as the results are very sensitive to biases of U. Especially in Fig 7c and 7d, I have a strong suspicion that the wind speed and direction cause the sign shift in the bias, as the overall plume structure visible in Fig. 1 turns progressively to more southerly directions.. It might be that more accurate evaluation of wind direction could help reducing that bias – it stands to reason that in those areas far downwind the wind direction (and speed) is highly variable within the screen and assumption of a single-average wind is simply wrong. Authors might either test if local wind speed information can be interpolated (UAVs or aircraft usually carry wind sensors), another approach could be to detect the central plume path (see Kuhlmann et al 2020).
- Finally, I would like to point out that the results from the modelling of four simulations covering two afternoons, even after so detailed an analysis, is not sufficient to generalize the results. Statements that could be interpreted as general recommendations should be therefore avoided, e.g.: "alone, a screen at a downwind distance of 4 km or more provides the same level of accuracy for the three types of sources investigated here (i.e. elevated stacks, small surface area sources, or a large surface area source)". There is simply not enough proof to extrapolate these results to all cases, with local conditions (meteorological and otherwise) playing such a major role in the atmospheric transport in turbulent conditions. I therefore suggest to soften all such statements. The paper will not lose its (high) value, but transparency will be increased. I have marked some of such statements below.
Other comments:
L59: “… and requires induvial plumes to be well defined and separate (e.g. Baray et al., 2018).” – This makes sense if information on individual sources is required. If information on the cluster / group of sources is sufficient, there is no such need.
L80: “This study aims to optimize…” – here the authors indirectly imply that the results could also be extrapolated to dust particles – or at least this is how I understand it. While it might be true, it needs to clearly be stated in the study (also in the abstract, and in conclusions) that the tracers emitted in WRF are considered gaseous sources, and that typical dust processes like deposition etc. are not considered.
L95: Case Studies and Locations – perhaps “location”? The study is concentrated around Athabasca Oil Sands and facilities there.
L96: “The model is run” -> “The model is run in an LES mode…”
L96: “dz ≈ 12 m -- I assume this is the height of the lowest layer - please state it clearly. Also, this information is given in sec. 2.2. again (with 11.2 m stated), please see my comment there.
L108-L117: I find this paragraph hard to read, consider revising. Possibly also moving to another section, since here the focus is on the extraction of data from the model, which doesn’t fit the section title. Some suggestions follow:
L108: “In this study, we … “ – erase this sentence and fragment of the next until “To achieve this” – This is said again below, with higher information content.
L110: “along flight paths similar to those conducted during” – I think it’s ok to use “same” or “matching” here.
L110: “The super-resolution of our 110 model-generated atmospheric fields allow us to sample data at temporal and spatial scales of airborne measurements without the need for interpolation of model generated fields.” – some details important for study reproducibility are missing. How was the model sampled in horizontal and vertical? Was it simply using nearest-neighbour sampling? Or interpolation was used? If yes – were absolute heights used, or pressure, for vertical coordinate?
L121: a) neither T, p or c symbols are used later in the paper, consider dropping; b) please be specific, which moisture variable is archived? Relative humidity? Specific humidity?
L124: “~ 31 km” – Please give 31.25 km exactly, this makes sense with 1:5 nesting ratio for WRF, approximation raises an eybrow.
L125: Was the vertical resolution forced to 11.2 m for all 40 grid levels? This is not a typical WRF configuration with hybrid model levels, so please state it clearly here. For comparisons against other modelling setups, it would also help to state how many vertical layers are present in the lowest 3km, please add this information here.
L127: Please state the spatial resolution for NARR data as well.
L131: Please limit the description to sources and tracers relevant to this analysis.
L136: 1. "in height" repeated 2. Please give exact heights of all four stacks 3. Please state their respective emissions – do they differ? Consider a table if they do. This is relevant for the analysis later.
L137: “Each source emits a known amount” -- 1. Is it meant that the emissions are known in real world, or prescribed in the model? Please make clear. Consider "Each source in the model emits a known amount" or "The emissions prescribed in the model E_s can be compared..."
L139: “Here we evaluate three emissions scenarios: stacks” -- "Scenarios" does not make sense in this context. "Emissions from group of emitters” are evaluated, consider this or similar.
L146: “more than enough” –> sufficient
Figure 1, caption: “All stacks are combined…” -> It’s the emissions that are combined. Consider: “Emissions from all stacks are followed using a single tracer in the model. Small dispersed area sources are grouped similarly”. Also: degree symbol missing in coordinates.
L165-184: I have my doubts about whether the full algorithm in this context, as most of the components except for the horizontal advective flux are immediately discarded. See my major comment 3.
L182: More details need to be given on how the wind was calculated. See major comment 4.
L186: “The terms… must be ignored” -- Wording. Actually they must not be ignored -- because that would mean we accept a presence of potentially large bias, as the mass escapes the volume. More precisely, it is reasonable to >assume they are negligible< - provided that there is no indication of mass on the higher levels of the flight, and no deep convection was observed –half a sentence that none of this was observed is worth adding.
L191: “a 3-dimensional prism” – please add “or a cylinder”
L201-202: When read first, it feels like contradicting L151. I suggest removing part of sentence “representing a well-mixed concentration in the boundary-layer” entirely.
L215: This discussion of the storage is very relevant to biases demonstrated later, but not highlighteed in the discussion. If it’s possible to evaluate the storage component in the previous study numerically, why not use the same method to “correct” the emission estimates here for individual cases? See also my major comment 3 and comment to L581.
L243: Please use another symbol. T is used for temperature or period of oscillation, both could theoretically be used in this study (e.g. period of circling around the source, where circular paths is discussed). In fact, temperature is also denoted as T in sec. 2.5.. To avoid confusion (especially in discussion), I strongly suggest simply Dh here (or similar).
L243: If T is set to 100m, then what's the point of optimizing it? Is that the base value? Please make clear
L245: Is 1 minute for turn a realistic time based on actual data from measurement campaigns?
L249: See major comment 1.
L249: “Based on our estimation…” – sigma is simply a single measurement uncertainty estimate. Please erase or simplify.
L250: “When comparing…” – Is this relevant? Please clarify or erase.
L255: In real world that would mean we have 10 instruments available. Perhaps add clarification whether this is meant to represent real-world situation where someone is flying 10 drones (unlikely for various reasons), or is just a method to estimate uncertainty. Related to major comment 2.
L256: “horizontal aircraft speed is randomly offset…” Is this number according to real data? Based on my knowledge the variability of speed in UAVs in automatic mode is usually within 0.1 m/s at an altitudes up to 200 meters. When flown “manually“ this value increases somehow (0.5 m/s – data from actual measurements) but having 3 m/s variability is unlikely, as these sort of conditions are not flight-permitting. For small aircraft change of wind speed by 3 m/s at higher altitudes is perhaps more likely, but then the momentum preservation law will prevents that to be >completely< random. And for larger aircraft this is simply impossible. Finally, the accumulation of the error is an entirely wrong assumption as either the automatic guidance systems, or the pilots will prevent "drifts" of the desired speeds and altitudes. This needs to be addressed, either by recalculating the procedure entirely, or by demonstrating that this does not lead to major biases in estimation.
L268: “These screens are flown…” - I think "flown" is confusing in this context - if the full screen are output at a single time, then perhaps it's better to use "sampled" here.
L270: We refer to these flights and the calculated emission rate values as “instantaneous”. Linked to above; I suggest: "… to thus calculated emission rates as instantaneous".
L274: Here and throughout the text, I feel it would be beneficial to differentiate between a "single flight" and “10 subsequent flights". Consider “formation flight", “group flight” or even “echelon flight”.
L281: “turbulence and the and stability” –repeated “and”
L283: “model runs (using a criteria of 0.25 > Ri > –0.25 for neutral conditions)” -- This is not a typical interpretation: please add a reference for range given if available. I've never encountered values below zero to be interpreted as neutral. Usually flows with Ri_b < 0.25 are treated as turbulent. See Stull, "Introduction to Boundary Layer Meteorology", Sec, 5.6.3. Fig 5.19., for example.
L284: “Temperature rises consistently during both afternoons, rising approximately” – “rises, rising” - replace second with “by”
L288: -- way >to< sample
L291: “eliminated” – should be “eliminating”
L305: “calculation of .... means ...” -> "calculating the screen lenght... results in a screen length that is..."
L340: I think the critical point here is the temporal scale of the changes - these occur on high time frequencies, high enough to cause variability in estimated emissions between flights separated by 1 minute. The "storage" term here is a manifestation of the turbulent eddies transferring mass through the screen at highly variable rates. See my major comment 3.
Figure 3. a. This figure is only for stacks - and it should be noted in the caption. b. Red symbols are not mentioned - please add where appropriate. c. Please add Panel A/B/C/D references next to appropriate dates.
L376: “The extrapolated concentrations…” - The way this is written it suggests (“average of…”) that more than 1 sampling was compared, but the text above says it was only the "first single instantaneous flight" was sampled and compared against the original screen. Please clarify if only one instance was compared, or the comparison was done to 10 flights).
L378-380: The authors correctly spotted this effect for vertical motion but didn’t consider it for horizontal – see major comment 2.
L393: “Generally, flying…” – I’m quite certain this is due to source being below 150 m and extrapolating without “seeing” most of the mass. It's quite clear from Fig 4, where at 2km the tracer concentration extrapolation < 150m underestimates concentrations in 14/16 cases (most of those quite clearly). Would require to look in detail on the model output over a longer period (if the output is available for several hours, then would be a good addition) to confirm without any doubt, but it’s quite logical - 2 km is not enough distance and time for the model to assure updrafts move the mass above 150m.
L408: “This transition from overestimation at small spacing to underestimation at larger spacing could be due to vertical movement of the plume opposite to the sampling direction, resulting in transects missing the plume centre at larger spacing.” – Again, authors think only of vertical, but not of horizontal. See major comment 2.
L413: sometime -> sometimes
L435: “For the small area sources (Figs. 6a-d), the instantaneous flight horizontal…” – see comment for L393, same effect.
L445: “The relatively good agreement between instantaneous and non-instantaneous estimates implies that vertical motion of the plume does not result in over- or under-sampling.” It is also partially because for large source area the effective distance from the source is much larger – what is given is calculated to the >edge< of a large source, so that the emission-centre point is much further upwind (Fig 1), and the effective signal is from areas well-mixed (far away) and not well-mixed (close to measurement). This deserves some expanded discussion as well, with “effective distance” or “distance to centerpoint” rather than distance to edge used for x if the comparison is to be fair.
L458: “large area source would show substantially less uncertainty relative to a single flight sampling small area sources.” – delete “would”, no need to hypothesise.
L459: “we expect” – as above, “we estimate”
L467: “instantaneous area source flights” -- instantaneous sampling maybe? See comment to L270.
L477: “however, this is…” - Clearly something else negates this effect. Bias in wind speed or direction could be explained, especially since the model clearly predicts a large-scale change of wind direction, shifting to more southerly winds as the plume goes norhwards. See major comment 4.
L484: “For this source…” – More precisely it should start with “for this source and these atmospheric conditions”. See my major comment 5.
L508: “Scaling” – this section doesn’t have a corresponding entry in Methods. Reorganize, with expanded description of the method (and motivation for it’s use) moved to Section 2.
L511: “wind speed” – horizontal, or also using W component?
L511: “boundary layer heights are taken as…” –What was the method for PBLH evaluation here? State it clearly. Also, authors assume that PBLH did not change significantly – see comment below.
L516: “The results are not collapsed…” - Would it be better if actual PBLH was taken into the account, I wonder? High variability is possible - 16 UTC and 17UTC corresponds to approximately 9 and 10 local time in Alberta - PBLH development can be quite dynamic at this time, changes of 200 m per hour are typical for mid-latitudes in summer, so if the longest analysed time periods are 30 mins, then change of 100m is over 20% if the zi. I assume the PBLH field from the model is available, please give numbers here and discuss. Consider also the plume extent – single point values might not be representative.
L527: “for the Aug 20 17:20 stack flights at 𝐷𝐷= 10 km (see Fig. 4b), and it is unclear what would happen at further downwind distances for that flight.” -- Large-scale change of wind direction is probably at play here and this breaks the method assumptions – see major comment 4.
L545: “Hence, based on the average estimate of 𝐸_H/𝐸_𝑆 alone…” – This reads as a general comment. I disagree that the evidence presented support this. See my major comment 5.
L549: “variability is seen instantaneous results” – “seen in”
L555: “Hence, 3 flights can be flown at 𝐷𝐷= 4 km in the same time it takes to fly one flight at 𝐷 = 12 km. Taking the average of these 3 flights, reduces the uncertainty by a factor of 0.58 (1/√3). Hence, …“ – numbers flawed as based on wrong assumptions of statistics. See major comment 2. Also: “hence” is used twice.
L561-567: Again, results are based on four eddy realizations. I find the sample size too small to derive such conclusions. See major comment 5.
L579: “However, the results do demonstrate the potential to improve emission rate retrieval by accompanying any flight campaign with a strong modelling effort.” - While I agree this statement is true in general, I do need to point out that this is not demonstrated in this study, as the results were not compared to actual measurement data here -- emission estimates were not "improved". Consider erasing. See also comment below (L581).
L581: “Reanalysis data combined with tracer release can be used to mimic flight actual patterns and estimate storage and release during actual flight time, thus reducing the most substantial uncertainty in the emission rate estimation.” – If I understand the authors’ thought here, the model would require that to simulate exactly the same plumes, same eddies, as in reality. Do authors believe this is possible? The eddies are stochastic, and while can simulat realistic conditions, it's unlikely that we will reproduce exactly the same eddy pattern. And if we can't, then can the model help us correct estimations if we only have a single, or maybe two flights (as we often do?). Or does it only allow us to estimate uncertainty more realistically? Please comment.
References:
Fuentes Andrade, B., Buchwitz, M., Reuter, M., Bovensmann, H., Richter, A., Boesch, H., and Burrows, J. P.: A method for estimating localized CO2 emissions from co-located satellite XCO2 and NO2 images, Atmos. Meas. Tech., 17, 1145–1173, https://doi.org/10.5194/amt-17-1145-2024, 2024.
Gałkowski, M., Marshall, J., Fuentes Andrade, B., and Gerbig, C.: Impact of atmospheric turbulence on the accuracy of point source emission estimates using satellite imagery, Atmos. Chem. Phys., 25, 13831–13848, https://doi.org/10.5194/acp-25-13831-2025, 2025.
Gerbig, C., J. C. Lin, S. C. Wofsy, B. C. Daube, A. E. Andrews, B. B. Stephens, P. S. Bakwin, and C. A. Grainger, Toward constraining regional-scale fluxes of CO2 with atmospheric observations over a continent: 1. Observed spatial variability from airborne platforms, J. Geophys. Res., 108(D24), 4756, doi:10.1029/2002JD003018, 2003.
JCGM (EC, IFCC, ILAC, ISO, IUPAC, IUPAP, OIML and BIPM) - Evaluation of measurement data—Guide to the expression of uncertainty in measurement, , JCGM 100:2008, 72–73, available at: https://www.bipm.org/documents/20126/2071204/JCGM_100_2008_E.pdf
Kuhlmann, G., Brunner, D., Broquet, G., and Meijer, Y.: Quantifying CO2 emissions of a city with the Copernicus Anthropogenic CO2 Monitoring satellite mission, Atmos. Meas. Tech., 13, 6733–6754, https://doi.org/10.5194/amt-13-6733-2020, 2020
Citation: https://doi.org/10.5194/egusphere-2025-4542-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 141 | 43 | 19 | 203 | 16 | 13 |
- HTML: 141
- PDF: 43
- XML: 19
- Total: 203
- BibTeX: 16
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study investigates sources of error in typical aircraft mass balance experiments and provides guidance for experiment design. It is well written and easy to follow, with clear conclusions: storage leads to random error in E_H/E_S, whereas extrapolation below the lowest transect can be an important source of systematic error. This is consistent with previous studies, but the thorough investigation presented here provides valuable insight for planning and evaluating future real-world data. I suggest that this paper is suitable for publication with only minor revisions. I think it would benefit from a slightly expanded discussion on the points below.
There is currently very little mention of the background concentrations. I understand that this is not a factor in the simulated data, as all the tracers released in the model come from sources within the domain. However, in real world examples variability in the background can be an important source of error, so at least some discussion of this is required. In particular, it impacts statements such as that in L368-370. Going further downwind may reduce the sources of random error addressed here, but there is a trade-off in terms of signal-to-noise above background.
It is interesting that the kriging interpolation resulted in an overestimation of the instantaneous screen (L375). It would be great to see some more investigation of this. Was anisotropy in the variogram considered? I wonder if the variogram becomes more isotropic as you move further from the source? That would make intuitive sense to me. Were other functions (i.e. other than the spherical function mentioned) tested when fitting the variogram? It could also be interesting to see if this choice impacts the overestimation, although I appreciate that it is hard to draw general conclusions because the best function will always be specific to an individual flight. The same goes for the area source flights – L487 points to the kriging as a potentially significant error source so it would be good to see this case investigated too.
The investigation of the vertical transect spacing is interesting but the results are hard to interpret. The hypothesis that plume movement could be responsible for the changes seen in the Sep 2 case seems plausible, but it would be nice to see this tested. Seeing as we are dealing with simulated flights, could a test be done where the order of the transects is changed?
L213/216 – refers to the known emissions as Es but I don’t think this has been defined yet
L225 – in some cases even faster than 2 Hz. I know the UK FAAM aircraft has a CO2/CH4 LGR with a data acquisition rate of 10 Hz, although the cell turnover time means that the effective frequency of measurement is less than this (more like 7 Hz I believe).
L360 – it might be worth rephrasing this to clarify that it is E_H/E_S which is lower in the non-instantaneous cases (i.e. the underestimation is worse). A “lower underestimation” could perhaps be misinterpreted.
L413 – typo “sometimes”
Figure 9 – formatting error on some axes labels
L576-577 – it makes qualitative sense that more information below the lowest transect would help. Could this be tested? At least for the case of a mobile vehicle you could presumably add an extra transect at z=0 with a typical vehicle speed and see what difference this makes