Evaluating Weather and Chemical Transport Models at High Latitudes using MAGIC2021 Airborne Measurements

Langot, Félix; Crevoisier, Cyril; Lauvaux, Thomas; Abdallah, Charbel; Pernin, Jérôme; Lin, Xin; Saunois, Marielle; Guedj, Axel; Ponthieu, Thomas; Roiger, Anke; Gottschaldt, Klaus-Dirk; Fiehn, Alina

doi:https://doi.org/10.5194/egusphere-2024-3559

Preprints

https://doi.org/10.5194/egusphere-2024-3559

Preprints

27 Nov 2024

| 27 Nov 2024

Evaluating Weather and Chemical Transport Models at High Latitudes using MAGIC2021 Airborne Measurements

Félix Langot, Cyril Crevoisier, Thomas Lauvaux, Charbel Abdallah, Jérôme Pernin, Xin Lin, Marielle Saunois, Axel Guedj, Thomas Ponthieu, Anke Roiger, Klaus-Dirk Gottschaldt, and Alina Fiehn

Abstract. Methane (CH₄) fluxes emitted by wetlands at high latitudes remain one of the largest sources of uncertainties in global methane budgets. At these latitudes, flux estimation approaches, such as atmospheric inversions, are impacted by improper characterisation of atmospheric transport due to challenging meteorological conditions and a lack of measurements. Here, we assess the performances of ERA5 reanalysis, mesoscale simulations from WRF-Chem, and various atmospheric transport models from several global and regional inversion systems using meteorological and CH₄ in-situ measurements collected during the MAGIC2021 campaign near Kiruna, Sweden. Over six measurements days in August 2021, ERA5 exhibited better agreement with observations than WRF-Chem thanks to data assimilation. Nevertheless, WRF-Chem demonstrated proficiency in simulating local atmospheric dynamics. Among global simulations of atmospheric concentrations of CH₄, inversion-optimised simulations of CH₄ concentrations yielded the best performances, particularly near the surface, with CAMS v21r1 marginally outperforming PYVAR-LMDz-SACS ensemble inversions. WRF-Chem regional simulations revealed performance disparities among CH₄ products, with positive biases in the boundary layer indicative of an overestimation of wetland emissions by selected wetland flux models. All transport models exhibited a vertically delayed gradient of CH₄ mixing ratios near the tropopause, resulting in a positive bias in the stratosphere. The high vertical resolution of CAMS hlkx facilitated a better representation of the vertical structure of CH₄ profiles in the stratosphere. Despite the limited spatiotemporal scope of MAGIC2021, we were able to identify the best performing transport models and to evaluate fluxes from different biogeochemical model parametrisations using the MAGIC2021 high-resolution dataset.

Received: 22 Nov 2024 – Discussion started: 27 Nov 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Félix Langot, Cyril Crevoisier, Thomas Lauvaux, Charbel Abdallah, Jérôme Pernin, Xin Lin, Marielle Saunois, Axel Guedj, Thomas Ponthieu, Anke Roiger, Klaus-Dirk Gottschaldt, and Alina Fiehn

Status: closed

RC1:
'Comment on egusphere-2024-3559', Danilo Custódio, 16 Dec 2024
Reviewer Comments for the Article "Evaluating Weather and Chemical Transport Models at High Latitudes using MAGIC2021 Airborne Measurements"
The manuscript focus on the ability of atmospheric composition models in reproducing observed CH4 mixing ratios. As well, to asses the compliance of the meteorological variables used to drive atmospheric transport comparing it to meteorological variables measured at high latitude. The article addresses critical issues in the field of atmospheric modeling, particularly at high latitudes, which are underrepresented in global atmospheric monitoring and modeling efforts. The relevance of such an evaluation cannot be overstated, as high-latitude regions are crucial for understanding key atmospheric processes, including CH4 emission and transport.
The study makes a valuable attempt to bridge the gap between observations and model simulations by combining state-of-the-art airborne measurements with model inter-comparisons. Confronting model results with high-resolution observations is a cornerstone of atmospheric science, as it is necessary to validate, refine, and benchmark modeling frameworks. The findings have the potential to contribute significantly to the atmospheric modeling community, offering insights into model performance under challenging conditions and emphasizing the importance of improving CH4 in polar regions.
However, while the study holds promise, the manuscript in its current form is not an easy read and has significant shortcomings in its presentation, structure, and overall clarity. I recommend major revisions before the article is considered for publication.
If the authors address the weaknesses in presentation and analysis, this work could substantially contribute to the scientific discourse on atmospheric modeling in high-latitude environments.

1. Major Concerns

Clarity and Presentation

The manuscript is difficult to follow due to unclear wording, undued wording, and overly dense descriptions. Some key points are buried in the text, making it challenging for readers to extract the central findings and their implications. Additionally, the plots are mazy and visually overwhelming, detracting from their effectiveness in conveying the results.
Metrics and Model Performance Assessment

While the study evaluates model performance, the choice of metrics is not optimal. The authors should consider employing more comprehensive and widely accepted set of statistical metrics for model evaluation. Correlation Coefficients and Root Mean Square Error are good; however, I would recommend bias.
Additionally, the manuscript could discus the implications of the metrics used. For example, while some metrics may show agreement, others may reveal discrepancies, which are worth exploring.
Figures

The figures and tables are a central issue. While they contain a wealth of information, they are too crowded and difficult to interpret. Each figure should serve a clear purpose and convey specific insights. To improve:
Use clean background, simplify the layout and make sure that the data are visible.

Use color schemes that are easy to distinguish, particularly for readers with color vision deficiencies.

Add concise and informative captions that explain the key takeaways from each figure.

2. Others Comments

The manuscript's Figure 1 is confusing and requires clarification:
Scope of Flights: Does Figure 1 intend to show the entire MAGIC2021 campaign or only flights over Kiruna? The caption does not make this clear.

Flight over Norway: Why the flight over Norway is not included in the figure? Is it not part of the MAGIC2021 campaign?

Unexplained Elements: The color blocks in the middle of Figure 1 are not explained in the caption or the text. This information seems to appear "out of the blue," and the figure lacks sufficient annotation to guide the reader. Please ensure that every feature in the figure is fully explained in the caption and supported by the text.

The introduction of AirCore measurements is presented in a very shallow manner. While AirCore is a critical part of the study, its role and methodology are not sufficiently explained. Readers who are unfamiliar with AirCore technology willgrasp it.
The division of the atmosphere into three layers based on pressure ranges—P > 800 hPa, 300 < P < 800 hPa, and P < 300 hPa—is arbitrary and does not align with commonly accepted atmospheric definitions. The chosen pressure thresholds do not accurately correspond to the planetary boundary layer (PBL), free troposphere (FT), or lower stratosphere (LS). A more scientifically sound approach would involve:
Using PBL height (PBLH) to define the boundary layer.

Defining the tropopause to separate the troposphere from the stratosphere.

This approach would ensure that the results are more meaningful and interpretable, especially for discussions of CH₄ transport dynamics across these atmospheric layer

The table captions should be placed at the top of the tables, following standard formatting conventions. Additionally, the table labels should succinctly describe the contents of the table. For instance, it does not make sense to include information about what is not in the table. Ensure that the captions are clear and concise, helping the reader to quickly understand the data presented.
The manuscript refers to "four statistic," which is an unclear and incorrect phrasing. Likely, the authors mean "four metrics used to evaluate model performance." The use of appropriate and precise terminology is critical for clarity. This error is indicative of broader language issues in the subsection "Statistics," which should be rewritten to ensure proper English usage and a professional tone.
The caption for Figure 2 is insufficient to help readers understand the plot. Captions should summarize the key information conveyed in the figure and provide any necessary context for interpretation. In its current state, the caption leaves too much ambiguity and fails to assist the reader in navigating the content.
The manuscript's discussion of wind fields is constrained solely to advection (horizontal transport), which provides an incomplete picture. The vertical component of wind, which is critical for transport processes and atmospheric mixing, is entirely missing. Vertical transport are among the most significant challenges in atmospheric modeling. Without addressing these, the discussion remains superficial. The authors could evaluate turbulence representation and vertical wind components in the models, as these are critical to understanding transport processes.
Figure 5 is visually confusing and "weird" in its current presentation. The layout, formatting, and choice of visualization make it difficult to follow and interpret. Clearer design and simpler representations would greatly enhance the reader's understanding of this figure. Ensure that key messages are apparent and not lost in the visual clutter.
The content of subsection 3.3 is difficult to follow due to poor organization and unclear visualizations. The comparisons presented in this section lack coherence in terms of visual representation, metrics used, and overall wording. It is essential to streamline the presentation of comparisons to make them more reader-friendly and effective.
The comparison of meteorological data between models and observations is superficial, merely reporting which model or dataset is closer to observations. This approach fails to provide meaningful insights or a deeper understanding of model inter-comparisons. Readers expect a more insightful analysis of model performance, including:
Identifying potential reasons for discrepancies.

Explaining how differences in parametrizations or data assimilation processes contribute to observed biases or differences.

Suggesting ways to improve model representation of meteorological processes.

The manuscript must go beyond simply reporting agreement or disagreement to provide a more nuanced and insightful evaluation.
The vertical profiles presented in the manuscript are overly complicated and lack clarity. The plots are "mazy," and the text does not provide sufficient guidance to help the reader interpret them. The analysis of vertical profiles should do more than report which model performs better in specific atmospheric regions (which, as noted above, were not properly defined). A thorough discussion of the physical processes contributing to vertical variations in CH₄ and meteorological variables would enrich the article.
The conclusion that all models overestimate CH₄ at the upper troposphere-lower stratosphere (UTLS) boundary is interesting but could be influenced by the interpolation method used for data colocation. In addition:
TM3 does not have the resolution to accurately resolve the tropopause.

While IFS has more vertical level, it still struggles with tropopause representation.

The lack of proper selection for the lower-most stratosphere in this study further compounds this issue. A more refined methodology is required to draw robust conclusions about model biases in the UTLS region.
The association of the overall positive CH₄ bias in the boundary layer to wetland emissions is an important finding. However, this conclusion seems premature without further testing. A sensitivity test maybe could strengthen this claim and ensure that this conclusion is robust.
The spatial and temporal limitations of this model evaluation could be addressed by incorporating data from the CoMet 2.0 campaign over Canada in the summer of 2022. While the MAGIC2021 campaign provides valuable observations, supplementing this with additional datasets could offer a more comprehensive evaluation of model performance.
I hope the authors do not feel disheartened by this review. The effort and dedication evident in this work are truly impressive. I believe that addressing these points will unlock the full potential of the manuscript, making it clearer, more robust, and significantly more impactful for the atmospheric modelling community.
Citation: https://doi.org/10.5194/egusphere-2024-3559-RC1
- AC1: 'Reply on RC1', Félix Langot, 16 Jul 2025
  
  Dear D. Custodio,
  Thank you very much for reviewing our paper. You can find attached here our answers to your comments, that have been very helpful at improving the quality of the paper.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3559-AC1
RC2:
'Comment on egusphere-2024-3559', Anonymous Referee #2, 21 Feb 2025
This study compares a dataset of airborne meteorological and methane trace gas measurements from a short measurement campaign in arctic Sweden to outputs from multiple different atmospheric transport and flux inversion models. The observational dataset is rather limited in space and time, but with multiple airborne platforms, appears to have high enough density and altitudinal coverage, and the group of models chosen has a decent diversity of inputs and setups, to allow for some generalized findings.
My main concerns are:
There was no exploration of model representation of PBL height or vertical mixing. Accurate representation of these parameters and processes is a central issue in flux estimation and therefore is a bit of a gaping hole in this paper. For example, the authors conclude that some wetland models over-estimate emissions since simulated CH4 values are too high on average in the BL for many cases. How do you know that representation of BL height or vertical mixing are not playing a role in this high bias?

Section 3.2, Figure 4 bottom left panel – I don't understand why models would track temperatures from aircraft well and not AirCore. I’m concerned that the model comparison has identified a significant measurement bias in the balloon-borne temperature measurements. Later in section 3.4, you say “Temperatures from weather balloons appear to be slightly biased… due to a lack of corrections…” Bad data should not be retained for model evaluation, even if it’s discovered after the fact. For validation of all observational variables analyzed, were measurements from separate platforms/instruments inter-compared or calibrated to a common standard?

Section 4.1: “PLS Surf b notably showed a Δ ∼0 when compared with AirCore data but a significant underestimation of both Cessna and ATR42 measurements.” Is this finding because the CH4 data from the different platforms are biased relative to each other or because they have different processes/ecosystems in their influence regions? It would be helpful to characterize the dominate flux processes in the footprints of the observations.

Section 4.2: You say some models over-estimate wetland emissions due to lack of complexity. But lack of complexity/variability is only one possible reason and no evidence is given. How do you know emissions aren't over-estimated on average?

Section 4.3: Can you provide more information on the wetland models that overestimated methane emissions? How much higher, within the area of influence of the observations, were the emissions from the models that over-estimated observed CH4 concentrations compared to the models that got it about right?

Related to the composition- The paper is full of minor grammatical errors and awkward word choices. It’s not a significant issue on an individual basis, but it made the paper laborious to read and detracted from my understanding in some cases. I have listed many minor suggested edits below.

Other comments:
In the abstract, at this early point in the paper, the model names/acronyms (mainly: “CAMS v21r1”, “PYVAR-LMDz-SACS”, “CAMS hlkx”) don’t mean anything to the reader and no description is provided. It would be better to describe the models in mostly general terms in the abstract related to what aspects distinguish them and/or led to differences in performance.

A summary table or list of the different models tested and their important aspects/differences would be really helpful.

Line 463: The AirCore-Fr network is not described so I don’t know the space/time coverage of those observations. Are you saying that a systematic bias extends beyond the Arctic?

Line 496: “It is worth noting that a difference in tropopause height between models and observations does not influence the results, as confirmed by temperature profiles (Figure 4).” I don’t understand what is meant by this statement. Please elaborate or rephrase.

Line 521: I am unclear on what a “delayed vertical gradient” means. Too slow vertical mixing?

Several sections have an inconsistent mix of past and present-tense from sentence to sentence. Please check to make consistent.

Data availability: Data links/citations/DOIs are not provided making me doubt it’s actual availability. Also what about the WRF data?

Minor suggested edits by line number:
11: I don’t understand what “among CH4 products” means. Among different prior flux models?

14: Suggest: Despite the [its] limited spatiotemporal scope of MAGIC2021 [coverage], we were able to identify the best performing transport models and to evaluate fluxes from different biogeochemical model parametrisations using the MAGIC2021 high-resolution dataset [, demonstrating the utility of insitu vertical profile datasets for transport and flux model evaluation].

22-23: A few more citations would be good to support such broad statements about feedbacks.

28: “tall tower” -> “surface measurement”

34: For ABoVE, suggest citing: https://doi.org/10.5194/acp-22-6347-2022

36: “(2022)” – citation is incomplete and not listed

38: suggest: “[airborne] measurements of meteorological variables and atmospheric methane mixing ratios.”

47: “Kiruna and it’s surrounding [area] are…and tundra [ecoystems],…”

58: “Stohl (2004) have showed…”

60: “the main” -> “an important”

60: Check the accuracy of this statement: “OH is found mainly at the top of the troposphere and at the bottom of the stratosphere, where other chemical species also react with CH4…”

63: “gradient” -> “transport”

80: “ease and speed of data treatment” – What does this mean? All the data are not finalized or it was too much data to consider or ground data were not relevant to the goal of the analysis?

Figure1: With the colors used for land cover types, I can’t tell the difference between agriculture, boreal forest, and non-forest natural land in the map figure.

101: “DLR” acronym has not been defined. Also, is “(DLR)” supposed to be a citation to something?

106: “allows to measure” -> “allowed for measurements of”

118: “therefore contain” -> “include”; “samplings” -> “flights” or “soundings”

120: “was” -> “were”

124: “especially” -> “specifically”

128: “the higher density of vertical levels” – compared to what?

133: “To compare [model] humidity from to observations [of], that measured relative humidity (RH), to ERA5 humidity, given as specific humidity q, ERA5 data was converted…”

153: “The spatial resolution is of 3°×2°×34 levels and a [the] temporal resolution of [is] 6 hours.

156: PLS - Is there a reference for this model with more detail? If not, some detail and references on inputs and setup seem to be missing. For example, there are no references given for both priors and obs.

160: “surface” -> “insitu”

184: “Our” -> “The”

207: “Input emissions [(Table 1)] were chosen…”

222: “18 different flux versions are publicly available [from WetCHARTS], …”

225: “3 versions of total wetland flux [from JSB-HIM, each] differing in their driving meteorology, were included in this study.”

228-230: Suggest: “11 emission tracers and one boundary condition tracer were tracked in the simulation of total regional CH4 emissions. Boundary conditions were provided by…”

232: “This was done by hourly adding a constant offset of 300ppb through the emission tracers domain boundaries [on an hourly basis].”

Table 2: Suggest rounding the numbers in this table.

286: “ERA5 had NNE contributions more important [indicated a larger fraction of NNE] than N winds, contrary to observations and WRF. However, it [ERA5] showed a distribution of wind speeds closer to observations than WRF, which had a more important [larger] share of low speed winds than observations. WRF winds were again very similar between the two domains. They were overestimating [WRF overestimated] the contribution from NNW, especially of low speed winds.”

300: “there” -> “for the upper layer”; “better correlation” – Compared to what?

355: “instruments” -> “models”?

366: “However, performance did not improve significantly between d01 and d02, [and] d01…”

371: “consists in” -> “involves”

372: “to help regional simulations fit observations better (Bullock et al., 2014)[, but nudging was not utilized for the WRF runs analyzed here].”

386: “[For the PLS model in this figure,] we chose to only show comparison results of the PLS Surf b [configuration from the available 6-member ensemble].”

444: “so” -> “and”

447: “be happen” -> “occur”

465: “…an analysis products [for CH4 in the FT].”

485: “works” -> “is the case”

500: “content” -> “levels”

503: “chemistry” -> “structure”?

508: “our” -> “those”

514: “lower” -> “higher”?

519: “..observed only positive [to near neutral?] biases..”

520: “…models[, atleast for the limited region and timeframe captured by the observations.]”
Citation: https://doi.org/10.5194/egusphere-2024-3559-RC2
- AC2: 'Reply on RC2', Félix Langot, 16 Jul 2025
  
  Dear reviewer,
  Thank you very much for reviewing our paper, we believe your comments have greatly helped to improve the quality of the article. You can find attached our answer to the reviews.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3559-AC2

Status: closed

RC1:
'Comment on egusphere-2024-3559', Danilo Custódio, 16 Dec 2024
Reviewer Comments for the Article "Evaluating Weather and Chemical Transport Models at High Latitudes using MAGIC2021 Airborne Measurements"
The manuscript focus on the ability of atmospheric composition models in reproducing observed CH4 mixing ratios. As well, to asses the compliance of the meteorological variables used to drive atmospheric transport comparing it to meteorological variables measured at high latitude. The article addresses critical issues in the field of atmospheric modeling, particularly at high latitudes, which are underrepresented in global atmospheric monitoring and modeling efforts. The relevance of such an evaluation cannot be overstated, as high-latitude regions are crucial for understanding key atmospheric processes, including CH4 emission and transport.
The study makes a valuable attempt to bridge the gap between observations and model simulations by combining state-of-the-art airborne measurements with model inter-comparisons. Confronting model results with high-resolution observations is a cornerstone of atmospheric science, as it is necessary to validate, refine, and benchmark modeling frameworks. The findings have the potential to contribute significantly to the atmospheric modeling community, offering insights into model performance under challenging conditions and emphasizing the importance of improving CH4 in polar regions.
However, while the study holds promise, the manuscript in its current form is not an easy read and has significant shortcomings in its presentation, structure, and overall clarity. I recommend major revisions before the article is considered for publication.
If the authors address the weaknesses in presentation and analysis, this work could substantially contribute to the scientific discourse on atmospheric modeling in high-latitude environments.

1. Major Concerns

Clarity and Presentation

The manuscript is difficult to follow due to unclear wording, undued wording, and overly dense descriptions. Some key points are buried in the text, making it challenging for readers to extract the central findings and their implications. Additionally, the plots are mazy and visually overwhelming, detracting from their effectiveness in conveying the results.
Metrics and Model Performance Assessment

While the study evaluates model performance, the choice of metrics is not optimal. The authors should consider employing more comprehensive and widely accepted set of statistical metrics for model evaluation. Correlation Coefficients and Root Mean Square Error are good; however, I would recommend bias.
Additionally, the manuscript could discus the implications of the metrics used. For example, while some metrics may show agreement, others may reveal discrepancies, which are worth exploring.
Figures

The figures and tables are a central issue. While they contain a wealth of information, they are too crowded and difficult to interpret. Each figure should serve a clear purpose and convey specific insights. To improve:
Use clean background, simplify the layout and make sure that the data are visible.

Use color schemes that are easy to distinguish, particularly for readers with color vision deficiencies.

Add concise and informative captions that explain the key takeaways from each figure.

2. Others Comments

The manuscript's Figure 1 is confusing and requires clarification:
Scope of Flights: Does Figure 1 intend to show the entire MAGIC2021 campaign or only flights over Kiruna? The caption does not make this clear.

Flight over Norway: Why the flight over Norway is not included in the figure? Is it not part of the MAGIC2021 campaign?

Unexplained Elements: The color blocks in the middle of Figure 1 are not explained in the caption or the text. This information seems to appear "out of the blue," and the figure lacks sufficient annotation to guide the reader. Please ensure that every feature in the figure is fully explained in the caption and supported by the text.

The introduction of AirCore measurements is presented in a very shallow manner. While AirCore is a critical part of the study, its role and methodology are not sufficiently explained. Readers who are unfamiliar with AirCore technology willgrasp it.
The division of the atmosphere into three layers based on pressure ranges—P > 800 hPa, 300 < P < 800 hPa, and P < 300 hPa—is arbitrary and does not align with commonly accepted atmospheric definitions. The chosen pressure thresholds do not accurately correspond to the planetary boundary layer (PBL), free troposphere (FT), or lower stratosphere (LS). A more scientifically sound approach would involve:
Using PBL height (PBLH) to define the boundary layer.

Defining the tropopause to separate the troposphere from the stratosphere.

This approach would ensure that the results are more meaningful and interpretable, especially for discussions of CH₄ transport dynamics across these atmospheric layer

The table captions should be placed at the top of the tables, following standard formatting conventions. Additionally, the table labels should succinctly describe the contents of the table. For instance, it does not make sense to include information about what is not in the table. Ensure that the captions are clear and concise, helping the reader to quickly understand the data presented.
The manuscript refers to "four statistic," which is an unclear and incorrect phrasing. Likely, the authors mean "four metrics used to evaluate model performance." The use of appropriate and precise terminology is critical for clarity. This error is indicative of broader language issues in the subsection "Statistics," which should be rewritten to ensure proper English usage and a professional tone.
The caption for Figure 2 is insufficient to help readers understand the plot. Captions should summarize the key information conveyed in the figure and provide any necessary context for interpretation. In its current state, the caption leaves too much ambiguity and fails to assist the reader in navigating the content.
The manuscript's discussion of wind fields is constrained solely to advection (horizontal transport), which provides an incomplete picture. The vertical component of wind, which is critical for transport processes and atmospheric mixing, is entirely missing. Vertical transport are among the most significant challenges in atmospheric modeling. Without addressing these, the discussion remains superficial. The authors could evaluate turbulence representation and vertical wind components in the models, as these are critical to understanding transport processes.
Figure 5 is visually confusing and "weird" in its current presentation. The layout, formatting, and choice of visualization make it difficult to follow and interpret. Clearer design and simpler representations would greatly enhance the reader's understanding of this figure. Ensure that key messages are apparent and not lost in the visual clutter.
The content of subsection 3.3 is difficult to follow due to poor organization and unclear visualizations. The comparisons presented in this section lack coherence in terms of visual representation, metrics used, and overall wording. It is essential to streamline the presentation of comparisons to make them more reader-friendly and effective.
The comparison of meteorological data between models and observations is superficial, merely reporting which model or dataset is closer to observations. This approach fails to provide meaningful insights or a deeper understanding of model inter-comparisons. Readers expect a more insightful analysis of model performance, including:
Identifying potential reasons for discrepancies.

Explaining how differences in parametrizations or data assimilation processes contribute to observed biases or differences.

Suggesting ways to improve model representation of meteorological processes.

The manuscript must go beyond simply reporting agreement or disagreement to provide a more nuanced and insightful evaluation.
The vertical profiles presented in the manuscript are overly complicated and lack clarity. The plots are "mazy," and the text does not provide sufficient guidance to help the reader interpret them. The analysis of vertical profiles should do more than report which model performs better in specific atmospheric regions (which, as noted above, were not properly defined). A thorough discussion of the physical processes contributing to vertical variations in CH₄ and meteorological variables would enrich the article.
The conclusion that all models overestimate CH₄ at the upper troposphere-lower stratosphere (UTLS) boundary is interesting but could be influenced by the interpolation method used for data colocation. In addition:
TM3 does not have the resolution to accurately resolve the tropopause.

While IFS has more vertical level, it still struggles with tropopause representation.

The lack of proper selection for the lower-most stratosphere in this study further compounds this issue. A more refined methodology is required to draw robust conclusions about model biases in the UTLS region.
The association of the overall positive CH₄ bias in the boundary layer to wetland emissions is an important finding. However, this conclusion seems premature without further testing. A sensitivity test maybe could strengthen this claim and ensure that this conclusion is robust.
The spatial and temporal limitations of this model evaluation could be addressed by incorporating data from the CoMet 2.0 campaign over Canada in the summer of 2022. While the MAGIC2021 campaign provides valuable observations, supplementing this with additional datasets could offer a more comprehensive evaluation of model performance.
I hope the authors do not feel disheartened by this review. The effort and dedication evident in this work are truly impressive. I believe that addressing these points will unlock the full potential of the manuscript, making it clearer, more robust, and significantly more impactful for the atmospheric modelling community.
Citation: https://doi.org/10.5194/egusphere-2024-3559-RC1
- AC1: 'Reply on RC1', Félix Langot, 16 Jul 2025
  
  Dear D. Custodio,
  Thank you very much for reviewing our paper. You can find attached here our answers to your comments, that have been very helpful at improving the quality of the paper.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3559-AC1
RC2:
'Comment on egusphere-2024-3559', Anonymous Referee #2, 21 Feb 2025
This study compares a dataset of airborne meteorological and methane trace gas measurements from a short measurement campaign in arctic Sweden to outputs from multiple different atmospheric transport and flux inversion models. The observational dataset is rather limited in space and time, but with multiple airborne platforms, appears to have high enough density and altitudinal coverage, and the group of models chosen has a decent diversity of inputs and setups, to allow for some generalized findings.
My main concerns are:
There was no exploration of model representation of PBL height or vertical mixing. Accurate representation of these parameters and processes is a central issue in flux estimation and therefore is a bit of a gaping hole in this paper. For example, the authors conclude that some wetland models over-estimate emissions since simulated CH4 values are too high on average in the BL for many cases. How do you know that representation of BL height or vertical mixing are not playing a role in this high bias?

Section 3.2, Figure 4 bottom left panel – I don't understand why models would track temperatures from aircraft well and not AirCore. I’m concerned that the model comparison has identified a significant measurement bias in the balloon-borne temperature measurements. Later in section 3.4, you say “Temperatures from weather balloons appear to be slightly biased… due to a lack of corrections…” Bad data should not be retained for model evaluation, even if it’s discovered after the fact. For validation of all observational variables analyzed, were measurements from separate platforms/instruments inter-compared or calibrated to a common standard?

Section 4.1: “PLS Surf b notably showed a Δ ∼0 when compared with AirCore data but a significant underestimation of both Cessna and ATR42 measurements.” Is this finding because the CH4 data from the different platforms are biased relative to each other or because they have different processes/ecosystems in their influence regions? It would be helpful to characterize the dominate flux processes in the footprints of the observations.

Section 4.2: You say some models over-estimate wetland emissions due to lack of complexity. But lack of complexity/variability is only one possible reason and no evidence is given. How do you know emissions aren't over-estimated on average?

Section 4.3: Can you provide more information on the wetland models that overestimated methane emissions? How much higher, within the area of influence of the observations, were the emissions from the models that over-estimated observed CH4 concentrations compared to the models that got it about right?

Related to the composition- The paper is full of minor grammatical errors and awkward word choices. It’s not a significant issue on an individual basis, but it made the paper laborious to read and detracted from my understanding in some cases. I have listed many minor suggested edits below.

Other comments:
In the abstract, at this early point in the paper, the model names/acronyms (mainly: “CAMS v21r1”, “PYVAR-LMDz-SACS”, “CAMS hlkx”) don’t mean anything to the reader and no description is provided. It would be better to describe the models in mostly general terms in the abstract related to what aspects distinguish them and/or led to differences in performance.

A summary table or list of the different models tested and their important aspects/differences would be really helpful.

Line 463: The AirCore-Fr network is not described so I don’t know the space/time coverage of those observations. Are you saying that a systematic bias extends beyond the Arctic?

Line 496: “It is worth noting that a difference in tropopause height between models and observations does not influence the results, as confirmed by temperature profiles (Figure 4).” I don’t understand what is meant by this statement. Please elaborate or rephrase.

Line 521: I am unclear on what a “delayed vertical gradient” means. Too slow vertical mixing?

Several sections have an inconsistent mix of past and present-tense from sentence to sentence. Please check to make consistent.

Data availability: Data links/citations/DOIs are not provided making me doubt it’s actual availability. Also what about the WRF data?

Minor suggested edits by line number:
11: I don’t understand what “among CH4 products” means. Among different prior flux models?

14: Suggest: Despite the [its] limited spatiotemporal scope of MAGIC2021 [coverage], we were able to identify the best performing transport models and to evaluate fluxes from different biogeochemical model parametrisations using the MAGIC2021 high-resolution dataset [, demonstrating the utility of insitu vertical profile datasets for transport and flux model evaluation].

22-23: A few more citations would be good to support such broad statements about feedbacks.

28: “tall tower” -> “surface measurement”

34: For ABoVE, suggest citing: https://doi.org/10.5194/acp-22-6347-2022

36: “(2022)” – citation is incomplete and not listed

38: suggest: “[airborne] measurements of meteorological variables and atmospheric methane mixing ratios.”

47: “Kiruna and it’s surrounding [area] are…and tundra [ecoystems],…”

58: “Stohl (2004) have showed…”

60: “the main” -> “an important”

60: Check the accuracy of this statement: “OH is found mainly at the top of the troposphere and at the bottom of the stratosphere, where other chemical species also react with CH4…”

63: “gradient” -> “transport”

80: “ease and speed of data treatment” – What does this mean? All the data are not finalized or it was too much data to consider or ground data were not relevant to the goal of the analysis?

Figure1: With the colors used for land cover types, I can’t tell the difference between agriculture, boreal forest, and non-forest natural land in the map figure.

101: “DLR” acronym has not been defined. Also, is “(DLR)” supposed to be a citation to something?

106: “allows to measure” -> “allowed for measurements of”

118: “therefore contain” -> “include”; “samplings” -> “flights” or “soundings”

120: “was” -> “were”

124: “especially” -> “specifically”

128: “the higher density of vertical levels” – compared to what?

133: “To compare [model] humidity from to observations [of], that measured relative humidity (RH), to ERA5 humidity, given as specific humidity q, ERA5 data was converted…”

153: “The spatial resolution is of 3°×2°×34 levels and a [the] temporal resolution of [is] 6 hours.

156: PLS - Is there a reference for this model with more detail? If not, some detail and references on inputs and setup seem to be missing. For example, there are no references given for both priors and obs.

160: “surface” -> “insitu”

184: “Our” -> “The”

207: “Input emissions [(Table 1)] were chosen…”

222: “18 different flux versions are publicly available [from WetCHARTS], …”

225: “3 versions of total wetland flux [from JSB-HIM, each] differing in their driving meteorology, were included in this study.”

228-230: Suggest: “11 emission tracers and one boundary condition tracer were tracked in the simulation of total regional CH4 emissions. Boundary conditions were provided by…”

232: “This was done by hourly adding a constant offset of 300ppb through the emission tracers domain boundaries [on an hourly basis].”

Table 2: Suggest rounding the numbers in this table.

286: “ERA5 had NNE contributions more important [indicated a larger fraction of NNE] than N winds, contrary to observations and WRF. However, it [ERA5] showed a distribution of wind speeds closer to observations than WRF, which had a more important [larger] share of low speed winds than observations. WRF winds were again very similar between the two domains. They were overestimating [WRF overestimated] the contribution from NNW, especially of low speed winds.”

300: “there” -> “for the upper layer”; “better correlation” – Compared to what?

355: “instruments” -> “models”?

366: “However, performance did not improve significantly between d01 and d02, [and] d01…”

371: “consists in” -> “involves”

372: “to help regional simulations fit observations better (Bullock et al., 2014)[, but nudging was not utilized for the WRF runs analyzed here].”

386: “[For the PLS model in this figure,] we chose to only show comparison results of the PLS Surf b [configuration from the available 6-member ensemble].”

444: “so” -> “and”

447: “be happen” -> “occur”

465: “…an analysis products [for CH4 in the FT].”

485: “works” -> “is the case”

500: “content” -> “levels”

503: “chemistry” -> “structure”?

508: “our” -> “those”

514: “lower” -> “higher”?

519: “..observed only positive [to near neutral?] biases..”

520: “…models[, atleast for the limited region and timeframe captured by the observations.]”
Citation: https://doi.org/10.5194/egusphere-2024-3559-RC2
- AC2: 'Reply on RC2', Félix Langot, 16 Jul 2025
  
  Dear reviewer,
  Thank you very much for reviewing our paper, we believe your comments have greatly helped to improve the quality of the article. You can find attached our answer to the reviews.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3559-AC2

Félix Langot, Cyril Crevoisier, Thomas Lauvaux, Charbel Abdallah, Jérôme Pernin, Xin Lin, Marielle Saunois, Axel Guedj, Thomas Ponthieu, Anke Roiger, Klaus-Dirk Gottschaldt, and Alina Fiehn

Viewed

Total article views: 951 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
807	118	26	951	32	44

HTML: 807
PDF: 118
XML: 26
Total: 951
BibTeX: 32
EndNote: 44

Views and downloads (calculated since 27 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	41	10	1	52
Dec 2024	57	29	3	89
Jan 2025	24	10	1	35
Feb 2025	31	4	2	37
Mar 2025	21	7	1	29
Apr 2025	11	9	3	23
May 2025	19	6	3	28
Jun 2025	29	8	3	40
Jul 2025	30	15	5	50
Aug 2025	83	13	4	100
Sep 2025	443	3	0	446
Oct 2025	18	4	0	22

Cumulative views and downloads (calculated since 27 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	41	10	1	52
Dec 2024	57	29	3	89
Jan 2025	24	10	1	35
Feb 2025	31	4	2	37
Mar 2025	21	7	1	29
Apr 2025	11	9	3	23
May 2025	19	6	3	28
Jun 2025	29	8	3	40
Jul 2025	30	15	5	50
Aug 2025	83	13	4	100
Sep 2025	443	3	0	446
Oct 2025	18	4	0	22

Viewed (geographical distribution)

Total article views: 940 (including HTML, PDF, and XML) Thereof 940 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 19 Oct 2025

Short summary

Our study compares outputs from meteorological and atmospheric composition models to data from the MAGIC2021 campaign that took place in Sweden. Our results highlight performance differences among models, revealing strengths and weaknesses of different modelling techniques. We also found that wetland emission inventories overestimated emissions in regional simulations. This work helps refining methane emission predictions, essential for understanding climate change.


Total:	0
HTML:	0
PDF:	0
XML:	0