Upscaling of soil methane fluxes from topographic attributes derived from a digital elevation model in a cold temperate mountain forest

Paul, Sumonta Kumar; Yuasa, Keisuke; Dannoura, Masako; Epron, Daniel

doi:10.5194/egusphere-2025-3449

Preprints

https://doi.org/10.5194/egusphere-2025-3449

Preprints

08 Aug 2025

| 08 Aug 2025

Upscaling of soil methane fluxes from topographic attributes derived from a digital elevation model in a cold temperate mountain forest

Sumonta Kumar Paul, Keisuke Yuasa, Masako Dannoura, and Daniel Epron

Abstract. Forest soils are generally considered a sink for atmospheric methane (CH₄), but their uptake rate can vary considerably in space and time. This study aimed to investigate the temporal patterns of spatially distributed soil CH₄ fluxes in a topographically complex cold-temperate mountain forest in central Japan. Soil CH₄ fluxes were measured nine times during the snow-free season at multiple locations within a 40-ha area in a forested watershed. A machine-learning approach was developed to upscale measured upland fluxes to the landscape scale, using topographic attributes derived from a digital elevation model and vegetation types. Upland soils were a sink of CH₄, while small wetland patches emitted CH₄ consistently throughout the study period. The accuracy of predicted upland fluxes varied seasonally, with the highest model performance observed in early autumn (R² = 0.67) and the lowest in mid-summer (R² = 0.28). Within the study landscape, predicted upland CH₄ fluxes varied significantly across topographic positions, with greater uptake on ridges and slopes than on the plain and foot slopes. Predicted upland CH₄ fluxes ranged from −0.35 to −0.60 g CH₄ ha⁻¹ h⁻¹ in spring, −0.41 to −1.25 g CH₄ ha⁻¹ h⁻¹ in summer, and −0.50 to −0.89 g CH₄ ha⁻¹ h⁻¹ in autumn. Seasonal upland fluxes were highly correlated with the 20-day antecedent precipitation index (R² = 0.71), revealing the importance of seasonal moisture conditions in regulating CH₄ flux dynamics. This study highlighted the importance of topography in controlling the soil CH₄ fluxes and the efficiency of remote sensing and machine learning approaches in scaling field measurements to the landscape level, enabling visualization of spatial patterns of fluxes across the landscape over time.

Received: 17 Jul 2025 – Discussion started: 08 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Sumonta Kumar Paul, Keisuke Yuasa, Masako Dannoura, and Daniel Epron

Status: closed

RC1:
'Comment on egusphere-2025-3449', Anonymous Referee #1, 01 Sep 2025

Overall Assessment
This manuscript presents a machine learning approach to upscale soil CH4 flux measurements across a topographically complex forest landscape using quantile regression forest models with topographic predictors. While the study addresses important questions about spatial controls on soil CH4 fluxes, there is room to improve methodologies to better differentiate between mechanistic and predictive statistics, and to contextualize the landscape-scale conclusions.
Major Comments:
1. The study assumes that topographic indices (TWI, TPI, VDCN) accurately represent soil moisture patterns that drive CH4 fluxes, but never measures soil moisture or temperature at sampling locations to validate this assumption. While the topographic predictors successfully predict CH4 fluxes, the mechanistic pathway (topography → soil moisture → CH4 flux) remains unverified. Without ground-truthing, it's unclear whether the correlations reflect the proposed moisture mechanisms or other covarying factors.
Please either acknowledge this limitation more explicitly or provide basic validation by measuring volumetric water content at a subset of locations to demonstrate that topographic predictors correlate with actual soil moisture conditions.
2. The study excludes wetlands from their predictive framework, leaving wetland pixels unmapped, but provides insufficient guidance on how their upland-only predictions should be applied to real forest landscapes. Most forests contain wet patches, seeps, or seasonally saturated areas that may not be classified as "wetlands" in standard remote sensing products but could function as significant CH4 sources. The authors' approach of simply excluding these areas creates uncertainty about how their upland flux predictions should be applied when: (a) wet patches exist but aren't formally classified as wetlands, (b) the boundary between "upland" and "wetland" conditions varies seasonally or with precipitation, and (c) their results are used to parameterize larger-scale models that need to handle mixed hydrologic conditions.
Provide clearer guidance on how to classify and handle hydrologically diverse areas when applying these results. Discuss what topographic or hydrologic thresholds define the boundaries of their "upland" predictions, and suggest approaches for handling wet patches that fall between clear upland and wetland classifications. This would help users appropriately apply their upland flux relationships while avoiding systematic underestimation of emissions from hydrologically complex forest landscapes.
3. Table A2 shows a significant three-way interaction (Position × Vegetation × Date, p = 0.04), yet the authors conclude that position and vegetation have no effects based on their lack of selection in the later random forest models. This understates the importance of the interaction effects in their mechanistic descriptive modeling, even if it does not provide additional predictive power in the landscape scaling.
The authors should acknowledge that the significant interaction indicates vegetation and position effects are present but depend on specific combinations and timing. In the discussion of Jevon et al. (lines 372-382), note that while vegetation interactions were significant in the LMM, vegetation wasn't selected in RF models because continuous topographic variables captured relevant gradients more effectively for prediction.
4. The LMM results (Table A2) report only p-values without effect sizes, making it impossible to assess practical significance. Similarly, while RF variable importance scores are reported in a table, the magnitude and direction of predictor effects aren't clear in text/discussion.
Please rreport standardized coefficients for LMM factors to show effect magnitudes alongside statistical significance. For RF models, clarify the interpretation of relationships for key predictors (e.g., whether higher TPI increases or decreases CH4 uptake).

5. Scale mismatch in validation approach The authors validate their predictions by comparing point measurements (20 cm diameter chambers) with pixel-level predictions (5m resolution), despite using predictors calculated at even coarser scales (e.g., 30m radius for TPI). This scale mismatch may actually understate the model's true predictive accuracy by forcing landscape-scale predictors to explain fine-scale chamber measurements that inevitably include local variability beyond what topographic indices can capture. The current validation approach tests whether coarse-resolution environmental variables can predict point-level flux heterogeneity, rather than testing the model's ability to capture the landscape-scale flux patterns it's designed to represent.

Consider validating at aggregated scales that better match the conceptual basis of the predictors. Compare predicted vs. observed mean fluxes within topographic position classes (ridge/slope/foot slope/plain) or other meaningful landscape units to test whether the model captures the spatial patterns it's intended to represent. This approach would provide a more appropriate assessment of model performance for landscape-scale applications. Additionally, measuring soil moisture at chamber locations would help validate the mechanistic assumption that topographic predictors accurately represent the moisture conditions driving CH4 fluxes, allowing separation of prediction errors due to scale mismatch from errors due to invalid mechanistic assumptions.

Citation: https://doi.org/10.5194/egusphere-2025-3449-RC1
- AC1:
  'Reply on RC1', Sumonta Kumar Paul, 11 Oct 2025
  Overall Assessment
  This manuscript presents a machine learning approach to upscale soil CH4 flux measurements across a topographically complex forest landscape using quantile regression forest models with topographic predictors. While the study addresses important questions about spatial controls on soil CH4 fluxes, there is room to improve methodologies to better differentiate between mechanistic and predictive statistics, and to contextualize the landscape-scale conclusions.
  Major Comments:
  The study assumes that topographic indices (TWI, TPI, VDCN) accurately represent soil moisture patterns that drive CH4 fluxes, but never measures soil moisture or temperature at sampling locations to validate this assumption. While the topographic predictors successfully predict CH4 fluxes, the mechanistic pathway (topography → soil moisture → CH4 flux) remains unverified. Without ground-truthing, it's unclear whether the correlations reflect the proposed moisture mechanisms or other covarying factors.
  
  Please either acknowledge this limitation more explicitly or provide basic validation by measuring volumetric water content at a subset of locations to demonstrate that topographic predictors correlate with actual soil moisture conditions.
  We measured soil water content (and temperature) at each collar at each CH4 flux measurements but did not judge useful to include these data, which was not sensible. In fact, prior to developing the model, we examined the Spearman-rank correlation between on one hand, measured soil water content, temperature and chemistry (C, N and pH), and on the other hand, several topographic and vegetation variables. We found significant correlations between soil moisture and TPI, TWI, VDCN, and vegetation density. This supports the use of topographic variables as effective predictors of CH4 fluxes in our landscape These correlations will be added to the manuscript and are presented in the attached PDF file. Please note that vegetation variables were not included in the final model because they did not improve model performance.
  
  However, incorporating soil moisture as an intermediate variable that would need to be scaled up to the landscape level would introduce an additional layer of uncertainty. Our strategy was to directly predict CH4 fluxes using topographic variables as proxies for soil moisture and related environmental gradients. We clarify this strategy in the revised manuscript.
  
  We nevertheless fully agree that no statistical method is mechanistic, and all can be biased by the existence of confounding factors. We will highlight this limitation more explicitly in the revised manuscript.
  
  The study excludes wetlands from their predictive framework, leaving wetland pixels unmapped, but provides insufficient guidance on how their upland-only predictions should be applied to real forest landscapes. Most forests contain wet patches, seeps, or seasonally saturated areas that may not be classified as "wetlands" in standard remote sensing products but could function as significant CH4 sources. The authors' approach of simply excluding these areas creates uncertainty about how their upland flux predictions should be applied when: (a) wet patches exist but aren't formally classified as wetlands, (b) the boundary between "upland" and "wetland" conditions varies seasonally or with precipitation, and (c) their results are used to parameterize larger-scale models that need to handle mixed hydrologic conditions.
  
  Provide clearer guidance on how to classify and handle hydrologically diverse areas when applying these results. Discuss what topographic or hydrologic thresholds define the boundaries of their "upland" predictions, and suggest approaches for handling wet patches that fall between clear upland and wetland classifications. This would help users appropriately apply their upland flux relationships while avoiding systematic underestimation of emissions from hydrologically complex forest landscapes.
  In our study, only permanent wetlands in the plain area were excluded (3 of 55 collars for data, less than 1% of the landscape pixels). Wet patches, which had temporarily water-saturated soils, were not excluded. Two collars were located in such wet patches, and positive fluxes were measured once on both collars. Therefore, to answer comment (a), areas with non-permanently saturated soil are included in the upland prediction and we will clarify this point in the revised manuscript. We acknowledged that our random forest models did not predict median positive fluxes, but the possibility of positive fluxes is reflected in the large uncertainties associated with near-zero fluxes.
  
  Regarding (b), we acknowledged that using a fixed boundary between "upland" and "wetland" conditions, although these boundaries may vary seasonally depending on the balance between precipitation and evaporation, may increase uncertainties in CH₄ flux prediction. Predicting the temporal variations of these boundaries was beyond the scope of this work, and, at our site, wetlands represent only 1% of the pixels, and their boundaries even less. However, despite the difficulty of this task, we agree that future researches could usefully attempt to better account for spatio-temporal variations of wetness conditions at these boundaries to improve landscape-scale CH₄ flux predictions at the landscape level. We will discuss the limitations of using static boundaries in more detail in the revised manuscript.
  
  Regarding (c), wetland exclusion, although acceptable in our 40-ha study area, where wetlands represent only 1% of the area, would overestimate CH4 uptake if incorrectly applied at larger scales, i.e., to the entire upper Yura River catchment in our case, for example, or to other hydrologically complex forest landscapes. We have already mentioned in our manuscript that sinks and sources should be modelled separately in the case of larger areas with mixed hydrological conditions. We will emphasize this point further in the revised discussion.
  
  For permanent wetland mapping, we collected additional GPS positions at the edges and within wetlands. We then used TWI, profile curvature, slope, and VDCN to predict wetland locations. Their boundaries were refined by visual inspection. A posteriori, pixel classified as wetland had TWI values above 8.1, profile curvature between -0.003 and 0.001, slope values below 6.8 for slope, and VDCN values below 2.2. It will be illustrated by a plot comparing the distribution of these topographic variables between upland and wetland pixels (see attached PDF).
  
  Table A2 shows a significant three-way interaction (Position × Vegetation × Date, p = 0.04), yet the authors conclude that position and vegetation have no effects based on their lack of selection in the later random forest models. This understates the importance of the interaction effects in their mechanistic descriptive modeling, even if it does not provide additional predictive power in the landscape scaling.
  
  The authors should acknowledge that the significant interaction indicates vegetation and position effects are present but depend on specific combinations and timing. In the discussion of Jevon et al. (lines 372-382), note that while vegetation interactions were significant in the LMM, vegetation wasn't selected in RF models because continuous topographic variables captured relevant gradients more effectively for prediction.
  We apologized but we should not have included interactions in the model, as some of them are missing. For example, there are no “pure” broadleaved areas on the ridge. Models with interactions would be rank-deficient. However, following the suggestion of the second reviewer, we significantly expanded the analysis of the influence of vegetation of CH4 fluxes, using not only the vegetation type but also its density. As landscape attributes, we now have (i) topographic position (plain, foot slope, slope, and ridge), (ii) vegetation type (broadleaf, coniferous, and mixed), and (iii) vegetation density (high, medium, and low). The effect sizes of topographic position, vegetation type, and vegetation density were 0.43, 0.006, and 0.11, respectively, highlighting the dominant role of topography over vegetation in the spatial variability of CH4 fluxes. The effect sizes will be added to the ANOVA table in the revised manuscript (see attached PDF).
  
  The LMM results (Table A2) report only p-values without effect sizes, making it impossible to assess practical significance. Similarly, while RF variable importance scores are reported in a table, the magnitude and direction of predictor effects aren't clear in text/discussion.
  
  Please rreport standardized coefficients for LMM factors to show effect magnitudes alongside statistical significance. For RF models, clarify the interpretation of relationships for key predictors (e.g., whether higher TPI increases or decreases CH4 uptake).
  We recognise that failing to consider the effect sizes of variables used in linear models can lead to an underestimation of their potential mechanistic importance, even if these factors do not improve the performance of random forest models. Effect size are now reported (see our response to your previous comment and the attached PDF).
  
  For RF models, the direction of predictive effects is now quantified using accumulated local effects (ALE) analysis, which is effective if predictors are correlated with each other. The interpretation of the ALE analysis will be added to the revised manuscript, as well as their graphical representation (now available in the attached PDF). In summary, for the two most influential predictors, low CH4 uptake rates were associated with high TWI values, while they were associated with low TPI values. We will discuss the magnitude and direction of predictive effects in more detail in the revised manuscript.
  
  Scale mismatch in validation approach The authors validate their predictions by comparing point measurements (20 cm diameter chambers) with pixel-level predictions (5m resolution), despite using predictors calculated at even coarser scales (e.g., 30m radius for TPI). This scale mismatch may actually understate the model's true predictive accuracy by forcing landscape-scale predictors to explain fine-scale chamber measurements that inevitably include local variability beyond what topographic indices can capture. The current validation approach tests whether coarse-resolution environmental variables can predict point-level flux heterogeneity, rather than testing the model's ability to capture the landscape-scale flux patterns it's designed torepresent.Consider validating at aggregated scales that better match the conceptual basis of the predictors. Compare predicted vs. observed mean fluxes within topographic position classes (ridge/slope/foot slope/plain) or other meaningful landscape units to test whether the model captures the spatial patterns it's intended to represent. This approach would provide a more appropriate assessment of model performance for landscape-scale applications. Additionally, measuring soil moisture at chamber locations would help validate the mechanistic assumption that topographic predictors accurately represent the moisture conditions driving CH4 fluxes, allowing separation of prediction errors due to scale mismatch from errors due to invalid mechanistic assumptions. Compare predicted vs. observed mean fluxes within topographic position classes (ridge/slope/foot slope/plain) or other meaningful landscape units to test whether the model captures the spatial patterns it's intended to represent. This approach would provide a more appropriate assessment of model performance for landscape-scale applications. Additionally, measuring soil moisture at chamber locations would help validate the mechanistic assumption that topographic predictors accurately represent the moisture conditions driving CH4 fluxes, allowing separation of prediction errors due to scale mismatch from errors due to invalid mechanistic assumptions.
  
  To clarify, all predictors are calculated at pixel size (5 by 5 m), including TPI. For each 5 by 5 pixels, TPI is calculated based on the elevation of that pixel relative to the surrounding pixels within a radius of 20, 30, or 50 m. That being said, we fully agree that there is a scale mismatch between collar and pixel size. We appreciate your suggestion to validate at aggregate scales by comparing predicted and observed mean fluxes within topographic position classes. The R2m of linear mixed model between predicted and measured fluxes at the four topographic position for 9 measurement (n=36) dates was 0.93. We did this not only using topographic position classes, but also considering vegetation types and density. This validation will be included in the revised manuscript and now available in the attached PDF.
  
  As mentioned in the response to your first comment, we measured soil water content (and temperature) at each collar at each time CH4 flux measurements but did not judge useful to include these data. This will be included in the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3449-AC1
RC2:
'Comment on egusphere-2025-3449', Anonymous Referee #2, 11 Sep 2025
The manuscript upscales soil methane fluxes with a digital terrain model in a forested landscape. It is nicely written, and the topic is relatively novel and worth investigating. However, there are certain issues that should be covered better.
The current analysis and methodology seems to have a double structure: there is a quantile regression forest analysis for upscaling methane fluxes for different dates and then there is a mix of different traditional statistical techniques (e.g., ANOVA, linear mixed models) for looking at relationships between different environmental characteristics (including topography and methane fluxes). I feel that this structure is a bit complex and some of the issues are done twice but with different methods. Therefore, I suggest simplifying the methodological approach and justifying better why certain analyses are conducted. For instance, why is linear regression conducted between measured and predicted fluxes? Isn’t it sufficient to provide observed-predicted plots? Why is there a need to conduct separate linear regression between topography and methane fluxes in addition to quantile regression forests?

In relation to the first point, there could also be additional analyses that have not been conducted. It is a bit unclear to me what is the logic in predicting temporally dynamic methane fluxes with temporally static topographic variables. Why not to test also a model with both temporally static but spatially distributed topographic variables and temporally dynamic but spatially uniform climate/weather variables such as API (there could be possibilities for including also other weather-related variables)? Could the soil variables be included also in the quantile regression forest to test their strength in addition to the topographic variables? Why is vegetation (or actually tree) information condensed into one categorical variable (forest type based on tree types? You could also have continuous variables about the tree species presence and abundance.

The selection of the topographic variables for upscaling is relatively arbitrary. Why were these specific variables chosen and not others (listed e.g., in Ågren et al 2021, https://doi.org/10.1016/j.geoderma.2021.115280). Furthermore, it is unclear why SAGA wetness index was not used instead of the traditional topographic wetness index, as the SAGA version spreads high values in the flat areas. Similarly, topographic position index should have been calculated with multiple neighborhood radiuses and vertical distance to streams with multiple stream networks. Now it is even unclear how the stream network was calculated and streams initiated when calculating the layer.

Research hypotheses and result section are not well aligned with each other. Particularly, section 3.1 does not seem to address any of the hypotheses. I would suggest phrasing the hypotheses/research questions so that they are answered one by one in the results section. Furthermore, also the methods section could be organized in the same way. Now result section starts with research for such methods that were described in the end of the methods section.

Novelty value of the research is not entirely clear yet. Is the main novelty about analyzing the role of topography on methane fluxes at different times of snow-free season? If yes, this could be highlighted more in the introduction and also in the conclusions section.

More detailed/minor comments:
l14: “aimed to investigate” -> “investigated”; i.e., you can use stronger language

l79: should it be “have been” instead of “were” to be more consistent with tenses. Also otherwise, it is best to write the introduction in present tense.

l82: can the km2 be written in ha so that same unit is used for all referenced studies

l85: Can you start just by writing “We assess”. Overall, it would be best if you would use active voice throughout. Now, you use partly passive and partly active voice in the methods section.

Figure 1: Can you also show the location of the area within Japan/Honshu?

l117: how were the coverages for the different land cover types estimated?

How was the measurement point sampling designed? Purposeful sampling or somehow randomized designed? How were the wetland measurement points sampled? Did you use boardwalks when measuring methane fluxes from wetlands?

l145/147: maybe better to use “spatial resolution”, “pixel size” or “grid” instead of “mesh”

l148: you write in parentheses both “less than” and “≤”. Either or is sufficient. Actually, it should be “less than or equal to”.

l161: What method was used to fill the DEM?

Did you upscale the vegetation/tree classification for the whole study area?

l195: you can delete “In this study”. It is self-evident that you are describing “this study”

l203: How were the vegetation types used as predictors? One categorical predictor with three different values? Why not to use continuous predictors related to vegetation and soil?

VSURF: did you employ all three steps of the method?

l217: Why did you use a separate package for variable importance? They can be obtained from random forest directly. What metric was used to assess the importance?

l235: How about temporal autocorrelation in the models?

l242: What does “scaled” mean here?

l288: How were the importance scores quantified?

Figure 3: Is the line 1:1-line?

Figure 5: Can you have the measured fluxes in the same figure?

l458: Can you provide some results about the models including wetland points in the supplementary material? Now this feels like speculation.

l516: You did not really quantify the dominant role of topography as your quantile regression models had mostly just topography predictors.
Citation: https://doi.org/10.5194/egusphere-2025-3449-RC2
- AC2:
  'Reply on RC2', Sumonta Kumar Paul, 11 Oct 2025
  The manuscript upscales soil methane fluxes with a digital terrain model in a forested landscape. It is nicely written, and the topic is relatively novel and worth investigating. However, there are certain issues that should be covered better.
  The current analysis and methodology seems to have a double structure: there is a quantile regression forest analysis for upscaling methane fluxes for different dates and then there is a mix of different traditional statistical techniques (e.g., ANOVA, linear mixed models) for looking at relationships between different environmental characteristics (including topography and methane fluxes). I feel that this structure is a bit complex and some of the issues are done twice but with different methods. Therefore, I suggest simplifying the methodological approach and justifying better why certain analyses are conducted.
  
  We appreciate the reviewer's observation regarding the complexity of our initial analytical framework. In the revised manuscript, we simplify the analytical approach. First, we upscaled measured CH4 fluxes to the landscape level using a quantile regression forest (QRF) model. Then, we tested whether predicted CH4 fluxes differed among landscape positions, vegetation types, and vegetation densities using a linear mixed model that accounted for spatial autocorrelation. This streamlined approach reduces methodological redundancy and maintains consistency between scaling analyses and subsequent interpretations of spatial patterns.
  
  For instance, why is linear regression conducted between measured and predicted fluxes? Isn’t it sufficient to provide observed-predicted plots?
  We understand that providing observed–predicted plots may seem sufficient; however, we included linear regressions between measured and predicted fluxes to quantitatively assess model performance. This provided an objective description of these plots, allowing us to test for the absence of bias, i.e., an intercept not significantly different from 0 and a slope not significantly different from 1. Furthermore, as the first reviewer pointed out a scale mismatch between collars and pixel size, we followed his suggestion and conducted additional analyses at aggregate scales by comparing predicted and observed mean fluxes within topographic position and vegetation classes. This new figure is available in the attached PDF.
  
  Why is there a need to conduct separate linear regression between topography and methane fluxes in addition to quantile regression forests?
  We agree with the reviewer that performing a separate linear regression between topography and CH4 fluxes is not justified, as the quantile regression forest model already captured nonlinear relationships. Therefore, we removed this analysis and instead present a correlation table among topographic variables and vegetation attributes in one hand, and soil moisture, temperature and chemistry on the other. This new table is available in the attached PDF. We are now reporting differences in predicted fluxes at the landscape level across topographic and vegetation classes. The updated ANOVA table is available in the attached PDF.
  
  In relation to the first point, there could also be additional analyses that have not been conducted. It is a bit unclear to me what is the logic in predicting temporally dynamic methane fluxes with temporally static topographic variables.
  
  Why not to test also a model with both temporally static but spatially distributed topographic variables and temporally dynamic but spatially uniform climate/weather variables such as API (there could be possibilities for including also other weather-related variables)?
  Because all pixels will have the same values for weather variables on a given date, it would be pointless to include them in RF models to predict spatial heterogeneity of landscape-scale fluxes. Previous works using similar RF modelling also run their models separately for each season without including weather data (e.g. Warner et al, 2019, Agric For Meteorol 264:80–91; Vainio et al, 2021, Biogeosciences 18:2003–2025).
  
  Could the soil variables be included also in the quantile regression forest to test their strength in addition to the topographic variables?
  Soil variables could potentially be included in the RF model but then the RF could not be used to predict flux at landscape level where only topographical and vegetation predictors are available at pixel level. Of course, it would have been possible to use other RFs to upscale soil variables at the landscape level and use the predicted soil variables to predict methane fluxes, but it would add additional layers of uncertainties. We will make this point clear in the revised manuscript.
  
  Why is vegetation (or actually tree) information condensed into one categorical variable (forest type based on tree types? You could also have continuous variables about the tree species presence and abundance.
  Thank you for this suggestion. It is true that we could have used a continuous variable for vegetation type in the model, and only used the categories for post hoc statistical tests. We also agree that it is useful to include vegetation density in addition to vegetation type. In the revised manuscript, we used two continuous variables: basal area for vegetation density and proportional contribution of conifers to basal area for vegetation type. Post hoc statistical analyses show that the effect sizes of topographic position, vegetation type, and vegetation density were 0.43, 0.006, and 0.11, respectively, highlighting the dominant role of topography over vegetation in the spatial variability of CH4 fluxes. The effect sizes will be added to the ANOVA table in the revised manuscript (see attached PDF).
  
  The selection of the topographic variables for upscaling is relatively arbitrary. Why were these specific variables chosen and not others (listed e.g., in Ågren et al 2021, https://doi.org/10.1016/j.geoderma.2021.115280).
  
  Many variables can be derived from DEM and selection is necessary to avoid overparameterization due to redundancies. Our preselection was motivated by the fact that methane fluxes are related to microbial activities (methanotrophic and methanogenic in our case), which are controlled by soil moisture and chemistry (C, N, pH), and, to a lesser extent, temperature. We examined the Spearman-rank correlation between measured soil water content, temperature and chemistry (C, N and pH) on the one hand, and several topographic and vegetation attributes on the other. We will add a table with the correlation coefficient as an appendix in the revised manuscript (now provide in the attached PDF).
  
  Furthermore, it is unclear why SAGA wetness index was not used instead of the traditional topographic wetness index, as the SAGA version spreads high values in the flat areas.
  We thank the reviewer for this insightful comment. In the revised analysis, we adopted the SAGA Wetness Index instead of the traditional Topographic Wetness Index (TWI). The reviewer’s suggestion helped us recognize that the SAGA version provides a more accurate representation of wetness distribution, especially in flat areas, where the traditional TWI may direct the flow in the wrong directions, thereby distorting the flow accumulation.
  
  Similarly, topographic position index should have been calculated with multiple neighborhood radiuses and vertical distance to streams with multiple stream networks. Now it is even unclear how the stream network was calculated and streams initiated when calculating the layer.
  We acknowledge the reviewer's comment that multiple radii should be considered for TPI and multiple stream network for VDCN, as these variables are highly scale-dependent. TPI was calculated using neighborhood radii of 20, 30, and 50 m. To calculate VDCN, the DEM was first filled, and then flow accumulation layers were generated using the multiple flow direction method. The resulting flow accumulation raster was then used to create topographically defined flow channel networks, applying flow initiation thresholds of 0.5, 2.5, and 5 ha. VDCN was subsequently calculated for each threshold.
  
  Finally, we chose the TPI with 30 m radii and VDCN with initiation thresholds of 5 ha, as they had the highest Spearman correlations with the soil variables, as explain above (The correlation table is available in the attached PDF.). However, the model performance in terms of its ability to predict measured fluxes were very similar when using any of combinations of these three TPI and VDCN. Detailed explanation will be added in the revised manuscript.
  
  4.Research hypotheses and result section are not well aligned with each other. Particularly, section 3.1 does not seem to address any of the hypotheses. I would suggest phrasing the hypotheses/research questions so that they are answered one by one in the results section.
  We have a slightly different opinion here. For us, the research hypothesis should be well-aligned with the discussion, not with the result section, because hypotheses should be discussed based on the results but also additional information available from the literature. The result section follows a step-by-step organisation: the data, the model, the post hoc analysis.
  
  Furthermore, also the methods section could be organized in the same way. Now result section starts with research for such methods that were described in the end of the methods section.
  Thank you, here we agree with you that methods and results should be better aligned. Following the advices in your first comment, the structure of the result section will be changed, and we will better aligned the method section with the updated result section.
  
  Novelty value of the research is not entirely clear yet. Is the main novelty about analyzing the role of topography on methane fluxes at different times of snow-free season? If yes, this could be highlighted more in the introduction and also in the conclusions section.
  
  Yes, our objective is to analyse the role of topography on methane fluxes throughout the snow-free season in a topographically complex mountainous landscape, and how the aggregated flux at the landscape level and its spatial heterogeneity vary over time. We will make it more visible in the introduction and also in the conclusions section.
  
  More detailed/minor comments:
  14: “aimed to investigate” -> “investigated”; i.e., you can use stronger language
  Thank you, the change will be made.
  
  79: should it be “have been” instead of “were” to be more consistent with tenses. Also otherwise, it is best to write the introduction in present tense.
  We will replace “were” by “have been”
  
  82: can the km2 be written in ha so that same unit is used for all referenced studies
  The change will be made
  
  85: Can you start just by writing “We assess”. Overall, it would be best if you would use active voice throughout. Now, you use partly passive and partly active voice in the methods section.
  The change will be made
  
  Figure 1: Can you also show the location of the area within Japan/Honshu?
  A map of Japan will be included
  
  17: how were the coverages for the different land cover types estimated?
  For permanent wetland mapping, we collected additional GPS positions at the edges and within wetlands. We then used TWI, profile curvature, slope, and VDCN to predict wetland locations. Their boundaries were refined by visual inspection. A posteriori, pixel classified as wetland had TWI values above 8.1, profile curvature between -0.003 and 0.001, slope values below 6.8 for slope, and VDCN values below 2.2. It will be illustrated by a plot comparing the distribution of these topographic variables between upland and wetland pixels (see attached PDF).
  
  For river mapping, pixels corresponding to rivers was detected from channel network raster calculated by 5 ha initiation threshold.
  
  How was the measurement point sampling designed? Purposeful sampling or somehow randomized designed? How were the wetland measurement points sampled? Did you use boardwalks when measuring methane fluxes from wetlands?
  The sampling design was purposeful. We established transects perpendicular to the main river channel, from the plain to the ridges covering on both slopes (south-facing and north-facing), as well as in a lateral canyon, and parallel to the main river channel, on the plain, above the foot slope, and on a ridge. The location of these transects was constrained by the site geography and safety considerations. Ropes had to be installed on some parts of the slopes. The wetland patches were small enough that boardwalks were not required. When measuring fluxes, we took care to avoid trampling the soil near the collars, taking advantage of the abundant presence of stones and coarse woody debris. This additional information will be added in the revised manuscript.
  
  145/147: maybe better to use “spatial resolution”, “pixel size” or “grid” instead of “mesh”
  We agree: the change will be made
  
  148: you write in parentheses both “less than” and “≤”. Either or is sufficient. Actually, it should be “less than or equal to”.
  The change will be made
  
  161: What method was used to fill the DEM?
  We used the Wang and Liu (2006) method, implemented by built-in DEM filling option in QGIS
  
  Did you upscale the vegetation/tree classification for the whole study area?
  Yes, we upscaled vegetation (basal area of conifers and broadleaved trees) from of 55. 10-meter radius census plots to the entire study area using NDVI, TWI, TPI and VDCN.
  
  195: you can delete “In this study”. It is self-evident that you are describing “this study”
  The change will be made
  
  203: How were the vegetation types used as predictors? One categorical predictor with three different values? Why not to use continuous predictors related to vegetation and soil?
  We thank you for this suggestion. We re-evaluated the RF models using two continuous predictor variables: vegetation type (proportional contribution of conifers to basal area) and vegetation density (basal area). Vegetation type was never selected while vegetation density (basal area) was selected twice, in April and October. However, it turns out that excluding basal area from the initial list of variables increased the model performance on both dates. A comparative table will be added in an appendix to the revised manuscript and is now available in the attached PDF file. We therefore did not include basal area in the final models used to upscale fluxes to the landscape level, thus avoiding adding an additional layer of uncertainty.
  
  We continue to use categorical variables for post hoc statistical tests. For vegetation type and vegetation density, the distribution of the variable was divided into three categories: above the upper quartile, within the interquartile range, and below the lower upper quartile.
  
  As previously mentioned, soil variables were not included in the RF model because only topographical and vegetation predictors are available at pixel level. We could have used other RF models to upscale the soil variables to the landscape level, but this would have added additional layer of uncertainties. Soil variables are well correlated with topographical and vegetation variables, as previously mentioned. Therefore, we consider them to be indirectly included in the final RF models.
  
  VSURF: did you employ all three steps of the method?
  Yes, we follow the method of Genuer et al. (2010) for variable selection. We will provide more details about the different steps of variable selection in the revised manuscript
  
  217: Why did you use a separate package for variable importance? They can be obtained from random forest directly. What metric was used to assess the importance?
  Although variable importance can be obtained directly from the random forest algorithm, we calculated variable importance using the vip package (Greenwell and Boehmke, 2020). Variable importance scores were estimated using a permutation-based approach, in which the values of each predictor in the training data were randomly permuted to assess the resulting change in model performance, as quantified by the adjusted R-squared value. A greater reduction in adjusted R2 indicated a higher importance of the predictor variable.
  
  235: How about temporal autocorrelation in the models?
  We included pixel IDs as a random effect in the linear mixed model to account for repeated prediction at the same location.
  
  242: What does “scaled” mean here?
  For clarity, we will replace “scaled soil CH4 fluxes” with “landscape-scale predicted soil CH4 fluxes”
  
  288: How were the importance scores quantified?
  As previously mentioned, importance scores were quantified using a permutation-based approach developed by Greenwell and Boehmke (2020) and implemented in the VIF package.
  
  Figure 3: Is the line 1:1-line?
  Yes, we will revise the caption of this figure
  
  Figure 5: Can you have the measured fluxes in the same figure?
  We modified Figure 5 by aggregating the temporal variation (the average of all flux measurements during the snow-free season are now shown) and we added the measured fluxes in the figure, as suggested. Please look at the attached pdf file.
  
  458: Can you provide some results about the models including wetland points in the supplementary material? Now this feels like speculation.
  A comparison of model performance with and without wetland measurements will be included in an appendix and is now available in the attached PDF file.
  
  516: You did not really quantify the dominant role of topography as your quantile regression models had mostly just topography predictors.
  We agree that only topographical predictors were used in the previous data analysis. Because we included vegetation in the post hoc statistical test, we can now safely conclude on the dominant role of topography on the spatial variation of soil CH4 fluxes (effect-size are available in the attached PDF).
  
  Citation: https://doi.org/10.5194/egusphere-2025-3449-AC2

Status: closed

RC1:
'Comment on egusphere-2025-3449', Anonymous Referee #1, 01 Sep 2025

Overall Assessment
This manuscript presents a machine learning approach to upscale soil CH4 flux measurements across a topographically complex forest landscape using quantile regression forest models with topographic predictors. While the study addresses important questions about spatial controls on soil CH4 fluxes, there is room to improve methodologies to better differentiate between mechanistic and predictive statistics, and to contextualize the landscape-scale conclusions.
Major Comments:
1. The study assumes that topographic indices (TWI, TPI, VDCN) accurately represent soil moisture patterns that drive CH4 fluxes, but never measures soil moisture or temperature at sampling locations to validate this assumption. While the topographic predictors successfully predict CH4 fluxes, the mechanistic pathway (topography → soil moisture → CH4 flux) remains unverified. Without ground-truthing, it's unclear whether the correlations reflect the proposed moisture mechanisms or other covarying factors.
Please either acknowledge this limitation more explicitly or provide basic validation by measuring volumetric water content at a subset of locations to demonstrate that topographic predictors correlate with actual soil moisture conditions.
2. The study excludes wetlands from their predictive framework, leaving wetland pixels unmapped, but provides insufficient guidance on how their upland-only predictions should be applied to real forest landscapes. Most forests contain wet patches, seeps, or seasonally saturated areas that may not be classified as "wetlands" in standard remote sensing products but could function as significant CH4 sources. The authors' approach of simply excluding these areas creates uncertainty about how their upland flux predictions should be applied when: (a) wet patches exist but aren't formally classified as wetlands, (b) the boundary between "upland" and "wetland" conditions varies seasonally or with precipitation, and (c) their results are used to parameterize larger-scale models that need to handle mixed hydrologic conditions.
Provide clearer guidance on how to classify and handle hydrologically diverse areas when applying these results. Discuss what topographic or hydrologic thresholds define the boundaries of their "upland" predictions, and suggest approaches for handling wet patches that fall between clear upland and wetland classifications. This would help users appropriately apply their upland flux relationships while avoiding systematic underestimation of emissions from hydrologically complex forest landscapes.
3. Table A2 shows a significant three-way interaction (Position × Vegetation × Date, p = 0.04), yet the authors conclude that position and vegetation have no effects based on their lack of selection in the later random forest models. This understates the importance of the interaction effects in their mechanistic descriptive modeling, even if it does not provide additional predictive power in the landscape scaling.
The authors should acknowledge that the significant interaction indicates vegetation and position effects are present but depend on specific combinations and timing. In the discussion of Jevon et al. (lines 372-382), note that while vegetation interactions were significant in the LMM, vegetation wasn't selected in RF models because continuous topographic variables captured relevant gradients more effectively for prediction.
4. The LMM results (Table A2) report only p-values without effect sizes, making it impossible to assess practical significance. Similarly, while RF variable importance scores are reported in a table, the magnitude and direction of predictor effects aren't clear in text/discussion.
Please rreport standardized coefficients for LMM factors to show effect magnitudes alongside statistical significance. For RF models, clarify the interpretation of relationships for key predictors (e.g., whether higher TPI increases or decreases CH4 uptake).

5. Scale mismatch in validation approach The authors validate their predictions by comparing point measurements (20 cm diameter chambers) with pixel-level predictions (5m resolution), despite using predictors calculated at even coarser scales (e.g., 30m radius for TPI). This scale mismatch may actually understate the model's true predictive accuracy by forcing landscape-scale predictors to explain fine-scale chamber measurements that inevitably include local variability beyond what topographic indices can capture. The current validation approach tests whether coarse-resolution environmental variables can predict point-level flux heterogeneity, rather than testing the model's ability to capture the landscape-scale flux patterns it's designed to represent.

Consider validating at aggregated scales that better match the conceptual basis of the predictors. Compare predicted vs. observed mean fluxes within topographic position classes (ridge/slope/foot slope/plain) or other meaningful landscape units to test whether the model captures the spatial patterns it's intended to represent. This approach would provide a more appropriate assessment of model performance for landscape-scale applications. Additionally, measuring soil moisture at chamber locations would help validate the mechanistic assumption that topographic predictors accurately represent the moisture conditions driving CH4 fluxes, allowing separation of prediction errors due to scale mismatch from errors due to invalid mechanistic assumptions.

Citation: https://doi.org/10.5194/egusphere-2025-3449-RC1
- AC1:
  'Reply on RC1', Sumonta Kumar Paul, 11 Oct 2025
  Overall Assessment
  This manuscript presents a machine learning approach to upscale soil CH4 flux measurements across a topographically complex forest landscape using quantile regression forest models with topographic predictors. While the study addresses important questions about spatial controls on soil CH4 fluxes, there is room to improve methodologies to better differentiate between mechanistic and predictive statistics, and to contextualize the landscape-scale conclusions.
  Major Comments:
  The study assumes that topographic indices (TWI, TPI, VDCN) accurately represent soil moisture patterns that drive CH4 fluxes, but never measures soil moisture or temperature at sampling locations to validate this assumption. While the topographic predictors successfully predict CH4 fluxes, the mechanistic pathway (topography → soil moisture → CH4 flux) remains unverified. Without ground-truthing, it's unclear whether the correlations reflect the proposed moisture mechanisms or other covarying factors.
  
  Please either acknowledge this limitation more explicitly or provide basic validation by measuring volumetric water content at a subset of locations to demonstrate that topographic predictors correlate with actual soil moisture conditions.
  We measured soil water content (and temperature) at each collar at each CH4 flux measurements but did not judge useful to include these data, which was not sensible. In fact, prior to developing the model, we examined the Spearman-rank correlation between on one hand, measured soil water content, temperature and chemistry (C, N and pH), and on the other hand, several topographic and vegetation variables. We found significant correlations between soil moisture and TPI, TWI, VDCN, and vegetation density. This supports the use of topographic variables as effective predictors of CH4 fluxes in our landscape These correlations will be added to the manuscript and are presented in the attached PDF file. Please note that vegetation variables were not included in the final model because they did not improve model performance.
  
  However, incorporating soil moisture as an intermediate variable that would need to be scaled up to the landscape level would introduce an additional layer of uncertainty. Our strategy was to directly predict CH4 fluxes using topographic variables as proxies for soil moisture and related environmental gradients. We clarify this strategy in the revised manuscript.
  
  We nevertheless fully agree that no statistical method is mechanistic, and all can be biased by the existence of confounding factors. We will highlight this limitation more explicitly in the revised manuscript.
  
  The study excludes wetlands from their predictive framework, leaving wetland pixels unmapped, but provides insufficient guidance on how their upland-only predictions should be applied to real forest landscapes. Most forests contain wet patches, seeps, or seasonally saturated areas that may not be classified as "wetlands" in standard remote sensing products but could function as significant CH4 sources. The authors' approach of simply excluding these areas creates uncertainty about how their upland flux predictions should be applied when: (a) wet patches exist but aren't formally classified as wetlands, (b) the boundary between "upland" and "wetland" conditions varies seasonally or with precipitation, and (c) their results are used to parameterize larger-scale models that need to handle mixed hydrologic conditions.
  
  Provide clearer guidance on how to classify and handle hydrologically diverse areas when applying these results. Discuss what topographic or hydrologic thresholds define the boundaries of their "upland" predictions, and suggest approaches for handling wet patches that fall between clear upland and wetland classifications. This would help users appropriately apply their upland flux relationships while avoiding systematic underestimation of emissions from hydrologically complex forest landscapes.
  In our study, only permanent wetlands in the plain area were excluded (3 of 55 collars for data, less than 1% of the landscape pixels). Wet patches, which had temporarily water-saturated soils, were not excluded. Two collars were located in such wet patches, and positive fluxes were measured once on both collars. Therefore, to answer comment (a), areas with non-permanently saturated soil are included in the upland prediction and we will clarify this point in the revised manuscript. We acknowledged that our random forest models did not predict median positive fluxes, but the possibility of positive fluxes is reflected in the large uncertainties associated with near-zero fluxes.
  
  Regarding (b), we acknowledged that using a fixed boundary between "upland" and "wetland" conditions, although these boundaries may vary seasonally depending on the balance between precipitation and evaporation, may increase uncertainties in CH₄ flux prediction. Predicting the temporal variations of these boundaries was beyond the scope of this work, and, at our site, wetlands represent only 1% of the pixels, and their boundaries even less. However, despite the difficulty of this task, we agree that future researches could usefully attempt to better account for spatio-temporal variations of wetness conditions at these boundaries to improve landscape-scale CH₄ flux predictions at the landscape level. We will discuss the limitations of using static boundaries in more detail in the revised manuscript.
  
  Regarding (c), wetland exclusion, although acceptable in our 40-ha study area, where wetlands represent only 1% of the area, would overestimate CH4 uptake if incorrectly applied at larger scales, i.e., to the entire upper Yura River catchment in our case, for example, or to other hydrologically complex forest landscapes. We have already mentioned in our manuscript that sinks and sources should be modelled separately in the case of larger areas with mixed hydrological conditions. We will emphasize this point further in the revised discussion.
  
  For permanent wetland mapping, we collected additional GPS positions at the edges and within wetlands. We then used TWI, profile curvature, slope, and VDCN to predict wetland locations. Their boundaries were refined by visual inspection. A posteriori, pixel classified as wetland had TWI values above 8.1, profile curvature between -0.003 and 0.001, slope values below 6.8 for slope, and VDCN values below 2.2. It will be illustrated by a plot comparing the distribution of these topographic variables between upland and wetland pixels (see attached PDF).
  
  Table A2 shows a significant three-way interaction (Position × Vegetation × Date, p = 0.04), yet the authors conclude that position and vegetation have no effects based on their lack of selection in the later random forest models. This understates the importance of the interaction effects in their mechanistic descriptive modeling, even if it does not provide additional predictive power in the landscape scaling.
  
  The authors should acknowledge that the significant interaction indicates vegetation and position effects are present but depend on specific combinations and timing. In the discussion of Jevon et al. (lines 372-382), note that while vegetation interactions were significant in the LMM, vegetation wasn't selected in RF models because continuous topographic variables captured relevant gradients more effectively for prediction.
  We apologized but we should not have included interactions in the model, as some of them are missing. For example, there are no “pure” broadleaved areas on the ridge. Models with interactions would be rank-deficient. However, following the suggestion of the second reviewer, we significantly expanded the analysis of the influence of vegetation of CH4 fluxes, using not only the vegetation type but also its density. As landscape attributes, we now have (i) topographic position (plain, foot slope, slope, and ridge), (ii) vegetation type (broadleaf, coniferous, and mixed), and (iii) vegetation density (high, medium, and low). The effect sizes of topographic position, vegetation type, and vegetation density were 0.43, 0.006, and 0.11, respectively, highlighting the dominant role of topography over vegetation in the spatial variability of CH4 fluxes. The effect sizes will be added to the ANOVA table in the revised manuscript (see attached PDF).
  
  The LMM results (Table A2) report only p-values without effect sizes, making it impossible to assess practical significance. Similarly, while RF variable importance scores are reported in a table, the magnitude and direction of predictor effects aren't clear in text/discussion.
  
  Please rreport standardized coefficients for LMM factors to show effect magnitudes alongside statistical significance. For RF models, clarify the interpretation of relationships for key predictors (e.g., whether higher TPI increases or decreases CH4 uptake).
  We recognise that failing to consider the effect sizes of variables used in linear models can lead to an underestimation of their potential mechanistic importance, even if these factors do not improve the performance of random forest models. Effect size are now reported (see our response to your previous comment and the attached PDF).
  
  For RF models, the direction of predictive effects is now quantified using accumulated local effects (ALE) analysis, which is effective if predictors are correlated with each other. The interpretation of the ALE analysis will be added to the revised manuscript, as well as their graphical representation (now available in the attached PDF). In summary, for the two most influential predictors, low CH4 uptake rates were associated with high TWI values, while they were associated with low TPI values. We will discuss the magnitude and direction of predictive effects in more detail in the revised manuscript.
  
  Scale mismatch in validation approach The authors validate their predictions by comparing point measurements (20 cm diameter chambers) with pixel-level predictions (5m resolution), despite using predictors calculated at even coarser scales (e.g., 30m radius for TPI). This scale mismatch may actually understate the model's true predictive accuracy by forcing landscape-scale predictors to explain fine-scale chamber measurements that inevitably include local variability beyond what topographic indices can capture. The current validation approach tests whether coarse-resolution environmental variables can predict point-level flux heterogeneity, rather than testing the model's ability to capture the landscape-scale flux patterns it's designed torepresent.Consider validating at aggregated scales that better match the conceptual basis of the predictors. Compare predicted vs. observed mean fluxes within topographic position classes (ridge/slope/foot slope/plain) or other meaningful landscape units to test whether the model captures the spatial patterns it's intended to represent. This approach would provide a more appropriate assessment of model performance for landscape-scale applications. Additionally, measuring soil moisture at chamber locations would help validate the mechanistic assumption that topographic predictors accurately represent the moisture conditions driving CH4 fluxes, allowing separation of prediction errors due to scale mismatch from errors due to invalid mechanistic assumptions. Compare predicted vs. observed mean fluxes within topographic position classes (ridge/slope/foot slope/plain) or other meaningful landscape units to test whether the model captures the spatial patterns it's intended to represent. This approach would provide a more appropriate assessment of model performance for landscape-scale applications. Additionally, measuring soil moisture at chamber locations would help validate the mechanistic assumption that topographic predictors accurately represent the moisture conditions driving CH4 fluxes, allowing separation of prediction errors due to scale mismatch from errors due to invalid mechanistic assumptions.
  
  To clarify, all predictors are calculated at pixel size (5 by 5 m), including TPI. For each 5 by 5 pixels, TPI is calculated based on the elevation of that pixel relative to the surrounding pixels within a radius of 20, 30, or 50 m. That being said, we fully agree that there is a scale mismatch between collar and pixel size. We appreciate your suggestion to validate at aggregate scales by comparing predicted and observed mean fluxes within topographic position classes. The R2m of linear mixed model between predicted and measured fluxes at the four topographic position for 9 measurement (n=36) dates was 0.93. We did this not only using topographic position classes, but also considering vegetation types and density. This validation will be included in the revised manuscript and now available in the attached PDF.
  
  As mentioned in the response to your first comment, we measured soil water content (and temperature) at each collar at each time CH4 flux measurements but did not judge useful to include these data. This will be included in the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3449-AC1
RC2:
'Comment on egusphere-2025-3449', Anonymous Referee #2, 11 Sep 2025
The manuscript upscales soil methane fluxes with a digital terrain model in a forested landscape. It is nicely written, and the topic is relatively novel and worth investigating. However, there are certain issues that should be covered better.
The current analysis and methodology seems to have a double structure: there is a quantile regression forest analysis for upscaling methane fluxes for different dates and then there is a mix of different traditional statistical techniques (e.g., ANOVA, linear mixed models) for looking at relationships between different environmental characteristics (including topography and methane fluxes). I feel that this structure is a bit complex and some of the issues are done twice but with different methods. Therefore, I suggest simplifying the methodological approach and justifying better why certain analyses are conducted. For instance, why is linear regression conducted between measured and predicted fluxes? Isn’t it sufficient to provide observed-predicted plots? Why is there a need to conduct separate linear regression between topography and methane fluxes in addition to quantile regression forests?

In relation to the first point, there could also be additional analyses that have not been conducted. It is a bit unclear to me what is the logic in predicting temporally dynamic methane fluxes with temporally static topographic variables. Why not to test also a model with both temporally static but spatially distributed topographic variables and temporally dynamic but spatially uniform climate/weather variables such as API (there could be possibilities for including also other weather-related variables)? Could the soil variables be included also in the quantile regression forest to test their strength in addition to the topographic variables? Why is vegetation (or actually tree) information condensed into one categorical variable (forest type based on tree types? You could also have continuous variables about the tree species presence and abundance.

The selection of the topographic variables for upscaling is relatively arbitrary. Why were these specific variables chosen and not others (listed e.g., in Ågren et al 2021, https://doi.org/10.1016/j.geoderma.2021.115280). Furthermore, it is unclear why SAGA wetness index was not used instead of the traditional topographic wetness index, as the SAGA version spreads high values in the flat areas. Similarly, topographic position index should have been calculated with multiple neighborhood radiuses and vertical distance to streams with multiple stream networks. Now it is even unclear how the stream network was calculated and streams initiated when calculating the layer.

Research hypotheses and result section are not well aligned with each other. Particularly, section 3.1 does not seem to address any of the hypotheses. I would suggest phrasing the hypotheses/research questions so that they are answered one by one in the results section. Furthermore, also the methods section could be organized in the same way. Now result section starts with research for such methods that were described in the end of the methods section.

Novelty value of the research is not entirely clear yet. Is the main novelty about analyzing the role of topography on methane fluxes at different times of snow-free season? If yes, this could be highlighted more in the introduction and also in the conclusions section.

More detailed/minor comments:
l14: “aimed to investigate” -> “investigated”; i.e., you can use stronger language

l79: should it be “have been” instead of “were” to be more consistent with tenses. Also otherwise, it is best to write the introduction in present tense.

l82: can the km2 be written in ha so that same unit is used for all referenced studies

l85: Can you start just by writing “We assess”. Overall, it would be best if you would use active voice throughout. Now, you use partly passive and partly active voice in the methods section.

Figure 1: Can you also show the location of the area within Japan/Honshu?

l117: how were the coverages for the different land cover types estimated?

How was the measurement point sampling designed? Purposeful sampling or somehow randomized designed? How were the wetland measurement points sampled? Did you use boardwalks when measuring methane fluxes from wetlands?

l145/147: maybe better to use “spatial resolution”, “pixel size” or “grid” instead of “mesh”

l148: you write in parentheses both “less than” and “≤”. Either or is sufficient. Actually, it should be “less than or equal to”.

l161: What method was used to fill the DEM?

Did you upscale the vegetation/tree classification for the whole study area?

l195: you can delete “In this study”. It is self-evident that you are describing “this study”

l203: How were the vegetation types used as predictors? One categorical predictor with three different values? Why not to use continuous predictors related to vegetation and soil?

VSURF: did you employ all three steps of the method?

l217: Why did you use a separate package for variable importance? They can be obtained from random forest directly. What metric was used to assess the importance?

l235: How about temporal autocorrelation in the models?

l242: What does “scaled” mean here?

l288: How were the importance scores quantified?

Figure 3: Is the line 1:1-line?

Figure 5: Can you have the measured fluxes in the same figure?

l458: Can you provide some results about the models including wetland points in the supplementary material? Now this feels like speculation.

l516: You did not really quantify the dominant role of topography as your quantile regression models had mostly just topography predictors.
Citation: https://doi.org/10.5194/egusphere-2025-3449-RC2
- AC2:
  'Reply on RC2', Sumonta Kumar Paul, 11 Oct 2025
  The manuscript upscales soil methane fluxes with a digital terrain model in a forested landscape. It is nicely written, and the topic is relatively novel and worth investigating. However, there are certain issues that should be covered better.
  The current analysis and methodology seems to have a double structure: there is a quantile regression forest analysis for upscaling methane fluxes for different dates and then there is a mix of different traditional statistical techniques (e.g., ANOVA, linear mixed models) for looking at relationships between different environmental characteristics (including topography and methane fluxes). I feel that this structure is a bit complex and some of the issues are done twice but with different methods. Therefore, I suggest simplifying the methodological approach and justifying better why certain analyses are conducted.
  
  We appreciate the reviewer's observation regarding the complexity of our initial analytical framework. In the revised manuscript, we simplify the analytical approach. First, we upscaled measured CH4 fluxes to the landscape level using a quantile regression forest (QRF) model. Then, we tested whether predicted CH4 fluxes differed among landscape positions, vegetation types, and vegetation densities using a linear mixed model that accounted for spatial autocorrelation. This streamlined approach reduces methodological redundancy and maintains consistency between scaling analyses and subsequent interpretations of spatial patterns.
  
  For instance, why is linear regression conducted between measured and predicted fluxes? Isn’t it sufficient to provide observed-predicted plots?
  We understand that providing observed–predicted plots may seem sufficient; however, we included linear regressions between measured and predicted fluxes to quantitatively assess model performance. This provided an objective description of these plots, allowing us to test for the absence of bias, i.e., an intercept not significantly different from 0 and a slope not significantly different from 1. Furthermore, as the first reviewer pointed out a scale mismatch between collars and pixel size, we followed his suggestion and conducted additional analyses at aggregate scales by comparing predicted and observed mean fluxes within topographic position and vegetation classes. This new figure is available in the attached PDF.
  
  Why is there a need to conduct separate linear regression between topography and methane fluxes in addition to quantile regression forests?
  We agree with the reviewer that performing a separate linear regression between topography and CH4 fluxes is not justified, as the quantile regression forest model already captured nonlinear relationships. Therefore, we removed this analysis and instead present a correlation table among topographic variables and vegetation attributes in one hand, and soil moisture, temperature and chemistry on the other. This new table is available in the attached PDF. We are now reporting differences in predicted fluxes at the landscape level across topographic and vegetation classes. The updated ANOVA table is available in the attached PDF.
  
  In relation to the first point, there could also be additional analyses that have not been conducted. It is a bit unclear to me what is the logic in predicting temporally dynamic methane fluxes with temporally static topographic variables.
  
  Why not to test also a model with both temporally static but spatially distributed topographic variables and temporally dynamic but spatially uniform climate/weather variables such as API (there could be possibilities for including also other weather-related variables)?
  Because all pixels will have the same values for weather variables on a given date, it would be pointless to include them in RF models to predict spatial heterogeneity of landscape-scale fluxes. Previous works using similar RF modelling also run their models separately for each season without including weather data (e.g. Warner et al, 2019, Agric For Meteorol 264:80–91; Vainio et al, 2021, Biogeosciences 18:2003–2025).
  
  Could the soil variables be included also in the quantile regression forest to test their strength in addition to the topographic variables?
  Soil variables could potentially be included in the RF model but then the RF could not be used to predict flux at landscape level where only topographical and vegetation predictors are available at pixel level. Of course, it would have been possible to use other RFs to upscale soil variables at the landscape level and use the predicted soil variables to predict methane fluxes, but it would add additional layers of uncertainties. We will make this point clear in the revised manuscript.
  
  Why is vegetation (or actually tree) information condensed into one categorical variable (forest type based on tree types? You could also have continuous variables about the tree species presence and abundance.
  Thank you for this suggestion. It is true that we could have used a continuous variable for vegetation type in the model, and only used the categories for post hoc statistical tests. We also agree that it is useful to include vegetation density in addition to vegetation type. In the revised manuscript, we used two continuous variables: basal area for vegetation density and proportional contribution of conifers to basal area for vegetation type. Post hoc statistical analyses show that the effect sizes of topographic position, vegetation type, and vegetation density were 0.43, 0.006, and 0.11, respectively, highlighting the dominant role of topography over vegetation in the spatial variability of CH4 fluxes. The effect sizes will be added to the ANOVA table in the revised manuscript (see attached PDF).
  
  The selection of the topographic variables for upscaling is relatively arbitrary. Why were these specific variables chosen and not others (listed e.g., in Ågren et al 2021, https://doi.org/10.1016/j.geoderma.2021.115280).
  
  Many variables can be derived from DEM and selection is necessary to avoid overparameterization due to redundancies. Our preselection was motivated by the fact that methane fluxes are related to microbial activities (methanotrophic and methanogenic in our case), which are controlled by soil moisture and chemistry (C, N, pH), and, to a lesser extent, temperature. We examined the Spearman-rank correlation between measured soil water content, temperature and chemistry (C, N and pH) on the one hand, and several topographic and vegetation attributes on the other. We will add a table with the correlation coefficient as an appendix in the revised manuscript (now provide in the attached PDF).
  
  Furthermore, it is unclear why SAGA wetness index was not used instead of the traditional topographic wetness index, as the SAGA version spreads high values in the flat areas.
  We thank the reviewer for this insightful comment. In the revised analysis, we adopted the SAGA Wetness Index instead of the traditional Topographic Wetness Index (TWI). The reviewer’s suggestion helped us recognize that the SAGA version provides a more accurate representation of wetness distribution, especially in flat areas, where the traditional TWI may direct the flow in the wrong directions, thereby distorting the flow accumulation.
  
  Similarly, topographic position index should have been calculated with multiple neighborhood radiuses and vertical distance to streams with multiple stream networks. Now it is even unclear how the stream network was calculated and streams initiated when calculating the layer.
  We acknowledge the reviewer's comment that multiple radii should be considered for TPI and multiple stream network for VDCN, as these variables are highly scale-dependent. TPI was calculated using neighborhood radii of 20, 30, and 50 m. To calculate VDCN, the DEM was first filled, and then flow accumulation layers were generated using the multiple flow direction method. The resulting flow accumulation raster was then used to create topographically defined flow channel networks, applying flow initiation thresholds of 0.5, 2.5, and 5 ha. VDCN was subsequently calculated for each threshold.
  
  Finally, we chose the TPI with 30 m radii and VDCN with initiation thresholds of 5 ha, as they had the highest Spearman correlations with the soil variables, as explain above (The correlation table is available in the attached PDF.). However, the model performance in terms of its ability to predict measured fluxes were very similar when using any of combinations of these three TPI and VDCN. Detailed explanation will be added in the revised manuscript.
  
  4.Research hypotheses and result section are not well aligned with each other. Particularly, section 3.1 does not seem to address any of the hypotheses. I would suggest phrasing the hypotheses/research questions so that they are answered one by one in the results section.
  We have a slightly different opinion here. For us, the research hypothesis should be well-aligned with the discussion, not with the result section, because hypotheses should be discussed based on the results but also additional information available from the literature. The result section follows a step-by-step organisation: the data, the model, the post hoc analysis.
  
  Furthermore, also the methods section could be organized in the same way. Now result section starts with research for such methods that were described in the end of the methods section.
  Thank you, here we agree with you that methods and results should be better aligned. Following the advices in your first comment, the structure of the result section will be changed, and we will better aligned the method section with the updated result section.
  
  Novelty value of the research is not entirely clear yet. Is the main novelty about analyzing the role of topography on methane fluxes at different times of snow-free season? If yes, this could be highlighted more in the introduction and also in the conclusions section.
  
  Yes, our objective is to analyse the role of topography on methane fluxes throughout the snow-free season in a topographically complex mountainous landscape, and how the aggregated flux at the landscape level and its spatial heterogeneity vary over time. We will make it more visible in the introduction and also in the conclusions section.
  
  More detailed/minor comments:
  14: “aimed to investigate” -> “investigated”; i.e., you can use stronger language
  Thank you, the change will be made.
  
  79: should it be “have been” instead of “were” to be more consistent with tenses. Also otherwise, it is best to write the introduction in present tense.
  We will replace “were” by “have been”
  
  82: can the km2 be written in ha so that same unit is used for all referenced studies
  The change will be made
  
  85: Can you start just by writing “We assess”. Overall, it would be best if you would use active voice throughout. Now, you use partly passive and partly active voice in the methods section.
  The change will be made
  
  Figure 1: Can you also show the location of the area within Japan/Honshu?
  A map of Japan will be included
  
  17: how were the coverages for the different land cover types estimated?
  For permanent wetland mapping, we collected additional GPS positions at the edges and within wetlands. We then used TWI, profile curvature, slope, and VDCN to predict wetland locations. Their boundaries were refined by visual inspection. A posteriori, pixel classified as wetland had TWI values above 8.1, profile curvature between -0.003 and 0.001, slope values below 6.8 for slope, and VDCN values below 2.2. It will be illustrated by a plot comparing the distribution of these topographic variables between upland and wetland pixels (see attached PDF).
  
  For river mapping, pixels corresponding to rivers was detected from channel network raster calculated by 5 ha initiation threshold.
  
  How was the measurement point sampling designed? Purposeful sampling or somehow randomized designed? How were the wetland measurement points sampled? Did you use boardwalks when measuring methane fluxes from wetlands?
  The sampling design was purposeful. We established transects perpendicular to the main river channel, from the plain to the ridges covering on both slopes (south-facing and north-facing), as well as in a lateral canyon, and parallel to the main river channel, on the plain, above the foot slope, and on a ridge. The location of these transects was constrained by the site geography and safety considerations. Ropes had to be installed on some parts of the slopes. The wetland patches were small enough that boardwalks were not required. When measuring fluxes, we took care to avoid trampling the soil near the collars, taking advantage of the abundant presence of stones and coarse woody debris. This additional information will be added in the revised manuscript.
  
  145/147: maybe better to use “spatial resolution”, “pixel size” or “grid” instead of “mesh”
  We agree: the change will be made
  
  148: you write in parentheses both “less than” and “≤”. Either or is sufficient. Actually, it should be “less than or equal to”.
  The change will be made
  
  161: What method was used to fill the DEM?
  We used the Wang and Liu (2006) method, implemented by built-in DEM filling option in QGIS
  
  Did you upscale the vegetation/tree classification for the whole study area?
  Yes, we upscaled vegetation (basal area of conifers and broadleaved trees) from of 55. 10-meter radius census plots to the entire study area using NDVI, TWI, TPI and VDCN.
  
  195: you can delete “In this study”. It is self-evident that you are describing “this study”
  The change will be made
  
  203: How were the vegetation types used as predictors? One categorical predictor with three different values? Why not to use continuous predictors related to vegetation and soil?
  We thank you for this suggestion. We re-evaluated the RF models using two continuous predictor variables: vegetation type (proportional contribution of conifers to basal area) and vegetation density (basal area). Vegetation type was never selected while vegetation density (basal area) was selected twice, in April and October. However, it turns out that excluding basal area from the initial list of variables increased the model performance on both dates. A comparative table will be added in an appendix to the revised manuscript and is now available in the attached PDF file. We therefore did not include basal area in the final models used to upscale fluxes to the landscape level, thus avoiding adding an additional layer of uncertainty.
  
  We continue to use categorical variables for post hoc statistical tests. For vegetation type and vegetation density, the distribution of the variable was divided into three categories: above the upper quartile, within the interquartile range, and below the lower upper quartile.
  
  As previously mentioned, soil variables were not included in the RF model because only topographical and vegetation predictors are available at pixel level. We could have used other RF models to upscale the soil variables to the landscape level, but this would have added additional layer of uncertainties. Soil variables are well correlated with topographical and vegetation variables, as previously mentioned. Therefore, we consider them to be indirectly included in the final RF models.
  
  VSURF: did you employ all three steps of the method?
  Yes, we follow the method of Genuer et al. (2010) for variable selection. We will provide more details about the different steps of variable selection in the revised manuscript
  
  217: Why did you use a separate package for variable importance? They can be obtained from random forest directly. What metric was used to assess the importance?
  Although variable importance can be obtained directly from the random forest algorithm, we calculated variable importance using the vip package (Greenwell and Boehmke, 2020). Variable importance scores were estimated using a permutation-based approach, in which the values of each predictor in the training data were randomly permuted to assess the resulting change in model performance, as quantified by the adjusted R-squared value. A greater reduction in adjusted R2 indicated a higher importance of the predictor variable.
  
  235: How about temporal autocorrelation in the models?
  We included pixel IDs as a random effect in the linear mixed model to account for repeated prediction at the same location.
  
  242: What does “scaled” mean here?
  For clarity, we will replace “scaled soil CH4 fluxes” with “landscape-scale predicted soil CH4 fluxes”
  
  288: How were the importance scores quantified?
  As previously mentioned, importance scores were quantified using a permutation-based approach developed by Greenwell and Boehmke (2020) and implemented in the VIF package.
  
  Figure 3: Is the line 1:1-line?
  Yes, we will revise the caption of this figure
  
  Figure 5: Can you have the measured fluxes in the same figure?
  We modified Figure 5 by aggregating the temporal variation (the average of all flux measurements during the snow-free season are now shown) and we added the measured fluxes in the figure, as suggested. Please look at the attached pdf file.
  
  458: Can you provide some results about the models including wetland points in the supplementary material? Now this feels like speculation.
  A comparison of model performance with and without wetland measurements will be included in an appendix and is now available in the attached PDF file.
  
  516: You did not really quantify the dominant role of topography as your quantile regression models had mostly just topography predictors.
  We agree that only topographical predictors were used in the previous data analysis. Because we included vegetation in the post hoc statistical test, we can now safely conclude on the dominant role of topography on the spatial variation of soil CH4 fluxes (effect-size are available in the attached PDF).
  
  Citation: https://doi.org/10.5194/egusphere-2025-3449-AC2

Sumonta Kumar Paul, Keisuke Yuasa, Masako Dannoura, and Daniel Epron

Viewed

Total article views: 1,435 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,293	107	35	1,435	43	45

HTML: 1,293
PDF: 107
XML: 35
Total: 1,435
BibTeX: 43
EndNote: 45

Views and downloads (calculated since 08 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	609	21	4	634
Sep 2025	520	9	8	537
Oct 2025	67	26	11	104
Nov 2025	50	24	5	79
Dec 2025	38	21	7	66
Jan 2026	9	6	0	15

Cumulative views and downloads (calculated since 08 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	609	21	4	634
Sep 2025	520	9	8	537
Oct 2025	67	26	11	104
Nov 2025	50	24	5	79
Dec 2025	38	21	7	66
Jan 2026	9	6	0	15

Viewed (geographical distribution)

Total article views: 1,396 (including HTML, PDF, and XML) Thereof 1,396 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 09 Jan 2026

Short summary

This study used a machine learning approach to scale soil CH₄ fluxes over time in a topographically complex mountain forest. Within the landscape, predicted upland CH₄ fluxes varied significantly across topographic positions, with the greater uptake on ridges and slopes than in the plain and foot slopes. Recent past precipitations significantly influenced seasonal CH₄ uptake. Our findings highlight the role of topography and the potential of remote sensing and machine learning to map CH₄ fluxes.


Total:	0
HTML:	0
PDF:	0
XML:	0