Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast

Wagner, Julia; Wolter, Juliane; Ramage, Justine; Martin, Victoria; Richter, Andreas; Speetjens, Niek Jesse; Vonk, Jorien E.; Lodi, Rachele; Bartsch, Annett; Fritz, Michael; Lantuit, Hugues; Hugelius, Gustaf

doi:10.5194/egusphere-2025-1052

Preprints

https://doi.org/10.5194/egusphere-2025-1052

Preprints

12 Mar 2025

| 12 Mar 2025

Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius

Abstract. Permafrost soils are particularly vulnerable to climate change. To assess and improve estimations of carbon (C) and nitrogen (N) budgets it is necessary to accurately map soil carbon and nitrogen in the permafrost region. In particular, soil organic carbon (SOC) stocks have been predicted and mapped by many studies from local to pan-Arctic scales. Several studies have been carried out at the Canadian Beaufort Sea coast, though no regional synthesis of terrestrial carbon stocks based on spatial modelling has been conducted yet. This study synthesises available field data from the Canadian coastal plain and uses it to map regional SOC and N stocks using the machine learning algorithm random forest and environmental variables based on remote sensing data. We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks. We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island. We assessed performance of different random forest models by using the Area of Applicability (AOA) method. We further applied the quantile regression forest method to the mainland and Qikiqtaruk Herschel Island models for SOC stocks and compared the results with the AOA method. Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important. The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m⁻²and the N stock 2.19 ± 0.51 kg m⁻². The average SOC stocks vary significantly when including or excluding data in the predictive models. Qikiqtaruk Herschel Island is geologically different from the coastal mainland and has lower SOC stocks. Including Qikiqtaruk Herschel Island soil data to predict SOC stocks at the mainland has large impact on the results. Differences in N stocks were not as dependent on the location as SOC stocks and rather differences between individual studies occurred. The results of the separate models show 36.2 ± 5.7 kg C m⁻²and 2.66 ± 0.39 kg N m⁻²for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m⁻²and 2.17 ± 0.50 kg N m⁻²for the mainland. Our results diverge from previous studies of lower resolution, showing the added regional-scale accuracy and precision that can be achieved at intermediate resolution and with sufficient field data.

Received: 05 Mar 2025 – Discussion started: 12 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 28519 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (28519 KB)

Supplement (1218 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

10 Feb 2026

Challenges in the use of local data for regional scale mapping of C and N stocks in the continuous permafrost zone at the Yukon Coastal Plain

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius

SOIL, 12, 113–132, https://doi.org/10.5194/soil-12-113-2026,https://doi.org/10.5194/soil-12-113-2026, 2026

Short summary

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1052', Anonymous Referee #1, 09 May 2025
The manuscript address few key areas
Synthesis of SOC and N stocks data from previous research

Upscaling of assimilated local data for detailed mapping of SOC and N stocks at fine resolution (10 m). (local to regional scale prediction)

Overall quality:
In terms of research questions trying to address, this is a good study and authors have attempted to some extent. Despites that, several key weaknesses have impacted and reduced the scientific rigor of this study. This study needs a thorough revision with a focus on limitations and potentials of using local data sets in spatial characterization of SOC and N stocks. In this context, some weaknesses noted are listed below.
Giving priority to make estimates of SOC and N stocks, despite the data set used in this study is quite weak in spatial context.

Lower prediction accuracy of prediction models, this could be due to inadequate sample distribution and/or inadequate explanatory variables used in RF models. These aspects need to be clearly understood.

The comparison of random forest models developed using the “entire data set” and by “dividing dataset into two data sets (mainland and island)” look biased. Validation results of RF model developed using “entire data set” need to reported by partitioning the data set into mainland and the island. Then we can have a proper understanding of prediction accuracies, as pooling data would have resulted poor performances of the RF model. I have indicated this weakness under my comments for discussion part.

Wish to quote the following “This study shows that regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”

- This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks (with weak models) while comparing the same with previous studies.
e) More attention need to be given for accuracy and precision of predictions when reporting research outcomes and concluding.

f) The use of AoA analysis is commendable, but need to be integrated with accuracy and precision of predictions.

Title “Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast”
I noticed that the findings are not well aligned with the title. By regional synthesis, authors have meant the compilation of research data from previous work in the area. I believe that rather than simple compilation, authors need to look at data harmonization (laboratory methods) and spatial harmonization already collected data when trying to synthesize them, enabling detailed mapping. The mapping part hardly could be a focus here due to a key limitation in the study, i.e. poor fit of random forest models for entire area as well as individual areas. Thus, it is misleading that this work has accomplished a successful mapping task compared to previous studies. I would rather expect a title as such “Challenges in the use of local data for reginal scale mapping of C and N stocks in a continuous permafrost zone of the Yukon Coastal Plain”

Abstract
Need to be rewritten with clear flow of research questions, objectives and results. Often, the sequence of flow of presentation is weak.

“We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks”

- What is meant by local differences, is it the differences between “coastal lowland area” vs “Herschel Island”. Need to define the local differences in relation to the spatial scale!!!

“We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island”

– Normally we do modelling then mapping!!!! Not mapping then modelling!!!

“Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important”

– This statement need to be justified with data coming out from the research”

“The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock 2.19 ± 0.51 kg m−2”

I suppose these are average values. Authors need to explain why these values are more accurate than reported values having found the accuracy and precision of random forest models developed for this study are not strong enough due to weaknesses in the soil data base!!!!

“The average SOC stocks vary significantly when including or excluding data in the predictive models”

– What is meant by including and excluding data models. Theoretically, It has to be!!! Here must be very specific on what data!!! Dependent or independent variables. What is the key message intend to pass to the reader?

“Our results diverge from previous studies of lower resolution, showing the added regional-scale 20 accuracy and precision that can be achieved at intermediate resolution and with sufficient field data”

- This statement needs more robust analysis with a strong data set comparing predicted and actual values of models used in this study and other studies”

“The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock

2.19 ± 0.51 kg m−2.” . The results of the separate models show 36.2 ± 5.7 kg C m−2 and 2.66 ± 0.39 kg N m−2 for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m−2 and 2.17 ± 0.50 kg N m−2 for the mainland”
                             - Not sure how these values were calculated. Is it after considering AoA analysis?

Introduction
Authors need to clearly identify “what is meant by regional scale” in the context of different mapping scales. This is not clear, though event the title uses the term “Regional Synthesis”

Introduction lacks information on the “importance synthesis of SOC and N stocks data from previous work”

“Studies in between those scales are lacking, but necessary when quantifying regional carbon budgets”

– What really the scale authors trying to resolve here, this need to be very clear. Did authors have enough soil samples to adequately resolve this variability with a robust model validation? These need to be addressed.

Materials and Methods
“Soil property data was retrieved from existing publications (Table 1, Fig. 1), harmonised and converted into the depth intervals 0-30 cm and 30-100 cm”

- Authors need to explain how the harmonization was done, specially in-relation to sampling (depth, sampling configuration) and laboratory methods. Provided information are not adequate.
- How about the correction of data for coarse fraction of the soil?
2) “In addition to published data, new data from DTLBs that were sampled during a field campaign in April 2019 was added to this synthesis”
- DTLB data would have been used for validation of model/published data
- Do the DTLB data harmonize with data from other sources?
3) “ The landcover data (Bartsch et al., 2019b) was converted into binary variables and the dominating classes in the area were selected. Those were “dry to moist prostrate to erect dwarf shrub tundra” (LC_class4) and “moist to wet graminoid prostrate to erect dwarf shrub tundra” (LC_class5)”
              - How about the coverage of the study area by non-dominated classes (except LC 4 and LC5). Didn’t the sample coverage capture other land cover classes? If captures, it would be good to include all land cover classes for mapping. In this case, I would recommend to use either RandomForest or Ranger packages to run RF algorithm.
4) Our study uses the 20m product, as the 10m product was not available yet when the analysis was completed.
- How reasonable to do mapping at 10m resolution when explanatory variable/s are at more coarse resolution (20 m)
              5) Authors have mentioned about SCORPAN model. What is the reason for not incorporating the spatial autocorrelation for random forest models. At least, simply by including X and Y coordinates. I believe that the models would have been improved if done so!!!
6) I believe, other factors of SCORPAN model need to be incorporated in these models, specially climate factors (if variability and data exist), soil factor (e.g. surface geology), if not authors need to explain why other factors were not considered. If these factors adequately selected the differences of two areas would have been captured in a single model. I rather wish to argue that different models were needed because of the lack of predictor variables to model the variability of SOC and N across the entire area.
7) Also, authors would have used more distance related variables (e.g. distance to sea) to improve the model.
              8) Need to elaborate sample numbers available at both sites for mapping.
              9) Please see my observation on the biasedness in model comparison (entire area vs individual sites) given under discussion note 3).
Results
3.1 Data synthesis of regional carbon and nitrogen stocks
1) Authors need to provide a comparison of Couture et al. 2018 data at both sites, as it is the only sample set distributed across both sites. Otherwise, the comparison results would be biased to analytical and sampling methods.

3.2 Random Forest mapping and model validation assessment
1) Authors need to chose adequate validation indices to explain the precison and accuracy of predictions. Lin’s concordance correlation coefficient could be an added choice.
2) Model strengths (low R2) do not warrant to say the predictions are accurate than other studies done at larger scale.
3) Independent validation of data would make more trustworthy results.

3.3. Area of Applicability (AOA) and uncertainty with quantile regression forest
1) It is good that authors used AoA for their analysis. But, the issue here is it does not assess the overall accuracy of prediction. Non- AoA areas only identify those combinations of predicter variable space not adequately captured by RF models, a reflection of inadequacy of samples to capture the attribute space of predictor variables. Authors need to discuss the point clearly.
2) Based on the Table 5, what is the reasonable estimate of average SOC and N stocks in both areas?

Discussion

1) “Our study shows that already at a regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
              - This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks while comparing the same with previous studies.
2) “It is therefore advisable to analyse the values of the target variable at the sampling locations and the diversity of the landscape to ensure that the spatial variability in the landscape is reflected by the sampled sites”.
              - This statement needs more detail.
3) “The AOA method can be used to assess whether the heterogeneity of the landscape, ideally mirrored in the covariates, is captured by the sampling locations. Areas where this is not the case can be excluded from regional estimates and could be further used to determine new sampling sites for future field sampling campaigns”
              - Good point, but also the improvement of prediction model accuracy also important aspect to be considered.

4.2 Spatial mapping of carbon and nitrogen stocks with random forest, area of applicability and uncertainty
1) “Our analysis shows a substantial challenge in bridging from local- to regional-scale study areas”
              - Very important point to be capitalize in this study rather than trying to estimate SOC and N stocks using inadequate data base.

Line 247 need to be corrected as R2

Discussion given in L245-253. Is quite confusing to me. I would rather think this is a biased comparison of two models.

The model developed for the entire area has been cross validated using whole data set. But, two models developed for two sites have been validated using individual data sets of these sites. In this case, the question comes if validation indices for the RF model for entire are are calculated for two sites, the results would be the same. Others just need to separate out validation results of models for the entire area into mainland and the Island and provide R2, RMSE, MEE values.

L263 ((Fig. 7a and Fig. 1b?)

L260 -268: Agree that there is a difference between spatial distribution of SOC stocks in mainland when RF models are trained either using all data or main land data. But, how we should justify these spatial differences also show differences in prediction accuracies. To do so, as I mentioned under the point 3) validation results for the mainland data should be shown for both RF models developed using all data and mainland data. Same argument applies to Herschel Island.

4.3 Comparing local scale results to regional scale synthesis
              It would be good to use independent validation for such a comparison. But, rather I doubt that the available data are adequate!!!
Conclusion

Based on the analysis the study should conclude the most acceptable estimates of SOC and N Stocks. Difficult to understand whether authors have incorporated AoA analysis, and QRF analysis when concluding the work.
Citation: https://doi.org/10.5194/egusphere-2025-1052-RC1
- AC1:
  'Reply on RC1', Julia Wagner, 20 Aug 2025
  Dear anonymous reviewer,
  We sincerely appreciate the constructive feedback on our manuscript. Should we be granted the opportunity to submit a revised version, we are confident in our ability to address all the concerns raised and to substantially improve the quality of the manuscript. Below, we provide a detailed plan outlining our proposed responses to each comment.
  
  Kind regards on behalf of all listed authors,
  Julia Wagner
  
  The manuscript address few key areas
  Synthesis of SOC and N stocks data from previous research
  
  Upscaling of assimilated local data for detailed mapping of SOC and N stocks at fine resolution (10 m). (local to regional scale prediction)
  
  Overall quality:
  In terms of research questions trying to address, this is a good study and authors have attempted to some extent. Despites that, several key weaknesses have impacted and reduced the scientific rigor of this study. This study needs a thorough revision with a focus on limitations and potentials of using local data sets in spatial characterization of SOC and N stocks. In this context, some weaknesses noted are listed below.
  Giving priority to make estimates of SOC and N stocks, despite the data set used in this study is quite weak in spatial context.
  
  While we acknowledge that the dataset used in this study has limitations in spatial coverage, it represents the first comprehensive compilation of all available SOC and N data for the region. Given the generally sparse sampling across Arctic areas, this dataset reflects the best available synthesis to date and provides a valuable foundation for regional estimates. We wish to stress the enormous logistical challenges associated with soil sampling in remote permafrost regions; it is simply not comparable to sampling density one might be used to in more habituated regions for which national or regional soil monitoring programs exist. As a comparison, across the massive circumpolar northern permafrost region (which is 4 times the size of the European Union) only ca. 3000 soil profiles exist, sampled over a period of many decades. In light of this, the data density in our study region is comparatively very high for a remote permafrost area. Further our work shows that the data for the study area is clustered and not evenly distributed which can provide a foundation for future sampling efforts and model development.
  
  Lower prediction accuracy of prediction models, this could be due to inadequate sample distribution and/or inadequate explanatory variables used in RF models. These aspects need to be clearly understood.
  
  Similar to the previous comment, we acknowledge that the accuracy is lower than what can be achieved in more data-dense regions. But compared to other studies form permafrost landscapes, it is not low. We understand that the sampling distribution and sample availability has very likely affected the models producing low accuracies. Further we also acknowledge that the amount and type of explanatory variables has affected this as well. We will clarify in the discussion that these are major limitations of the study and affect the use and interpretation of the data.
  
  The comparison of random forest models developed using the “entire data set” and by “dividing dataset into two data sets (mainland and island)” look biased. Validation results of RF model developed using “entire data set” need to reported by partitioning the data set into mainland and the island. Then we can have a proper understanding of prediction accuracies, as pooling data would have resulted poor performances of the RF model. I have indicated this weakness under my comments for discussion part.
  
  Thank you for your feedback. In the revised version we will provide the validation results partitioned per site. We agree that using the validation results for the whole area is not comparable with the results of the individual models of each study site. I realized predicted values were not saved and quickly retrained the SOC model for the whole area 0-30cm (with the same seed and settings) and found out that the RMSE for Herschel Island data was 5.77 and for the mainland data was 7.03 which influences the interpretation results comparing the individual models with the “whole area” model. The plan for the revised manuscript would be to rerun the whole area models again and provide the new values and consider these in the discussion.
  Wish to quote the following “This study shows that regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
  
  - This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks (with weak models) while comparing the same with previous studies.
  e) More attention need to be given for accuracy and precision of predictions when reporting research outcomes and concluding.
  
  f) The use of AoA analysis is commendable, but need to be integrated with accuracy and precision of predictions.
  
  Thank you for pointing this out. Together with the change in title (below) we suggest to shift the focus of the manuscript towards discussing the Challenges rather than using the results as “absolute”. We suggest to add a chapter to the discussion where we discuss challenges and limitations including the usage of AOA and quantile regression forest.
  
  Title “Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast”
  I noticed that the findings are not well aligned with the title. By regional synthesis, authors have meant the compilation of research data from previous work in the area. I believe that rather than simple compilation, authors need to look at data harmonization (laboratory methods) and spatial harmonization already collected data when trying to synthesize them, enabling detailed mapping. The mapping part hardly could be a focus here due to a key limitation in the study, i.e. poor fit of random forest models for entire area as well as individual areas. Thus, it is misleading that this work has accomplished a successful mapping task compared to previous studies. I would rather expect a title as such “Challenges in the use of local data for reginal scale mapping of C and N stocks in a continuous permafrost zone of the Yukon Coastal Plain”
  
  We agree the study has not fully accomplished the task to provide maps with high accuracies and appreciate your suggestion of a new title and suggest to change the title to the following: “Challenges in the use of local data for reginal scale mapping of C and N stocks in the continuous permafrost zone at the Yukon Coastal Plain”
  
  Abstract
  Need to be rewritten with clear flow of research questions, objectives and results. Often, the sequence of flow of presentation is weak.
  
  “We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks”
  
  - What is meant by local differences, is it the differences between “coastal lowland area” vs “Herschel Island”. Need to define the local differences in relation to the spatial scale!!!
  
  We use the term local differences to mainly refer to the differences between Herschel Island and the coastal lowland area. We will clarify this in the rewritten version of the abstract.
  
  “We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island”
  
  – Normally we do modelling then mapping!!!! Not mapping then modelling!!!
  We agree that the written flow of the current version if the abstract does not follow the technical flow and will correct this in the updated version.
  
  “Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important”
  
  – This statement need to be justified with data coming out from the research”
  This statement refers to the differences in results (average SOC stocks) between creating separate models for “coastal lowland area” and “Herschel Island vs combining all data in one model for “the entire region”. Further, the explanatory variables picked by each model differ. The model including all data selects the DEM (elevation) as most important variable, this variable is a less important variable in the other models (Figure S1). This indicates that elevation likely separates the data into both areas. This statement sounds generalized and we will clarify what we are aiming to express.
  
  “The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock 2.19 ± 0.51 kg m−2”
  
  I suppose these are average values. Authors need to explain why these values are more accurate than reported values having found the accuracy and precision of random forest models developed for this study are not strong enough due to weaknesses in the soil data base!!!!
  
  The reported SOC and N stock estimates represent averages for the study region. While we acknowledge that the accuracy and precision of the random forest models are limited by gaps and uncertainties in the underlying soil database, this dataset includes substantially more sampling points and spatial coverage than most previous studies in the region, which often rely on averages by soil or land-cover class and smaller sample sizes. Our intention is not to suggest that these estimates are definitively more accurate than all prior work, but rather that they represent the best possible regional-scale assessment given current data availability.
  “The average SOC stocks vary significantly when including or excluding data in the predictive models”
  
  – What is meant by including and excluding data models. Theoretically, It has to be!!! Here must be very specific on what data!!! Dependent or independent variables. What is the key message intend to pass to the reader?
  
  Thank you for highlighting the need for clarification. The key message is that both the number of data points and their spatial distribution significantly influence the SOC stock estimates. Specifically, we compared models built using the full dataset, combining mainland and Herschel Island data with models developed separately for each area. These different approaches lead to notably different average SOC stock values, underlining how data inclusion and model structure affect results. We will revise the text to clearly specify that the comparison refers to the inclusion or exclusion of spatial subsets of the data (i.e., mainland vs. island) in the predictive models, to improve clarity for the reader.
  
  “Our results diverge from previous studies of lower resolution, showing the added regional-scale accuracy and precision that can be achieved at intermediate resolution and with sufficient field data”
  
  - This statement needs more robust analysis with a strong data set comparing predicted and actual values of models used in this study and other studies”
  We appreciate the suggestion to strengthen this statement. Our comparison refers to prior studies that either rely on class-based averaging with fewer region-specific data points or pan-Arctic scale analyses that, while including more data globally, have sparser sampling density in the study region itself. These methodological differences likely contribute to discrepancies in spatial resolution and local accuracy. While a direct quantitative comparison with those studies is limited by differences in data sources and scales, we will revise the manuscript to better explain these distinctions and clarify that our results reflect improved regional detail based on a more comprehensive, regionally focused dataset that includes new previously unpublished data for the region.
  “The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock
  
  2.19 ± 0.51 kg m−2.” . The results of the separate models show 36.2 ± 5.7 kg C m−2 and 2.66 ± 0.39 kg N m−2 for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m−2 and 2.17 ± 0.50 kg N m−2 for the mainland”
                               - Not sure how these values were calculated. Is it after considering AoA analysis?
  The here presented values include average values without considering the AoA. In our revised Abstract we will clarify this.
  
  Thank you for pointing out these aspects above about the Abstract. In the updated version of the Abstract we will clarify research questions, objectives and results while considering a clear flow of statements. We will also clarify the messages behind statements and be explicit on what is meant.
  
  Introduction
  Authors need to clearly identify “what is meant by regional scale” in the context of different mapping scales. This is not clear, though event the title uses the term “Regional Synthesis”
  
  Thank you for pointing this out. We are trying to put our study in the context of existing studies that map SOC and N stocks for permafrost areas.
  “Quantification and mapping of SOC using different methods have been carried out on pan-Arctic (Hugelius et al., 2014; Mishra et al., 2021) and local level (Obu et al., 2017; Palmtag et al., 2018; Siewert et al., 2015, 2016; Siewert, 2018; Wagner et al., 2023).” (line 37-39).
  Regional in this context refers to studies covering larger areas than the here mentioned “local studies” usually at the catchment scale or a smaller defined study area. Regional in this context means spanning over several catchments. The spatial extent of the study area introduces spatial variability in soil forming factors beyond those driven by landscape level factors such as catenary position or location within a catchment. We will make this point clearer in the introduction.
  
  Introduction lacks information on the “importance synthesis of SOC and N stocks data from previous work”
  
  Thank you for this comment. We acknowledge that the introduction could more explicitly state the importance of synthesizing SOC and N stock data from previous work. While the relevance of permafrost climate feedbacks is highlighted and the heterogeneity of the permafrost landscape, the rationale for synthesis benefits from further clarification. Many of the existing SOC and N measurements in the region were collected for very local studies, some without the intention of regional mapping or digital soil mapping. As a result, these valuable datasets remain fragmented and difficult to compare across scales. By compiling and harmonizing these data, our study creates the first consistent regional-scale dataset of SOC and N stocks for the Yukon coastal plain. This dataset provides a much-needed baseline for future studies, including digital soil mapping and model development, but also studies that estimate regional carbon and nitrogen pools.
  
  Importantly, our study not only combines previously published measurements but also incorporates new, previously unpublished data.
  We will incorporate this in the introduction.
  
  “Studies in between those scales are lacking, but necessary when quantifying regional carbon budgets”
  
  – What really the scale authors trying to resolve here, this need to be very clear. Did authors have enough soil samples to adequately resolve this variability with a robust model validation? These need to be addressed.
  Thank you for pointing this out! We will clarify in the revised version what we mean with regional scale. As mentioned above, we define regional as spanning across multiple catchments from a hydrological point of view. From a soil science perspective, the scale of the study is defined as soil region (Pachepsky and Hill 2017).
  
  Materials and Methods
  “Soil property data was retrieved from existing publications (Table 1, Fig. 1), harmonised and converted into the depth intervals 0-30 cm and 30-100 cm”
  
  - Authors need to explain how the harmonization was done, specially in-relation to sampling (depth, sampling configuration) and laboratory methods. Provided information are not adequate.
  Soil property data were compiled/ combined from multiple published sources and harmonized by converting reported values into the target depth intervals of 0–30 cm and 30–100 cm using weighted averaging. The laboratory methods differ with some studies using an elemental analyzer (e.g. Elementar vario EL III and Elementar vario MAX C in study Obu et al. 2017) or an elemental analyzer (CE Instrument EA 1110 elemental analyzer) coupled to an isotope ratio mass spectrometer (Thermo Fischer Scientific Instruments, Delta V Advantage) (Siewert et al. 2021) or coupled to a continuous-flow isotopic ratio mass spectrometer (IRMS, DeltaPlus, Finnigan MAT) (Wagner et al. 2023). Further, the timing of the fieldwork differs. Whereas the majority of the studies sampled during the summer (mostly July), the DTLB samples were taken as full frozen cores during spring 2019.
  These differences in sampling (active layer sampling + coring vs full frozen coring) have likely affected the results together with the use of different laboratory methods to derive OC content. We did not account for these differences in the compilation of the dataset. However, we can add a section in the methods that summarizes these differences and further mention in the discussion in the new proposed chapter about limitations that these add uncertainty to the data which we have not quantified.
  - How about the correction of data for coarse fraction of the soil?
  Most studies (except Ramage et al. 2019) mention that coarse fraction > 2mm was excluded. The analysis of the DTLB data had been coordinated together with the colleagues to ensure a similar protocol as used by the study Wagner et al. 2023. We will add this info to the method description.
  
  2) “In addition to published data, new data from DTLBs that were sampled during a field campaign in April 2019 was added to this synthesis”
  - DTLB data would have been used for validation of model/published data
  We would like to point to the explanation below (“Results 3) Independent validation of data would make more trustworthy results.). We have opted for a repeated k-fold crossvalidation approach to ensure a robust validation. The DTLB dataset does not cover the range of the entire dataset adequately (see Figure 2B). It further covers only one specific type of landforms and does not cover the entire variability of the entire study region. In case of the creation of an independent dataset, Data-splitting across all datasets would be more favorable.
  - Do the DTLB data harmonize with data from other sources?
  The DTLB data is sampled using methods and protocols that are consistent with established soil science methodology.
  3) “ The landcover data (Bartsch et al., 2019b) was converted into binary variables and the dominating classes in the area were selected. Those were “dry to moist prostrate to erect dwarf shrub tundra” (LC_class4) and “moist to wet graminoid prostrate to erect dwarf shrub tundra” (LC_class5)”8
                - How about the coverage of the study area by non-dominated classes (except LC 4 and LC5). Didn’t the sample coverage capture other land cover classes? If captures, it would be good to include all land cover classes for mapping. In this case, I would recommend to use either RandomForest or Ranger packages to run RF algorithm.
  Thank you for pointing this out. We used the RandomForest package implemented in the caret package in our study. 76 of the total of 211 sites fall within landcover class 4 and 48 within class 5. 36 sites are within landcover class 3 and 14 sites within class 6. Other classes are represented by 10 sites or less. We would suggest to run the analysis again including class 3 and 6 and the variables suggested below (see next comment).
  
  4) Our study uses the 20m product, as the 10m product was not available yet when the analysis was completed.
  - How reasonable to do mapping at 10m resolution when explanatory variable/s are at more coarse resolution (20 m)
  
  As suggested above, we would like to redo the analysis and incorporate more explanatory variables and include the updated product of the landcover at 10m spatial resolution. Furthermore, variables at not significantly coarser resolutions can still provide useful information. Many studies in digital soil mapping include covariates at different resolutions, including coarser resolutions and resample to the target resolution (e.g. Hengl et al. 2021: https://www.nature.com/articles/s41598-021-85639-y, Baltensweiler et al. 2021: https://www.sciencedirect.com/science/article/pii/S2352009421000821 or Deragon et al. 2023: https://cdnsciencepub.com/doi/full/10.1139/cjss-2022-0031?af=R
  
                5) Authors have mentioned about SCORPAN model. What is the reason for not incorporating the spatial autocorrelation for random forest models. At least, simply by including X and Y coordinates. I believe that the models would have been improved if done so!!!
  We will incorporate this if we are given the opportunity to revise the manuscript and run the models again.
  
  6) I believe, other factors of SCORPAN model need to be incorporated in these models, specially climate factors (if variability and data exist), soil factor (e.g. surface geology), if not authors need to explain why other factors were not considered. If these factors adequately selected the differences of two areas would have been captured in a single model. I rather wish to argue that different models were needed because of the lack of predictor variables to model the variability of SOC and N across the entire area.
  7) Also, authors would have used more distance related variables (e.g. distance to sea) to improve the model.
  
  We agree that using a variable at 20 m this introduces some uncertainties into the final map when targeting a resolution of 10m (point 4). Considering this aspect, and the points 5-7 we would suggest to run the models again including location X and Y, distance to the sea as variables and the updated landcover product at 10m resolution (including class 3 and 6). Additionally, we suggest to use the surficial Geology (Rampton 1982) and a binary variable of the extent of the ice sheet at the last glacial maximum as additional variables.
  We initially considered climate factors (Ground surface temperature and ground temperature at multiple depths, Bartsch et al. 2021). Those were only available at resolution of approx. 1 km and not finally used due to its coarse resolution.
  
                8) Need to elaborate sample numbers available at both sites for mapping.
  
  While we already added info on the number of sites, we did not include the number of actual samples. We can add a table in the supplement that includes the actual amounts of samples. In the compilation of the data, we included all individual sample data from the included studies except for Obu et al. 2017 and Siewert et al. 2021, where published data was already calculated into the depth increments 0-30 and 30-100 cm. However, the publications state 128 and 409 total samples respectively. In this table we will also add information on the sampling and laboratory analyses of the different studies (including date of the campaign, laboratory method to measure OC, N content, coring method, etc.).
  
                9) Please see my observation on the biasedness in model comparison (entire area vs individual sites) given under discussion note 3).
  See comment under note 3 in discussion.
  
  Results
  3.1 Data synthesis of regional carbon and nitrogen stocks
  1) Authors need to provide a comparison of Couture et al. 2018 data at both sites, as it is the only sample set distributed across both sites. Otherwise, the comparison results would be biased to analytical and sampling methods.
  Yes we will separate this data into data for each study area and will provide an updated Figure 2B.
  3.2 Random Forest mapping and model validation assessment
  1) Authors need to chose adequate validation indices to explain the precison and accuracy of predictions. Lin’s concordance correlation coefficient could be an added choice.
  Thank you for the suggestion. We can add this measure when we redo the analysis.
  
  2) Model strengths (low R2) do not warrant to say the predictions are accurate than other studies done at larger scale.
  3) Independent validation of data would make more trustworthy results.
  We agree that an independent validation by Data-splitting (that creates an independent test dataset and a separate training dataset), could produce more trustworthy results. We chose k-fold cross-validation to maximize the use of all available data for both model training and evaluation, while still obtaining unbiased performance estimates. Cross-validation is the standard evaluation strategy in digital soil mapping when data are scarce, as reflected in prior works. According to Piikki et al. (2021, https://bsssjournals.onlinelibrary.wiley.com/doi/10.1111/sum.12694), cross validation was the most commonly used method with 43 % across all considered studies within this review. Data-splitting, (which we could have used), was applied in 31% of the considered studies in this review.
  We would like to cite the following passage of this review study: “Data-splitting is also a problem in studies with relatively few samples, because models created by a smaller number of observations can be less accurate, and validation in that case can underestimate the accuracy of the mapping (when all data are used). Cross-validation produces much more stable results because it uses all data for validation and should be preferred over data-splitting.”
  After carefully considering this aspect, we decided to apply a repeated k-fold crossvalidation approach.
  3.3. Area of Applicability (AOA) and uncertainty with quantile regression forest
  1) It is good that authors used AoA for their analysis. But, the issue here is it does not assess the overall accuracy of prediction. Non- AoA areas only identify those combinations of predicter variable space not adequately captured by RF models, a reflection of inadequacy of samples to capture the attribute space of predictor variables. Authors need to discuss the point clearly.
  We are aware that the AOA only evaluates whether the feature space is adequately covered by the sample data and not assesses the uncertainty of the model. Therefore, we used the quantile regression forest to estimate uncertainty. Though the combination of both methods is valuable to assess the results. We will clarify this better in the discussion.
  2) Based on the Table 5, what is the reasonable estimate of average SOC and N stocks in both areas?
  The SOC and N stock values reported in the Abstract and Conclusions were derived from individual random forest models for each site, excluding the AOA-based results due to the method’s limitations. Nevertheless, we calculated the AOA-based estimates for comparative purposes and to discuss associated uncertainties, as presented in Table 5.
  
  Discussion
  1) “Our study shows that already at a regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
                - This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks while comparing the same with previous studies.
  We will give focus to this point in the suggested added subchapter in the discussion about the limitations
  
  2) “It is therefore advisable to analyse the values of the target variable at the sampling locations and the diversity of the landscape to ensure that the spatial variability in the landscape is reflected by the sampled sites”.
                - This statement needs more detail.
  
  We agree that this statement is not clear. Here we refer to the use of the AoA (the stament that follows this) that puts the covariate values at the sampling locations in the context of the whole area and where the feature space of the covariates is not covered by the soil sampling locations. This information can be used to plan future sampling campaigns that add soil sampling locations in areas where feature space is not covered. We will clarify this in the revised version and connect the statements.
  3) “The AOA method can be used to assess whether the heterogeneity of the landscape, ideally mirrored in the covariates, is captured by the sampling locations. Areas where this is not the case can be excluded from regional estimates and could be further used to determine new sampling sites for future field sampling campaigns”
                - Good point, but also the improvement of prediction model accuracy also important aspect to be considered.
  Yes, we agree and will emphasize this in the added chapter about the limitations of the study.
  4.2 Spatial mapping of carbon and nitrogen stocks with random forest, area of applicability and uncertainty
  1) “Our analysis shows a substantial challenge in bridging from local- to regional-scale study areas”
                - Very important point to be capitalize in this study rather than trying to estimate SOC and N stocks using inadequate data base.
  Yes, we will emphasize this in the discussion.
  Line 247 need to be corrected as R2
  
  We will correct this typo.
  Discussion given in L245-253. Is quite confusing to me. I would rather think this is a biased comparison of two models.
  
  We agree that this section is written a bit fuzzy and mixes scale, soil heterogeneity, and predictor quality. Here we compared the R2 values of separate models for “all data”, “mainland” and “Herschel Island” for 0-30 and 30-100 cm depth – 6 models in total which are not directly comparable due to different number of training points and selected covariates by the model..
  The main aim of this paragraph was to point out the heterogeneity in digital soil mapping persists at different scales and propose to change the paragraph to the following:
  “Our analysis highlights the challenge of scaling from local (pedon) to regional study areas. At the pedon scale, permafrost soils are highly heterogeneous (Siewert et al., 2021), and this variability is still evident at coarser spatial scales. The models for the entire area explain only a small fraction of spatial heterogeneity in SOC and N stocks (R² = 0.17 - 0.24), while models for the mainland perform slightly worse (R² = 0.11 - 0.18). In contrast, models for Herschel Island show higher R² values for SOC (0.28 - 0.35), which could be due to a combination of factors: (1) lower natural soil variability on Herschel Island, (2) better representation of the soil data, or (3) stronger relationships between the predictors and SOC at this site.”
  
  The model developed for the entire area has been cross validated using whole data set. But, two models developed for two sites have been validated using individual data sets of these sites. In this case, the question comes if validation indices for the RF model for entire are are calculated for two sites, the results would be the same. Others just need to separate out validation results of models for the entire area into mainland and the Island and provide R2, RMSE, MEE values.
  This aspect has been considered above under Overall quality point 3. In the revised version we will provide the validation results partitioned per site.
  
  L263 ((Fig. 7a and Fig. 1b?)
  
  This is an error and should mean (Fig, 7 1a and 1b). There is a further mistake at the end of the same line which will be corrected to (Fig. 7 2a and 2b).
  L260 -268: Agree that there is a difference between spatial distribution of SOC stocks in mainland when RF models are trained either using all data or main land data. But, how we should justify these spatial differences also show differences in prediction accuracies. To do so, as I mentioned under the point 3) validation results for the mainland data should be shown for both RF models developed using all data and mainland data. Same argument applies to Herschel Island.
  
  As already mentioned under point 3), in the revised version we will provide the validation results partitioned per site.
  4.3 Comparing local scale results to regional scale synthesis
                It would be good to use independent validation for such a comparison. But, rather I doubt that the available data are adequate!!!
  As mentioned already above to make use of the full dataset we decided to apply repeated k-fold crossvalidation approach to ensure a robust validation.
  
  Conclusion
  
  Based on the analysis the study should conclude the most acceptable estimates of SOC and N Stocks. Difficult to understand whether authors have incorporated AoA analysis, and QRF analysis when concluding the work.
  We will clarify in the revised version on which method the concluding numbers are based.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1052-AC1
RC2:
'Comment on egusphere-2025-1052', Anonymous Referee #1, 09 May 2025

Good attempt!!! Just got a couple of questions from this work.
1) It's not much clear what techniques are used to synthesise already collected data. Can you describe them
2) Does the AoA analysis also assess the prediction accuracy of models?
3) Since prediction models are not quite strong, how would you confidently say that estimated SOC and N stocks are superior to already reported values?
4) I could not link the SOC and N Stock values listed in the abstract with the results.
5) How did you tackle very poor sample distribution that came out from isolated studies when developing prediction models? Did you try data declustering approaches?
6) Application of geostatistical analysis would improve interpretations of short scale variability at two sites.

Citation: https://doi.org/10.5194/egusphere-2025-1052-RC2
- AC2: 'Reply on RC2', Julia Wagner, 20 Aug 2025
  
  Dear anonymous reviewer,
  We sincerely appreciate the constructive feedback above on our manuscript. Below we answer the additional questions you raised about the study.
  
  Kind regards on behalf of all listed authors,
  Julia Wagner
  
  Good attempt!!! Just got a couple of questions from this work.
  1) It's not much clear what techniques are used to synthesise already collected data. Can you describe them
  The study synthesizes existing datasets by assembling previously collected measurements from various sources into a single dataset and converting them to standard depth intervals (0–30 cm and 30–100 cm) for consistency in subsequent analyses. The researchers working on permafrost soil carbon, especially in that region (including co-authors of this study) are well connected and follow standardized protocols for sampling of permafrost soils (e.g. Ping et al. 2013: https://acsess.onlinelibrary.wiley.com/doi/epdf/10.2136/sh12-09-0027 and Palmtag et al. 2022: https://essd.copernicus.org/articles/14/4095/2022/ )
  Does the AoA analysis also assess the prediction accuracy of models?
  The AoA analysis itself does not directly measure prediction accuracy. It measures the similarity of new data to the training data, with the DI threshold defined in a way that is consistent with the mode’s cross-validation process.
  The AoA can be used alongside the accuracy measures. Accuracy metrics (e.g., RMSE, R²) quantify how well the model performs on known data. The AoA then quantifies where those performance expectations are likely to hold, based on predictor similarity.
  
  3) Since prediction models are not quite strong, how would you confidently say that estimated SOC and N stocks are superior to already reported values?
  The estimates presented in this study are based on the most up-to-date dataset currently available for the region. While we do not claim that our results represent a definitive (or even correct) assessment, they provide the first estimates of SOC and N stocks derived from a digital soil mapping approach using random forest modelling for this area. As such, they offer a robust starting point for further analyses, targeted field campaigns, and methodological refinements. It is also important to note that many previously reported values rely on generalized landform or landscape class-based approaches, whereas our analysis uses continuous predictor variables and a spatially explicit modelling framework, providing finer resolution and potentially greater relevance for site-specific applications.
  4) I could not link the SOC and N Stock values listed in the abstract with the results.
  These results are calculated from the sum of the raster values of the prediction results for 0-30 cm and 30-100 cm for a mosaic that combines the individual predictions for Herschel Island and the mainland, as we conclude individual models for each area due to the geological differences. We will add an explanation in the result. In case we do the suggested reanalysis, we will adjust these numbers.
  
  5) How did you tackle very poor sample distribution that came out from isolated studies when developing prediction models? Did you try data declustering approaches?
  Models were trained using the available datasets in their collected form, which we acknowledge contained uneven sample distributions due to the isolated nature of the source studies. Indeed, the AOA analyses identifies areas where the feature space is adequately covered by the sample data. But given the focus of the manuscript on exploring using local data to bridge scales we considered our approach acceptable. If we are given the chance to revise the manuscript which means also our proposed rerun of the models we propose to use spatial crossvalidation instead of a purely random crossvalidation approach to account for the spatially uneven distribution of the training data in our models. We recommend that future studies could explore declustering approaches to improve representativeness.
  
  6) Application of geostatistical analysis would improve interpretations of short scale variability at two sites.
  While we acknowledge the suggestion to include the spatial location in the random forest modelling framework to account for spatial autocorrelation, we wish to maintain our focus on a machine-learning–based approach, as this aligns with the objectives of our study, following previous research from this region which emphasize the exploration of machine learning methods for prediction of soil parameters in permafrost regions (e.g. Siewert et al. 2021).
  
  Citation: https://doi.org/10.5194/egusphere-2025-1052-AC2
RC3:
'Comment on egusphere-2025-1052', Anonymous Referee #2, 30 Jun 2025

The paper is well-written, but the novelty is not clearly demonstrated. The authors discuss the upscaling from local to regional scale as one of the issues, but this is common to all mapping exercises of soil properties. The use of digital soil mapping is well-known and it is not clear what the specific research gap is. Surely, these permafrost areas play an important role in the global carbon cycle, but as it stands, I can see mainly see the local interest. The methodology is not clearly explained and particularly details on area of interest and dissimilarity index are lacking. More importantly, I am not sure that the stocks and their standard deviations are correctly calculated. If they are calculated based on the calibration points, they do not necessarily represent the spatial patterns correctly. If they are calculated on pixel values, there is a methodological flaw, as the spatial auto correlation is not accounted for. The lack of clarity is the main reason that I was not able to review the discussion section.

Lines 39-42 I am not sure that I understand the problem of scalability. In particular because you mention that you use the same data sets. Would not then the local scale simply be a cut-out of the regional scale?
Lines 53 -57 Digital soil mapping also depends on the scale at which the co-variates are available. I am not sure why this widely applied methodology is presented as the solution to the scale problem.
Line 100_107 The sampling protocols of the previous campaigns are explained, but there is no information on the coring devices, bulk density measurements or C analysis. It is tricky to use the SOC stocks from different campaigns in a joint data analysis.
Figure 5 The way that the area of applicability and dissimilarity index are calculated is missing in the materials and methods section. Please add a short description including equations.
Section 3.2 Were the average stocks calculated on the calibration data set?
Section 3.3 Here you also calculate the standard deviation. Is this based on the calibration data set? You mention model results according to the AOA. Are you sure that there are enough sample points for the results to be meaningful? I hope that you did not calculate the std based on the pixel estimates, as these are spatially auto correlated. I start doubting when I see table 5. Please give the number of calibration points in this table.
Line 247 What are the ‘m^-2’ values?

Citation: https://doi.org/10.5194/egusphere-2025-1052-RC3
- AC3: 'Reply on RC3', Julia Wagner, 20 Aug 2025
  
  Dear anonymous reviewer,
  We sincerely appreciate the constructive feedback on our manuscript. Should we be granted the opportunity to submit a revised version, we are confident in our ability to address all the concerns raised and to substantially improve the quality of the manuscript. Below, we provide a detailed plan outlining our proposed responses to each comment.
  
  Kind regards on behalf of all listed authors,
  Julia Wagner
  
  The paper is well-written, but the novelty is not clearly demonstrated. The authors discuss the upscaling from local to regional scale as one of the issues, but this is common to all mapping exercises of soil properties. The use of digital soil mapping is well-known and it is not clear what the specific research gap is. Surely, these permafrost areas play an important role in the global carbon cycle, but as it stands, I can see mainly see the local interest. The methodology is not clearly explained and particularly details on area of interest and dissimilarity index are lacking. More importantly, I am not sure that the stocks and their standard deviations are correctly calculated. If they are calculated based on the calibration points, they do not necessarily represent the spatial patterns correctly. If they are calculated on pixel values, there is a methodological flaw, as the spatial auto correlation is not accounted for. The lack of clarity is the main reason that I was not able to review the discussion section.
  
  The main points of concern are the following: Novelty and research gap, global relevance beyond local interest and methodological flaws
  We would like to state that the permafrost region is (a) remote (b) data-poor (c) logistically extremely challenging and (d) cuts across many countries it is unlikely that systematic data collection designed for broad-scale analyses will occur (as is done elsewhere via national soil surveys). Therefore, we are left to assessment of data based on various aggregated data sources, each designed for specific local scale analyses. We wish to assess the possibilities to bridge form local to regional settings in these circumstances using DSM. While the maps are regionally specific the lessons learned, e.g. challenges of assessing unsampled regions gleaned from AOA analyses are likely to apply to other permafrost tundra regions.
  Lines 39-42 I am not sure that I understand the problem of scalability. In particular because you mention that you use the same data sets. Would not then the local scale simply be a cut-out of the regional scale?
  We are not treating the local scale as a simple cut-out of the regional map. With the term local, we refer to the individual studies that created the datasets included in out study, e.g. Wagner et al. 2023 or Obu et al. 2017). For local models, only the dataset from that specific area is used for training and prediction. For the regional model, however, we combine multiple datasets from across the study area. This means the two approaches are not nested, but instead represent different ways of using available data. In addition, we explore whether information from sampled regions can be transferred to unsampled areas, and where model applicability breaks down.
  Lines 53 -57 Digital soil mapping also depends on the scale at which the co-variates are available. I am not sure why this widely applied methodology is presented as the solution to the scale problem.
  Here we do not present the absolute solution to the scale problem, but rather using DSM to map SOC and N stocks over a larger area than previous studies. With emerging technologies, more high-resolution datasets suitable for DSM are being developed in Arctic regions, which may help to partially address, but not eliminate scale-related challenges. DSM is not as widely applied in arctic regions as in other regions in the world. Traditionally, studies used upscaling through thematic maps for example soil maps, landcover classes or geological classes. Studies to date apply DSM pan-Arctic using the still sparsely available pan-Arctic soil data. In contrast very local studies using DSM exist for Arctic Canada (f.ex. Wagner et al. 2023).
  For Alaska in contrast more regional studies exists due to higher data availability (Mishra and Riley 2012 and more recently: Minai et al. 2025 and Ainuddin et al. 2024).
  Line 100_107 The sampling protocols of the previous campaigns are explained, but there is no information on the coring devices, bulk density measurements or C analysis. It is tricky to use the SOC stocks from different campaigns in a joint data analysis.
  
  Reviewer 1 raised a similar concern. We will provide a table in the supplement with information on the sampling and laboratory analyses of the different studies (including date of the campaign, laboratory method to measure OC, N content, coring method, number of samples, etc.). Further we would like to mention that in the community of permafrost soil researchers well established and common protocols are applied to ensure data comparability.
  
  Figure 5 The way that the area of applicability and dissimilarity index are calculated is missing in the materials and methods section. Please add a short description including equations.
  To keep the manuscript concise, we refer to the publication by Meyer and Pebesma, 2021. However, we can add a summary to the supplement.
  Section 3.2 Were the average stocks calculated on the calibration data set?
  The average SOC/TN stocks presented in this section refers to the spatially average stocks calculated form the gridded output maps. This method of calculating mean landscape SOC/N stocks is consistent with the approach taken by earlier studies in this region (shown in table 3). We do not present the stocks calculated as the arithmetic mean of the available soil pedon observations. We suggest to expand table 5 and display the mean SOC and TN stocks from the soil pedon data next to the mean from the predicted output maps.
  To validate the models, we used repeated k-fold crossvalidation. We did not set aside a subset of the datapoints for independent validation to make use of the full dataset for model training due to the limited amount of datapoints.
  Section 3.3 Here you also calculate the standard deviation. Is this based on the calibration data set? You mention model results according to the AOA. Are you sure that there are enough sample points for the results to be meaningful? I hope that you did not calculate the std based on the pixel estimates, as these are spatially auto correlated. I start doubting when I see table 5. Please give the number of calibration points in this table.
  Standard deviations are based on the pixel values from the gridded output maps. We applied the AoA method to evaluate areas where the predicted results are meaningful. This method assesses the feature space of the predictors and identifies areas where the feature space is not covered by the sampling locations, thus identifies areas where the accuracies of the model are not valid according to Meyer and Pebesma (2021). In table 4 we mention the number of samples the models are based on. We did not set aside a subset of the data for an independent validation. The models are based on all available data and evaluated using repeated k-fold crossvalidation. K-fold cross-validation is a technique that splits data into k subsets, trains on k–1 of them, and tests on the remaining one, repeating this process k times to evaluate model performance more reliably.
  
  Line 247 What are the ‘m^-2’ values?
  Thank you for pointing this out. This is a typo and will be corrected into R^-2
  
  Citation: https://doi.org/10.5194/egusphere-2025-1052-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1052', Anonymous Referee #1, 09 May 2025
The manuscript address few key areas
Synthesis of SOC and N stocks data from previous research

Upscaling of assimilated local data for detailed mapping of SOC and N stocks at fine resolution (10 m). (local to regional scale prediction)

Overall quality:
In terms of research questions trying to address, this is a good study and authors have attempted to some extent. Despites that, several key weaknesses have impacted and reduced the scientific rigor of this study. This study needs a thorough revision with a focus on limitations and potentials of using local data sets in spatial characterization of SOC and N stocks. In this context, some weaknesses noted are listed below.
Giving priority to make estimates of SOC and N stocks, despite the data set used in this study is quite weak in spatial context.

Lower prediction accuracy of prediction models, this could be due to inadequate sample distribution and/or inadequate explanatory variables used in RF models. These aspects need to be clearly understood.

The comparison of random forest models developed using the “entire data set” and by “dividing dataset into two data sets (mainland and island)” look biased. Validation results of RF model developed using “entire data set” need to reported by partitioning the data set into mainland and the island. Then we can have a proper understanding of prediction accuracies, as pooling data would have resulted poor performances of the RF model. I have indicated this weakness under my comments for discussion part.

Wish to quote the following “This study shows that regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”

- This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks (with weak models) while comparing the same with previous studies.
e) More attention need to be given for accuracy and precision of predictions when reporting research outcomes and concluding.

f) The use of AoA analysis is commendable, but need to be integrated with accuracy and precision of predictions.

Title “Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast”
I noticed that the findings are not well aligned with the title. By regional synthesis, authors have meant the compilation of research data from previous work in the area. I believe that rather than simple compilation, authors need to look at data harmonization (laboratory methods) and spatial harmonization already collected data when trying to synthesize them, enabling detailed mapping. The mapping part hardly could be a focus here due to a key limitation in the study, i.e. poor fit of random forest models for entire area as well as individual areas. Thus, it is misleading that this work has accomplished a successful mapping task compared to previous studies. I would rather expect a title as such “Challenges in the use of local data for reginal scale mapping of C and N stocks in a continuous permafrost zone of the Yukon Coastal Plain”

Abstract
Need to be rewritten with clear flow of research questions, objectives and results. Often, the sequence of flow of presentation is weak.

“We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks”

- What is meant by local differences, is it the differences between “coastal lowland area” vs “Herschel Island”. Need to define the local differences in relation to the spatial scale!!!

“We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island”

– Normally we do modelling then mapping!!!! Not mapping then modelling!!!

“Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important”

– This statement need to be justified with data coming out from the research”

“The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock 2.19 ± 0.51 kg m−2”

I suppose these are average values. Authors need to explain why these values are more accurate than reported values having found the accuracy and precision of random forest models developed for this study are not strong enough due to weaknesses in the soil data base!!!!

“The average SOC stocks vary significantly when including or excluding data in the predictive models”

– What is meant by including and excluding data models. Theoretically, It has to be!!! Here must be very specific on what data!!! Dependent or independent variables. What is the key message intend to pass to the reader?

“Our results diverge from previous studies of lower resolution, showing the added regional-scale 20 accuracy and precision that can be achieved at intermediate resolution and with sufficient field data”

- This statement needs more robust analysis with a strong data set comparing predicted and actual values of models used in this study and other studies”

“The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock

2.19 ± 0.51 kg m−2.” . The results of the separate models show 36.2 ± 5.7 kg C m−2 and 2.66 ± 0.39 kg N m−2 for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m−2 and 2.17 ± 0.50 kg N m−2 for the mainland”
                             - Not sure how these values were calculated. Is it after considering AoA analysis?

Introduction
Authors need to clearly identify “what is meant by regional scale” in the context of different mapping scales. This is not clear, though event the title uses the term “Regional Synthesis”

Introduction lacks information on the “importance synthesis of SOC and N stocks data from previous work”

“Studies in between those scales are lacking, but necessary when quantifying regional carbon budgets”

– What really the scale authors trying to resolve here, this need to be very clear. Did authors have enough soil samples to adequately resolve this variability with a robust model validation? These need to be addressed.

Materials and Methods
“Soil property data was retrieved from existing publications (Table 1, Fig. 1), harmonised and converted into the depth intervals 0-30 cm and 30-100 cm”

- Authors need to explain how the harmonization was done, specially in-relation to sampling (depth, sampling configuration) and laboratory methods. Provided information are not adequate.
- How about the correction of data for coarse fraction of the soil?
2) “In addition to published data, new data from DTLBs that were sampled during a field campaign in April 2019 was added to this synthesis”
- DTLB data would have been used for validation of model/published data
- Do the DTLB data harmonize with data from other sources?
3) “ The landcover data (Bartsch et al., 2019b) was converted into binary variables and the dominating classes in the area were selected. Those were “dry to moist prostrate to erect dwarf shrub tundra” (LC_class4) and “moist to wet graminoid prostrate to erect dwarf shrub tundra” (LC_class5)”
              - How about the coverage of the study area by non-dominated classes (except LC 4 and LC5). Didn’t the sample coverage capture other land cover classes? If captures, it would be good to include all land cover classes for mapping. In this case, I would recommend to use either RandomForest or Ranger packages to run RF algorithm.
4) Our study uses the 20m product, as the 10m product was not available yet when the analysis was completed.
- How reasonable to do mapping at 10m resolution when explanatory variable/s are at more coarse resolution (20 m)
              5) Authors have mentioned about SCORPAN model. What is the reason for not incorporating the spatial autocorrelation for random forest models. At least, simply by including X and Y coordinates. I believe that the models would have been improved if done so!!!
6) I believe, other factors of SCORPAN model need to be incorporated in these models, specially climate factors (if variability and data exist), soil factor (e.g. surface geology), if not authors need to explain why other factors were not considered. If these factors adequately selected the differences of two areas would have been captured in a single model. I rather wish to argue that different models were needed because of the lack of predictor variables to model the variability of SOC and N across the entire area.
7) Also, authors would have used more distance related variables (e.g. distance to sea) to improve the model.
              8) Need to elaborate sample numbers available at both sites for mapping.
              9) Please see my observation on the biasedness in model comparison (entire area vs individual sites) given under discussion note 3).
Results
3.1 Data synthesis of regional carbon and nitrogen stocks
1) Authors need to provide a comparison of Couture et al. 2018 data at both sites, as it is the only sample set distributed across both sites. Otherwise, the comparison results would be biased to analytical and sampling methods.

3.2 Random Forest mapping and model validation assessment
1) Authors need to chose adequate validation indices to explain the precison and accuracy of predictions. Lin’s concordance correlation coefficient could be an added choice.
2) Model strengths (low R2) do not warrant to say the predictions are accurate than other studies done at larger scale.
3) Independent validation of data would make more trustworthy results.

3.3. Area of Applicability (AOA) and uncertainty with quantile regression forest
1) It is good that authors used AoA for their analysis. But, the issue here is it does not assess the overall accuracy of prediction. Non- AoA areas only identify those combinations of predicter variable space not adequately captured by RF models, a reflection of inadequacy of samples to capture the attribute space of predictor variables. Authors need to discuss the point clearly.
2) Based on the Table 5, what is the reasonable estimate of average SOC and N stocks in both areas?

Discussion

1) “Our study shows that already at a regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
              - This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks while comparing the same with previous studies.
2) “It is therefore advisable to analyse the values of the target variable at the sampling locations and the diversity of the landscape to ensure that the spatial variability in the landscape is reflected by the sampled sites”.
              - This statement needs more detail.
3) “The AOA method can be used to assess whether the heterogeneity of the landscape, ideally mirrored in the covariates, is captured by the sampling locations. Areas where this is not the case can be excluded from regional estimates and could be further used to determine new sampling sites for future field sampling campaigns”
              - Good point, but also the improvement of prediction model accuracy also important aspect to be considered.

4.2 Spatial mapping of carbon and nitrogen stocks with random forest, area of applicability and uncertainty
1) “Our analysis shows a substantial challenge in bridging from local- to regional-scale study areas”
              - Very important point to be capitalize in this study rather than trying to estimate SOC and N stocks using inadequate data base.

Line 247 need to be corrected as R2

Discussion given in L245-253. Is quite confusing to me. I would rather think this is a biased comparison of two models.

The model developed for the entire area has been cross validated using whole data set. But, two models developed for two sites have been validated using individual data sets of these sites. In this case, the question comes if validation indices for the RF model for entire are are calculated for two sites, the results would be the same. Others just need to separate out validation results of models for the entire area into mainland and the Island and provide R2, RMSE, MEE values.

L263 ((Fig. 7a and Fig. 1b?)

L260 -268: Agree that there is a difference between spatial distribution of SOC stocks in mainland when RF models are trained either using all data or main land data. But, how we should justify these spatial differences also show differences in prediction accuracies. To do so, as I mentioned under the point 3) validation results for the mainland data should be shown for both RF models developed using all data and mainland data. Same argument applies to Herschel Island.

4.3 Comparing local scale results to regional scale synthesis
              It would be good to use independent validation for such a comparison. But, rather I doubt that the available data are adequate!!!
Conclusion

Based on the analysis the study should conclude the most acceptable estimates of SOC and N Stocks. Difficult to understand whether authors have incorporated AoA analysis, and QRF analysis when concluding the work.
Citation: https://doi.org/10.5194/egusphere-2025-1052-RC1
- AC1:
  'Reply on RC1', Julia Wagner, 20 Aug 2025
  Dear anonymous reviewer,
  We sincerely appreciate the constructive feedback on our manuscript. Should we be granted the opportunity to submit a revised version, we are confident in our ability to address all the concerns raised and to substantially improve the quality of the manuscript. Below, we provide a detailed plan outlining our proposed responses to each comment.
  
  Kind regards on behalf of all listed authors,
  Julia Wagner
  
  The manuscript address few key areas
  Synthesis of SOC and N stocks data from previous research
  
  Upscaling of assimilated local data for detailed mapping of SOC and N stocks at fine resolution (10 m). (local to regional scale prediction)
  
  Overall quality:
  In terms of research questions trying to address, this is a good study and authors have attempted to some extent. Despites that, several key weaknesses have impacted and reduced the scientific rigor of this study. This study needs a thorough revision with a focus on limitations and potentials of using local data sets in spatial characterization of SOC and N stocks. In this context, some weaknesses noted are listed below.
  Giving priority to make estimates of SOC and N stocks, despite the data set used in this study is quite weak in spatial context.
  
  While we acknowledge that the dataset used in this study has limitations in spatial coverage, it represents the first comprehensive compilation of all available SOC and N data for the region. Given the generally sparse sampling across Arctic areas, this dataset reflects the best available synthesis to date and provides a valuable foundation for regional estimates. We wish to stress the enormous logistical challenges associated with soil sampling in remote permafrost regions; it is simply not comparable to sampling density one might be used to in more habituated regions for which national or regional soil monitoring programs exist. As a comparison, across the massive circumpolar northern permafrost region (which is 4 times the size of the European Union) only ca. 3000 soil profiles exist, sampled over a period of many decades. In light of this, the data density in our study region is comparatively very high for a remote permafrost area. Further our work shows that the data for the study area is clustered and not evenly distributed which can provide a foundation for future sampling efforts and model development.
  
  Lower prediction accuracy of prediction models, this could be due to inadequate sample distribution and/or inadequate explanatory variables used in RF models. These aspects need to be clearly understood.
  
  Similar to the previous comment, we acknowledge that the accuracy is lower than what can be achieved in more data-dense regions. But compared to other studies form permafrost landscapes, it is not low. We understand that the sampling distribution and sample availability has very likely affected the models producing low accuracies. Further we also acknowledge that the amount and type of explanatory variables has affected this as well. We will clarify in the discussion that these are major limitations of the study and affect the use and interpretation of the data.
  
  The comparison of random forest models developed using the “entire data set” and by “dividing dataset into two data sets (mainland and island)” look biased. Validation results of RF model developed using “entire data set” need to reported by partitioning the data set into mainland and the island. Then we can have a proper understanding of prediction accuracies, as pooling data would have resulted poor performances of the RF model. I have indicated this weakness under my comments for discussion part.
  
  Thank you for your feedback. In the revised version we will provide the validation results partitioned per site. We agree that using the validation results for the whole area is not comparable with the results of the individual models of each study site. I realized predicted values were not saved and quickly retrained the SOC model for the whole area 0-30cm (with the same seed and settings) and found out that the RMSE for Herschel Island data was 5.77 and for the mainland data was 7.03 which influences the interpretation results comparing the individual models with the “whole area” model. The plan for the revised manuscript would be to rerun the whole area models again and provide the new values and consider these in the discussion.
  Wish to quote the following “This study shows that regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
  
  - This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks (with weak models) while comparing the same with previous studies.
  e) More attention need to be given for accuracy and precision of predictions when reporting research outcomes and concluding.
  
  f) The use of AoA analysis is commendable, but need to be integrated with accuracy and precision of predictions.
  
  Thank you for pointing this out. Together with the change in title (below) we suggest to shift the focus of the manuscript towards discussing the Challenges rather than using the results as “absolute”. We suggest to add a chapter to the discussion where we discuss challenges and limitations including the usage of AOA and quantile regression forest.
  
  Title “Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast”
  I noticed that the findings are not well aligned with the title. By regional synthesis, authors have meant the compilation of research data from previous work in the area. I believe that rather than simple compilation, authors need to look at data harmonization (laboratory methods) and spatial harmonization already collected data when trying to synthesize them, enabling detailed mapping. The mapping part hardly could be a focus here due to a key limitation in the study, i.e. poor fit of random forest models for entire area as well as individual areas. Thus, it is misleading that this work has accomplished a successful mapping task compared to previous studies. I would rather expect a title as such “Challenges in the use of local data for reginal scale mapping of C and N stocks in a continuous permafrost zone of the Yukon Coastal Plain”
  
  We agree the study has not fully accomplished the task to provide maps with high accuracies and appreciate your suggestion of a new title and suggest to change the title to the following: “Challenges in the use of local data for reginal scale mapping of C and N stocks in the continuous permafrost zone at the Yukon Coastal Plain”
  
  Abstract
  Need to be rewritten with clear flow of research questions, objectives and results. Often, the sequence of flow of presentation is weak.
  
  “We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks”
  
  - What is meant by local differences, is it the differences between “coastal lowland area” vs “Herschel Island”. Need to define the local differences in relation to the spatial scale!!!
  
  We use the term local differences to mainly refer to the differences between Herschel Island and the coastal lowland area. We will clarify this in the rewritten version of the abstract.
  
  “We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island”
  
  – Normally we do modelling then mapping!!!! Not mapping then modelling!!!
  We agree that the written flow of the current version if the abstract does not follow the technical flow and will correct this in the updated version.
  
  “Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important”
  
  – This statement need to be justified with data coming out from the research”
  This statement refers to the differences in results (average SOC stocks) between creating separate models for “coastal lowland area” and “Herschel Island vs combining all data in one model for “the entire region”. Further, the explanatory variables picked by each model differ. The model including all data selects the DEM (elevation) as most important variable, this variable is a less important variable in the other models (Figure S1). This indicates that elevation likely separates the data into both areas. This statement sounds generalized and we will clarify what we are aiming to express.
  
  “The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock 2.19 ± 0.51 kg m−2”
  
  I suppose these are average values. Authors need to explain why these values are more accurate than reported values having found the accuracy and precision of random forest models developed for this study are not strong enough due to weaknesses in the soil data base!!!!
  
  The reported SOC and N stock estimates represent averages for the study region. While we acknowledge that the accuracy and precision of the random forest models are limited by gaps and uncertainties in the underlying soil database, this dataset includes substantially more sampling points and spatial coverage than most previous studies in the region, which often rely on averages by soil or land-cover class and smaller sample sizes. Our intention is not to suggest that these estimates are definitively more accurate than all prior work, but rather that they represent the best possible regional-scale assessment given current data availability.
  “The average SOC stocks vary significantly when including or excluding data in the predictive models”
  
  – What is meant by including and excluding data models. Theoretically, It has to be!!! Here must be very specific on what data!!! Dependent or independent variables. What is the key message intend to pass to the reader?
  
  Thank you for highlighting the need for clarification. The key message is that both the number of data points and their spatial distribution significantly influence the SOC stock estimates. Specifically, we compared models built using the full dataset, combining mainland and Herschel Island data with models developed separately for each area. These different approaches lead to notably different average SOC stock values, underlining how data inclusion and model structure affect results. We will revise the text to clearly specify that the comparison refers to the inclusion or exclusion of spatial subsets of the data (i.e., mainland vs. island) in the predictive models, to improve clarity for the reader.
  
  “Our results diverge from previous studies of lower resolution, showing the added regional-scale accuracy and precision that can be achieved at intermediate resolution and with sufficient field data”
  
  - This statement needs more robust analysis with a strong data set comparing predicted and actual values of models used in this study and other studies”
  We appreciate the suggestion to strengthen this statement. Our comparison refers to prior studies that either rely on class-based averaging with fewer region-specific data points or pan-Arctic scale analyses that, while including more data globally, have sparser sampling density in the study region itself. These methodological differences likely contribute to discrepancies in spatial resolution and local accuracy. While a direct quantitative comparison with those studies is limited by differences in data sources and scales, we will revise the manuscript to better explain these distinctions and clarify that our results reflect improved regional detail based on a more comprehensive, regionally focused dataset that includes new previously unpublished data for the region.
  “The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock
  
  2.19 ± 0.51 kg m−2.” . The results of the separate models show 36.2 ± 5.7 kg C m−2 and 2.66 ± 0.39 kg N m−2 for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m−2 and 2.17 ± 0.50 kg N m−2 for the mainland”
                               - Not sure how these values were calculated. Is it after considering AoA analysis?
  The here presented values include average values without considering the AoA. In our revised Abstract we will clarify this.
  
  Thank you for pointing out these aspects above about the Abstract. In the updated version of the Abstract we will clarify research questions, objectives and results while considering a clear flow of statements. We will also clarify the messages behind statements and be explicit on what is meant.
  
  Introduction
  Authors need to clearly identify “what is meant by regional scale” in the context of different mapping scales. This is not clear, though event the title uses the term “Regional Synthesis”
  
  Thank you for pointing this out. We are trying to put our study in the context of existing studies that map SOC and N stocks for permafrost areas.
  “Quantification and mapping of SOC using different methods have been carried out on pan-Arctic (Hugelius et al., 2014; Mishra et al., 2021) and local level (Obu et al., 2017; Palmtag et al., 2018; Siewert et al., 2015, 2016; Siewert, 2018; Wagner et al., 2023).” (line 37-39).
  Regional in this context refers to studies covering larger areas than the here mentioned “local studies” usually at the catchment scale or a smaller defined study area. Regional in this context means spanning over several catchments. The spatial extent of the study area introduces spatial variability in soil forming factors beyond those driven by landscape level factors such as catenary position or location within a catchment. We will make this point clearer in the introduction.
  
  Introduction lacks information on the “importance synthesis of SOC and N stocks data from previous work”
  
  Thank you for this comment. We acknowledge that the introduction could more explicitly state the importance of synthesizing SOC and N stock data from previous work. While the relevance of permafrost climate feedbacks is highlighted and the heterogeneity of the permafrost landscape, the rationale for synthesis benefits from further clarification. Many of the existing SOC and N measurements in the region were collected for very local studies, some without the intention of regional mapping or digital soil mapping. As a result, these valuable datasets remain fragmented and difficult to compare across scales. By compiling and harmonizing these data, our study creates the first consistent regional-scale dataset of SOC and N stocks for the Yukon coastal plain. This dataset provides a much-needed baseline for future studies, including digital soil mapping and model development, but also studies that estimate regional carbon and nitrogen pools.
  
  Importantly, our study not only combines previously published measurements but also incorporates new, previously unpublished data.
  We will incorporate this in the introduction.
  
  “Studies in between those scales are lacking, but necessary when quantifying regional carbon budgets”
  
  – What really the scale authors trying to resolve here, this need to be very clear. Did authors have enough soil samples to adequately resolve this variability with a robust model validation? These need to be addressed.
  Thank you for pointing this out! We will clarify in the revised version what we mean with regional scale. As mentioned above, we define regional as spanning across multiple catchments from a hydrological point of view. From a soil science perspective, the scale of the study is defined as soil region (Pachepsky and Hill 2017).
  
  Materials and Methods
  “Soil property data was retrieved from existing publications (Table 1, Fig. 1), harmonised and converted into the depth intervals 0-30 cm and 30-100 cm”
  
  - Authors need to explain how the harmonization was done, specially in-relation to sampling (depth, sampling configuration) and laboratory methods. Provided information are not adequate.
  Soil property data were compiled/ combined from multiple published sources and harmonized by converting reported values into the target depth intervals of 0–30 cm and 30–100 cm using weighted averaging. The laboratory methods differ with some studies using an elemental analyzer (e.g. Elementar vario EL III and Elementar vario MAX C in study Obu et al. 2017) or an elemental analyzer (CE Instrument EA 1110 elemental analyzer) coupled to an isotope ratio mass spectrometer (Thermo Fischer Scientific Instruments, Delta V Advantage) (Siewert et al. 2021) or coupled to a continuous-flow isotopic ratio mass spectrometer (IRMS, DeltaPlus, Finnigan MAT) (Wagner et al. 2023). Further, the timing of the fieldwork differs. Whereas the majority of the studies sampled during the summer (mostly July), the DTLB samples were taken as full frozen cores during spring 2019.
  These differences in sampling (active layer sampling + coring vs full frozen coring) have likely affected the results together with the use of different laboratory methods to derive OC content. We did not account for these differences in the compilation of the dataset. However, we can add a section in the methods that summarizes these differences and further mention in the discussion in the new proposed chapter about limitations that these add uncertainty to the data which we have not quantified.
  - How about the correction of data for coarse fraction of the soil?
  Most studies (except Ramage et al. 2019) mention that coarse fraction > 2mm was excluded. The analysis of the DTLB data had been coordinated together with the colleagues to ensure a similar protocol as used by the study Wagner et al. 2023. We will add this info to the method description.
  
  2) “In addition to published data, new data from DTLBs that were sampled during a field campaign in April 2019 was added to this synthesis”
  - DTLB data would have been used for validation of model/published data
  We would like to point to the explanation below (“Results 3) Independent validation of data would make more trustworthy results.). We have opted for a repeated k-fold crossvalidation approach to ensure a robust validation. The DTLB dataset does not cover the range of the entire dataset adequately (see Figure 2B). It further covers only one specific type of landforms and does not cover the entire variability of the entire study region. In case of the creation of an independent dataset, Data-splitting across all datasets would be more favorable.
  - Do the DTLB data harmonize with data from other sources?
  The DTLB data is sampled using methods and protocols that are consistent with established soil science methodology.
  3) “ The landcover data (Bartsch et al., 2019b) was converted into binary variables and the dominating classes in the area were selected. Those were “dry to moist prostrate to erect dwarf shrub tundra” (LC_class4) and “moist to wet graminoid prostrate to erect dwarf shrub tundra” (LC_class5)”8
                - How about the coverage of the study area by non-dominated classes (except LC 4 and LC5). Didn’t the sample coverage capture other land cover classes? If captures, it would be good to include all land cover classes for mapping. In this case, I would recommend to use either RandomForest or Ranger packages to run RF algorithm.
  Thank you for pointing this out. We used the RandomForest package implemented in the caret package in our study. 76 of the total of 211 sites fall within landcover class 4 and 48 within class 5. 36 sites are within landcover class 3 and 14 sites within class 6. Other classes are represented by 10 sites or less. We would suggest to run the analysis again including class 3 and 6 and the variables suggested below (see next comment).
  
  4) Our study uses the 20m product, as the 10m product was not available yet when the analysis was completed.
  - How reasonable to do mapping at 10m resolution when explanatory variable/s are at more coarse resolution (20 m)
  
  As suggested above, we would like to redo the analysis and incorporate more explanatory variables and include the updated product of the landcover at 10m spatial resolution. Furthermore, variables at not significantly coarser resolutions can still provide useful information. Many studies in digital soil mapping include covariates at different resolutions, including coarser resolutions and resample to the target resolution (e.g. Hengl et al. 2021: https://www.nature.com/articles/s41598-021-85639-y, Baltensweiler et al. 2021: https://www.sciencedirect.com/science/article/pii/S2352009421000821 or Deragon et al. 2023: https://cdnsciencepub.com/doi/full/10.1139/cjss-2022-0031?af=R
  
                5) Authors have mentioned about SCORPAN model. What is the reason for not incorporating the spatial autocorrelation for random forest models. At least, simply by including X and Y coordinates. I believe that the models would have been improved if done so!!!
  We will incorporate this if we are given the opportunity to revise the manuscript and run the models again.
  
  6) I believe, other factors of SCORPAN model need to be incorporated in these models, specially climate factors (if variability and data exist), soil factor (e.g. surface geology), if not authors need to explain why other factors were not considered. If these factors adequately selected the differences of two areas would have been captured in a single model. I rather wish to argue that different models were needed because of the lack of predictor variables to model the variability of SOC and N across the entire area.
  7) Also, authors would have used more distance related variables (e.g. distance to sea) to improve the model.
  
  We agree that using a variable at 20 m this introduces some uncertainties into the final map when targeting a resolution of 10m (point 4). Considering this aspect, and the points 5-7 we would suggest to run the models again including location X and Y, distance to the sea as variables and the updated landcover product at 10m resolution (including class 3 and 6). Additionally, we suggest to use the surficial Geology (Rampton 1982) and a binary variable of the extent of the ice sheet at the last glacial maximum as additional variables.
  We initially considered climate factors (Ground surface temperature and ground temperature at multiple depths, Bartsch et al. 2021). Those were only available at resolution of approx. 1 km and not finally used due to its coarse resolution.
  
                8) Need to elaborate sample numbers available at both sites for mapping.
  
  While we already added info on the number of sites, we did not include the number of actual samples. We can add a table in the supplement that includes the actual amounts of samples. In the compilation of the data, we included all individual sample data from the included studies except for Obu et al. 2017 and Siewert et al. 2021, where published data was already calculated into the depth increments 0-30 and 30-100 cm. However, the publications state 128 and 409 total samples respectively. In this table we will also add information on the sampling and laboratory analyses of the different studies (including date of the campaign, laboratory method to measure OC, N content, coring method, etc.).
  
                9) Please see my observation on the biasedness in model comparison (entire area vs individual sites) given under discussion note 3).
  See comment under note 3 in discussion.
  
  Results
  3.1 Data synthesis of regional carbon and nitrogen stocks
  1) Authors need to provide a comparison of Couture et al. 2018 data at both sites, as it is the only sample set distributed across both sites. Otherwise, the comparison results would be biased to analytical and sampling methods.
  Yes we will separate this data into data for each study area and will provide an updated Figure 2B.
  3.2 Random Forest mapping and model validation assessment
  1) Authors need to chose adequate validation indices to explain the precison and accuracy of predictions. Lin’s concordance correlation coefficient could be an added choice.
  Thank you for the suggestion. We can add this measure when we redo the analysis.
  
  2) Model strengths (low R2) do not warrant to say the predictions are accurate than other studies done at larger scale.
  3) Independent validation of data would make more trustworthy results.
  We agree that an independent validation by Data-splitting (that creates an independent test dataset and a separate training dataset), could produce more trustworthy results. We chose k-fold cross-validation to maximize the use of all available data for both model training and evaluation, while still obtaining unbiased performance estimates. Cross-validation is the standard evaluation strategy in digital soil mapping when data are scarce, as reflected in prior works. According to Piikki et al. (2021, https://bsssjournals.onlinelibrary.wiley.com/doi/10.1111/sum.12694), cross validation was the most commonly used method with 43 % across all considered studies within this review. Data-splitting, (which we could have used), was applied in 31% of the considered studies in this review.
  We would like to cite the following passage of this review study: “Data-splitting is also a problem in studies with relatively few samples, because models created by a smaller number of observations can be less accurate, and validation in that case can underestimate the accuracy of the mapping (when all data are used). Cross-validation produces much more stable results because it uses all data for validation and should be preferred over data-splitting.”
  After carefully considering this aspect, we decided to apply a repeated k-fold crossvalidation approach.
  3.3. Area of Applicability (AOA) and uncertainty with quantile regression forest
  1) It is good that authors used AoA for their analysis. But, the issue here is it does not assess the overall accuracy of prediction. Non- AoA areas only identify those combinations of predicter variable space not adequately captured by RF models, a reflection of inadequacy of samples to capture the attribute space of predictor variables. Authors need to discuss the point clearly.
  We are aware that the AOA only evaluates whether the feature space is adequately covered by the sample data and not assesses the uncertainty of the model. Therefore, we used the quantile regression forest to estimate uncertainty. Though the combination of both methods is valuable to assess the results. We will clarify this better in the discussion.
  2) Based on the Table 5, what is the reasonable estimate of average SOC and N stocks in both areas?
  The SOC and N stock values reported in the Abstract and Conclusions were derived from individual random forest models for each site, excluding the AOA-based results due to the method’s limitations. Nevertheless, we calculated the AOA-based estimates for comparative purposes and to discuss associated uncertainties, as presented in Table 5.
  
  Discussion
  1) “Our study shows that already at a regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
                - This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks while comparing the same with previous studies.
  We will give focus to this point in the suggested added subchapter in the discussion about the limitations
  
  2) “It is therefore advisable to analyse the values of the target variable at the sampling locations and the diversity of the landscape to ensure that the spatial variability in the landscape is reflected by the sampled sites”.
                - This statement needs more detail.
  
  We agree that this statement is not clear. Here we refer to the use of the AoA (the stament that follows this) that puts the covariate values at the sampling locations in the context of the whole area and where the feature space of the covariates is not covered by the soil sampling locations. This information can be used to plan future sampling campaigns that add soil sampling locations in areas where feature space is not covered. We will clarify this in the revised version and connect the statements.
  3) “The AOA method can be used to assess whether the heterogeneity of the landscape, ideally mirrored in the covariates, is captured by the sampling locations. Areas where this is not the case can be excluded from regional estimates and could be further used to determine new sampling sites for future field sampling campaigns”
                - Good point, but also the improvement of prediction model accuracy also important aspect to be considered.
  Yes, we agree and will emphasize this in the added chapter about the limitations of the study.
  4.2 Spatial mapping of carbon and nitrogen stocks with random forest, area of applicability and uncertainty
  1) “Our analysis shows a substantial challenge in bridging from local- to regional-scale study areas”
                - Very important point to be capitalize in this study rather than trying to estimate SOC and N stocks using inadequate data base.
  Yes, we will emphasize this in the discussion.
  Line 247 need to be corrected as R2
  
  We will correct this typo.
  Discussion given in L245-253. Is quite confusing to me. I would rather think this is a biased comparison of two models.
  
  We agree that this section is written a bit fuzzy and mixes scale, soil heterogeneity, and predictor quality. Here we compared the R2 values of separate models for “all data”, “mainland” and “Herschel Island” for 0-30 and 30-100 cm depth – 6 models in total which are not directly comparable due to different number of training points and selected covariates by the model..
  The main aim of this paragraph was to point out the heterogeneity in digital soil mapping persists at different scales and propose to change the paragraph to the following:
  “Our analysis highlights the challenge of scaling from local (pedon) to regional study areas. At the pedon scale, permafrost soils are highly heterogeneous (Siewert et al., 2021), and this variability is still evident at coarser spatial scales. The models for the entire area explain only a small fraction of spatial heterogeneity in SOC and N stocks (R² = 0.17 - 0.24), while models for the mainland perform slightly worse (R² = 0.11 - 0.18). In contrast, models for Herschel Island show higher R² values for SOC (0.28 - 0.35), which could be due to a combination of factors: (1) lower natural soil variability on Herschel Island, (2) better representation of the soil data, or (3) stronger relationships between the predictors and SOC at this site.”
  
  The model developed for the entire area has been cross validated using whole data set. But, two models developed for two sites have been validated using individual data sets of these sites. In this case, the question comes if validation indices for the RF model for entire are are calculated for two sites, the results would be the same. Others just need to separate out validation results of models for the entire area into mainland and the Island and provide R2, RMSE, MEE values.
  This aspect has been considered above under Overall quality point 3. In the revised version we will provide the validation results partitioned per site.
  
  L263 ((Fig. 7a and Fig. 1b?)
  
  This is an error and should mean (Fig, 7 1a and 1b). There is a further mistake at the end of the same line which will be corrected to (Fig. 7 2a and 2b).
  L260 -268: Agree that there is a difference between spatial distribution of SOC stocks in mainland when RF models are trained either using all data or main land data. But, how we should justify these spatial differences also show differences in prediction accuracies. To do so, as I mentioned under the point 3) validation results for the mainland data should be shown for both RF models developed using all data and mainland data. Same argument applies to Herschel Island.
  
  As already mentioned under point 3), in the revised version we will provide the validation results partitioned per site.
  4.3 Comparing local scale results to regional scale synthesis
                It would be good to use independent validation for such a comparison. But, rather I doubt that the available data are adequate!!!
  As mentioned already above to make use of the full dataset we decided to apply repeated k-fold crossvalidation approach to ensure a robust validation.
  
  Conclusion
  
  Based on the analysis the study should conclude the most acceptable estimates of SOC and N Stocks. Difficult to understand whether authors have incorporated AoA analysis, and QRF analysis when concluding the work.
  We will clarify in the revised version on which method the concluding numbers are based.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1052-AC1
RC2:
'Comment on egusphere-2025-1052', Anonymous Referee #1, 09 May 2025

Good attempt!!! Just got a couple of questions from this work.
1) It's not much clear what techniques are used to synthesise already collected data. Can you describe them
2) Does the AoA analysis also assess the prediction accuracy of models?
3) Since prediction models are not quite strong, how would you confidently say that estimated SOC and N stocks are superior to already reported values?
4) I could not link the SOC and N Stock values listed in the abstract with the results.
5) How did you tackle very poor sample distribution that came out from isolated studies when developing prediction models? Did you try data declustering approaches?
6) Application of geostatistical analysis would improve interpretations of short scale variability at two sites.

Citation: https://doi.org/10.5194/egusphere-2025-1052-RC2
- AC2: 'Reply on RC2', Julia Wagner, 20 Aug 2025
  
  Dear anonymous reviewer,
  We sincerely appreciate the constructive feedback above on our manuscript. Below we answer the additional questions you raised about the study.
  
  Kind regards on behalf of all listed authors,
  Julia Wagner
  
  Good attempt!!! Just got a couple of questions from this work.
  1) It's not much clear what techniques are used to synthesise already collected data. Can you describe them
  The study synthesizes existing datasets by assembling previously collected measurements from various sources into a single dataset and converting them to standard depth intervals (0–30 cm and 30–100 cm) for consistency in subsequent analyses. The researchers working on permafrost soil carbon, especially in that region (including co-authors of this study) are well connected and follow standardized protocols for sampling of permafrost soils (e.g. Ping et al. 2013: https://acsess.onlinelibrary.wiley.com/doi/epdf/10.2136/sh12-09-0027 and Palmtag et al. 2022: https://essd.copernicus.org/articles/14/4095/2022/ )
  Does the AoA analysis also assess the prediction accuracy of models?
  The AoA analysis itself does not directly measure prediction accuracy. It measures the similarity of new data to the training data, with the DI threshold defined in a way that is consistent with the mode’s cross-validation process.
  The AoA can be used alongside the accuracy measures. Accuracy metrics (e.g., RMSE, R²) quantify how well the model performs on known data. The AoA then quantifies where those performance expectations are likely to hold, based on predictor similarity.
  
  3) Since prediction models are not quite strong, how would you confidently say that estimated SOC and N stocks are superior to already reported values?
  The estimates presented in this study are based on the most up-to-date dataset currently available for the region. While we do not claim that our results represent a definitive (or even correct) assessment, they provide the first estimates of SOC and N stocks derived from a digital soil mapping approach using random forest modelling for this area. As such, they offer a robust starting point for further analyses, targeted field campaigns, and methodological refinements. It is also important to note that many previously reported values rely on generalized landform or landscape class-based approaches, whereas our analysis uses continuous predictor variables and a spatially explicit modelling framework, providing finer resolution and potentially greater relevance for site-specific applications.
  4) I could not link the SOC and N Stock values listed in the abstract with the results.
  These results are calculated from the sum of the raster values of the prediction results for 0-30 cm and 30-100 cm for a mosaic that combines the individual predictions for Herschel Island and the mainland, as we conclude individual models for each area due to the geological differences. We will add an explanation in the result. In case we do the suggested reanalysis, we will adjust these numbers.
  
  5) How did you tackle very poor sample distribution that came out from isolated studies when developing prediction models? Did you try data declustering approaches?
  Models were trained using the available datasets in their collected form, which we acknowledge contained uneven sample distributions due to the isolated nature of the source studies. Indeed, the AOA analyses identifies areas where the feature space is adequately covered by the sample data. But given the focus of the manuscript on exploring using local data to bridge scales we considered our approach acceptable. If we are given the chance to revise the manuscript which means also our proposed rerun of the models we propose to use spatial crossvalidation instead of a purely random crossvalidation approach to account for the spatially uneven distribution of the training data in our models. We recommend that future studies could explore declustering approaches to improve representativeness.
  
  6) Application of geostatistical analysis would improve interpretations of short scale variability at two sites.
  While we acknowledge the suggestion to include the spatial location in the random forest modelling framework to account for spatial autocorrelation, we wish to maintain our focus on a machine-learning–based approach, as this aligns with the objectives of our study, following previous research from this region which emphasize the exploration of machine learning methods for prediction of soil parameters in permafrost regions (e.g. Siewert et al. 2021).
  
  Citation: https://doi.org/10.5194/egusphere-2025-1052-AC2
RC3:
'Comment on egusphere-2025-1052', Anonymous Referee #2, 30 Jun 2025

The paper is well-written, but the novelty is not clearly demonstrated. The authors discuss the upscaling from local to regional scale as one of the issues, but this is common to all mapping exercises of soil properties. The use of digital soil mapping is well-known and it is not clear what the specific research gap is. Surely, these permafrost areas play an important role in the global carbon cycle, but as it stands, I can see mainly see the local interest. The methodology is not clearly explained and particularly details on area of interest and dissimilarity index are lacking. More importantly, I am not sure that the stocks and their standard deviations are correctly calculated. If they are calculated based on the calibration points, they do not necessarily represent the spatial patterns correctly. If they are calculated on pixel values, there is a methodological flaw, as the spatial auto correlation is not accounted for. The lack of clarity is the main reason that I was not able to review the discussion section.

Lines 39-42 I am not sure that I understand the problem of scalability. In particular because you mention that you use the same data sets. Would not then the local scale simply be a cut-out of the regional scale?
Lines 53 -57 Digital soil mapping also depends on the scale at which the co-variates are available. I am not sure why this widely applied methodology is presented as the solution to the scale problem.
Line 100_107 The sampling protocols of the previous campaigns are explained, but there is no information on the coring devices, bulk density measurements or C analysis. It is tricky to use the SOC stocks from different campaigns in a joint data analysis.
Figure 5 The way that the area of applicability and dissimilarity index are calculated is missing in the materials and methods section. Please add a short description including equations.
Section 3.2 Were the average stocks calculated on the calibration data set?
Section 3.3 Here you also calculate the standard deviation. Is this based on the calibration data set? You mention model results according to the AOA. Are you sure that there are enough sample points for the results to be meaningful? I hope that you did not calculate the std based on the pixel estimates, as these are spatially auto correlated. I start doubting when I see table 5. Please give the number of calibration points in this table.
Line 247 What are the ‘m^-2’ values?

Citation: https://doi.org/10.5194/egusphere-2025-1052-RC3
- AC3: 'Reply on RC3', Julia Wagner, 20 Aug 2025
  
  Dear anonymous reviewer,
  We sincerely appreciate the constructive feedback on our manuscript. Should we be granted the opportunity to submit a revised version, we are confident in our ability to address all the concerns raised and to substantially improve the quality of the manuscript. Below, we provide a detailed plan outlining our proposed responses to each comment.
  
  Kind regards on behalf of all listed authors,
  Julia Wagner
  
  The paper is well-written, but the novelty is not clearly demonstrated. The authors discuss the upscaling from local to regional scale as one of the issues, but this is common to all mapping exercises of soil properties. The use of digital soil mapping is well-known and it is not clear what the specific research gap is. Surely, these permafrost areas play an important role in the global carbon cycle, but as it stands, I can see mainly see the local interest. The methodology is not clearly explained and particularly details on area of interest and dissimilarity index are lacking. More importantly, I am not sure that the stocks and their standard deviations are correctly calculated. If they are calculated based on the calibration points, they do not necessarily represent the spatial patterns correctly. If they are calculated on pixel values, there is a methodological flaw, as the spatial auto correlation is not accounted for. The lack of clarity is the main reason that I was not able to review the discussion section.
  
  The main points of concern are the following: Novelty and research gap, global relevance beyond local interest and methodological flaws
  We would like to state that the permafrost region is (a) remote (b) data-poor (c) logistically extremely challenging and (d) cuts across many countries it is unlikely that systematic data collection designed for broad-scale analyses will occur (as is done elsewhere via national soil surveys). Therefore, we are left to assessment of data based on various aggregated data sources, each designed for specific local scale analyses. We wish to assess the possibilities to bridge form local to regional settings in these circumstances using DSM. While the maps are regionally specific the lessons learned, e.g. challenges of assessing unsampled regions gleaned from AOA analyses are likely to apply to other permafrost tundra regions.
  Lines 39-42 I am not sure that I understand the problem of scalability. In particular because you mention that you use the same data sets. Would not then the local scale simply be a cut-out of the regional scale?
  We are not treating the local scale as a simple cut-out of the regional map. With the term local, we refer to the individual studies that created the datasets included in out study, e.g. Wagner et al. 2023 or Obu et al. 2017). For local models, only the dataset from that specific area is used for training and prediction. For the regional model, however, we combine multiple datasets from across the study area. This means the two approaches are not nested, but instead represent different ways of using available data. In addition, we explore whether information from sampled regions can be transferred to unsampled areas, and where model applicability breaks down.
  Lines 53 -57 Digital soil mapping also depends on the scale at which the co-variates are available. I am not sure why this widely applied methodology is presented as the solution to the scale problem.
  Here we do not present the absolute solution to the scale problem, but rather using DSM to map SOC and N stocks over a larger area than previous studies. With emerging technologies, more high-resolution datasets suitable for DSM are being developed in Arctic regions, which may help to partially address, but not eliminate scale-related challenges. DSM is not as widely applied in arctic regions as in other regions in the world. Traditionally, studies used upscaling through thematic maps for example soil maps, landcover classes or geological classes. Studies to date apply DSM pan-Arctic using the still sparsely available pan-Arctic soil data. In contrast very local studies using DSM exist for Arctic Canada (f.ex. Wagner et al. 2023).
  For Alaska in contrast more regional studies exists due to higher data availability (Mishra and Riley 2012 and more recently: Minai et al. 2025 and Ainuddin et al. 2024).
  Line 100_107 The sampling protocols of the previous campaigns are explained, but there is no information on the coring devices, bulk density measurements or C analysis. It is tricky to use the SOC stocks from different campaigns in a joint data analysis.
  
  Reviewer 1 raised a similar concern. We will provide a table in the supplement with information on the sampling and laboratory analyses of the different studies (including date of the campaign, laboratory method to measure OC, N content, coring method, number of samples, etc.). Further we would like to mention that in the community of permafrost soil researchers well established and common protocols are applied to ensure data comparability.
  
  Figure 5 The way that the area of applicability and dissimilarity index are calculated is missing in the materials and methods section. Please add a short description including equations.
  To keep the manuscript concise, we refer to the publication by Meyer and Pebesma, 2021. However, we can add a summary to the supplement.
  Section 3.2 Were the average stocks calculated on the calibration data set?
  The average SOC/TN stocks presented in this section refers to the spatially average stocks calculated form the gridded output maps. This method of calculating mean landscape SOC/N stocks is consistent with the approach taken by earlier studies in this region (shown in table 3). We do not present the stocks calculated as the arithmetic mean of the available soil pedon observations. We suggest to expand table 5 and display the mean SOC and TN stocks from the soil pedon data next to the mean from the predicted output maps.
  To validate the models, we used repeated k-fold crossvalidation. We did not set aside a subset of the datapoints for independent validation to make use of the full dataset for model training due to the limited amount of datapoints.
  Section 3.3 Here you also calculate the standard deviation. Is this based on the calibration data set? You mention model results according to the AOA. Are you sure that there are enough sample points for the results to be meaningful? I hope that you did not calculate the std based on the pixel estimates, as these are spatially auto correlated. I start doubting when I see table 5. Please give the number of calibration points in this table.
  Standard deviations are based on the pixel values from the gridded output maps. We applied the AoA method to evaluate areas where the predicted results are meaningful. This method assesses the feature space of the predictors and identifies areas where the feature space is not covered by the sampling locations, thus identifies areas where the accuracies of the model are not valid according to Meyer and Pebesma (2021). In table 4 we mention the number of samples the models are based on. We did not set aside a subset of the data for an independent validation. The models are based on all available data and evaluated using repeated k-fold crossvalidation. K-fold cross-validation is a technique that splits data into k subsets, trains on k–1 of them, and tests on the remaining one, repeating this process k times to evaluate model performance more reliably.
  
  Line 247 What are the ‘m^-2’ values?
  Thank you for pointing this out. This is a typo and will be corrected into R^-2
  
  Citation: https://doi.org/10.5194/egusphere-2025-1052-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (08 Sep 2025) by Bas van Wesemael

AR by Julia Wagner on behalf of the Authors (14 Dec 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (18 Dec 2025) by Bas van Wesemael

RR by Anonymous Referee #2 (09 Jan 2026)

ED: Publish as is (09 Jan 2026) by Bas van Wesemael

ED: Publish as is (12 Jan 2026) by Rémi Cardinael (Executive editor)

AR by Julia Wagner on behalf of the Authors (19 Jan 2026) Manuscript

Journal article(s) based on this preprint

10 Feb 2026

Challenges in the use of local data for regional scale mapping of C and N stocks in the continuous permafrost zone at the Yukon Coastal Plain

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius

SOIL, 12, 113–132, https://doi.org/10.5194/soil-12-113-2026,https://doi.org/10.5194/soil-12-113-2026, 2026

Short summary

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius

Supplement

https://doi.org/10.5194/egusphere-2025-1052-supplement

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius

Viewed

Total article views: 3,285 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
2,629	560	96	3,285	245	113	144

HTML: 2,629
PDF: 560
XML: 96
Total: 3,285
Supplement: 245
BibTeX: 113
EndNote: 144

Views and downloads (calculated since 12 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	124	40	10	174
Apr 2025	86	20	6	112
May 2025	90	24	6	120
Jun 2025	122	14	6	142
Jul 2025	82	42	6	130
Aug 2025	282	34	14	330
Sep 2025	1,156	8	12	1,176
Oct 2025	74	20	4	98
Nov 2025	100	42	4	146
Dec 2025	66	48	6	120
Jan 2026	138	76	10	224
Feb 2026	140	60	4	204
Mar 2026	88	42	2	132
Apr 2026	42	48	2	92
May 2026	29	21	1	51
Jun 2026	8	7	0	15
Jul 2026	2	14	3	19

Cumulative views and downloads (calculated since 12 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	124	40	10	174
Apr 2025	86	20	6	112
May 2025	90	24	6	120
Jun 2025	122	14	6	142
Jul 2025	82	42	6	130
Aug 2025	282	34	14	330
Sep 2025	1,156	8	12	1,176
Oct 2025	74	20	4	98
Nov 2025	100	42	4	146
Dec 2025	66	48	6	120
Jan 2026	138	76	10	224
Feb 2026	140	60	4	204
Mar 2026	88	42	2	132
Apr 2026	42	48	2	92
May 2026	29	21	1	51
Jun 2026	8	7	0	15
Jul 2026	2	14	3	19

Viewed (geographical distribution)

Total article views: 3,281 (including HTML, PDF, and XML) Thereof 3,281 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (28519 KB)
Metadata XML

Short summary

Permafrost soils store vast amounts of organic carbon, key to understanding climate change. This study uses machine learning and combines existing data with new field data to create detailed regional maps of soil carbon and nitrogen stocks for the Yukon coastal plain. The results show how soil properties vary across the landscape highlighting the importance of data selection for accurate predictions. These findings improve carbon storage estimates and may aid regional carbon budget assessments.


Total:	0
HTML:	0
PDF:	0
XML:	0