the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast
Abstract. Permafrost soils are particularly vulnerable to climate change. To assess and improve estimations of carbon (C) and nitrogen (N) budgets it is necessary to accurately map soil carbon and nitrogen in the permafrost region. In particular, soil organic carbon (SOC) stocks have been predicted and mapped by many studies from local to pan-Arctic scales. Several studies have been carried out at the Canadian Beaufort Sea coast, though no regional synthesis of terrestrial carbon stocks based on spatial modelling has been conducted yet. This study synthesises available field data from the Canadian coastal plain and uses it to map regional SOC and N stocks using the machine learning algorithm random forest and environmental variables based on remote sensing data. We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks. We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island. We assessed performance of different random forest models by using the Area of Applicability (AOA) method. We further applied the quantile regression forest method to the mainland and Qikiqtaruk Herschel Island models for SOC stocks and compared the results with the AOA method. Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important. The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock 2.19 ± 0.51 kg m−2. The average SOC stocks vary significantly when including or excluding data in the predictive models. Qikiqtaruk Herschel Island is geologically different from the coastal mainland and has lower SOC stocks. Including Qikiqtaruk Herschel Island soil data to predict SOC stocks at the mainland has large impact on the results. Differences in N stocks were not as dependent on the location as SOC stocks and rather differences between individual studies occurred. The results of the separate models show 36.2 ± 5.7 kg C m−2 and 2.66 ± 0.39 kg N m−2 for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m−2 and 2.17 ± 0.50 kg N m−2 for the mainland. Our results diverge from previous studies of lower resolution, showing the added regional-scale accuracy and precision that can be achieved at intermediate resolution and with sufficient field data.
- Preprint
(28519 KB) - Metadata XML
-
Supplement
(1218 KB) - BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2025-1052', Anonymous Referee #1, 09 May 2025
reply
The manuscript address few key areas
- Synthesis of SOC and N stocks data from previous research
- Upscaling of assimilated local data for detailed mapping of SOC and N stocks at fine resolution (10 m). (local to regional scale prediction)
Overall quality:
In terms of research questions trying to address, this is a good study and authors have attempted to some extent. Despites that, several key weaknesses have impacted and reduced the scientific rigor of this study. This study needs a thorough revision with a focus on limitations and potentials of using local data sets in spatial characterization of SOC and N stocks. In this context, some weaknesses noted are listed below.
- Giving priority to make estimates of SOC and N stocks, despite the data set used in this study is quite weak in spatial context.
- Lower prediction accuracy of prediction models, this could be due to inadequate sample distribution and/or inadequate explanatory variables used in RF models. These aspects need to be clearly understood.
- The comparison of random forest models developed using the “entire data set” and by “dividing dataset into two data sets (mainland and island)” look biased. Validation results of RF model developed using “entire data set” need to reported by partitioning the data set into mainland and the island. Then we can have a proper understanding of prediction accuracies, as pooling data would have resulted poor performances of the RF model. I have indicated this weakness under my comments for discussion part.
- Wish to quote the following “This study shows that regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
- This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks (with weak models) while comparing the same with previous studies.
- e) More attention need to be given for accuracy and precision of predictions when reporting research outcomes and concluding.
- f) The use of AoA analysis is commendable, but need to be integrated with accuracy and precision of predictions.
Title “Regional synthesis and mapping of soil organic carbon and nitrogen stocks at the Canadian Beaufort coast”
I noticed that the findings are not well aligned with the title. By regional synthesis, authors have meant the compilation of research data from previous work in the area. I believe that rather than simple compilation, authors need to look at data harmonization (laboratory methods) and spatial harmonization already collected data when trying to synthesize them, enabling detailed mapping. The mapping part hardly could be a focus here due to a key limitation in the study, i.e. poor fit of random forest models for entire area as well as individual areas. Thus, it is misleading that this work has accomplished a successful mapping task compared to previous studies. I would rather expect a title as such “Challenges in the use of local data for reginal scale mapping of C and N stocks in a continuous permafrost zone of the Yukon Coastal Plain”
Abstract
- Need to be rewritten with clear flow of research questions, objectives and results. Often, the sequence of flow of presentation is weak.
- “We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks”
- What is meant by local differences, is it the differences between “coastal lowland area” vs “Herschel Island”. Need to define the local differences in relation to the spatial scale!!!
- “We mapped SOC and N stocks for the entire region and provide separate models for the coastal mainland area and Qikiqtaruk Herschel Island”
– Normally we do modelling then mapping!!!! Not mapping then modelling!!!
- “Our results indicate that not only the selection of data is crucial for the resulting maps, but also the chosen covariates, which were picked by the models as most important”
– This statement need to be justified with data coming out from the research”
- “The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock 2.19 ± 0.51 kg m−2”
- I suppose these are average values. Authors need to explain why these values are more accurate than reported values having found the accuracy and precision of random forest models developed for this study are not strong enough due to weaknesses in the soil data base!!!!
- “The average SOC stocks vary significantly when including or excluding data in the predictive models”
– What is meant by including and excluding data models. Theoretically, It has to be!!! Here must be very specific on what data!!! Dependent or independent variables. What is the key message intend to pass to the reader?
- “Our results diverge from previous studies of lower resolution, showing the added regional-scale 20 accuracy and precision that can be achieved at intermediate resolution and with sufficient field data”
- This statement needs more robust analysis with a strong data set comparing predicted and actual values of models used in this study and other studies”
- “The estimated SOC stock for the upper metre is 56.7 ± 5.6 kg m−2 and the N stock
2.19 ± 0.51 kg m−2.” . The results of the separate models show 36.2 ± 5.7 kg C m−2 and 2.66 ± 0.39 kg N m−2 for Qikiqtaruk Herschel Island and 57.2 ± 4.5 kg C m−2 and 2.17 ± 0.50 kg N m−2 for the mainland”
- Not sure how these values were calculated. Is it after considering AoA analysis?
Introduction
- Authors need to clearly identify “what is meant by regional scale” in the context of different mapping scales. This is not clear, though event the title uses the term “Regional Synthesis”
- Introduction lacks information on the “importance synthesis of SOC and N stocks data from previous work”
- “Studies in between those scales are lacking, but necessary when quantifying regional carbon budgets”
– What really the scale authors trying to resolve here, this need to be very clear. Did authors have enough soil samples to adequately resolve this variability with a robust model validation? These need to be addressed.
Materials and Methods
- “Soil property data was retrieved from existing publications (Table 1, Fig. 1), harmonised and converted into the depth intervals 0-30 cm and 30-100 cm”
- Authors need to explain how the harmonization was done, specially in-relation to sampling (depth, sampling configuration) and laboratory methods. Provided information are not adequate.
- How about the correction of data for coarse fraction of the soil?
2) “In addition to published data, new data from DTLBs that were sampled during a field campaign in April 2019 was added to this synthesis”
- DTLB data would have been used for validation of model/published data
- Do the DTLB data harmonize with data from other sources?
3) “ The landcover data (Bartsch et al., 2019b) was converted into binary variables and the dominating classes in the area were selected. Those were “dry to moist prostrate to erect dwarf shrub tundra” (LC_class4) and “moist to wet graminoid prostrate to erect dwarf shrub tundra” (LC_class5)”
- How about the coverage of the study area by non-dominated classes (except LC 4 and LC5). Didn’t the sample coverage capture other land cover classes? If captures, it would be good to include all land cover classes for mapping. In this case, I would recommend to use either RandomForest or Ranger packages to run RF algorithm.
4) Our study uses the 20m product, as the 10m product was not available yet when the analysis was completed.
- How reasonable to do mapping at 10m resolution when explanatory variable/s are at more coarse resolution (20 m)
5) Authors have mentioned about SCORPAN model. What is the reason for not incorporating the spatial autocorrelation for random forest models. At least, simply by including X and Y coordinates. I believe that the models would have been improved if done so!!!
6) I believe, other factors of SCORPAN model need to be incorporated in these models, specially climate factors (if variability and data exist), soil factor (e.g. surface geology), if not authors need to explain why other factors were not considered. If these factors adequately selected the differences of two areas would have been captured in a single model. I rather wish to argue that different models were needed because of the lack of predictor variables to model the variability of SOC and N across the entire area.
7) Also, authors would have used more distance related variables (e.g. distance to sea) to improve the model.
8) Need to elaborate sample numbers available at both sites for mapping.
9) Please see my observation on the biasedness in model comparison (entire area vs individual sites) given under discussion note 3).
Results
3.1 Data synthesis of regional carbon and nitrogen stocks
1) Authors need to provide a comparison of Couture et al. 2018 data at both sites, as it is the only sample set distributed across both sites. Otherwise, the comparison results would be biased to analytical and sampling methods.
3.2 Random Forest mapping and model validation assessment
1) Authors need to chose adequate validation indices to explain the precison and accuracy of predictions. Lin’s concordance correlation coefficient could be an added choice.
2) Model strengths (low R2) do not warrant to say the predictions are accurate than other studies done at larger scale.
3) Independent validation of data would make more trustworthy results.
3.3. Area of Applicability (AOA) and uncertainty with quantile regression forest
1) It is good that authors used AoA for their analysis. But, the issue here is it does not assess the overall accuracy of prediction. Non- AoA areas only identify those combinations of predicter variable space not adequately captured by RF models, a reflection of inadequacy of samples to capture the attribute space of predictor variables. Authors need to discuss the point clearly.
2) Based on the Table 5, what is the reasonable estimate of average SOC and N stocks in both areas?
- Discussion
1) “Our study shows that already at a regional level there is a very high heterogeneity in field data, challenging the predictive ability of models”
- This should be the strongest point in this study and would have been written to highlight this fact. Despite this, its unfortunate authors have given more focus on the mapping and assessment of SOC and N stocks while comparing the same with previous studies.
2) “It is therefore advisable to analyse the values of the target variable at the sampling locations and the diversity of the landscape to ensure that the spatial variability in the landscape is reflected by the sampled sites”.
- This statement needs more detail.
3) “The AOA method can be used to assess whether the heterogeneity of the landscape, ideally mirrored in the covariates, is captured by the sampling locations. Areas where this is not the case can be excluded from regional estimates and could be further used to determine new sampling sites for future field sampling campaigns”
- Good point, but also the improvement of prediction model accuracy also important aspect to be considered.
4.2 Spatial mapping of carbon and nitrogen stocks with random forest, area of applicability and uncertainty
1) “Our analysis shows a substantial challenge in bridging from local- to regional-scale study areas”
- Very important point to be capitalize in this study rather than trying to estimate SOC and N stocks using inadequate data base.
- Line 247 need to be corrected as R2
- Discussion given in L245-253. Is quite confusing to me. I would rather think this is a biased comparison of two models.
The model developed for the entire area has been cross validated using whole data set. But, two models developed for two sites have been validated using individual data sets of these sites. In this case, the question comes if validation indices for the RF model for entire are are calculated for two sites, the results would be the same. Others just need to separate out validation results of models for the entire area into mainland and the Island and provide R2, RMSE, MEE values.
- L263 ((Fig. 7a and Fig. 1b?)
- L260 -268: Agree that there is a difference between spatial distribution of SOC stocks in mainland when RF models are trained either using all data or main land data. But, how we should justify these spatial differences also show differences in prediction accuracies. To do so, as I mentioned under the point 3) validation results for the mainland data should be shown for both RF models developed using all data and mainland data. Same argument applies to Herschel Island.
4.3 Comparing local scale results to regional scale synthesis
It would be good to use independent validation for such a comparison. But, rather I doubt that the available data are adequate!!!
- Conclusion
Based on the analysis the study should conclude the most acceptable estimates of SOC and N Stocks. Difficult to understand whether authors have incorporated AoA analysis, and QRF analysis when concluding the work.
Citation: https://doi.org/10.5194/egusphere-2025-1052-RC1 -
RC2: 'Comment on egusphere-2025-1052', Anonymous Referee #1, 09 May 2025
reply
Good attempt!!! Just got a couple of questions from this work.
1) It's not much clear what techniques are used to synthesise already collected data. Can you describe them
2) Does the AoA analysis also assess the prediction accuracy of models?
3) Since prediction models are not quite strong, how would you confidently say that estimated SOC and N stocks are superior to already reported values?
4) I could not link the SOC and N Stock values listed in the abstract with the results.
5) How did you tackle very poor sample distribution that came out from isolated studies when developing prediction models? Did you try data declustering approaches?
6) Application of geostatistical analysis would improve interpretations of short scale variability at two sites.
Citation: https://doi.org/10.5194/egusphere-2025-1052-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
172 | 47 | 12 | 231 | 12 | 15 | 18 |
- HTML: 172
- PDF: 47
- XML: 12
- Total: 231
- Supplement: 12
- BibTeX: 15
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1