the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning-based Alpine treeline detection in Xue Mountain of Taiwan
Abstract. Taiwan has the highest density of high mountains globally, with over 200 peaks exceeding 3,000 meters in elevation. The Alpine Treeline Ecotone (ATE) is a transitional zone between different vegetation types. The species distribution, range variations, and movement patterns of vegetation within the ATE are crucial indicators for assessing the impact of climate change and warming on alpine ecosystems. Therefore, this study focuses on the Xue Mountain glacial cirques in Taiwan (approximately 4 km²) and utilizes WorldView-2 satellite images from 2012 and 2021 to compute various vegetation indices and texture features (GLCM). By integrating these features with the Random Forest (RF) and U-Net models, we developed a classification map of the alpine treeline ecotone (ATE) in Xue Mountain. We analyzed changes in bare land, forest, krummholz, and shadows within the ATE from 2012 to 2021. The results indicate that the classification accuracy reached an overall accuracy (OA) of 0.838 when incorporating raw spectral bands along with vegetation indices and texture features (GLCM) (77 features in total). Feature importance ranking and selection reduced training time by 14.3 % while ensuring alignment between field survey treeline positions and classification results. From 2012 to 2021, tree cover density increased, with the total forest area expanding by approximately 0.101 km². The elevation of tree distribution rose by 14 m, with the most significant area changes occurring between 3,500 and 3,600 m, while the 3,700 to 3,800 m range remained relatively stable. This study integrates remote sensing imagery with deep learning classification methods to establish a large-scale alpine treeline ecotone (ATE) classification map. The findings provide a valuable reference for the sustainable management of alpine ecosystems in the Xue Mountain glacial cirques in Taiwan.
- Preprint
(1697 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-969', Mathieu Gravey, 03 Jun 2025
Main Comment
The authors state that they used "two cloud-free WorldView-2 orthorectified images with a spatial resolution of 0.4 meters, acquired on November 3, 2012, and September 26, 2021." However, they later clarify that only the panchromatic (PAN) band is available at this resolution, while they appear to use the color bands instead. This is unclear—did they use pansharpening? Please clarify which bands were actually used, at what resolution, and whether pansharpening was applied.
The origin of the training data is not clearly explained. The authors write: "Ground truth data in the study area were labeled using a pixel-based approach and categorized into four classes: bare land, forest, krummholz, and shadow (Fig. 3)." Does this mean an operator manually classified these images? If both images were already classified, what is the purpose of the complex processing workflow? Were both images used for training? If only one image was used for training, why would we expect the same classification accuracy to transfer to the second image, especially given possible environmental and seasonal differences?
Regarding training, the authors mention using 512x512 patches and then splitting the dataset. Is the train/test split done at the patch level or at the pixel level (within patches)? This distinction is important, as pixel-level splits can introduce data leakage, especially in spatially autocorrelated datasets.
The use of Random Forest (RF) for variable importance analysis is questionable. This approach is valid only if variables are independent, which is clearly not the case here. Additionally, is it worth performing this complex selection to save 20% of variables? Reducing from 77 to 61 features may not justify the effort, especially if interpretability or performance gain is marginal. As such, the entire discussion about variable importance remains inconclusive.
Finally, the reported 14-meter height increase lacks context. The sentence "Forest area and highest point height difference from 2012 to 2021" is vague. Does this mean the authors extracted the maximum elevation value among all forest pixels? What was done to ensure robustness against outliers or noise? Also, scientific results are typically reported with associated uncertainties, which are missing here—or, if included, were not clear to me.
Lastly, if the only interest was in changes to forest cover, why not classify the change directly instead of classifying each image independently?
Minor Comments
"Taiwan has the highest density of high mountains globally, with over 200 peaks exceeding 3,000 meters in elevation."
→ This sounds too subjective. The result depends on the threshold chosen. I recommend rewriting as:
"Taiwan is one of the regions with the highest density of high mountains, with over 200 peaks exceeding 3,000 meters in elevation."The introduction goes beyond the immediate scope of the study. However, I appreciate that the authors took the time to place their work in a broader context.
"At the same time, the productivity of alpine treeline vegetation increased, enhancing the ability to sequester atmospheric CO₂ and mitigating the effects of climate change (Rumpf et al., 2022)"
→ If this is true, however it's also be stated that the global effect is likely minor. The sentence could be more balanced.Citation: https://doi.org/10.5194/egusphere-2025-969-RC1 - AC2: 'Reply on RC1', G. G. Wang, 29 Jun 2025
- AC3: 'Reply on RC1', G. G. Wang, 29 Jun 2025
-
RC2: 'Comment on egusphere-2025-969', Maaike Bader, 09 Jun 2025
Dear authors,
It was a pleasure to read your well-written manuscript. You used two machine-learning (ML) methods to classify four land-cover/ image section classes (forest, krummholz, bare ground and shadow) in a montane-alpine transition in Taiwan, repeating the classification for images from two years to detect changes through time.
The machine-learning methods are very nicely explained. However, I am missing information about how you obtained your ”ground truth”. This is actually not collected on the ground or by manual/visual labelling, if I interpreted the flowchart correctly, but by automated (?) pixel-based classification (not further specified…). So it sounds like you use one classification method to validate another, which seems strange and a bit circular. Why did you not just use the pixel-based classification for both images, if that worked so well that you could use it as “ground truth”? Why the step of developing the ML methods?
It appears to me that very typical patterns of “shade” and “forest” are produced, that should allow a ML model to recognize forest at a somewhat larger scale than at the pixel level. Of course, if you train your model with a pixel-level classification of spectral signatures, the ML model is going to reproduce this, but if you would use real ground-truth data or manually labelled forest areas to train the model, it may be able to really recognize forest and to make use of the shadow rather than to have it only as a nuisance (it will still be a nuisance where whole hillslopes are in shadow, but within the forest, it could become part of the signal, and should be capturable in the texture variables, perhaps if you use a different scale (number of neighbours) to calculate the texture.
Another important piece of information that you need to elaborate upon is how you defined the treeline/ treeline ecotone, in the field and on the classified images, what the survey data consist of, and how you compared the survey data to the treeline location suggested by the classification.
Usually, the accuracy of ML models is much higher within the image it was trained on that on new images, since these may differ in e.g. lighting, season, angle, etc. Figure 7 a and c show that indeed the lighting seems to have been quite different in the two images. Therefore, it is not unlikely, that the classification accuracy was a lot lower for the year not used for model training, so that part of the differences through time could be due to this different accuracy. Did you validate the accuracy for the other year in any way? It would be important to do and show a detailed manual (not based on another automated classification method) validation of the results, seeing whether and where the forest, in particular, is classified correctly, especially at the boundary between forest and non-forest, i.e. in the treeline ecotone. You could e.g. use imagery from Google Maps or Bing, which have a higher spatial resolution for many parts of the world than Worldview images and are often available for past dates, or perhaps there are aerial images available from more local sources. These would allow you to check whether the areas where you detected change really appear to have changed in reality. I guess your field survey data might also show this, but it is currently unclear from the manuscript how these were used. Related to this, especially if you define forest elevational shift by the single highest point, there is a reasonable chance that a change in that point does not represent a real shift. Perhaps you can think of a different, more robust, measure of treeline-ecotone elevation?
About the presentation: the figures are well-prepared. I like figure 2, in particular, as it nicely explains the different steps taken in the research and the connection between them. However, the captions for all of the figures and tables are much too short. They do not explain what is in the respective figure or table. Please expand and try to make the figures and tables self-explanatory, i.e. if a reader goes to look at them without having read the main text, they can sort of understand what they are about. See my example below for Fig 1. I also would advise not to use title font (capitalized words) in the figure captions or section titles.
Some more detailed comments:
You sometimes report more decimal numbers than is reasonable or useful (e.g. L 21, 24, 48, 49) Please check for this and reduce the unnecessary precision
Avoid sentences like “The F1-score results are shown in Fig. 4.” – instead, give the results and then just cite the figure: The result was X (Fig, 4).
Title: Change Alpine to alpine (the capital letter would suggest that you worked in the Alps, whereas the small a is used for alpine as a life zone in general)
Title: on Xue Mountain in Taiwan / in the Xue Mountains of Taiwan
Abstract:
L14 do not use capitals for alpine treeline ecotone
L18 remove “the” before Random Forest (and not sure that random forest needs to be capitalized)
L19 & 26-27 either use the introduced abbreviation (ATE) or the full term (alpine treeline ecotone), but not both every time
Introduction
L33-34 “Alpine zone ecosystems are susceptible to environmental changes compared to other regions”: are they really? Maybe they are not, because of the high environmental heterogeneity and small migration distances, compared to latitudinal climate gradients…
L35-36 I would suggest referring to this transition as the alpine treeline ecotone (not alpine treeline and not capitalized). This is partly a matter of habit and taste, but it was recently suggested to reserve “treeline” for the climatic potential, and treeline ecotone to the actual observed transition from forest to alpine (or upper forest limit if it is unclear whether the transition is even related to climatic limitations) See e.g. Körner & Hoch 2023 and Malanson 2024. If you decide to follow this terminology, check its use throughout the manuscript.
L38: Bader et al., 2020 should be 2021
L38-39 “Furthermore, ATE changes illustrate the impact of climate change…” Why “furtherore”? And how do you know that climate change is the driver of change? Are there no other potential drivers?
L42 “to study alpine treelines.” You could cite Garbarino et al 2023 here
L41-54 This paragrpahs gives some examples, but the seem a bit of a random pick. Can you highlight how they are somehow connected (e.g. three examples of studies at different spatial resolutions (please provide the sensor resolution for each data source used), three examples of change detection, or something else.
L52-53 These percentage have no meaning, since we do not know what the reverence area was.
L54 Careful, these studies do not tell us anything about the reliability of the methods applied. Maybe the confirm the usefulness or the great potential or something like that, but not the reliability
L56 “vaforable results” of what? Classification accuracy?
L58 municipality
L62 the tree Cecropia hololeuca, which has a optically striking shape and colour (I think, check the original paper to confirm)
L63 “The classification accuracy for Cecropia hololeuca species reached 97%, with an IoU of 0.86.” This is a bit too much technical detail at this point.
L68-69 “Based on these studies, applying…” à Based on these studies, we conclude that applying…
L80 with “studies on the volume estimation”, do you mean “studies estimating wood volumes”?
L81 It is not clear here whether these studies were done in the alpine treeline ecotone, or in the forest below.
L82 remove the sentence referring to the figure (just add (Fig 1). Instead pleas explain here what “krummholz” means to you. How do you define it? And the same for “forest” This is quite relevant in a treeline context.
Figure 1 Please provide a self-explanatory caption. E.g. “Location of the treeline ecotone study area in the Xue Mountain glacial cirques in Shei-Pa National Park (top-right map) in north-central Taiwan (top-left map). The red marker in the aerial image (bottom-left map) indicates….. The digital elevation model shown in the bottom-right image shows the same area as the aerial image and covers the entire study area” Instead of having to refer to the “top-left map” you could also add letters to each panel.
Methods & results: please write the methods and results sections in the past tense
L89 How was the optimal model selected, did you validate the classification at all? How did you obtain the “ground truth”- from the flow chart fig 2, it looks like you obtained it from the image itself, rather than from survey data? Survey data are not in the flow chart at all.
L93 I suggest writing “is reported to be within 3 meters”
Figure 2: This is a nice figure (but the caption needs to explain what this is a research flow for (e.g. “Research flow for classifying Worldview images of a treeline ecotone on Mt Xue in Taiwan for detecting changes in forest cover.”). A detailed question: in section 7, the two images that are subtracted indeed look a bit different, but in Figure 6, they look much more similar. What explains this discrepancy? Maybe in 6, b and c are accidentally the same image..?
L104 “and GPS was used to record survey points” This addition seems out of place…
L107 2.4 Vegetation indices
L115-116 This reads like it came out of the research proposal. Here, please use the past tense.
L149 “calculated using the following formula:” and then no formula follows…
L173, Fig 3 With “ground truth data” you mean manually classified images for training and validation? That is not really ground truth, is it? Maybe call this “labelled data” instead?
L175 Can you define these classes? For example what is “bare land”? Rocks? Soil? Does it include alpine vegetation other than krummholz?
L176 “The dataset was randomly split, with 80% used for training and validation and 75% and 25% allocated for training and validation,”… something appears to be wrong here.
L194 Explain what the F1 score is, E.g. the accuracy of the models can be depicted by the F1 score, which exceeded… etc.
L195 what do you mean with “they tend to influence each other more”? That they are confused more often?
L199-203 It seems that you may be over-interpreting the differences in F1 scores between the combinations. Most differences seem quite minimal. I would suggest writing that the combinations performed similarly well, pointing only at the larger differences that may actually mean something in terms of model performance (for the U-net model). It would also help to remind the reader what features are included in what combination, as this teaches us something about the importance of e.g. texture for recognizing different cover classes.
L204 You just presented some accuracy metrics, and here you suddenly present other accuracy metrics; that is a bit confusing. Maybe ad “overall accuracy” (as opposed to the class-wise F1 scores), but I recommend just presenting the results here (“Similar to the accuracy patterns for the individual classes, in the U-Net models, the overall classification accuracy improved as the number of features increased, whereas for the RF model this was not the case (Table 5)”), not the table as such.
Figure 4 Pleas repeat here what the different feature combinations were
Table 5 Explain abbreviations and combinations and number in brackets and bold font in the caption
L214 “The treeline is determined based on the boundary between bare land, forests, and krummholz.” – I suggest using “The upper limit of the treeline ecotone”, and explaining better how you define this line, since now there are three classes, and those will all three be heterogeneously spread across the landscape….Since treeline is defined by trees and the ecotone is generally also described by the patterns of tree cover, the accuracy of forest detection seems to be the most important, so in line L221 you could add behind “texture features were relatively less important” “, as also suggested by the low F1 scores for combination 3 (spectral + texture; Fig 4), although for forest, in particular, texture strongly increased the classification accuracy relative to providing just spectral information, but vegetation indices increased it more”.
L218 remove the sentence that introduces Fig 5, just add (Fig 5)
L221 remove the line break, it does not look like a new paragraphs should start here. If anything, start it with Using these 61…
L222 Why is it important to reduce training time? Is it more important to reduce training time than to get a better model fit? I guess this could become important if one would want to apply the model more broadly, but for your particular application it does not appear to matter. On the other hand, there is the concept of parsimony in model selection, i.e. select the model with the best fit, but penalizing for model complexity, i.e. make the model as complex as necessary, but no more. Is this concept related, does it apply to machine-learning models? If the last features add 5% of accuracy, that is the same order of magnitude as you effect size (the change in forest cover), so those 5% may be relevant…? You see, I am a bit confused. Perhaps it needs a bit more explanation why you decided to use feature selection.
L223 Remove the “additionally. That difference is not a difference…
Table 6 Could you provide the training time in hours, so the it is easier to understand the order of magnitude?
Figure 6 These maps are not so informative. Maybe one bigger map with the classes and including missclassifications would show better how good the classification is. The “Ground truth” looks like another automatic classification, please explain well in the methods how you got to this map, and maybe call it something other than “ground truth”…
Figure 7 Please print as big as possible, it is very hard to see anything on these images. Could you draw the boxes shown in b and d in the images in a and c, and remove the big white boxes with 1 and 2 (they block the view)? Also, the symbols for the field investigations are hard to see. It would also be more informative if not only the location, but also the vegetation types of the field survey points would be shown. As it is , it is unclear what the field survey data are. Please explain these data in the methods section and again briefly in the figure caption and in line 234-235. In any case, to be able to evaluate the fit, the images would need to be much bigger. You could also plot the fits of the field (real ground truth!) and the image classification in a separate graph.
L232 Decadal changes in the treeline ecotone
L235 avoid using tree line, stick to treeline or, even better treeline ecotone
L235 Since you have not explained how you define treeline, this statement is impossible to follow.
L236 “Over a decade, the proportion of forest and shadow areas increased by 3.4% and 8.5%,….” Really? The proportion of shadow area increased in the last decade?? Obviously this is just a matter of lighting when the image was taken, so I recommend rephrasing this result.
L239 how did you define the elevation distribution of the forest? The uppermost forest pixels?
Figure 8 caption: between 2012 and 2021. It would also be helpful to see the persistent forest cover here.
L238, Table 9: it may be better to express the area changes in e.g. ha, instead of km2, to get nicer numbers.
L240-241 “with the most significant changes occurring in the 3,500 to 3,600 m range.” If this is where most of the treeline ecotone was, it would be worth mentioning this here
L241 and/or L308 “In comparison, the most stable area was observed in the 3,700 to 3,800 m range.” – explain here that hardly any forest was found here at any time.
Table 9: please also provide the forest area in each belt in 2012 and the % change
Discussion
L256-267 & 275-288 As a general advice, better start with you results and then contrast or align it with other studies, rather than the other way around. Now these paragraphs read a bit like a second introduction, again listing studies without any obvious logical connection between them. If you start with your results, you can use connections like “In contrast…”, or “Likewise” to make a clearer connection between other people’s findings and your results.
L271 Was there any relationship between the treeline pattern and the local change? E.g. did abrupt treeline stay more stable than ones with tree islands? A bit more ecological interpretation of the patterns found would be interesting (e.g. relationship with topography).
L275-288 Can you discuss here what the difference is between feature selectin based on the overall accuracy and feature selection based on the accuracy of your target land-cover class (which was obviously not shadow, but forest?
L292 Maybe do not mention shadow here, since that is more a no-data area than a land-cover class…
L294-295 These numbers are very technical for a conclusion section…
L296 “SEVI, Y, B, G, and NDVI2.” Here, since readers may read the conclusions without having read the whole paper, you might want to explain the abbreviations.
L297 Again, this cannot be understood without an explanation about what the survey data are and how treeline was defined.
L299 denser or expanded?
L299 at higher elevations
References: please ident them (the hanging parts of each reference, rather than the main author name) to make them easier to navigate
References cited
Garbarino, M., D. Morresi, N. Anselmetto, and P. J. Weisberg. 2023. Treeline remote sensing: from tracking treeline shifts to multi‐dimensional monitoring of ecotonal change. Remote Sensing in Ecology and Conservation 9:729-742. https://doi.org/10.1002/rse2.351
Körner, C., and G. Hoch. 2023. Not every high-latitude or high-elevation forest edge is a treeline. Journal of Biogeography 50:838-845. https://doi.org/10.1111/jbi.14593
Malanson, G. P. 2024. Inclusions and exclusions in treeline definitions. Journal of Biogeography 51:54-56. https://doi.org/10.1111/jbi.14729
Citation: https://doi.org/10.5194/egusphere-2025-969-RC2 - AC1: 'Reply on RC2', G. G. Wang, 29 Jun 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
292 | 68 | 22 | 382 | 13 | 30 |
- HTML: 292
- PDF: 68
- XML: 22
- Total: 382
- BibTeX: 13
- EndNote: 30
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1