the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Mapping Wetland Probability Across Massachusetts with Machine Learning and Multiscale Predictors
Abstract. Wetlands perform a vital array of ecosystem functions, but up to 50 % of global wetlands have been lost and those that remain are under ongoing threat from development pressures. Accurate and comprehensive maps are critical for the management and protection of wetland resources. Conventional methods for wetland mapping are time consuming and resource intensive, and the common mapping methods that rely on the inspection of aerial imagery often miss forested and other wetland types that do not have a distinctive visual signature, i.e. cryptic wetlands. The use of machine learning and spatial data to map wetlands is a growing field that promises a fast and efficient complement to conventional methods and improved detection of forested and other cryptic wetlands. In this paper we demonstrate the use of a random forest model to generate a large-scale, state-wide map of wetland probabilities in the Commonwealth of Massachusetts, using widely available open source software and publicly accessible data. Through this model we also test the efficacy of multi-scale predictors, including not only terrain derivatives used in previous research but also multi-scale implementations of soil, vegetation, and spectral data. The random forest was trained on the official Massachusetts wetland inventory, and achieved an overall accuracy rate of 92 % relative to that dataset. The model showed particular promise in detecting cryptic wetlands by identifying an additional 40 % of probable wetland area statewide, and an additional 46 % of forested wetland specifically. The use of diverse multi-scale predictors was supported by model performance, variable importance measures, and the feature selection process. This strategy for improving detection of cryptic wetlands and creating better estimates of wetland extent, using non-proprietary software and data, will be a vital adjunct to conventional methods for wetland mapping and monitoring.
- Preprint
(4172 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3326', Anonymous Referee #1, 02 Feb 2025
General Comments:
This study presents a wetland probability mapping approach for the Commonwealth of Massachusetts using random forest algorithms, demonstrating promising performance. The objectives are clearly stated and well-explained, and the figures are of high quality. However, the study's innovation and the rationale for selecting RF over other machine learning algorithms are not clearly articulated. Additionally, the section labelling is somewhat confusing. Below are some specific comments for further improvement.
Specific Comments:
- Page 2, Paragraph 2 (“The use of machine learning …”)
This paragraph mentions that multiple machine learning algorithms have been used for wetland mapping, but it does not clearly highlight how this study builds upon or differs from previous research. It would be helpful to explicitly state the novelty of this work. - Page 2, Paragraph 4 (“The growing field of digital …”)
Toward the end of this paragraph, the authors mention that RF has been used for wetland mapping in previous studies. However, a more detailed explanation of RF’s performance in these applications would strengthen the argument. Why was RF chosen over other machine learning methods? Adding one or two sentences addressing this would improve clarity. - Figure 1 (“Prediction layer (0-1)”)
The meaning of "0-1" should be clarified. Does it represent probability values, a binary classification, or something else? - Pages 6-7
The authors describe the predictors and data sources used in the study, but the relationship between these predictors and wetland probability mapping is not well explained. Why were these particular variables selected? Providing a rationale would enhance the understanding of their relevance. - Page 9
The section title may be more appropriately labelled as “Data Split,” as model fitting, validation, and testing are discussed in more detail in subsequent sections. - Page 11, Section 2.4 (“Model Validation and Testing”)
Adding a reference to support the explanation of Out-of-Bag (OOB) validation would be helpful. - Page 14, Section 3 (“Results”)
The study mentions the number of trees used in the RF model, but it is unclear whether other hyperparameters were tuned. Providing details on hyperparameter optimization would be beneficial. - Page 22, Last Paragraph (“Results show that the model …”)
The model failed to identify certain types of wetlands. Can the authors summarize potential reasons for this? - Page 23 (Extrapolation of Results)
This research is highly valuable. However, can the method be applied to other regions with limited data availability?
Technical Corrections:
- Section numbering is inconsistent—Sections 1.2 and 2.1 are missing, and there are two Sections 2.3. This should be corrected for clarity.
- Page 7, First Line: "Planoform Curvature" should be corrected to "Planform Curvature."
- Table 2 (“In Final Model” Column): Only the last two rows include units. Units should be consistently provided or not provided for all values.
Citation: https://doi.org/10.5194/egusphere-2024-3326-RC1 - Page 2, Paragraph 2 (“The use of machine learning …”)
-
RC2: 'Comment on egusphere-2024-3326', Anonymous Referee #2, 20 Feb 2025
General comments
The manuscript presents a study on mapping the wetland probability score across Massachusetts using machine learning (i.e. random forest) and multiscale predictors. The topic is of interest, and the writing style is clear and concise. However, the novelty of the study is not clearly presented, and further explanation and discussion of the multiscale predictors is required.
Specific comments (Major)
- The authors utilize the existing wetland map (2005 MassDEP Wetlands layer, MDW) as the target, employing various predictors taken approximately 15 years later (e.g. NDVI, NDWI from Landsat, NDVI from MassGIS Leaf-Off Aerial Imagery, vegetation height from MassGIS LiDAR). Might there be any discrepancy between the datasets, given that the authors have noted that the wetland area is not stable over time?
- The RF model was trained using the MDW wetland map. However, it is not clear how the authors distinguish between the different wetland types. For example, there is no explanation of how the forested wetland areas were identified in their output (see page 13, line 55 as the model produces a 0-1 probability score and not a classification...)
- The authors intend to generate a wetland probability score map with a spatial resolution of 4m. However, despite utilizing MassGIS data (LiDAR and Leaf-Off Aerial Imagery with a spatial resolution of 4m), the model performance appears to be dominated by predictors with much coarser spatial resolution (see Figure 5). It is interesting that the 4m-predictors have the lowest variable importance, with the exception of the Hydric_Soils_4m, which is less than half the importance of Hydric_Soils_60m. It would be beneficial if the authors could provide an in-depth explanation here of why scale matters from a physical perspective. Additionally, if the final output is generated at a different spatial resolution, e.g., 12m or 60m, and the same threshold (i.e.,> 0.5) is applied, will the result differ and affect the conclusion regarding the increased 40% of probable wetland area statewide?
Specific comments (Minor)
Sorted by page and line number order, with comments on figures and tables at the end.
- page 4 line 15: It should be section 2.1, unless the authors somehow missed section 2.1.
- page 7 line 55 -60: I understand that the indexes like slope, DTW, TWI are very commonly used in the field, but some equations here might help more ordinary readers.
- page 7 line 70: What is 'surficial geology' (Surf_Geo), and why does the author only use it at a resolution of 4 m? What is its role here?
- page 7 line 73: First-time abbreviation usage: NDVI. Again, an equation will help to explain NDVI and NDWI.
- page 7 line 73-75: Which date range the LANDSAT 8 imagery was utilized in this study, 2021? Why the time-series of LANDSAT 8 is not used but only the mean and median, e.g. in Zhang, X et al., (2024)?
- page 7 line 75-82: It is suggested that the author should provide details on the MassGIS data, including the temporal resolution of the Leaf-Off Aerial Imagery and its processing.
- page 14 line 76: Is the abbreviation DEP same as MDW? Otherwise, the author should avoid using DEP and MDW interchangeably.
- page 14 line 77 and line 84: Is 'MassGIS 2016 Land Cover / Land Use' same as 'MA 2016 High Resolution Land Cover dataset'?
- page 15 line 02: Model performance met or exceeded all expectations for predictions... what is the expectation here?
- Page 22 line 06: The model was improved by inclusion of Hydric Soils ... It is important to evaluate the model's performance without these features to confirm that their inclusion led to improvement. Additionally, the higher importance score of hydric soil at a coarser spatial resolution should be examined. The author should expand the discussion to explore these aspects in greater depth.
Technical corrections
- Figure 2: What is the label unit in the second row? The unit for Cartographic Depth to Water (DTW) appears to be 4 km², 16 km², or 64 km², but this is inconsistent with Table 2, where DTW is measured in pixels.
- Table 1. The unit ac can be removed in the second column, or add it everywhere else. The Holdout area in the last row (21%) is not correct, 1,226,251/5,189,682 = 0.236
- Table 2 lists only 25 predictors under the 'In Final Model' column, missing B3_4m and DTW_16k compared to Figure 5. Potential mismatches (not exhaustive):
- DEV_12m in Table 2, but not in Figure 5
- DIF_12 m in Table 2, but not in Figure 5
- Veg_height_4m in Table 2, but not in Figure 5.
- Ksat_60 m appears in Table 2, but only Ksat_12 m in Figure 5.
- B2_12m in Table 2, but only B2_4m in Figure 5.
- Slope_12m and Slope_60m in Table 2, but only one slope in Figure 5.
Additionally, the scale units for Hydric_soil (%), B2 and B4 (-), Slope (degrees), NDVI (-), and NDWI (-) are incorrect.
- Figure 7. First-time abbreviation usage: WPS.
- Citations: page 25, lines 9 - 14: Both Halabisky et al. (2022) for the open discussion and Halabisky et al. (2023) for the final version are cited here. While this is fine, I suggest only using the final version. Corresponding to page 3, line 91, the citation (Halabisky et al., 2022) should be updated to (Halabisky et al., 2023).
Reference
Zhang, X., Liu, L., Zhao, T. et al. Global annual wetland dataset at 30 m with a fine classification system from 2000 to 2022. Sci Data 11, 310 (2024). https://doi.org/10.1038/s41597-024-03143-0
Citation: https://doi.org/10.5194/egusphere-2024-3326-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
247 | 39 | 20 | 306 | 3 | 4 |
- HTML: 247
- PDF: 39
- XML: 20
- Total: 306
- BibTeX: 3
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 156 | 56 |
China | 2 | 19 | 6 |
Netherlands | 3 | 16 | 5 |
Ireland | 4 | 15 | 5 |
Germany | 5 | 11 | 4 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 156