the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Identifying Important Features for Downscaling Soil Moisture to 1-km in the Contiguous United States
Abstract. Soil moisture is a fundamental state variable in climatology, meteorology, and hydrology. Many of the available soil moisture products have a coarse spatial resolution that is not useful for agricultural applications. This study used Random Forest to identify which features are most helpful for accurately downscaling soil moisture to 1-km resolution. Fourteen features were considered: precipitation, antecedent precipitation index, maximum daily air temperature, minimum daily air temperature, mean daily air temperature, diurnal temperature range, dew point temperature, elevation, slope, aspect, normalized difference vegetation index, leaf area index (LAI), soil texture, and land use/land cover. The analysis of variable importance was repeated using two different sources of soil moisture data (e.g., satellite-derived soil moisture from NASA’s Soil Moisture Active Passive (SMAP) and model-derived soil moisture from the North American Land Assimilation System (NLDAS)) and two different ways of representing soil saturation (e.g., volumetric water content (VWC) and percentiles). We found that dew point temperature is the most important variable for downscaling SMAP percentiles (0.18), NLDAS VWC (0.27), and NLDAS percentiles (0.17) over CONUS, while elevation is the most important variable for downscaling SMAP VWC (0.28). Dew point temperature is crucial for downscaling in most regions of the United States, except in the South and WestNorthCentral, where elevation is the most important feature. The accuracy of the downscaling varies by region. In the South, SMAP VWC and NLDAS VWC downscaling are relatively accurate, both have mean absolute errors of ~0.07. The MAE values in the South region are 0.196 for SMAP percentiles and 0.175 for NLDAS percentiles.
- Preprint
(1508 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-896', Anonymous Referee #1, 06 Jun 2025
The manuscript applies a conventional Random-Forest (RF) regression to downscale SMAP 9 km and NLDAS-2 12.5 km surface-soil-moisture data to 1 km across CONUS, using fourteen resampled ancillary predictors (precipitation, API, T-max/min/mean, DTR, dew-point, elevation, slope, aspect, NDVI, LAI, soil texture, LULC) and permutation IncMSE to rank feature importance. All predictors are aggregated with nearest-neighbor interpolation to the coarse grids, the RF is tuned with a narrow-randomized search and trained on 80 % of the daily data (3-fold random CV) from 2015-2021 (SMAP) or 2001-2021 (NLDAS), and validated on the remaining 20 % plus ~1 542 in-situ stations. The authors report that dew-point temperature (or elevation for SMAP-VWC) dominates importance at national and regional scales, whereas vegetation and land-cover variables contribute little. Predictive skill is modest: regional R values 0.23–0.57, R² < 0.32, ubRMSE 0.08–0.11 m³ m⁻³ for VWC and higher for percentiles, yet the paper concludes the downscaling is “skillful” and claims to be the first CONUS-wide feature-importance study using both satellite and model sources.
- Essentially the same RF-based downscaling studies already cover CONUS or large sub-regions by some of previous studies; repeating the exercise with a slightly different predictor list offers no demonstrable novelty, yet the manuscript asserts uniqueness .
• Nearest-neighbour resampling for continuous variables is chosen solely for speed , producing blocky artefacts and aliasing at 1 km; bilinear or cubic methods are standard.
• Random 80 / 20 splits with 3-fold CV ignore spatial and temporal autocorrelation ; withholding entire tiles or years is needed to obtain unbiased accuracy and feature rankings.
• IncMSE is known to inflate importance of correlated predictors; no conditional permutation, SHAP, or SAFE checks are provided, so the prominence of dew-point temperature may be an artefact.
• Hyper-parameter search is extremely shallow (n ≤ 200 trees, depth ≤ 110) ; state-of-the-art RF downscaling typically uses thousands of trees and tests split-rule options.
• Reported skill is weak (median R ≈ 0.4, R² ≈ 0.15, ubRMSE ≈ 0.10 m³ m⁻³) yet called “skillful” ; no baseline (coarse product, climatology, persistence) is shown, so added value is unclear.
• The 1 km product is validated against point sensors without representativeness analysis; many stations occupy heterogeneous pixels, likely inflating error.
• No comparison is made with deep-learning or ensemble downscalers (e.g. DPR, CNN-U-Net, XGB) that now achieve ubRMSE < 0.04 m³ m⁻³ over CONUS or SMAP 1-KM operational soil moisture product.
• Seasonal PDPs (Fig. 10) are interpreted physically, but dew-point and soil-moisture co-variation can arise from mutual dependence on precipitation, not causal influence; the discussion over-reaches.
• Tables 2-3 mix absolute errors for VWC and percentile metrics without normalisation; presenting errors relative to SMAP accuracy thresholds or soil porosity would aid interpretation.
• The manuscript omits uncertainty propagation from SMAP-L4’s data-assimilation bias corrections—critical when comparing with L4-derived in-situ “truth”.
• Several sections duplicate background already covered by Peng 2017 and Zhao 2018, and could be condensed for clarity.
Therefore, due to the lack of methodological innovation, limited skill gains, and multiple methodological weaknesses, the study does not meet the novelty or rigor expected for publication.
Citation: https://doi.org/10.5194/egusphere-2025-896-RC1 -
AC1: 'Reply on RC1', Eshita Eva, 04 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-896/egusphere-2025-896-AC1-supplement.pdf
- Essentially the same RF-based downscaling studies already cover CONUS or large sub-regions by some of previous studies; repeating the exercise with a slightly different predictor list offers no demonstrable novelty, yet the manuscript asserts uniqueness .
-
RC2: 'Comment on egusphere-2025-896', Anonymous Referee #2, 24 Jun 2025
This is an interesting manuscript focused on downscaling SMAP L4 and NLDAS-2 soil moisture data to 1-km resolution across CONUS. It applies random forest models trained on 9-km (SMAP) or 12.5 km (NLDAS) data. The resulting predictions are compared with observations from >1,000 in situ stations. This is an ambitious and worthy undertaking, but I think better execution is needed. There are inconsistencies and/or lack of clarity regarding how mismatches in spatial scales were handled between the ancillary data sources, the training datasets, and the in situ validation data. The implications and uncertainties associated with these scaling issues were not adequately addressed. The predictive power of the resulting models was limited (R2 < 0.26), and the conclusions regarding the relative importance of the 14 ancillary variables tested may have been biased by the data aggregation methods used. I have included 54 specific comments, questions, and edits in the attached pdf version of the manuscript.
-
AC2: 'Reply on RC2', Eshita Eva, 04 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-896/egusphere-2025-896-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Eshita Eva, 04 Aug 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
432 | 85 | 15 | 532 | 12 | 28 |
- HTML: 432
- PDF: 85
- XML: 15
- Total: 532
- BibTeX: 12
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1