Benchmarking Catchment-Scale Snow Water Equivalent Datasets and Models in the Western United States
Abstract. This study benchmarks a wide range of snow water equivalent (SWE) models and data products at a catchment scale in the Western US, and discusses an experimental protocol to facilitate community-wide intercomparisons. Utilizing lidar-based ASO (Airborne Snow Observatory) SWE estimates as a 'ground truth', this study evaluates the performance of multiple SWE products, including SNODAS, SWANN (4km and 800m), the US National Water Model (NWM), UCLA-SWE, SWEMLv2, NLDAS-2 (VIC, Noah, and Mosaic), ERA5-Land, Daymet and the CONUS404 dataset. We use SWE aggregated to hydrologic catchments as the standard spatial basis for assessment, focusing on multiple spatially-variable performance metrics. UCLA-SWE, SWANN (both 800 m and 4 km), and SWEMLv2 show the strongest agreement with ASO SWE, each achieving Kling–Gupta Efficiency (KGE) values above 0.6. SNODAS also performs competitively with these higher-performing models. The coarser-resolution products generally perform poorly at the catchment scale. Notably, ERA5-Land and the NLDAS-2 Mosaic and VIC models demonstrate strong skill for basin-average SWE (R² > 0.9), while the NLDAS-2 Noah model exhibits weak performance across both spatial scales. Noting the lack of a common community standard for SWE product and model evaluation, we use the results of the multi-dataset analysis to explore potential experimental protocols for a standardized SWE evaluation that could support community-wide intercomparison and benchmarking of existing and new SWE products. SWE datasets are a critical component in hydrologic prediction practices such as water supply forecasting, thus the use of experimental standards proposed herein could facilitate quantitative guidance for agency and stakeholder adoption of specific SWE products in decision support applications.