Beyond the simple mean: an alternative way to improve multi-model bottom-up wetland CH4 estimates
Abstract. Wetlands are the largest natural source of atmospheric methane, yet bottom-up estimates of their emissions remain highly uncertain due to structural differences, parametric uncertainties among process-based models and strong environmental heterogeneity. In addition, most wetland emission models are not systematically calibrated against methane flux measurements, limiting their ability to capture realistic spatial and temporal dynamics. The increasing availability of eddy-covariance towers measuring CH4 fluxes across diverse wetland types now offers new opportunities to evaluate and constrain model ensembles using observational data. Multi-model ensembles are commonly used to quantify uncertainty, but the widespread use of simple model averaging implicitly assumes that all models contribute equally and optimally across sites, an assumption that is rarely justified. Here, we present a data-driven framework that moves beyond the simple mean to derive site-adaptive ensemble estimates and to characterize spatial patterns of model disagreement, and how they are reshaped by ensemble weighting, in wetland CH4 emissions.
Using flux observations from a global network of 44 wetland sites, ranging from boreal arctic to tropical wetland ecosystems, and simulations from sixteen global wetland biogeochemistry models, we estimate site-specific optimal ensemble weights via a Bayesian model averaging framework fitted by Expectation–Maximization (EM-BMA). To improve robustness, weights are stabilized through resampling, and sites are clustered based on their stabilized weight signatures in compositional space, yielding groups of locations with similar model skill structures. Within each cluster, we perform cross-validated predictions and compare EM-BMA against simple model averaging (SMA) using standard performance metrics (R2, normalized RMSE, and mean bias).
To interpret these clusters and to delineate the conditions under which different model combinations are best fitting the site measurements, we relate cluster membership to environmental predictors. We characterize cluster-specific validity domains in predictor space using low-dimensional projections and geometric and probabilistic envelopes, and we identify the most influential predictors using a machine-learning classifier with back-projected feature importance.
We show that optimal model combinations vary across sites and that Bayesian model averaging outperforms simple model averaging in cross-validation. When propagated beyond the measurement sites using environmental predictors and applied to model diagnostic outputs, the resulting global wetland CH4 emission estimates differ only slightly from those obtained with simple averaging (less than 5 %). However, substantial differences emerge at local and regional scales, highlighting the importance of accounting for spatial heterogeneity in model skill. This framework therefore provides a transparent and reproducible alternative to equal-weight ensemble means for improving bottom-up wetland CH4 estimates across heterogeneous environments.