Technical note: A framework for causal inference applied to solar radiation and temperature effects on dissolved gaseous mercury
Abstract. Environmental science usually requires researchers to rely on observational data alone. However, researchers want to identify causal relationships and not only correlations between pollutant behaviour and other environmental factors such as weather. Previously it has been shown that solar radiation associates with the volatilisation and evasion of the hazardous pollutant mercury from sea surfaces into the atmosphere. Statistical and machine learning methods can help find and quantify such associations. However, association does not imply causation, and inferring causal relationships from observational data alone remains a significant challenge. Here, we aim to create an 'easy-to-follow' framework, to be used by environmental researchers, for using prior scientific knowledge encoded as graphical causal models to enable causal inference and to estimate effect sizes of different related factors using collected field data. We demonstrate the framework through a case study estimating the effect sizes of solar radiation and sea surface temperature on dissolved gaseous mercury (DGM) in seawater measured at the west coast of Sweden. Our causal analysis reveals that 32 % of the total effect of solar radiation on DGM is mediated indirectly via changes in sea surface temperature. Wind and instrumentation acted as confounders, biasing effect estimates by 4.5 %. Results from the case study show that our proposed framework allows for a rigorous design, validation, and reporting of causal inference in environmental science. It shows potential in modelling causes of pollutant dynamics and quantifying the effect of regulating policies such as the Minamata Convention For Mercury.
The paper introduces a Bayesian graphical causal inference framework to investigate solar radiation and temperature effects on dissolved gaseous mercury (DGM) concentrations. This is an exciting contribution with clear potential to advance environmental data analysis. However, major revisions are required to ensure that the method is applied following best practices and clearly communicated to a broader audience in environmental sciences who may not have a statistical background.
Major Comments
The study does not explicitly demonstrate that frequentist methods fail or that Bayesian inference provides a clear empirical advantage. No comparison is made (e.g., between regression or structural equation models and their Bayesian alternatives) to show instability or bias under a frequentist framework. Since Bayesian methods are technically more complex, the manuscript should clarify when and why they are preferable and under what conditions their use provides meaningful benefits.
The authors claim that previous studies suffered from temporal limitations. While this study uses high-frequency data, the model itself does not incorporate time as a structural or dynamic dimension—it treats each time step as an independent observation. The manuscript should clearly explain how this approach differs from earlier studies and whether the higher temporal resolution truly enhances inference or simply provides finer data granularity.
The assumption of a Normal likelihood for C_{MW}is weakly justified. While the Normal distribution is commonly used, its prevalence does not imply appropriateness; the appeal to the Central Limit Theorem oversimplifies environmental concentration data, which are typically multiplicative and right-skewed -- Figure 11(e) shows a long-tailed distribution. The authors could either demonstrate that residuals are approximately normal (supported by residual–fitted value plots) or acknowledge this limitation and discuss whether a log-normal likelihood would be more appropriate.
For model m4, the paper discusses indirect effects through Sol → T_S → C_{MW} and Sol → W → C_{MW} but omits the valid multi-step path Sol → T_S → r_W → C_{MW}. The authors should clarify whether such compound mediation effects are included in the total indirect effect and provide clearer guidance on interpreting direct, indirect, and total effects from the DAG.
The causal conclusions rely on the correctness of the assumed DAG structure in many aspects, in addition to independence, mis-specified relationships or omitted variables - such as unmodeled nonlinear effects or unobserved confounders - could lead to misleading causal inferences. The authors should discuss the potential impact of those DAG misspecification.
Minor Comments
The priors (e.g., Normal(0.5, 1), Normal(0.5, 0.5)) appear somewhat arbitrary and not elicited from domain experts. The study would be strengthened by (a) justifying these priors through expert input or empirical reasoning, or (b) using uninformative priors.
Please clarify how model convergence was assessed under the Bayesian MCMC framework. Including trace plots or diagnostics is important for verifying convergence. A useful reference is: Reich, Brian J., and Sujit K. Ghosh. Bayesian Statistical Methods. Chapman and Hall/CRC, 2019.
Both R2 and WAIC are reported and appear consistent. However, if they diverged, how should this be interpreted? A short explanation of their conceptual difference would improve clarity.
Figure 13(b) seems to show narrower confidence intervals than (a), but this is hard to discern. The figure could be redesigned for better contrast. Also, revise the phrasing “noisier but also more reliable,” as “noisier” typically suggests lower precision.
The rationale for preferring graphical causal models over alternatives (e.g., Granger causality, potential outcomes) is generally sound. Graphical models do enhance transparency and facilitate the integration of mechanistic knowledge. However, they do not eliminate assumptions or guarantee correctness. Traditional causal frameworks are not inherently “non-transparent” but rely on different theoretical foundations. Acknowledging this nuance would make the argument more balanced.
Appendix E Figure E1, used to validate statistical independence, could be clearer. Adding fitted lines with distinct colors for different temperature levels would improve readability and interpretation.