Empirical evidence of spurious correlations among space weather variables
Abstract. This paper investigates the prevalence and identification of spurious correlations within space weather datasets, a critical concern given the complex inter-dependencies of nature of geophysical phenomena. This is carried out using daily-averaged galactic cosmic ray (GCR) datasets from MOSC and OULU neutron monitor (NM) stations analyzed separately, the large Forbush Decrease (FD) (FD > 3 %) and the small FD (FD ≤ 3 %) in each station, to account for the effects of 11-year solar cycle oscillations. For the first time, a statistical analytical method was employed to test the link between FD amplitudes and solar-geomagnetic variables in each dataset after the effects of 11-year solar cycle oscillations are filtered. We demonstrate that, while significant correlations between various space-weather indices and Forbush Decrease events are empirically observable, a meticulous analysis reveals that a subset of these relationships may not reflect true physical causality but rather arise from statistical artifacts or confounding factors inherent in the data. Specifically, analyses of Forbush Decreases often reveal varying correlation coefficients with geomagnetic and solar wind parameters, which can fluctuate significantly across time periods and cosmic-ray stations. For instance, correlations between Forbush Decrease amplitudes and interplanetary magnetic field strength, solar wind speed, and geomagnetic indices like Kp and Dst have been observed to exhibit both negative and positive trends, depending on the specific dataset and analytical approach employed. The results obviously show inconsistencies in the datasets for both MOSC and OULU stations for the large and small FDs, respectively – specifically, strong correlations were noticed for the parameters’ regression analyses after the effects of 11-year solar cycle oscillations were removed for both big and small Fds. These inconsistencies strongly suggest the influence of 11-year solar cycle oscillations on the FDs counted on both stations, thereby affecting the relationships between the FDs and the geomagnetic tested variables, echoing concerns about "spurious regression" in the stationary time series. Most of the results are statistically significant at a 95 % confidence level. The results obtained here imply that 11-year solar cycle oscillations have impacts on the GCR flux intensity.
Manuscript Title: Empirical Evidence of Spurious Correlations Among Space Weather Variables
Recommendation: Major Revision
Major Comments
•The methodology for removing solar-cycle oscillations is insufficiently described. A clear mathematical description of the filtering procedure is required for reproducibility.
•Autocorrelation and non-stationarity in time-series data are not adequately addressed. This may itself produce spurious regression results.
•The dramatic change in FD counts before and after filtering requires deeper justification and
sensitivity analysis.
•The manuscript relies heavily on p-values without sufficient discussion of effect sizes and physical interpretation.
•The claim of identifying spurious correlations should be moderated unless robustness tests (e.g., bootstrapping, cross-validation) are provided.
•Only Solar Cycle 23 is analyzed. Extending to additional cycles would strengthen generalizability.
•There is an inconsistency between the discussion of advanced methods (e.g., mutual information, AI) and the actual application of linear regression only.
Minor Comments
•Several grammatical and typographical errors require correction.
•Table numbering should be carefully checked for consistency. Some references appear duplicated, and formatting should be standardized.
Definitions of acronyms (e.g., SI, SSN) should be standardized at first use. Figures would benefit from additional quantitative statistical descriptors.
Overall Assessment
The manuscript addresses an important problem in space weather analysis, namely, the potential inflation of correlations due to long-term solar-cycle modulation. The study presents interesting findings and may have a substantial impact after methodological strengthening. However, significant revisions are required to improve statistical rigor, reproducibility, and clarity of interpretation before the work can be considered for publication.