the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The Spatio-Temporal Visualization Tool HMMLVis in Renewable Energy Applications
Abstract. In this work, we present HMMLVis, an original visualization tool for multivariate Granger causal inference. More precisely, for heterogeneous Granger causality to infer causal relationships in time-series following an exponential distribution. HMMLVis is easy to use and can be applied in any scientific discipline exploring time series and their relationships. In this paper, we focus on climatological and meteorological applications. The visualization tool is demonstrated on different types of applications related to meteorological events on the upper/lower tails of the respective distributions using a renewable energy (wind, PV), air pollution, and the EUMETNET postprocessing benchmark data set (EUPPBench) and different temporal horizons. We demonstrate that the HMMLVis method and visualization depicts the known causal and detects causal relations in the temporal dependencies which are additional important information for the respective cases. We believe that HMMVis as an interpretable visualization tool will serve climatologists or meteorologists and in this way it will contribute to knowledge discovery in these scientific fields.
- Preprint
(2134 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2024-3126', Anonymous Referee #1, 02 May 2025
reply
-
CC1: 'Reply on RC1', Irene Schicker, 17 Nov 2025
reply
We thank the reviewer for the careful reading of our manuscript and the constructive suggestions. We want to address three points, in particular, we (i) clarified all acronyms and notation at first occurrence, (ii) expanded the description of the semi-synthetic datasets used in the case studies, including their construction, validation, and limitations, and (iii) added a new figure that directly compares semi-synthetic and operational data for wind power, PV power, and global horizontal irradiance (GHI).
Reviewer comment 4: “The manuscript contains several instances where acronyms (e.g., EUMETNET) are introduced without first spelling out the full name. This is not compliant with standard academic writing practices. Please ensure that all acronyms are introduced in full upon first use, followed by the abbreviation in parentheses.”
Response:
We agree and have revised the manuscript to consistently introduce all acronyms at first occurrence by spelling out the full name followed by the abbreviation in parentheses. This includes, but is not limited to, EUMETNET (European Meteorological Network), GHI (global horizontal irradiance), PV (photovoltaic), ERA5 (ECMWF Reanalysis v5), CAMS (Copernicus Atmosphere Monitoring Service), HMML (Heterogeneous Minimum-Message-Length) and HMMLVis. We also checked the notation for all symbols and indices and ensured that they are defined where they first appear in the text.
Reviewer comment 7: “The paper uses semi-synthetic datasets (e.g., for PV and wind), but the construction process needs to be described in more detail. How realistic are these datasets? What modeling assumptions underlie their creation, and what uncertainties are introduced?”
Response:
We thank the reviewer for pointing out that our description of the semi-synthetic datasets was too brief. We have substantially expanded the corresponding subsection in the Data and Methods section and added a validation figure (Fig. X for now, see attached PDF) to document the realism and limitations of these datasets.
In the revised manuscript, we now distinguish clearly between:
- Wind power case (onshore wind farm):
- We start from ERA5 reanalysis wind fields and downscale them to hub height at the turbine locations using a standard vertical interpolation and site-specific adjustment.
- These wind speeds are converted to power using the manufacturer’s power curve and the installed nominal capacity for each turbine.
- The resulting “ERA5-synthetic” daily power time series is then compared against anonymized operational turbine data from the same wind farm for the period 2016–2020. We aggregate to daily values, remove days with obvious curtailment plateaus, and normalize both series to [0–1] for anonymization.
- The new Fig. X (left column) shows normalized monthly means, a density-coloured scatter plot, and the probability density functions. The correlation between daily measured and ERA5-synthetic power is r ≈ 0.91, and the distributions agree well over most of the range. This demonstrates that, for this particular site and period, ERA5-based semi-synthetic power captures the observed daily-to-seasonal variability sufficiently well for our methodological demonstration.
- PV power case (utility-scale PV plant):
- We construct semi-synthetic PV production from ERA5 plus CAMS radiation and atmospheric composition fields, combined with a simple PV performance model and the known installed DC capacity and orientation of the plant.
- Again, we compare daily values against measured plant output, normalize both series to [0–1], and show monthly averages, daily scatter and distributions in Fig. X (right column).
- The daily correlation between measured and ERA5+CAMS-synthetic PV power is r ≈ 0.98, and the distribution of normalized daily power is very similar, indicating that the semi-synthetic PV dataset realistically reproduces both the seasonal cycle and day-to-day variability.
- GHI case (radiation station):
- For GHI we use ERA5+CAMS semi-synthetic irradiance at a reference radiation station and compare it to long-term measurements.
- Here the correlation of daily values is r ≈ 0.99 and the distributions are almost indistinguishable (Fig. X, middle column), reinforcing that the semi-synthetic series are representative of realistic surface radiation conditions at this site.
We include a more explicitly statement in the introduction and discussion that:
- The semi-synthetic datasets are constructed to be realistic but not perfect representations of operational data;
- Their synthetic nature implies that our conclusions are primarily about the behaviour and usefulness of HMMLVis for exploring causal relations in realistic multi-variable time series, rather than about quantifying exact performance of specific power plants;
- Uncertainties arise from reanalysis biases, the simplicity of the power-conversion models, and remaining curtailment and data-quality effects, and these are briefly discussed in the revised text.
Overall, the new description and validation figure clarify how the semi-synthetic datasets are built, document their realism with respect to the available operational data, and delimit the scope of the conclusions drawn from these case studies.
Caption for the additional figure:
Figure X. Comparison of measured (black) and semi-synthetic (red) energy-relevant time series used in the HMMLVis case studies. Left column: normalized daily wind power for the reference wind farm, derived from ERA5 downscaled wind speed at hub height and converted using the turbine power curve (“ERA5-synthetic”). Middle column: daily global horizontal irradiance (GHI) at a radiation station, based on ERA5+CAMs (“ERA5+CAMS-synthetic”). Right column: normalized daily PV power for a utility-scale PV plant, constructed from ERA5+CAMs inputs and a simple PV performance model. For each case, the top row shows normalized monthly means, the middle row shows daily scatter plots with density shading and Pearson correlation coefficients, and the bottom row compares the probability density functions of normalized daily values. Overall, the semi-synthetic datasets reproduce the observed daily-to-seasonal variability well and are therefore suitable for demonstrating HMMLVis on realistic yet anonymized time series.
- Wind power case (onshore wind farm):
-
CC1: 'Reply on RC1', Irene Schicker, 17 Nov 2025
reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 704 | 112 | 65 | 881 | 21 | 50 |
- HTML: 704
- PDF: 112
- XML: 65
- Total: 881
- BibTeX: 21
- EndNote: 50
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments
This manuscript presents HMMLVis, a novel and thoughtfully designed visualization tool for heterogeneous Granger causal inference in multivariate time-series data. Built upon the heterogeneous graphical Granger model (HGGM) within a generalized linear model (GLM) framework and employing Minimum Message Length (MML) principles, the tool aims to support the discovery and interpretation of causal relationships in complex time-dependent datasets. This is a timely and well-motivated contribution, particularly as interest in data-driven causal inference grows within Earth system sciences.
The tool is demonstrated across several domains—ranging from renewable energy and air pollution to meteorological benchmark datasets such as EUPPBench—and is accompanied by a PyQt-based graphical interface that lowers the barrier for non-expert users. The interdisciplinary nature of this work is commendable, and the software addresses a real need for interpretable and accessible causal analysis in environmental applications.
That said, while this study reflects a strong technical effort, the focus of the manuscript—primarily on software development and visualization—appears misaligned with the scientific scope and editorial aims of Geoscientific Model Development (GMD). GMD is dedicated to the development, evaluation, and application of models in the geosciences. A large portion of this manuscript, especially Sections 5 and 6, centers on user interface features, visual options, and layout design rather than the advancement of geophysical modeling. In its current form, the paper may be better suited for a journal dedicated to scientific software development or computational tools.
Moreover, the clarity, organization, and presentation of the manuscript require significant improvement to meet publication standards. Several formatting inconsistencies, unclear figures, and missing definitions reduce overall readability.
Specific Comments
1. Scientific Scope and Fit for GMD:
While causal inference in environmental sciences is a relevant topic, the primary focus of this work is software visualization, and its main contributions lie in user-interface design and graphical rendering of beta coefficients. The paper does not deeply engage with geophysical model development or novel methodological contributions to causal modeling itself. For this reason, the manuscript may be better suited to journals such as : EGUsphere preprints, or other open source software journal.
2. Visualization and Readability:
Many of the figures (e.g., GUI screenshots, wind rose plots) have awkward scaling or inconsistent proportions, which affects readability. It is recommended to resize and standardize image layouts, especially in Figure 6 and others, to improve visual clarity.
3. Sections 5 and 6 – Placement:
These two sections are heavily focused on user-interface walkthroughs and technical instructions. While informative, they resemble a user guide rather than scientific content and would be more appropriate as supplementary material. The main paper should focus on the scientific rationale, methodological innovations, and evaluation results.
4. Terminology and Acronyms:
The manuscript contains several instances where acronyms (e.g., EUMETNET) are introduced without first spelling out the full name. This is not compliant with standard academic writing practices. Please ensure that all acronyms are introduced in full upon first use, followed by the abbreviation in parentheses.
5. Motivation for HMML and MML:
The rationale for using heterogeneous Granger models and MML-based feature selection should be articulated more clearly for readers unfamiliar with these frameworks. What concrete limitations of classical Granger models does HMML overcome, especially in environmental data contexts?
6. Model Evaluation:
While the tool is applied across several datasets, the evaluation lacks clear performance metrics or validation benchmarks. For example:
How does the tool’s output compare with known or simulated ground-truth causal structures?
Are the inferred relationships stable across time windows and locations?
Could precision/recall, consistency, or information gain be reported?
7. Synthetic Data Use:
The paper uses semi-synthetic datasets (e.g., for PV and wind), but the construction process needs to be described in more detail. How realistic are these datasets? What modeling assumptions underlie their creation, and what uncertainties are introduced?
8. Link Functions and GLMs:
More explanation is needed regarding the choice of link functions and distributions within the exponential family. Were they selected empirically, or based on expert input? Was model fit compared across different options?
9. Scalability and Runtime:
Please provide information on the computational cost of using HMMLVis, particularly with sliding windows and larger variable sets. How long does one window take to process?
Minor Comments
1. Throughout the manuscript, some notations (e.g., β\betaβ, η\etaη, indices i,j,ti, j, ti,j,t) are inconsistently formatted—sometimes in plain text, sometimes in math mode. Ensuring typographic and notational consistency across all equations and text would improve professionalism.
2. While most figures are labeled, some captions could be more descriptive to help readers interpret them without referring back to the main text. For example, indicate clearly what variables or locations the time series refer to, what color scales represent, or whether the visualizations correspond to real or synthetic datasets.
3. The abstract currently blends methodology and application without clearly delineating the main contribution. Consider restructuring it into: (1) motivation, (2) method, (3) key results, (4) broader implications—to improve clarity and impact for readers scanning the abstract alone.
4. Some sentences are overly long or have ambiguous phrasing, particularly in Sections 2 and 4. For instance, compound sentences mixing mathematical definitions and explanatory text can be split for better readability. A careful language edit would improve clarity.
5. Check reference formatting for consistency (e.g., Behzadi et al. (2019) vs. Behzadi et al., 2019). Ensure all references are cited in a consistent style and match the GMD citation standards.