the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Weather Type Reconstruction using Machine Learning Approaches
Abstract. Weather types are used to characterise large–scale synoptic weather patterns over a region. Long–standing records of weather types hold important information about day–to–day variability and changes of atmospheric circulation and the associated effects on the surface. However, most weather type reconstructions are restricted in their temporal extent as well as in the accuracy of the used methods. In our study, we assess various machine learning approaches for station–based weather type reconstruction over Europe based on the CAP9 weather type classification. With a common feedforward neural network performing best in this model comparison, we reconstruct a daily CAP9 weather type series back to 1728. The new reconstructions constitute the longest daily weather type series available. A detailed validation shows considerably better performance compared to previous statistical approaches and good agreement with the reference series for various climatological analyses. Our approach may serve as a guide for other weather type classifications.
- Preprint
(11155 KB) - Metadata XML
-
Supplement
(3156 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-1346', Anonymous Referee #1, 04 Aug 2024
Review of " Weather Type Reconstruction using Machine Learning Approaches" by Pfister et al.
- General comments
The authors carry out a study for reconstructing Weather Types daily series back to the 1700s using a known WT classification with the use of several site measures and applying different Machine Learning Approaches to assess which one is more fit for the job.
The paper is well presented and written, however there are points that should be addressed:
- The paper is based on an assumption that is not stated explicitly and cannot be taken for granted: Weather Types stemming from atmospheric variables can be assumed to remain the same across centuries. (See specific comment L.30-31). If we believe this hypothesis to hold, the authors made little effort to characterize temporal trends in the occurrence of the WTs and assess whether, from their reconstructed series, there have been shifts in occurrences from one season to another. From a climatic stand point I think these are relevant features of your very long (200+ years) classification.
- The authors used CAP9 as classification but comment that two of the WTs can be considered similar/redundant and that is why CAP7 was preferred by a previous study which is often cited for comparison. It is unclear to me why CAP7 was not preferred over CAP9 provided that throughout the manuscript there are indications that having 9 WTs makes identifiability of WTs more complicated and prone to error.
- Evaluation metrics in the summer season are systematically lower, making one doubt if the Weather Type classification is suffering from an under-representation of the atmospheric variable amplitude which is typically low in the summer season and high in the winter season (i.e., PCA input data has not been normalized by the seasonal cycle standard deviation). This aspect is important and should be clarified (see specific comment L.306-307).
- Specific comments (section addressing individual scientific questions/issues)
L.12-13 – “In Europe, where daily weather is mainly governed by transient high and low pressure systems driven by the westerly jet stream”. It seems a bit simplistic, especially because there are differences between the north-northwest part of the domain, influenced by the Atlantic, and the south-southeast part of the domain, where that influence is smaller and the Mediterranean acts as a frontier between the warm south and cold north.
L.30-31 – “in order to study long-term changes in atmospheric circulation patterns and associated surface effects, long-term series of WT are needed”. I think this statement is debatable and not justified in the manuscript. There is an assumption behind it: Weather Type classification is an adequate way to analyze long-term changes in atmospheric circulation, and, more importantly, Weather Types are stationary, meaning that the same Weather Types are there in 1800 as well as in the 2000 – in other words, let’s hypothesize that a reanalysis existed in 1700, applying a principal component analysis to e.g. geopotential height at 500 hPa to the period 1750-1800 and repeating the same to the period 1950-2000 would yield the same or similar EOFs and in turn describe the same patterns. These two are assumptions on which your paper is based upon, which deserve attention, cannot be taken for granted and should be clearly stated before carrying out your study.
L.40 – When you discuss the limitations of station-based reconstructions you could also mention that weather types generally describe atmospheric circulation over relatively large areas, so going beyond measures from a single point. Also: have these series been detrended?
L.63 – You wrote that you use CAP7 for the study but then at the end of the section you write that you reconstruct WTs extending the CAP9. Please clarify.
L.65 – “It does not suffer from subjective WT classes”. WT classifications suffer from subjectivity because the choice of the number of classes is subjective unless there is a metric that helps choosing that number (e.g. BIC, Bayesian Inference Criterion, in Falkena et al. 2020).
L.76-78 – I don’t understand the purpose of these lines and perhaps it could be introduced if the authors (or other studies) had assessed the added value of wind direction on periods where this type of record is available.
L.81-82 – Why base the study on two classifications CAP7 and CAP9? And not one of the two?
L.103 – How are the WT classified into advective and convective? Please explain.
L.104 – I understand that the WT are computed all year round implying that the larger amplitude of the variations of the atmospheric variables in the winter will potentially bias the WT towards winter patterns. Is there some sort of normalization of this amplitude throughout the year? Please clarify.
L.107 – If some of the WT are hard to distinguish from one another as you write, why didn’t you use CAP7 also for training your machine learning models?
L.191-194 – “Increasing number of covariates can lead to overfitting of the model”, I guess this characteristic is valid not only for this method. Also, could you clarify the choice of 4 as threshold for the VIF?
L.221-222- This characteristic is crucial: “As circulation patterns can persist several days”, as WT must persist a few days on average. The average persistence in days of each CAP7 (orCAP9) should be added to the manuscript along with a contingency table with weather patterns in rows and columns with shares (or counts) of transitions (e.g., see table 1 / 2 in Robertson et al. 2020). I wonder if transitions and preferential paths of transitions among WTs should be fed to the different machine learning approaches. Please comment.
L.243-245 – As noted above, choice of CAP7 over CAP9. To make things comparable with Schwander et al. 2017 things are adjusted between the two classifications (cap7, cap9) in a way that it seems like a single one would have been more convenient.
L.267-273 – In light of the error in the model set up found in Schwander et al. 2017 I wonder if it is worth following their footsteps so closely. I acknowledge the importance of having a reference study to compare to, but perhaps the authors could have been more brave in overcoming that study.
L.296 – Table 2 shows that, of the four methods proposed, it seems like NN and RNN outperform the other methods, with RF always behind regardless of the number of stations.
L.303-305 – I think this statement is not sufficiently supported. Are there works that have carried out similar analysis with statistical approaches that do not involve machine learning?
L.306-307 – Well, as noted above, this does not come as a surprise if no normalization of the atmospheric field was carried out prior to the computation of the regimes (the standard deviation during the year varies considerably with very low in the summer compared to the winter months e.g. figure 1d in Lee et al. 2023). This aspect is important and should be clarified.
L.309 – In light of the drops in accuracy in the summer this statement is perhaps optimistic and limited to the comparison to Schwander et al. 2017. “Our models are better capable of coping with seasonal differences”.
L.350 – increased accuracy in fall and winter, otherwise for summer months – another clue in the direction of lacking summer information (normalization of atmospheric field)? Or the fact that wet days are more frequent in fall and winter as opposed to summer months (regardless of the type of precipitation – large scale in winter vs. convective in summer as noted at L. 355)?
L.367 – I struggle with the term “accuracy” which, in e.g. operational forecasts, relies on the evaluation of simulated vs. actual variable values. In this case the actual values cannot be used as they can only be reconstructed. Therefore, is it appropriate to use this term? I suggest the authors clarify this point at the beginning of the paper, either in the Introduction or in the Data and Methods sections.
L.370 – Summer months lower accuracies AND L. 399 – False WT predictions in summer seems to originate from other sources related to year-round WT classification?
L.402 – “Weather types might change over the course of one day”. Are you sure this characteristic is relevant in errors assigning WTs? Aren’t WTs on average lasting 2+ days?
L.408 – Please clarify what you mean by “transient WTs”.
L.415 – It is of great value that the NN attributes a probability of occurrence to all WTs, and I think this feature should be discussed further in the assessment of the good/bad WT daily classifications. I would expect that the WTs with highest probability isn’t always with values of 0.8 or above and that days in which probabilities are more evenly distributed among the 9 classes exist. E.g. WT1 0.1, WT2 0.1, WT3 0.1, WT4 0.1, WT5 0.1, WT6 0.1, WT7 0.1, WT8 0.14, WT9 0.16, In this case what is the chosen WT, the WT9? One could argue that “no regime” class would be a more suitable choice. Have you counted how many times the probability of the winning WT is not crystal clear (probability much larger than the remaining WTs)?
L.431 /Figure 7 – Biases are visibly low for WT 8 and WT 9, do the authors have an explanation for this? From Figure 9 it seems that these two occur very little in the summer.
L.450 – On the absence of artificial discontinuities: it makes no sense to comment on discontinuities using the eye over a plot with smoothed lines (10yrs running mean). Why don’t the authors apply a statistical test for discontinuities/change-points on the non-smoothed series?
L.457 – “artificial trends can be dectected” – have you found significant trends through the application of a statistical test? It would be interesting to know if/which WTs have become more or less frequent throughout the period of analysis and if WTs occurrences have shifted in season.
L.470 – use either “thus” or “indeed”.
L.474-475 – I found no description of the detection method for trends and discontinuities in the manuscript.
L.484 – “WTs with low occurrence and strong seasonality can pose a challenge for reconstructing WTs”, this is why I wonder why CAP9 was preferred over CAP7 (fewer WTs).
L.488-490 – “Transient WTs make the distinction on a daily resolution difficult,… issue might be solved with the use of subdaily data”. I consider this option inadequate for the very nature of reconstructing WTs back to 1700s, it is already a miracle if you get a daily value, imagine subdaily, utter wishful thinking! Also, as far as transient WTs are there and may hinder daily classification, the degree to which the knowledge of sub-daily WTs would help such classification is far from demonstrated. WTs are, by design, approximation of reality at at daily time scale, it is to be expected that in some days a good match with the archetypal WT is lacking, it’s part of the game.
References:
Falkena SK, de Wiljes J, Weisheimer A, Shepherd TG. Revisiting the identification of wintertime atmospheric circulation regimes in the Euro-Atlantic sector. Q J R Meteorol Soc. 2020; 146: 2801–2814. https://doi.org/10.1002/qj.3818
Robertson, A. W., N. Vigaud, J. Yuan, and M. K. Tippett, 2020: Toward Identifying Subseasonal Forecasts of Opportunity Using North American Weather Regimes. Mon. Wea. Rev., 148, 1861–1875, https://doi.org/10.1175/MWR-D-19-0285.1.
Lee, S. H., M. K. Tippett, and L. M. Polvani, 2023: A New Year-Round Weather Regime Classification for North America. J. Climate, 36, 7091–7108, https://doi.org/10.1175/JCLI-D-23-0214.1.
Citation: https://doi.org/10.5194/egusphere-2024-1346-RC1 -
RC2: 'Comment on egusphere-2024-1346', Anonymous Referee #2, 07 Aug 2024
Review of „Weather Type Reconstruction using Machine Learning Approaches”
General comments:
This study uses machine learning methods to reconstruct the CAP9 weather type classification for Europe back to the year 1728 based on station observations. Four different machine learning methods are tested (multinomial logistic regression, random forest, feedforward neural network, RNN/CNN) and compared to a reconstruction method based on Mahalanobis distance and to the original CAP9 time series published by MeteoSwiss for the reference period 1957-2020.
I find this study to be interesting, well structured and well written, and with high scientific quality of the methods and results presented.
My main objective is that the scientific relevance should be better emphasized. The authors should better explain why a weather type classification based on station observations is beneficial, especially in light of the gridded EKF400v2 reanalysis product, which goes back to the year 1602.
Specific questions
Line 47: “Whereas common statistical approaches seem to have reached their limit for this purpose, (…)”. Why have they reached their limit? Please explain this better.
Figure 1 (Right): What is the unit of the average monthly occurrence? Days or counts?
Table 1: Please explain what exactly the temporal pressure gradient is and how it is derived from the historical station observations.
Line 32ff./Line 154: I was a bit surprised to learn about the EKF400v2 reanalysis product in Line 154, which covers the period 1603-2003 and was not mentioned during the Introduction. What is the point of deriving weather type classification from station observations if a gridded reanalysis product is available for the earliest period of your observations and even before? This is in direct contradiction to the statements in line 32ff. and thus to the motivation for this paper: “With the newest generation of reanalysis datasets, many WT records could already be extended back to the 19th century (…).” and “(…) the limit for WT classifications based on atmospheric fields is set by the 20th Century Reanalysis version 3 (…), which extends back to 1806”. Please correct these statements in the Introduction and revise the motivation for a classification based on station observations in light of the available EKF400v2 reanalysis going back to 1603.
Line 193: Which variables are the five predictors?
Chapter 2.3.3 and 2.3.4: What is the structure of the input layer? The Appendix says 6,8,9 x None (Time). Are these the number of stations used? What variables are used? In general, I miss a better description of the input variables of the machine learning methods. Are temperature and pressure time series used at all stations? What about the temporal pressure gradient? Please specify the structure of your input layers.
Chapter 2.3.3 and 2.3.4: Is the lat/lon information of the stations used as input as well? Does the machine learning model have any information on the position of the time series? If not, please discuss this.
Line 241f. What are “(…) all available pressure and temperature series”? Please specify.
Line 278: What is the advantage of the Heidke skill score? How can it be interpreted compared to overall accuracy?
Line 353: “(…) which are mostly within the range of uncertainty of model training.” How do you quantify the range of uncertainty of model training to reach this conclusion?
Line 368f.: “The accuracy for the earliest period between 01.01.1728 and 31.12.1737 is already remarkably high with a value of 77.8 % despite the limited set of available stations.” This sentence is misleading, because it suggests that you know the accuracy of your model for the earliest period. But you can't estimate the accuracy of the early period, because you don't have labels for that time to which you could compare your classifications to. If I got it right, the 77.8% indicate the accuracy of your trained model for a test set from the period 1957-2020 compared to the MeteoSwiss time series, whereby your model uses the number of stations only that are available since 1728. But your actual accuracy in the early period could be lower than that due to lower data quality in the early period e.g. measurement errors. Please refine the statement and discuss the data quality within your time series.
Figure 5: The plots are quite small and hard to compare by eye. It could help to increase the size and/or to show the differences of the false composites and the true composites to obs composites in order to better show the differences in the pressure fields. I’m also wondering how many cases each composite plot is derived from. The numbers could be indicated above the plots.
Discussion: I miss a discussion on why including the previous days in the RNN/CNN setup didn’t help to improve the accuracy of the weather type classification. Is this in line with what the authors expected? What could be the reasons for this?
Supplement Table S2.1: Please explain the variable names.
Citation: https://doi.org/10.5194/egusphere-2024-1346-RC2 -
RC3: 'Comment on egusphere-2024-1346', Anonymous Referee #3, 09 Aug 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1346/egusphere-2024-1346-RC3-supplement.pdf
Data sets
Weather Type Reconstruction using Machine Learning Approaches Lucas Pfister https://boris.unibe.ch/id/eprint/195666
Model code and software
Weather Type Reconstruction using Machine Learning Approaches Lucas Pfister https://boris.unibe.ch/id/eprint/195666
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
352 | 137 | 113 | 602 | 38 | 14 | 15 |
- HTML: 352
- PDF: 137
- XML: 113
- Total: 602
- Supplement: 38
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1