the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Robust weather-adaptive postprocessing using MOS random forests
Abstract. Physical numerical weather prediction models have biases and miscalibrations that can depend on the weather situation, which makes it difficult to postprocess them effectively using the traditional model output statistics (MOS) framework based on parametric regression models. Consequently, much recent work has focused on using flexible machine learning methods that are able to take additional weather-related predictors into account during postprocessing, beyond the forecast of the variable of interest only. Some of these methods have achieved impressive results, but they typically require significantly more training data than traditional MOS and are less straightforward to implement and interpret.
We propose MOS random forests, a new postprocessing method that avoids these problems by fusing traditional MOS with a powerful ML method called random forests to estimate "weather-adapted" MOS coefficients from a set of predictors. Since the assumed parametric base model contains valuable prior knowledge, much smaller training data sizes are required to obtain skillful forecasts and model results are easy to interpret. MOS forests are straightforward to implement and typically work well, even with no or very little hyperparameter tuning. For the difficult task of postprocessing daily precipitation sums in complex terrain, MOS forests outperform reference machine learning methods at most of the stations considered. Additionally, they are highly robust to changes in the data size and work well even when less than a hundred observations are available for training.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1077 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1077 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1021', Anonymous Referee #1, 20 Jun 2023
Review: Robust weather-adaptive postprocessing using MOS random forest by Muschinski et al.
In this paper, a new postprocessing approach for the correction of forecasts’ systematic errors and a quantification of their uncertainty are presented. The method is based on the use of random forests. The method also uses a local regression approach to adjust the parameters that describe the dependence of the conditional probability density function of the observations on the forecast. The method is validated using daily precipitation data and forecasts from the GFSE.
The paper is well written, the methodology is, in general, well described, and the results are evaluated in a statistically robust way. Below, I indicate some suggestions or comments pointing out some places in which the discussion can be clarified or in which more information should be provided.
Specific comments
L40: This statement is unclear. In the case of random forests, the resulting fit would be smoothed out by the ensemble, so the steps would not be so obvious in the output.
Section 2.2. Step 2. In this section, an independence test is used to identify possible dependencies between predictors and the model parameters \theta. There are different variables and parameters. Could the authors elaborate more on how the split-variable is selected? Is it the one with the lowest p-value with respect to any of the parameters? Which is the independence test used in this implementation?
Section 2.2, Step 3: Which are the stopping criteria for the growth of the tree? What is the minimum sample size at a leaf node in the experiments reported in this paper (particularly in those experiments where a relatively small dataset is used)?
Equations 6 and 7: To my understanding, these equations describe a type of local regression in which distance is measured based on the number of times two given data points belong to the same category in the different trees of a given forest. So the distance is specific to the problem at stake. This characteristic distinguishes this method from other methods that use random forests for postprocessing in the sense that \theta is not directly given by the forest, but the forest provides a way to detect predictors that are close to the current predictors, and based on these neighbor predictors, a new set of parameters can be obtained (by retraining the model using only these weighted neighbors). Based on this, I wonder:
- What would be the performance of the proposed technique if the parameters \theta provided by the forest were used directly for the postprocessing of the forecast (i.e., what is the impact of the neighbor approach on the performance of the method)?
- What would be the performance of the method if the distance metric were replaced by the classical Euclidean norm (like in the classical nearest neighbors approach)?
- What is the variability of the weights? Particularly in the small training sample cases. If the weight variance is not too high in the small training sample scenarios, then this may help to increase the robustness of the method because model parameters would be trained with a relatively larger sample than in the other methods. Is there a way in which this variability can be controlled and eventually tuned as a hyperparameter to maximize the performance of the method?
Table 1: Could the authors elaborate more on why ttpow_mean is excluded from the splitting variable list? It is not clear to me why that should be the case. Also, in the results section, the variable associated with the root split is the total column liquid condensate, which I assume is closely related to the precipitation rate (so the system is indirectly trying to use ttpow as a splitting variable)
Table 1: Could the authors provide here or in the text some details about the configuration of the other methods? Since overfitting is a major concern when dealing with trees and forests, indicating the tree growth stopping criteria (or any other pruning approach) would be relevant for the comparison.
Figure 2: This figure is very interesting. However, I could not find cases in which precipitation occurred without being forecast (or maybe there is only one case in node 13). Is this because of the selected nodes, or is this a general property of the dataset?
4,1 The names given to the different predictors are not clear. For example, what does pwat_mean_max mean? I assume the mean is from the ensemble mean, but I cannot interpret the max. This also applies to other names: t500_sprd_min, tppow_sprd1824.
Regarding tppow_sprd1824, later in the text or in a figure caption, it is said that it corresponds to the spread over the 18–24 hour lead time period. Why did the authors choose this period to characterize the ensemble spread?
L300 “if the variable observed is not a direct output of the NWP model”. This is unclear. Why can’t physical quantities other than the ones observed be used to model the conditional probability distribution parameters?
L141 \gamma_0 is introduced here, but it has not been defined before ( \sigma is used instead in the previous discussion).
Equation 7: The meaning of the denominator is not clear.
Equation 8: Please clarify the meaning of \phi and \Phi.
L225 rates are
Figure 3: Please clarify the meaning of the titles of the panels (“Location” and “Scale”).
In L268 and Caption Fig. 5, CRPS is used instead of CRPSS
Citation: https://doi.org/10.5194/egusphere-2023-1021-RC1 - AC1: 'Reply on RC1', Thomas Muschinski, 06 Sep 2023
-
RC2: 'Comment on egusphere-2023-1021', Anonymous Referee #2, 26 Jun 2023
Review of "Robust weather-adaptive postprocessing using MOS random forests"
by Thomas Muschinski, Georg J. Mayr, Achim Zeileis, and Thorsten SimonGeneral comments
This manuscript introduces a new type of postprocessing of numerical weather forecast using MOS random forests. It clearly describes the methodology, highlighting both the advantages and limitations. While the structure of the paper is closed to Schlosser et al. (2019), I consider the manuscript contains enough new results to be published in Nonlinear Processes in Geophysics. The manuscript can be considered as an update of Schlosser et al. (2019). I therefore recommend publication after minor revisions. Please find my specific comments and technical corrections below.
Specific comments
Line 23: What is the meaning of "homogeneous" here? Please elaborate.
Line 37: Please cite some references of random forests used to perform ML-based postprocessing.
Lines 107-108: What is the source of the citation, if it's a citation?
Line 179: It should be added that July 19, 2011 is missing for all the 95 stations in package RainTyrol (version: 0.2-0, date: 2020-01-13). This might not affect the results, but it is important to mention about it.
Lines 239: Please summarize the physical meaning of this, i.e., the mechanism, rather than only listing the variables. Or is it due to GEFS to generate more days with small amount of rainfall than observed? (which seems to be suggested by the authors.)
Lines 264-267 and Figure 5: This is an interesting result. Particularly compared to Fig. 8 of Schlosser et al. (2019), which shows a less organized spatial distribution of best postprocessing. But the question of why the NE-SW distribution is not really discussed in the main text. Is it solely due to the topography? From the figure, it seems the terrain is lower to the NE and higher to the SW. Also, as the MOS random forest is weather adaptative, could this spatial distribution be linked to the main mode of variability of weather in July? (either in the real world or in the GEFS world). It would be interesting to discuss this possibility in the manuscript.
Line 288: "new stations (or measurement instruments) are installed all the time". Is "new" here equivalent to "additional" or to "in replacement of"? If it is additional, is it to document highler altitudes in the case of complex terrain, knowing the costs of installation and maintenance? Is it true at the globe scale? I am not sure this assertion is necessary (and accurate) here. On the other hands, if this correct, it might in fact introduce more biases in dataset (instrument drift, error in transcription, system failure...). Please elaborate.
Line 289: How would you interpret this citation in the context of this study? Does it suggest that because the postprocessing is weather adaptive, it is constrained by the model's world weather?
This study only focuses on July. It would be interested to see the robustness on this approach in other seasons, because the regional influence (main modes of variability) vary along the year. And could be linked to the comment on Fig. 5, i.e., if we see a change in the spatial distribution of best postprocessing.
From this, another question would be: Is it possible to objectively define the most appropriate postprocessing? From the main mode of weather variability? Please discuss the possible directions.
Technical correctionsThroughout the manuscript, the author sometimes use the term "MOS forests" (for example l. 12 or l. 55) and sometimes the term "MOS random forests" (l. 43 for example). Please review the whole manuscript to homogeneize the terms (and maybe use an acronym).
Line 8: "ML" is not previously defined.
Line 95: Maybe write "[...] the postprocessing literarure is the nonhomogeneous Gaussian...".
Line 107: the sentence looks incomplete ("[...] a single MOS tree partitions the predictor space...").
Line 180: I would a short sentence to tell how this number is defined, i.e., "median of all estimated power coefficient (Stauffer et al. 2017a)".
Line 184: It would be easier for the readers to mention that the authors are specifically referring to Table 1 of Schlosser et al. (2019).
Line 221: Please indicate where is the station of Axams located in Figure 5 (preferred). Or at least refer to Fig. 8 of Schlosser et al. (2019).
Line 270: It would be better to define "PIT" and "PIT histograms" here.
Line 283: "very little data". Do the authors mean small sample size?
Figure 2: A "l" is missing in "Dashed and solid lines..."
Figure 3: What is the meaning of "Location" and "Scale"?
Figure 5: Shouldn't it be "CRPSS"? Also, the background is not described in the caption. And the size of the circles should be included in the legend.
Citation: https://doi.org/10.5194/egusphere-2023-1021-RC2 - AC2: 'Reply on RC2', Thomas Muschinski, 06 Sep 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1021', Anonymous Referee #1, 20 Jun 2023
Review: Robust weather-adaptive postprocessing using MOS random forest by Muschinski et al.
In this paper, a new postprocessing approach for the correction of forecasts’ systematic errors and a quantification of their uncertainty are presented. The method is based on the use of random forests. The method also uses a local regression approach to adjust the parameters that describe the dependence of the conditional probability density function of the observations on the forecast. The method is validated using daily precipitation data and forecasts from the GFSE.
The paper is well written, the methodology is, in general, well described, and the results are evaluated in a statistically robust way. Below, I indicate some suggestions or comments pointing out some places in which the discussion can be clarified or in which more information should be provided.
Specific comments
L40: This statement is unclear. In the case of random forests, the resulting fit would be smoothed out by the ensemble, so the steps would not be so obvious in the output.
Section 2.2. Step 2. In this section, an independence test is used to identify possible dependencies between predictors and the model parameters \theta. There are different variables and parameters. Could the authors elaborate more on how the split-variable is selected? Is it the one with the lowest p-value with respect to any of the parameters? Which is the independence test used in this implementation?
Section 2.2, Step 3: Which are the stopping criteria for the growth of the tree? What is the minimum sample size at a leaf node in the experiments reported in this paper (particularly in those experiments where a relatively small dataset is used)?
Equations 6 and 7: To my understanding, these equations describe a type of local regression in which distance is measured based on the number of times two given data points belong to the same category in the different trees of a given forest. So the distance is specific to the problem at stake. This characteristic distinguishes this method from other methods that use random forests for postprocessing in the sense that \theta is not directly given by the forest, but the forest provides a way to detect predictors that are close to the current predictors, and based on these neighbor predictors, a new set of parameters can be obtained (by retraining the model using only these weighted neighbors). Based on this, I wonder:
- What would be the performance of the proposed technique if the parameters \theta provided by the forest were used directly for the postprocessing of the forecast (i.e., what is the impact of the neighbor approach on the performance of the method)?
- What would be the performance of the method if the distance metric were replaced by the classical Euclidean norm (like in the classical nearest neighbors approach)?
- What is the variability of the weights? Particularly in the small training sample cases. If the weight variance is not too high in the small training sample scenarios, then this may help to increase the robustness of the method because model parameters would be trained with a relatively larger sample than in the other methods. Is there a way in which this variability can be controlled and eventually tuned as a hyperparameter to maximize the performance of the method?
Table 1: Could the authors elaborate more on why ttpow_mean is excluded from the splitting variable list? It is not clear to me why that should be the case. Also, in the results section, the variable associated with the root split is the total column liquid condensate, which I assume is closely related to the precipitation rate (so the system is indirectly trying to use ttpow as a splitting variable)
Table 1: Could the authors provide here or in the text some details about the configuration of the other methods? Since overfitting is a major concern when dealing with trees and forests, indicating the tree growth stopping criteria (or any other pruning approach) would be relevant for the comparison.
Figure 2: This figure is very interesting. However, I could not find cases in which precipitation occurred without being forecast (or maybe there is only one case in node 13). Is this because of the selected nodes, or is this a general property of the dataset?
4,1 The names given to the different predictors are not clear. For example, what does pwat_mean_max mean? I assume the mean is from the ensemble mean, but I cannot interpret the max. This also applies to other names: t500_sprd_min, tppow_sprd1824.
Regarding tppow_sprd1824, later in the text or in a figure caption, it is said that it corresponds to the spread over the 18–24 hour lead time period. Why did the authors choose this period to characterize the ensemble spread?
L300 “if the variable observed is not a direct output of the NWP model”. This is unclear. Why can’t physical quantities other than the ones observed be used to model the conditional probability distribution parameters?
L141 \gamma_0 is introduced here, but it has not been defined before ( \sigma is used instead in the previous discussion).
Equation 7: The meaning of the denominator is not clear.
Equation 8: Please clarify the meaning of \phi and \Phi.
L225 rates are
Figure 3: Please clarify the meaning of the titles of the panels (“Location” and “Scale”).
In L268 and Caption Fig. 5, CRPS is used instead of CRPSS
Citation: https://doi.org/10.5194/egusphere-2023-1021-RC1 - AC1: 'Reply on RC1', Thomas Muschinski, 06 Sep 2023
-
RC2: 'Comment on egusphere-2023-1021', Anonymous Referee #2, 26 Jun 2023
Review of "Robust weather-adaptive postprocessing using MOS random forests"
by Thomas Muschinski, Georg J. Mayr, Achim Zeileis, and Thorsten SimonGeneral comments
This manuscript introduces a new type of postprocessing of numerical weather forecast using MOS random forests. It clearly describes the methodology, highlighting both the advantages and limitations. While the structure of the paper is closed to Schlosser et al. (2019), I consider the manuscript contains enough new results to be published in Nonlinear Processes in Geophysics. The manuscript can be considered as an update of Schlosser et al. (2019). I therefore recommend publication after minor revisions. Please find my specific comments and technical corrections below.
Specific comments
Line 23: What is the meaning of "homogeneous" here? Please elaborate.
Line 37: Please cite some references of random forests used to perform ML-based postprocessing.
Lines 107-108: What is the source of the citation, if it's a citation?
Line 179: It should be added that July 19, 2011 is missing for all the 95 stations in package RainTyrol (version: 0.2-0, date: 2020-01-13). This might not affect the results, but it is important to mention about it.
Lines 239: Please summarize the physical meaning of this, i.e., the mechanism, rather than only listing the variables. Or is it due to GEFS to generate more days with small amount of rainfall than observed? (which seems to be suggested by the authors.)
Lines 264-267 and Figure 5: This is an interesting result. Particularly compared to Fig. 8 of Schlosser et al. (2019), which shows a less organized spatial distribution of best postprocessing. But the question of why the NE-SW distribution is not really discussed in the main text. Is it solely due to the topography? From the figure, it seems the terrain is lower to the NE and higher to the SW. Also, as the MOS random forest is weather adaptative, could this spatial distribution be linked to the main mode of variability of weather in July? (either in the real world or in the GEFS world). It would be interesting to discuss this possibility in the manuscript.
Line 288: "new stations (or measurement instruments) are installed all the time". Is "new" here equivalent to "additional" or to "in replacement of"? If it is additional, is it to document highler altitudes in the case of complex terrain, knowing the costs of installation and maintenance? Is it true at the globe scale? I am not sure this assertion is necessary (and accurate) here. On the other hands, if this correct, it might in fact introduce more biases in dataset (instrument drift, error in transcription, system failure...). Please elaborate.
Line 289: How would you interpret this citation in the context of this study? Does it suggest that because the postprocessing is weather adaptive, it is constrained by the model's world weather?
This study only focuses on July. It would be interested to see the robustness on this approach in other seasons, because the regional influence (main modes of variability) vary along the year. And could be linked to the comment on Fig. 5, i.e., if we see a change in the spatial distribution of best postprocessing.
From this, another question would be: Is it possible to objectively define the most appropriate postprocessing? From the main mode of weather variability? Please discuss the possible directions.
Technical correctionsThroughout the manuscript, the author sometimes use the term "MOS forests" (for example l. 12 or l. 55) and sometimes the term "MOS random forests" (l. 43 for example). Please review the whole manuscript to homogeneize the terms (and maybe use an acronym).
Line 8: "ML" is not previously defined.
Line 95: Maybe write "[...] the postprocessing literarure is the nonhomogeneous Gaussian...".
Line 107: the sentence looks incomplete ("[...] a single MOS tree partitions the predictor space...").
Line 180: I would a short sentence to tell how this number is defined, i.e., "median of all estimated power coefficient (Stauffer et al. 2017a)".
Line 184: It would be easier for the readers to mention that the authors are specifically referring to Table 1 of Schlosser et al. (2019).
Line 221: Please indicate where is the station of Axams located in Figure 5 (preferred). Or at least refer to Fig. 8 of Schlosser et al. (2019).
Line 270: It would be better to define "PIT" and "PIT histograms" here.
Line 283: "very little data". Do the authors mean small sample size?
Figure 2: A "l" is missing in "Dashed and solid lines..."
Figure 3: What is the meaning of "Location" and "Scale"?
Figure 5: Shouldn't it be "CRPSS"? Also, the background is not described in the caption. And the size of the circles should be included in the legend.
Citation: https://doi.org/10.5194/egusphere-2023-1021-RC2 - AC2: 'Reply on RC2', Thomas Muschinski, 06 Sep 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
241 | 101 | 20 | 362 | 13 | 12 |
- HTML: 241
- PDF: 101
- XML: 20
- Total: 362
- BibTeX: 13
- EndNote: 12
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Thomas Muschinski
Georg J. Mayr
Achim Zeileis
Thorsten Simon
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1077 KB) - Metadata XML