the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Combining Recurrent Neural Networks with Variational Mode Decomposition and Multifractals to Predict Rainfall Time Series
Abstract. Rainfall time series prediction is essential for monitoring urban hydrological systems, but it is challenging and complex due to the extreme variability of rainfall. A hybrid deep learning model (VMD-RNN) is used in order to improve prediction performance. In this study, variational mode decomposition (VMD) is first applied to decompose the original rainfall time series into several sub-sequences according to the frequency domain, where the number of decomposed sub-sequences is determined by power spectral density (PSD) analysis. To prevent the disclosure of forthcoming data, non-training time series are sequentially appended for generating the decomposed testing samples. Following that, different recurrent neural network (RNN) variant models are used to predict individual sub-sequences and the final prediction is reconstructed by summing the prediction results of sub-sequences. These RNN-variants are long short-term memory (LSTM), gated recurrent unit (GRU), bidirectional LSTM (BiLSTM) and bidirectional GRU (BiGRU), which are optimal for sequence prediction. In addition to three common evaluation criteria, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), the framework of universal multifractals (UM) is also introduced to assess the performance of predictions, which enables the extreme variability of predicted rainfall time series to be characterized. The study employs two rainfall time series with daily and hourly resolutions, respectively. The results indicate that the hybrid VMD-RNN model provides a reliable one-step-ahead prediction, with better performance in predicting high and low values than the pure LSTM model without decomposition.
- Preprint
(3354 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2023-2710', Anonymous Referee #1, 26 Feb 2024
The authors have used a hybrid deep-learning model to attempt to predict rainfall. While the paper is scientifically sound and quite in-depth, I can't see it as a good article for HESS because the focus is just so strongly on the Deep-Learning infrastructure. This is already evident in the Introduction which is clearly written with an audience in mind that is up-to-date with the terminology and typical issues that come with deep-learning models, whereas there is very little attention for the real-world practical problems this model is trying to solve (the abstract mentions urban runoff issues which are never mentioned anywhere in the main body, for instance). This would be fine for a journal that focuses on that particular research area, but the typical HESS reader (or at the very least, myself) will be completely lost in the methodology section. It's way too detailed in explaining the core mathematical concepts behind the model (once again, scientifcially absolutely good work, but not for a hydrology-focused journal), whereas the section discussing the used dataset for validation purposes (section 3.1) is barely 10 lines long and doesn't contain any information about the type of data collected (is it radar, tipping bucket, time-integrated, point measurements, etc etc).
That aside, the outcomes of the study are also a bit disappointing from a practical point of view. The authors acknowledge that their chosen study area has a fairly typical rain pattern, which makes me wonder whether this means such a model can't be applied anywhere else without specifically training it for that area - which would defeat the purpose of using a model, in my opinion. Secondly, and perhaps most importantly: the authors conclude that with the used lead time (1 time step) the applicability of the model is severely limited for prediction purposes, nor can it handle the stochastic nature of rainfall variability all too well. A conclusion on my end would be then that it's not any better than just interpolating observational data...
I can only recommend to reject this paper for HESS. While I can't judge the details of the used methodology (not being an expert in the field of Deep Learning Models myself), I do believe this is a sound scientific paper and definitely interesting, but just submitted to the wrong journal. A journal which focuses on developing models such as this would be a much better fit, where the validation and application of this model in a real setting is not as important.
Citation: https://doi.org/10.5194/egusphere-2023-2710-RC1 -
AC1: 'Reply on RC1', Hai Zhou, 11 Mar 2024
Thank you for your thoughtful analysis of our manuscript. While we understand the concerns that you have raised, we believe there are compelling reasons to reconsider the suitability of the article for HESS.
- We acknowledge your concern about the manuscript’s focus on deep-learning models. In fact, deep learning is increasingly relevant in hydrological research. By doing so, we aim to bridge the gap for readers who are not familiar with deep-learning models. In the revised manuscript, we will explicitly state the pedagogical purpose of the future work in the Introduction section. We believe that by framing the deep-learning approach as a powerful tool for addressing hydrological challenges, the paper becomes more accessible to broader readers.
- The worry about the model's applicability beyond the chosen study area is understandable because the model only has to be trained once. In principle, a new dataset from different regions or time periods can be fed directly into the well-trained model without repeating the training process to obtain the prediction on the new dataset. By the way, the used dataset in this study is sourced from The POWER Project. According to the project, the precipitation data is derived from a combination of observations, satellite data, and atmospheric model simulations. We will explore transferability and potential adaptations for different datasets in Section 4 of the result analysis.
- The confusion that our work can be considered as similar to interpolation is duly noted. We will also introduce the traditional interpolation method as one of benchmark methods in the Section 4, explaining how our methodology differs from traditional interpolation methods.
- We also understand the opinion about the lack of attention to real-world practical problems. Because the study presented in this manuscript serves as a kind of pedagogical example, acting as a starting point for further research that can extend to realistic forecast and space-time nowcasting. As we described in the future works, multi-step-ahead rainfall prediction is currently under investigation, and the model combined multifractals with deep learning is being developed to analyze the variability of stochastic rainfall time series.
In light of these clarifications, we believe that the pedagogical nature of our work can contribute significantly to the hydrology community by providing a deeper understanding of the application of deep-learning models and multifractals technique in rainfall prediction. We trust that these revisions will enhance the manuscript's suitability for HESS.
Citation: https://doi.org/10.5194/egusphere-2023-2710-AC1
-
AC1: 'Reply on RC1', Hai Zhou, 11 Mar 2024
-
AC1: 'Reply on RC1', Hai Zhou, 11 Mar 2024
Thank you for your thoughtful analysis of our manuscript. While we understand the concerns that you have raised, we believe there are compelling reasons to reconsider the suitability of the article for HESS.
- We acknowledge your concern about the manuscript’s focus on deep-learning models. In fact, deep learning is increasingly relevant in hydrological research. By doing so, we aim to bridge the gap for readers who are not familiar with deep-learning models. In the revised manuscript, we will explicitly state the pedagogical purpose of the future work in the Introduction section. We believe that by framing the deep-learning approach as a powerful tool for addressing hydrological challenges, the paper becomes more accessible to broader readers.
- The worry about the model's applicability beyond the chosen study area is understandable because the model only has to be trained once. In principle, a new dataset from different regions or time periods can be fed directly into the well-trained model without repeating the training process to obtain the prediction on the new dataset. By the way, the used dataset in this study is sourced from The POWER Project. According to the project, the precipitation data is derived from a combination of observations, satellite data, and atmospheric model simulations. We will explore transferability and potential adaptations for different datasets in Section 4 of the result analysis.
- The confusion that our work can be considered as similar to interpolation is duly noted. We will also introduce the traditional interpolation method as one of benchmark methods in the Section 4, explaining how our methodology differs from traditional interpolation methods.
- We also understand the opinion about the lack of attention to real-world practical problems. Because the study presented in this manuscript serves as a kind of pedagogical example, acting as a starting point for further research that can extend to realistic forecast and space-time nowcasting. As we described in the future works, multi-step-ahead rainfall prediction is currently under investigation, and the model combined multifractals with deep learning is being developed to analyze the variability of stochastic rainfall time series.
In light of these clarifications, we believe that the pedagogical nature of our work can contribute significantly to the hydrology community by providing a deeper understanding of the application of deep-learning models and multifractals technique in rainfall prediction. We trust that these revisions will enhance the manuscript's suitability for HESS.
Citation: https://doi.org/10.5194/egusphere-2023-2710-AC1 -
RC2: 'Comment on egusphere-2023-2710', Anonymous Referee #2, 06 Apr 2024
- Lines 47 and 48: “However, these pure variant models are…preprocessing. ” Please consider citing some previous studies to support your statement.
- The second to last and third to last paragraphs in the Introduction section should be in the Method section. They went into details about either a model or an evaluation index, rather than focusing on the context and motivation of this study.
- I suggest that the authors elaborate on their motivation and clarify the contribution of this work. Based on the current introduction, the model is not new, and the dataset is not new. It’s okay if this work is focused on applying a method to a dataset and this application has not been documented in previous research. But you will need to justify your decision with appropriate citations. For instance, why is that application important? It could be because of limitations from previous approaches or the good performance of some new approaches, and so on. You just need to justify this work by elaborating why it is important.
- Maybe I missed something, but why do you need steps 3 and 4? I suggest that the authors explain why they want to generate sub-sequences on combined sequences with both training and non-training sequences and then clip to get the non-training ones instead of generating the non-training sub-sequences using directly the non-training original sequences.
- Sub-section 3.3 open sources. The title of this sub-section is weird to me. Maybe consider using titles like Model Settings and Implementation.
- Result analysis. Since for each testing sub-sequence several RNN models were used and only the best result was kept for result aggregation, it will be really helpful to add a summary table showing the result of each RNN model on each sub-sequence. This will not only allow readers to understand how the eventual result was aggregated but will also bring insights into which model is the best, and so on.
- I feel that the Result section is not very well elaborated. So far there are only results but no discussion, which damaged the value of this study. How will readers benefit from reading this paper? To me what’s more important is the insights behind specific results. For instance, why are some models better than others? In what circumstances? What insights can I gain regarding model selection and tuning after reading this work? etc. I suggest the authors add more in-depth discussions (please also refer to my 6th comment) to improve the quality of this section.
Citation: https://doi.org/10.5194/egusphere-2023-2710-RC2 -
AC2: 'Reply on RC2', Hai Zhou, 01 May 2024
Thank you for your careful analysis of our manuscript and your detailed feedback. In response to your precise suggestions and comments, the corresponding response is provided as follows.
1. Lines 47 and 48: “However, these pure variant models are…preprocessing. ” Please consider citing some previous studies to support your statement.
We agree on the importance of backing up our statements by citing previous studies and emphasise the originality of our contributions since they aim to overcome current limitations by proposing an original combination of ML with Variational Mode Decomposition and multifractals. The preceding paragraph of this sentence (Line 47-48) has discussed these pure variant models. However in order to make it clear, we will cite these previous studies again.
2. The second to last and third to last paragraphs in the Introduction section should be in the Method section. They went into details about either a model or an evaluation index, rather than focusing on the context and motivation of this study.
The suggestion regarding the placement of two paragraphs, currently in the Introduction, is duly noted. They were positioned in the Introduction because these two paragraphs describe how our work differs from others’ studies and clarify the contribution of our work. We will therefore split their content between the Introduction (more on motivations) and the Methods (more on the details) sections.
3. I suggest that the authors elaborate on their motivation and clarify the contribution of this work. Based on the current introduction, the model is not new, and the dataset is not new. It’s okay if this work is focused on applying a method to a dataset and this application has not been documented in previous research. But you will need to justify your decision with appropriate citations. For instance, why is that application important? It could be because of limitations from previous approaches or the good performance of some new approaches, and so on. You just need to justify this work by elaborating why it is important.
We appreciate your suggestion to elaborate more on the motivation and contribution of our work. As replied to the second comment, we will improve the introduction section by providing a more comprehensive description to emphasise the importance of our work, although we have explained our work is different and meaningful in the Section 3.1. We will clarify again in Introduction that it is mainly due to overcoming limitations of current applications of ML to rain forecasting by combining ML with Variational Mode Decomposition and Multifractals.
4. Maybe I missed something, but why do you need steps 3 and 4? I suggest that the authors explain why they want to generate sub-sequences on combined sequences with both training and nontraining sequences and then clip to get the non-training ones instead of generating the non-training sub-sequences using directly the non-training original sequences.
We thank you for bringing up this point. Steps 3 and 4 are included in our method due to the fact that directly decomposing the non-training original sequences will result in the leakage of future data from the testing set. Because rainfall time series is observed daily or hourly, the decomposition process is repeated with daily or hourly rainfall data of the next step appended. This approach can mitigate the risk of exposing future data during the decomposition of non-training time series. We will add part of this discussion to clarify the methodology.
5. Sub-section 3.3 open sources. The title of this sub-section is weird to me. Maybe consider using titles like Model Settings and Implementation.
The comment about the title of subsection 3.3 is taken into account. The subsection primarily introduces the open-source software used in this study. The suggested title ‘Model Settings and Implementation’ seems to be in the good direction, despite we do not implement a model in the classical sense, but set together different open-access softwares.
6. Result analysis. Since for each testing sub-sequence several RNN models were used and only the best result was kept for result aggregation, it will be really helpful to add a summary table showing the result of each RNN model on each sub-sequence. This will not only allow readers to understand how the eventual result was aggregated but will also bring insights into which model is the best, and so on.
It’s totally agreed that your suggestion to add a summary table showing the results of each RNN model on each sub-sequence. We will include a summary table as you suggested to improve the clarity and interpretability of our results.
7. I feel that the Result section is not very well elaborated. So far there are only results but no discussion, which damaged the value of this study. How will readers benefit from reading this paper? To me what’s more important is the insights behind specific results. For instance, why are some models better than others? In what circumstances? What insights can I gain regarding model selection and tuning after reading this work? etc. I suggest the authors add more in-depth discussions (please also refer to my 6th comment) to improve the quality of this section.
We also agree with your feedback on the Result section. We will therefore strive to provide in-depth discussions to explain the significance of our model and the contribution of our work in the field of hydrology.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
452 | 197 | 38 | 687 | 23 | 24 |
- HTML: 452
- PDF: 197
- XML: 38
- Total: 687
- BibTeX: 23
- EndNote: 24
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1