the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Forecasting the cost of drought events in France by super learning
Abstract. Drought events are the second most expensive type of natural disaster within the legal framework of the French natural disasters compensation scheme. In recent years, droughts have been remarkable in their geographical scale and intensity. We develop a new methodology to forecast the cost of a drought event in France. The methodology hinges on super learning and takes into account the complex dependence structure induced in the data by the spatial and temporal nature of drought events.
- Preprint
(712 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2022-541', Anonymous Referee #1, 30 Jul 2022
- L7-L8- the statements need to be cited.
- In L16, In (Charpentier et al., 2021), I suggest the bracket be removed.
- In Line 30- I don’t know if the authors have a reason for not capitalizing the abbreviated words in OSSASL.
- In L115, L173, 207, 295- the use of “so-called” I do want to believe it is an adjectival description of the succeeding nouns, but in the various context it has been used, it sounds more of a sarcastic use than what you intended it for in the sentence. Also, the use of so-called with OSASSL, I found inappropriate as it is still a new algorithm that has not gained ground. I suggest the authors find another appropriate use for the “so-called” term in the highlighted sentences.
- The use of brackets should be well defined, example is at the end of L191, L193, there are cases of “unopened and opened parenthesis with no closing”.
- Sections 4.1: Starting the paragraph with “In fact” is a disconnect from the previous discussion
- L177- “more details to follow.” I suggest the sentence should be enclosed in a bracket rather than the hyphen.
- L190, L239- the use of a one-sentence paragraph, I suggest should be avoided altogether.
- The sentence in L190-L191 needs to be rephrased, to give an appropriate meaning of what the authors meant there.
- Section 4.2 should be properly named. “Training of what?”
- In Figure 5, the spelling of drought, not “drougth” should be checked in the map presented.
- Instead of the description with “left-hand side map” and “right-hand side map” I suggest the authors should label the maps with “Map A” and “Map B” or in any other format suitable for easy visual comprehension by readers.
- In Table 2, I think the table should be properly formatted.
- L263-267, the sentence is too long, I suggest it be split as appropriate.
- L-290- “and have been since then” should be properly rephrased.
- In summary, I suggest
- The authors need to do more editing and use more technical language in their work.
- The sectioning of the paper needs to be improved upon. L 199 is not a befitting section name. I suggest it should be either expunged or properly rephrased. Also, regarding the sections, if the authors decide to retain the structure of the sections, then they should give proper numbering to them.
- The paper is having an inadequate coherence in the flow of thought, aside from sectioning which has been mentioned earlier, the authors should, in addition, ensure a consistent flow of thought from one paragraph to the other, and from one section to the other.
- The authors should cite relevant works in their paper for validation.
I think the paper has been able to explain the process of the super learner, but they still need to do more work in explaining how their developed algorithms was able to forecast the cost of drought events and how this algorithm should be better considered than previously used ones if there were any.
Citation: https://doi.org/10.5194/egusphere-2022-541-RC1 -
CC1: 'Reply on RC1', Antoine Chambaz, 31 Aug 2022
We thank the reviewer for their report.
- 1) L7-L8- the statements need to be cited.
We rely on the 2021 annual report produced by CCR, "Les catastrophes naturelles en France, bilan 1982-2020" (2021), which is now properly cited.
- 2) In L16, In (Charpentier et al., 2021), I suggest the bracket be removed.
The guidelines about in-text citation recommend to use parentheses in this case.
- 3) In Line 30- I don't know if the authors have a reason for not capitalizing the abbreviated words in OSSASL.
The text now reads "We call our algorithm the One-Step Ahead Sequential Super Learner (OSASSL)."
- 4) In L115, L173, 207, 295- the use of "so-called" I do want to believe it is an adjectival description of the succeeding nouns, but in the various context it has been used, it sounds more of a sarcastic use than what you intended it for in the sentence. Also, the use of so-called with OSASSL, I found inappropriate as it is still a new algorithm that has not gained ground. I suggest the authors find another appropriate use for the "so-called" term in the highlighted sentences.
Thank you for pointing out our awkward use of the adjective "so-called" (this blog post was quite interesting on the subject: https://thebettereditor.wordpress.com/2017/03/28/another-so-called-or-is-it-so-called-blog-post/). We certainly did not mean to be sarcastic. We removed all its occurrences.
- 5) The use of brackets should be well defined, example is at the end of L191, L193, there are cases of "unopened and opened parenthesis with no closing".
In the example provided, there is no parentheses mismatch. We checked the entire text and did not spot any such mismatch.
- 6) Sections 4.1: Starting the paragraph with "In fact" is a disconnect from the previous discussion
We removed the expression "In fact".
- 7) L177- "more details to follow." I suggest the sentence should be enclosed in a bracket rather than the hyphen.
Done, thank you for the piece of advice.
- 8) L190, L239- the use of a one-sentence paragraph, I suggest should be avoided altogether.
In response to comment 9 the first paragraph, reshaped, is not a one-sentence paragraph anymore. The second one-sentence paragraph has been reshaped to consist of two sentences.
- 9) The sentence in L190-L191 needs to be rephrased, to give an appropriate meaning of what the authors meant there.
The sentence is short because it uses notation introduced ealier in the text. We added a comment: "In words, at time t >= 1, the algorithm whose penalized empirical average cumulative risk is the smallest is determined and the discrete overarching Super Learner returns the output of that algorithm trained on all data till time t."
- 10) Section 4.2 should be properly named. "Training of what?"
Section 4.2 is now named "Training the discrete and continuous overarching Super Learners".
- 11) In Figure 5, the spelling of drought, not "drougth" should be checked in the map presented.
Thank you for finding this typo. It has been corrected.
- 12) Instead of the description with "left-hand side map" and "right-hand side map" I suggest the authors should label the maps with "Map A" and "Map B" or in any other format suitable for easy visual comprehension by readers.
Done, thank you for the piece of advice.
- 13) In Table 2, I think the table should be properly formatted.
We added the required top and bottom horizontal lines to both Tables 1 and 2.
- 14) L263-267, the sentence is too long, I suggest it be split as appropriate.
Thank you for the piece of advice. We split the sentences in three parts.
- 15) L-290- "and have been since then" should be properly rephrased.
We cut the original sentence in two parts.
- 16) In summary, I suggest
- 16a) The authors need to do more editing and use more technical language in their work.
Eventually, the revised manuscript will be read (again) by a native English speaker before being uploaded.
- 16b) The sectioning of the paper needs to be improved upon. L 199 is not a befitting section name. I suggest it should be either expunged or properly rephrased. Also, regarding the sections, if the authors decide to retain the structure of the sections, then they should give proper numbering to them.
The sectioning of the manuscript will be improved upon.
- 16c) The paper is having an inadequate coherence in the flow of thought, aside from sectioning which has been mentioned earlier, the authors should, in addition, ensure a consistent flow of thought from one paragraph to the other, and from one section to the other.
The manuscript will be edited with special attention to the flow of thought.
- 16d) The authors should cite relevant works in their paper for validation.
Unfortunately, the relevant literature is scarce, in part because the challenge of anticipating the cost of drought events in France is obviously of interest mostly in France, and also because the data and methodologies are very sensitive. To the best of our knowledge, (Charpentier et al., 2021), recently published in NHESS, is the only published work addressing the prediction of the cost of drought events in France.
As explained in our reply to question 2 of reviewer 2, we do not address the same problem as Charpentier et al., (2021). It is very difficult to make comparisons between our results and theirs. We simply "quote Charpentier et al. (2021, end of Section 4.1) who say of their predictions for the year 2016 that they are 'severely underestimated'. Judging by their Figure 7, the underestimation by the discrete and continuous overarching Super Learners for the year 2016 is less pronounced than the underestimation by their algorithms (but we recall that they tackle a more challenging problem than us because we focus on the city-specific costs for those cities that have obtained the government declaration of natural disaster for a drought event whereas they consider all French cities)."
Furthermore, as written on page 11, "[f]or confidentiality reasons, we were not given the authorization to discuss how the overarching Super Learners fare compared to the algorithm currently deployed at CCR to predict the overall costs of drought events in France from 2007 to 2017. However, we were authorized to make a comparison for the sole year 2017. That particular year, the discrete and continuous overarching Super Learners outperform the algorithm currently deployed at CCR, with a precision of 96% (discrete overarching Super Learner), 94% (continuous overarching Super Learners) versus 83% (currently deployed algorithm)."
- I think the paper has been able to explain the process of the super learner, but they still need to do more work in explaining how their developed algorithm was able to forecast the cost of drought events and how this algorithm should be better considered than previously used ones if there were any.
Please, see our reply to question 16d.
Citation: https://doi.org/10.5194/egusphere-2022-541-CC1 -
AC1: 'Reply on CC1', Geoffrey Ecoto, 13 Sep 2022
The responses submitted by Antoine Chambaz were written by the two authors of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC1
-
AC3: 'Reply on RC1', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC3
-
RC2: 'Comment on egusphere-2022-541', Anonymous Referee #2, 07 Aug 2022
Please find my comments in the attached file.
-
CC2: 'Reply on RC2', Antoine Chambaz, 31 Aug 2022
We thank the reviewer for their report.
- In my opinion, this manuscript is not suitable for publication in its current form, particularly in a journal focused on natural-hazards research. My primary concerns can be summarized as follows:
- 1. The manuscript is missing many key details, such that it is not possible for me to determine whether the methodological approach adopted is sound. For example:
- 1a1. The data section does not describe the data used from different data sources and how these data compare in terms of resolution, quality etc.
The "Data" section spans approximately three pages. It consists of 77 lines of text and one figure. We disagree with the statement "[t]he data section does not describe the data used from different data sources". However, the reviewer makes useful suggestions to enhance significantly the description. Specifically:
- 1a2. The data section does not describe [...] how these data compare in terms of resolution, quality etc.
We thank the reviewer for pointing out the need for clarification. In Section 2.1 ("Data provided by CCR's cedents"), we now explain that the insured goods and claims data gathered by CCR over the years are accurately geolocalized. We also detail the resolution of each covariate when that is relevant. As for the covariates' quality, we obtained them from the National Institute for Statistical and Economic Studies (INSEE), Geographic National Institute (IGN), French Geological Survey (BRGM) and Météo-France, four trusted public organizations that notably collect, share and analyze information about the French economy and people (INSEE), geography (IGN), geology (BRGM) and meteorology (Météo-France).
- 1a3. There is no attempt made to justify the variables included in the algorithm or their significance, leaving the reader uncertain as to whether exclusion or inclusion of more variables would have improved the performance of the algorithm. Arbitrary divisions of the data (e.g., proportions of buildings built are categorized in different time intervals) are not explained and supported.
- In Subsection "About the city-level costs of drought events" we explain our need to derive provisional city-specific costs, and how we proceed to compute them.
- In Subsection "About the city-level description" we now clarify the source of each covariate. The climatic zone and seismic zone covariates are defined by law, and we now point to the relevant articles. The proportions of buildings are defined by, and obtained from, INSEE.
Overall, as explained at the beginning of the subsection, "[a] city's multi-faceted description attempts to capture all the city's traits that, beyond the city-level SWIs (...), can explain the cost of a possible drought event." We did our best to include covariates with a potential for being relevant for the task at hand. Of course we cannot guarantee that all of them are useful.
Interestingly, some of the base learners included in the library of algorithms upon which our discrete and continuous overarching Super Learners rely incorporate data-driven routines to select more relevant covariates. Moreover, as explained in Subsection 4.1 ("Their library of algorithms"), "some of these base learners are combined (upstream) with screening algorithms. A screening algorithm is merely an algorithm that selects a subset of the covariates deemed relevant to feed the base learners. (...) In our study, we only use deterministic screening algorithms based on expert knowledge."
In Section 5 ("Discussion"), we acknowledge that the quality of our predictions strongly depends on the quality of the local description of the drought event. We discuss how the local descriptions could be improved in future work.
- 1b. There is no attempt made to introduce the very basic high-level concepts of super learning to unfamiliar readers, even to explain the concept somewhat succinctly in the abstract. This is not appropriate, given that the targeted journal is focused on natural-hazard research. Furthermore, it seems to me (based on the description provided in Section 4.2) that the authors may be evaluating the performance of the algorithm based on training (rather than test) data, which would not be appropriate.
- super learning
The guidelines about the abstract recommend that it be "short, clear, concise". If possible, it would be good indeed to include a brief description of what super learning consists in. We give it a shot in the revised version. The new abstract now reads:
"Drought events are the second most expensive type of natural disaster within the French legal framework called the natural disasters compensation scheme. In recent years, drought events have been remarkable in their geographical scale and intensity. We develop and apply a new methodology to forecast the cost of a drought event in France. The methodology hinges on super learning (van der Laan et al., 2007; Benkeser et al., 2018). Super learning is a general methodology to learn a feature of the law of the data identified through an ad hoc risk function by relying on a library of algorithms. The algorithms either compete (discrete super learning) or collaborate (continuous super learning), a cross-validation scheme allowing to determine the best performing algorithm or combination of algorithms, respectively. Our super learner takes into account the complex dependence structure induced in the data by the spatial and temporal nature of drought events."
The third paragraph of Section 3.1 ("Presentation and theoretical performance" of the One-Step Ahead Sequential Super Learner, OSASSL) summarizes what super learning consists in. The detailed description of how we implement our two OSASSLs complements the summary.
Following the reviewer's suggestion, we also added the brief description of super learning from the abstract in the first paragraph of Section 3.1.
- on training OSASSL
Super learning hinges on cross-validation to evaluate and compare the risks of the various algorithms. In the simpler case where one learns from independent and identically distributed data, one often implements V-fold cross-validation: first, the data set is split into V groups of roughly equal sizes (the "folds"); second, every algorithm is trained and tested V times, once for each fold, which is used for testing after the algorithm has been trained using all the other folds; third, the cross-validated (empirical) risk of the algorithm is defined as the average of the V fold-specific (empirical) risks obtained by testing.
In this study, we learn from a (short) time-series (with time-specific observations consisting of many dependent data-structures). We thus cannot rely on V-fold cross-validation. Instead, we rely on a sequential cross-validation scheme: sequentially at each time t, for each algorithm: all data till time (t-1) are used for training and the t-specific data are used for testing; the t-specific cross-validated (empirical) cumulative risk of the algorithm is defined as the average of the tau-specific (empirical) risks (where tau ranges between 1 and t) obtained by testing.
If the reviewer thought that could be useful for future readers, we would gladly include the two above paragraphs in the manuscript.
Furthermore, as explained in Section 3.1 ("Presentation and theoretical performance"), the theoretical analysis of OSASSL carried out in a companion study reveals that OSASSL manages to make up for the shortness of the time-series thanks to the manyness of each time-specific observation provided that the latter are only slightly dependent.
- 1c. There is no justification provided for the authors' exclusive focus on cities. Why not also include the costs of droughts in rural areas, when total drought costs are available (according to Figure 1)?
We thank the reviewer for pointing out the fact the word "city" may be misleading. We certainly must clarify that all French "communes" are considered.
According to the Cambridge Dictionary, the word "commune" can be translated to "town" or "village". However, the first definition of "town" in the same dictionary reads "a place where people live and work, containing many houses, shops, places of work, places of entertainment, etc., and usually larger than a village but smaller than a city". We finally opted for the use of the word "city" regardless of the location and size.
Would a note on the choice of the word "city" and on the fact that all "communes" are considered solve the issue?
- 1d. How is inflation factored into the observed costs, particularly those from many years ago? How can future changes in exposure and population be integrated into future projections of drought costs from these algorithms? These questions should be answered clearly within the text.
We use "constant euros". This has been clarified at the very beginning of the manuscript.
In Section 3.1 ("Presentation and theoretical performance") we make a stationarity assumption on the mean conditional cost given the (a,t)-specific collection X_{a,t} of covariates describing city a on year t and the city-level SWI Z_{a,t} describing the drought event that year. In words, we assume that the mechanism that produces a cost after a drought event conditionally on (X_{a,t} , Z_{a,t}) does not depend on (a, t), that is, remains constant throughout time and France. In view of the reviewer's question, we emphasize that (X_{a,t} , Z_{a,t}) includes (and is not limited to) a measure of exposure and a description of the population.
Under this stationarity assumption, we can use the estimator of the mean conditional cost to make predictions at any (x,z) provided that (x,z) falls in the domain of the observed (X_{a,t} , Z_{a,t}). Of course, the closer (x,z) is to the border of that domain, the less reliable is the prediction. Moreover, if (x,z) falls outside the domain then, although a prediction may be made nonetheless, it cannot be trusted. So, in view of the reviewer's question and of climate change, not-too-distant-future projections of drought events can be made.
- 2. If the problem being tackled is "less challenging" than that of a previous study (as implied by the authors in line 270), then I am doubtful on what (if anything) the present study is contributing to the state of the art in this field.
Forecasting the cost of drought events in France is an important task for CCR. For a given year the task will be carried out several times because, as time goes by, more relevant information is available.
At first, it is necessary to predict which cities will make a request for the government declaration of natural disaster for a drought event. Later on it is known that some cities did make the request and it is still necessary to predict for the others. Later still it is known exactly which cities did make the request. Note that once a request is made, there is no uncertainty for CCR about whether or not the city will obtain the government declaration of natural disaster for a drought event.
Therefore CCR currently addresses two sub-problems separately: sub-problem 1 consists in predicting which cities will make a request for the government declaration of natural disaster for a drought event; sub-problem 2 consists in predicting the cost of a drought event for those cities that obtained the government declaration of natural disaster for a drought event. In this study, we focus on sub-problem 2. On the contrary, Charpentier et al (2021) address the two sub-problems as one single problem.
Our algorithms are useful early on, when it is still necessary to predict which cities will make a request for the government declaration of natural disaster for a drought event. In that case, another algorithm (a solution to sub-problem 1) is used to predict which cities will make a request and the prediction of costs is carried out for them. Our algorithms are also useful later on, when it is known exactly which cities did make the request. In that case, of course, the other algorithm is not useful.
- 3. As seen in Figure 1, the claims data does not adequately represent the full cost of the droughts in any given year. If the purpose of the algorithm is to predict claims data, then this might be acceptable but if the purpose of the algorithm is to predict overall drought costs, then these do not seem reasonable training data to me.
We do want to predict overall drought costs. Moreover, even if we aim to forecast the cost of drought events from year t on year (t+1), the cost of the damages in a city caused by a drought event that happened on year t is still unknown on year (t+1). In Section 2.3 ("City-level data processing", subsection "About the city-level costs of drought events") we explain how city-specific costs are estimated in such a way that the sum of all the city-specific costs equals the overall cost estimated by actuarial studies.
- 4. I am generally concerned by the arbitrary equivalence of droughts and natural disasters. Droughts are not the only natural disasters that France suffers, yet this seems to be incorrectly implied in a number of cases:
We should have clarified that we focus solely on drought events.
- 4a. Inputs to the algorithm include indicators on whether there have been (successful) requests for government declarations of natural disasters -- these declarations do not necessarily indicate the occurrence of a drought.
We now use systematically the expressions "make a request for/obtain the government declaration of natural disaster for a drought event".
- 4b. Figure 5 shows errors for regions where natural disasters (rather than specifically droughts) occurred.
See the two above replies.
- 5 More minor (but still important) concerns:
- 5a. I cannot find a precise description of the aim of the study in the Introduction. (This is implicit but should be explicit for clarity).
Thank you for noting this. We clarified of objective in the introduction.
- 5b. Figure 3: The real costs shown in this figure do not seem to align with those shown in Figure 1 (e.g., the 2017 cost of >900 million shown in Figure 3 is not found in Figure 1). So what real costs are being shown here?
The real costs are reevaluated every quarter. We will make sure that we use the latest real costs in both figures.
Citation: https://doi.org/10.5194/egusphere-2022-541-CC2 -
AC2: 'Reply on CC2', Geoffrey Ecoto, 13 Sep 2022
The responses submitted by Antoine Chambaz were written by the two authors of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC2 -
AC4: 'Reply on CC2', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC4 -
AC5: 'Reply on CC2', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC5
-
AC6: 'Reply on RC2', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC6
-
CC2: 'Reply on RC2', Antoine Chambaz, 31 Aug 2022
Status: closed
-
RC1: 'Comment on egusphere-2022-541', Anonymous Referee #1, 30 Jul 2022
- L7-L8- the statements need to be cited.
- In L16, In (Charpentier et al., 2021), I suggest the bracket be removed.
- In Line 30- I don’t know if the authors have a reason for not capitalizing the abbreviated words in OSSASL.
- In L115, L173, 207, 295- the use of “so-called” I do want to believe it is an adjectival description of the succeeding nouns, but in the various context it has been used, it sounds more of a sarcastic use than what you intended it for in the sentence. Also, the use of so-called with OSASSL, I found inappropriate as it is still a new algorithm that has not gained ground. I suggest the authors find another appropriate use for the “so-called” term in the highlighted sentences.
- The use of brackets should be well defined, example is at the end of L191, L193, there are cases of “unopened and opened parenthesis with no closing”.
- Sections 4.1: Starting the paragraph with “In fact” is a disconnect from the previous discussion
- L177- “more details to follow.” I suggest the sentence should be enclosed in a bracket rather than the hyphen.
- L190, L239- the use of a one-sentence paragraph, I suggest should be avoided altogether.
- The sentence in L190-L191 needs to be rephrased, to give an appropriate meaning of what the authors meant there.
- Section 4.2 should be properly named. “Training of what?”
- In Figure 5, the spelling of drought, not “drougth” should be checked in the map presented.
- Instead of the description with “left-hand side map” and “right-hand side map” I suggest the authors should label the maps with “Map A” and “Map B” or in any other format suitable for easy visual comprehension by readers.
- In Table 2, I think the table should be properly formatted.
- L263-267, the sentence is too long, I suggest it be split as appropriate.
- L-290- “and have been since then” should be properly rephrased.
- In summary, I suggest
- The authors need to do more editing and use more technical language in their work.
- The sectioning of the paper needs to be improved upon. L 199 is not a befitting section name. I suggest it should be either expunged or properly rephrased. Also, regarding the sections, if the authors decide to retain the structure of the sections, then they should give proper numbering to them.
- The paper is having an inadequate coherence in the flow of thought, aside from sectioning which has been mentioned earlier, the authors should, in addition, ensure a consistent flow of thought from one paragraph to the other, and from one section to the other.
- The authors should cite relevant works in their paper for validation.
I think the paper has been able to explain the process of the super learner, but they still need to do more work in explaining how their developed algorithms was able to forecast the cost of drought events and how this algorithm should be better considered than previously used ones if there were any.
Citation: https://doi.org/10.5194/egusphere-2022-541-RC1 -
CC1: 'Reply on RC1', Antoine Chambaz, 31 Aug 2022
We thank the reviewer for their report.
- 1) L7-L8- the statements need to be cited.
We rely on the 2021 annual report produced by CCR, "Les catastrophes naturelles en France, bilan 1982-2020" (2021), which is now properly cited.
- 2) In L16, In (Charpentier et al., 2021), I suggest the bracket be removed.
The guidelines about in-text citation recommend to use parentheses in this case.
- 3) In Line 30- I don't know if the authors have a reason for not capitalizing the abbreviated words in OSSASL.
The text now reads "We call our algorithm the One-Step Ahead Sequential Super Learner (OSASSL)."
- 4) In L115, L173, 207, 295- the use of "so-called" I do want to believe it is an adjectival description of the succeeding nouns, but in the various context it has been used, it sounds more of a sarcastic use than what you intended it for in the sentence. Also, the use of so-called with OSASSL, I found inappropriate as it is still a new algorithm that has not gained ground. I suggest the authors find another appropriate use for the "so-called" term in the highlighted sentences.
Thank you for pointing out our awkward use of the adjective "so-called" (this blog post was quite interesting on the subject: https://thebettereditor.wordpress.com/2017/03/28/another-so-called-or-is-it-so-called-blog-post/). We certainly did not mean to be sarcastic. We removed all its occurrences.
- 5) The use of brackets should be well defined, example is at the end of L191, L193, there are cases of "unopened and opened parenthesis with no closing".
In the example provided, there is no parentheses mismatch. We checked the entire text and did not spot any such mismatch.
- 6) Sections 4.1: Starting the paragraph with "In fact" is a disconnect from the previous discussion
We removed the expression "In fact".
- 7) L177- "more details to follow." I suggest the sentence should be enclosed in a bracket rather than the hyphen.
Done, thank you for the piece of advice.
- 8) L190, L239- the use of a one-sentence paragraph, I suggest should be avoided altogether.
In response to comment 9 the first paragraph, reshaped, is not a one-sentence paragraph anymore. The second one-sentence paragraph has been reshaped to consist of two sentences.
- 9) The sentence in L190-L191 needs to be rephrased, to give an appropriate meaning of what the authors meant there.
The sentence is short because it uses notation introduced ealier in the text. We added a comment: "In words, at time t >= 1, the algorithm whose penalized empirical average cumulative risk is the smallest is determined and the discrete overarching Super Learner returns the output of that algorithm trained on all data till time t."
- 10) Section 4.2 should be properly named. "Training of what?"
Section 4.2 is now named "Training the discrete and continuous overarching Super Learners".
- 11) In Figure 5, the spelling of drought, not "drougth" should be checked in the map presented.
Thank you for finding this typo. It has been corrected.
- 12) Instead of the description with "left-hand side map" and "right-hand side map" I suggest the authors should label the maps with "Map A" and "Map B" or in any other format suitable for easy visual comprehension by readers.
Done, thank you for the piece of advice.
- 13) In Table 2, I think the table should be properly formatted.
We added the required top and bottom horizontal lines to both Tables 1 and 2.
- 14) L263-267, the sentence is too long, I suggest it be split as appropriate.
Thank you for the piece of advice. We split the sentences in three parts.
- 15) L-290- "and have been since then" should be properly rephrased.
We cut the original sentence in two parts.
- 16) In summary, I suggest
- 16a) The authors need to do more editing and use more technical language in their work.
Eventually, the revised manuscript will be read (again) by a native English speaker before being uploaded.
- 16b) The sectioning of the paper needs to be improved upon. L 199 is not a befitting section name. I suggest it should be either expunged or properly rephrased. Also, regarding the sections, if the authors decide to retain the structure of the sections, then they should give proper numbering to them.
The sectioning of the manuscript will be improved upon.
- 16c) The paper is having an inadequate coherence in the flow of thought, aside from sectioning which has been mentioned earlier, the authors should, in addition, ensure a consistent flow of thought from one paragraph to the other, and from one section to the other.
The manuscript will be edited with special attention to the flow of thought.
- 16d) The authors should cite relevant works in their paper for validation.
Unfortunately, the relevant literature is scarce, in part because the challenge of anticipating the cost of drought events in France is obviously of interest mostly in France, and also because the data and methodologies are very sensitive. To the best of our knowledge, (Charpentier et al., 2021), recently published in NHESS, is the only published work addressing the prediction of the cost of drought events in France.
As explained in our reply to question 2 of reviewer 2, we do not address the same problem as Charpentier et al., (2021). It is very difficult to make comparisons between our results and theirs. We simply "quote Charpentier et al. (2021, end of Section 4.1) who say of their predictions for the year 2016 that they are 'severely underestimated'. Judging by their Figure 7, the underestimation by the discrete and continuous overarching Super Learners for the year 2016 is less pronounced than the underestimation by their algorithms (but we recall that they tackle a more challenging problem than us because we focus on the city-specific costs for those cities that have obtained the government declaration of natural disaster for a drought event whereas they consider all French cities)."
Furthermore, as written on page 11, "[f]or confidentiality reasons, we were not given the authorization to discuss how the overarching Super Learners fare compared to the algorithm currently deployed at CCR to predict the overall costs of drought events in France from 2007 to 2017. However, we were authorized to make a comparison for the sole year 2017. That particular year, the discrete and continuous overarching Super Learners outperform the algorithm currently deployed at CCR, with a precision of 96% (discrete overarching Super Learner), 94% (continuous overarching Super Learners) versus 83% (currently deployed algorithm)."
- I think the paper has been able to explain the process of the super learner, but they still need to do more work in explaining how their developed algorithm was able to forecast the cost of drought events and how this algorithm should be better considered than previously used ones if there were any.
Please, see our reply to question 16d.
Citation: https://doi.org/10.5194/egusphere-2022-541-CC1 -
AC1: 'Reply on CC1', Geoffrey Ecoto, 13 Sep 2022
The responses submitted by Antoine Chambaz were written by the two authors of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC1
-
AC3: 'Reply on RC1', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC3
-
RC2: 'Comment on egusphere-2022-541', Anonymous Referee #2, 07 Aug 2022
Please find my comments in the attached file.
-
CC2: 'Reply on RC2', Antoine Chambaz, 31 Aug 2022
We thank the reviewer for their report.
- In my opinion, this manuscript is not suitable for publication in its current form, particularly in a journal focused on natural-hazards research. My primary concerns can be summarized as follows:
- 1. The manuscript is missing many key details, such that it is not possible for me to determine whether the methodological approach adopted is sound. For example:
- 1a1. The data section does not describe the data used from different data sources and how these data compare in terms of resolution, quality etc.
The "Data" section spans approximately three pages. It consists of 77 lines of text and one figure. We disagree with the statement "[t]he data section does not describe the data used from different data sources". However, the reviewer makes useful suggestions to enhance significantly the description. Specifically:
- 1a2. The data section does not describe [...] how these data compare in terms of resolution, quality etc.
We thank the reviewer for pointing out the need for clarification. In Section 2.1 ("Data provided by CCR's cedents"), we now explain that the insured goods and claims data gathered by CCR over the years are accurately geolocalized. We also detail the resolution of each covariate when that is relevant. As for the covariates' quality, we obtained them from the National Institute for Statistical and Economic Studies (INSEE), Geographic National Institute (IGN), French Geological Survey (BRGM) and Météo-France, four trusted public organizations that notably collect, share and analyze information about the French economy and people (INSEE), geography (IGN), geology (BRGM) and meteorology (Météo-France).
- 1a3. There is no attempt made to justify the variables included in the algorithm or their significance, leaving the reader uncertain as to whether exclusion or inclusion of more variables would have improved the performance of the algorithm. Arbitrary divisions of the data (e.g., proportions of buildings built are categorized in different time intervals) are not explained and supported.
- In Subsection "About the city-level costs of drought events" we explain our need to derive provisional city-specific costs, and how we proceed to compute them.
- In Subsection "About the city-level description" we now clarify the source of each covariate. The climatic zone and seismic zone covariates are defined by law, and we now point to the relevant articles. The proportions of buildings are defined by, and obtained from, INSEE.
Overall, as explained at the beginning of the subsection, "[a] city's multi-faceted description attempts to capture all the city's traits that, beyond the city-level SWIs (...), can explain the cost of a possible drought event." We did our best to include covariates with a potential for being relevant for the task at hand. Of course we cannot guarantee that all of them are useful.
Interestingly, some of the base learners included in the library of algorithms upon which our discrete and continuous overarching Super Learners rely incorporate data-driven routines to select more relevant covariates. Moreover, as explained in Subsection 4.1 ("Their library of algorithms"), "some of these base learners are combined (upstream) with screening algorithms. A screening algorithm is merely an algorithm that selects a subset of the covariates deemed relevant to feed the base learners. (...) In our study, we only use deterministic screening algorithms based on expert knowledge."
In Section 5 ("Discussion"), we acknowledge that the quality of our predictions strongly depends on the quality of the local description of the drought event. We discuss how the local descriptions could be improved in future work.
- 1b. There is no attempt made to introduce the very basic high-level concepts of super learning to unfamiliar readers, even to explain the concept somewhat succinctly in the abstract. This is not appropriate, given that the targeted journal is focused on natural-hazard research. Furthermore, it seems to me (based on the description provided in Section 4.2) that the authors may be evaluating the performance of the algorithm based on training (rather than test) data, which would not be appropriate.
- super learning
The guidelines about the abstract recommend that it be "short, clear, concise". If possible, it would be good indeed to include a brief description of what super learning consists in. We give it a shot in the revised version. The new abstract now reads:
"Drought events are the second most expensive type of natural disaster within the French legal framework called the natural disasters compensation scheme. In recent years, drought events have been remarkable in their geographical scale and intensity. We develop and apply a new methodology to forecast the cost of a drought event in France. The methodology hinges on super learning (van der Laan et al., 2007; Benkeser et al., 2018). Super learning is a general methodology to learn a feature of the law of the data identified through an ad hoc risk function by relying on a library of algorithms. The algorithms either compete (discrete super learning) or collaborate (continuous super learning), a cross-validation scheme allowing to determine the best performing algorithm or combination of algorithms, respectively. Our super learner takes into account the complex dependence structure induced in the data by the spatial and temporal nature of drought events."
The third paragraph of Section 3.1 ("Presentation and theoretical performance" of the One-Step Ahead Sequential Super Learner, OSASSL) summarizes what super learning consists in. The detailed description of how we implement our two OSASSLs complements the summary.
Following the reviewer's suggestion, we also added the brief description of super learning from the abstract in the first paragraph of Section 3.1.
- on training OSASSL
Super learning hinges on cross-validation to evaluate and compare the risks of the various algorithms. In the simpler case where one learns from independent and identically distributed data, one often implements V-fold cross-validation: first, the data set is split into V groups of roughly equal sizes (the "folds"); second, every algorithm is trained and tested V times, once for each fold, which is used for testing after the algorithm has been trained using all the other folds; third, the cross-validated (empirical) risk of the algorithm is defined as the average of the V fold-specific (empirical) risks obtained by testing.
In this study, we learn from a (short) time-series (with time-specific observations consisting of many dependent data-structures). We thus cannot rely on V-fold cross-validation. Instead, we rely on a sequential cross-validation scheme: sequentially at each time t, for each algorithm: all data till time (t-1) are used for training and the t-specific data are used for testing; the t-specific cross-validated (empirical) cumulative risk of the algorithm is defined as the average of the tau-specific (empirical) risks (where tau ranges between 1 and t) obtained by testing.
If the reviewer thought that could be useful for future readers, we would gladly include the two above paragraphs in the manuscript.
Furthermore, as explained in Section 3.1 ("Presentation and theoretical performance"), the theoretical analysis of OSASSL carried out in a companion study reveals that OSASSL manages to make up for the shortness of the time-series thanks to the manyness of each time-specific observation provided that the latter are only slightly dependent.
- 1c. There is no justification provided for the authors' exclusive focus on cities. Why not also include the costs of droughts in rural areas, when total drought costs are available (according to Figure 1)?
We thank the reviewer for pointing out the fact the word "city" may be misleading. We certainly must clarify that all French "communes" are considered.
According to the Cambridge Dictionary, the word "commune" can be translated to "town" or "village". However, the first definition of "town" in the same dictionary reads "a place where people live and work, containing many houses, shops, places of work, places of entertainment, etc., and usually larger than a village but smaller than a city". We finally opted for the use of the word "city" regardless of the location and size.
Would a note on the choice of the word "city" and on the fact that all "communes" are considered solve the issue?
- 1d. How is inflation factored into the observed costs, particularly those from many years ago? How can future changes in exposure and population be integrated into future projections of drought costs from these algorithms? These questions should be answered clearly within the text.
We use "constant euros". This has been clarified at the very beginning of the manuscript.
In Section 3.1 ("Presentation and theoretical performance") we make a stationarity assumption on the mean conditional cost given the (a,t)-specific collection X_{a,t} of covariates describing city a on year t and the city-level SWI Z_{a,t} describing the drought event that year. In words, we assume that the mechanism that produces a cost after a drought event conditionally on (X_{a,t} , Z_{a,t}) does not depend on (a, t), that is, remains constant throughout time and France. In view of the reviewer's question, we emphasize that (X_{a,t} , Z_{a,t}) includes (and is not limited to) a measure of exposure and a description of the population.
Under this stationarity assumption, we can use the estimator of the mean conditional cost to make predictions at any (x,z) provided that (x,z) falls in the domain of the observed (X_{a,t} , Z_{a,t}). Of course, the closer (x,z) is to the border of that domain, the less reliable is the prediction. Moreover, if (x,z) falls outside the domain then, although a prediction may be made nonetheless, it cannot be trusted. So, in view of the reviewer's question and of climate change, not-too-distant-future projections of drought events can be made.
- 2. If the problem being tackled is "less challenging" than that of a previous study (as implied by the authors in line 270), then I am doubtful on what (if anything) the present study is contributing to the state of the art in this field.
Forecasting the cost of drought events in France is an important task for CCR. For a given year the task will be carried out several times because, as time goes by, more relevant information is available.
At first, it is necessary to predict which cities will make a request for the government declaration of natural disaster for a drought event. Later on it is known that some cities did make the request and it is still necessary to predict for the others. Later still it is known exactly which cities did make the request. Note that once a request is made, there is no uncertainty for CCR about whether or not the city will obtain the government declaration of natural disaster for a drought event.
Therefore CCR currently addresses two sub-problems separately: sub-problem 1 consists in predicting which cities will make a request for the government declaration of natural disaster for a drought event; sub-problem 2 consists in predicting the cost of a drought event for those cities that obtained the government declaration of natural disaster for a drought event. In this study, we focus on sub-problem 2. On the contrary, Charpentier et al (2021) address the two sub-problems as one single problem.
Our algorithms are useful early on, when it is still necessary to predict which cities will make a request for the government declaration of natural disaster for a drought event. In that case, another algorithm (a solution to sub-problem 1) is used to predict which cities will make a request and the prediction of costs is carried out for them. Our algorithms are also useful later on, when it is known exactly which cities did make the request. In that case, of course, the other algorithm is not useful.
- 3. As seen in Figure 1, the claims data does not adequately represent the full cost of the droughts in any given year. If the purpose of the algorithm is to predict claims data, then this might be acceptable but if the purpose of the algorithm is to predict overall drought costs, then these do not seem reasonable training data to me.
We do want to predict overall drought costs. Moreover, even if we aim to forecast the cost of drought events from year t on year (t+1), the cost of the damages in a city caused by a drought event that happened on year t is still unknown on year (t+1). In Section 2.3 ("City-level data processing", subsection "About the city-level costs of drought events") we explain how city-specific costs are estimated in such a way that the sum of all the city-specific costs equals the overall cost estimated by actuarial studies.
- 4. I am generally concerned by the arbitrary equivalence of droughts and natural disasters. Droughts are not the only natural disasters that France suffers, yet this seems to be incorrectly implied in a number of cases:
We should have clarified that we focus solely on drought events.
- 4a. Inputs to the algorithm include indicators on whether there have been (successful) requests for government declarations of natural disasters -- these declarations do not necessarily indicate the occurrence of a drought.
We now use systematically the expressions "make a request for/obtain the government declaration of natural disaster for a drought event".
- 4b. Figure 5 shows errors for regions where natural disasters (rather than specifically droughts) occurred.
See the two above replies.
- 5 More minor (but still important) concerns:
- 5a. I cannot find a precise description of the aim of the study in the Introduction. (This is implicit but should be explicit for clarity).
Thank you for noting this. We clarified of objective in the introduction.
- 5b. Figure 3: The real costs shown in this figure do not seem to align with those shown in Figure 1 (e.g., the 2017 cost of >900 million shown in Figure 3 is not found in Figure 1). So what real costs are being shown here?
The real costs are reevaluated every quarter. We will make sure that we use the latest real costs in both figures.
Citation: https://doi.org/10.5194/egusphere-2022-541-CC2 -
AC2: 'Reply on CC2', Geoffrey Ecoto, 13 Sep 2022
The responses submitted by Antoine Chambaz were written by the two authors of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC2 -
AC4: 'Reply on CC2', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC4 -
AC5: 'Reply on CC2', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC5
-
AC6: 'Reply on RC2', Geoffrey Ecoto, 13 Sep 2022
See Antoine Chambaz's reply.
Citation: https://doi.org/10.5194/egusphere-2022-541-AC6
-
CC2: 'Reply on RC2', Antoine Chambaz, 31 Aug 2022
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
677 | 258 | 55 | 990 | 33 | 29 |
- HTML: 677
- PDF: 258
- XML: 55
- Total: 990
- BibTeX: 33
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1