the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Downscaling using Deep Convolutional Autoencoders, a case study for South East Asia
Abstract. Inspired by recent advancements in the field of computer vision, specifically models for generating higher-resolution images from low-resolution images, we investigate the utility of a deep convolutional autoencoder for downscaling and bias correcting climate projections for South East Asia (SEA). Downscaled projections of 2 m surface temperature are generated, using autoencoders trained with data from the Coupled Model Intercomparison Project Phase 5 (CMIP5) and data from the fifth generation ECMWF atmospheric reanalysis (ERA5) project. Using CMIP5 projections as an input, three sets of downscaled data are generated using three methods of autoencoder training, which allow us to determine how autoencoder downscaling and bias correction modify temperature values. Where possible, the downscaled outputs are compared against the Southeast Asia Regional Climate Downscaling/Coordinated Regional Climate Downscaling Experiment–Southeast Asia (SEACLID/CORDEX–SEA) project and outputs from available CMIP6 experiments, to evaluate performance. The autoencoders are found to excel at the rapid generation of highly spatially-resolved climate projections for surface temperature. Realistic spatial features due to coastal and topographic variation are generated by the autoencoder, which are not present in the CMIP5 projections. Additionally, the autoencoders are capable of generating forecast data with regional temperature profiles exceeding that of those appearing in the training set (out-of-sample extrapolation). Seasonal temperature cycles are retained after downscaling throughout the region, despite the absence of temporal information provided to the model. However, autoencoders trained to carry out bias correction display a tendency to smooth daily average temperatures and reduce daily highs and lows beyond that which can be expected to be realistic. Without bias correction, downscaled outputs have a reduced improvement in spatial resolution but the daily temperature profiles of the CMIP5 input forecasts are maintained. Autoencoders rely on the presence of structural features in the datasets to carry out downscaling, and so performance over the oceans is reduced as strong temperature gradients are absent. For this reason, ocean warming is not well represented, an artefact which is not immediately clear in the downscaled outputs. This study demonstrates the importance of rigorous analysis of 'black-box' methods, which can generate non-obvious artefacts that could potentially create misleading results. Despite these limitations, Autoencoders are clearly capable of generating much needed high-resolution climate projections, and strategies to improve upon shortcomings are numerous and well established.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(9202 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-234', Anonymous Referee #1, 02 Sep 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2022/egusphere-2022-234/egusphere-2022-234-RC1-supplement.pdf
-
CEC1: 'Comment on egusphere-2022-234', Juan Antonio Añel, 20 Sep 2022
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
For manuscripts using training models, it is necessary that both the data used for training and the output files are available, jointly with the code. In your Zenodo repository, you include the code; however, the mentioned datasets are missing.
Therefore, you must upload the requested data to the repository and make them publicly available too. Please, do it as soon as possible, as this material should be available for the Discussions stage. Indeed, your manuscript should have never been accepted for Discussions without it. Also, please, note that failure to be aware that failing to comply with this request could result in rejecting your manuscript for publication.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2022-234-CEC1 -
RC2: 'Comment on egusphere-2022-234', Anonymous Referee #2, 24 Oct 2022
Summary
This study investigates the performance of autoencoders (a machine learning technique) for downscaling climate data to higher resolution. The authors specifically test three variations of an autoencoder (two which additionally bias correct CMIP data, and one which simply reconstructs CMIP data at higher resolution) over Southeast Asia and the Maritime Continent, a region of complex topography. Subject to minor revisions, I think this study will be suitable for publication.
General Comments
- The article needs to be carefully edited for clarity and grammar/language use. I have listed some examples below, but please note that these are not exhaustive.
- The use of passive voice and lack of first person voice makes some sentences very confusing (e.g., section 3.1.1). Revising with first person active voice (i.e., “we did X” instead of “X was done”) will improve clarity. In many parts of the manuscript (especially section 3.1.1) it is unclear what data is actually used, and who is doing what (i.e., there were many places where it wasn't clear whether something stated was carried out by the authors or in a previous study).
- Line 55-56: Revise for subject-verb agreement
- Line 232: sentence ends prematurely (due to the word “both” and it not containing a second item)
- In many paragraphs (e.g., first paragraph of section 4.2.3), a figure is being discussed before the figure is mentioned. This is confusing. Please move the mention of the figure earlier.
- CMIP5/CMIP6/CORDEX: I’m a bit confused by these comparisons. For example, why compare against CMIP6 when CMIP5 is the input data for downscaling? I don’t think it adds useful information and leads to me simply feeling confused. I’m craving more discussion/justification at each point where you are comparing these three. One suggestion for improved clarity is to either remove CMIP6 or change CORDEX to CORDEX (CMIP5) in figure labels. Additionally, the different emissions scenarios are even more confusing. Can you be more detailed in your motivations for comparing these three, and revise your descriptions of doing so throughout?
Specific Comments
- I suggest moving Figure 7 earlier. Perhaps even to Section 2. This is because I was immediately put off by the bias correction architectures. A downscaler should not bias correct, because preserving model spread is an important component of quantifying uncertainty. Additionally, bias correction assumes biases are stationary in time, which is not a given. Figure 7 and its accompanying discussion are a great motivator for keeping the bias correction.
- Replace the word “profiles” (used throughout) with something more appropriate. “Profiles” implies vertical structure in this field.
- Include a sentence in the abstract about the autoencoders with bias correction reducing the magnitude of warming when future data from emissions scenarios is input. I think this is an important result.
- Section 4.2.1: Something feels like its out of order or missing. E.g., there is no discussion of Figure 12 and then next page is mostly blank. Also, the paragraph on pg. 20 feels out of place.
- Line 388: “Ensuring that projections follow current observational trends” is too strong. It is alarming that translation dampens warming predicted from CMIP simulations. I disagree that translation “ensures” trends are followed
Technical Corrections
- Awkward to list figures out of order in which they are mentioned in the text. Fig. 3 is mentioned before Fig 1. I would re-order the figures.
- Line 228: Dates are different from those listed in line 187.
- Line 260-263: Incorrect statements are made in these sentences. I’m seeing ~2.5 K spread in minima and only ~1.5K spread in maxima. Also it looks like RC is overestimating daily maximum by 0.2K instead of underestimating by 1 K as stated.
- Figure 6: MSE for which model? All models? Training MSE? Testing MSE? Be more specific in caption.
- Section 4.2: Replace “gradient” with trend. “Gradient” implies spatial slope.
- Figure 11: What difference is this? Mean difference? Max difference? Replot with colorbar centered on zero.
- Figure 13: PLease include a colorbar so we can confirm that blue = colder ad red = warmer and that the values are centered on zero. It can be qualitative (e.g., “cold” and “warm” instead of numbers)
- Line 371-372: I’m observing the opposite (red line, which I’m guessing is the median or mean, is lower for CORDEX). Say what the red line is in the figure caption (fig 14).
- Appendices B,C,D don’t appear to be mentioned in the text.
Citation: https://doi.org/10.5194/egusphere-2022-234-RC2
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-234', Anonymous Referee #1, 02 Sep 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2022/egusphere-2022-234/egusphere-2022-234-RC1-supplement.pdf
-
CEC1: 'Comment on egusphere-2022-234', Juan Antonio Añel, 20 Sep 2022
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
For manuscripts using training models, it is necessary that both the data used for training and the output files are available, jointly with the code. In your Zenodo repository, you include the code; however, the mentioned datasets are missing.
Therefore, you must upload the requested data to the repository and make them publicly available too. Please, do it as soon as possible, as this material should be available for the Discussions stage. Indeed, your manuscript should have never been accepted for Discussions without it. Also, please, note that failure to be aware that failing to comply with this request could result in rejecting your manuscript for publication.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2022-234-CEC1 -
RC2: 'Comment on egusphere-2022-234', Anonymous Referee #2, 24 Oct 2022
Summary
This study investigates the performance of autoencoders (a machine learning technique) for downscaling climate data to higher resolution. The authors specifically test three variations of an autoencoder (two which additionally bias correct CMIP data, and one which simply reconstructs CMIP data at higher resolution) over Southeast Asia and the Maritime Continent, a region of complex topography. Subject to minor revisions, I think this study will be suitable for publication.
General Comments
- The article needs to be carefully edited for clarity and grammar/language use. I have listed some examples below, but please note that these are not exhaustive.
- The use of passive voice and lack of first person voice makes some sentences very confusing (e.g., section 3.1.1). Revising with first person active voice (i.e., “we did X” instead of “X was done”) will improve clarity. In many parts of the manuscript (especially section 3.1.1) it is unclear what data is actually used, and who is doing what (i.e., there were many places where it wasn't clear whether something stated was carried out by the authors or in a previous study).
- Line 55-56: Revise for subject-verb agreement
- Line 232: sentence ends prematurely (due to the word “both” and it not containing a second item)
- In many paragraphs (e.g., first paragraph of section 4.2.3), a figure is being discussed before the figure is mentioned. This is confusing. Please move the mention of the figure earlier.
- CMIP5/CMIP6/CORDEX: I’m a bit confused by these comparisons. For example, why compare against CMIP6 when CMIP5 is the input data for downscaling? I don’t think it adds useful information and leads to me simply feeling confused. I’m craving more discussion/justification at each point where you are comparing these three. One suggestion for improved clarity is to either remove CMIP6 or change CORDEX to CORDEX (CMIP5) in figure labels. Additionally, the different emissions scenarios are even more confusing. Can you be more detailed in your motivations for comparing these three, and revise your descriptions of doing so throughout?
Specific Comments
- I suggest moving Figure 7 earlier. Perhaps even to Section 2. This is because I was immediately put off by the bias correction architectures. A downscaler should not bias correct, because preserving model spread is an important component of quantifying uncertainty. Additionally, bias correction assumes biases are stationary in time, which is not a given. Figure 7 and its accompanying discussion are a great motivator for keeping the bias correction.
- Replace the word “profiles” (used throughout) with something more appropriate. “Profiles” implies vertical structure in this field.
- Include a sentence in the abstract about the autoencoders with bias correction reducing the magnitude of warming when future data from emissions scenarios is input. I think this is an important result.
- Section 4.2.1: Something feels like its out of order or missing. E.g., there is no discussion of Figure 12 and then next page is mostly blank. Also, the paragraph on pg. 20 feels out of place.
- Line 388: “Ensuring that projections follow current observational trends” is too strong. It is alarming that translation dampens warming predicted from CMIP simulations. I disagree that translation “ensures” trends are followed
Technical Corrections
- Awkward to list figures out of order in which they are mentioned in the text. Fig. 3 is mentioned before Fig 1. I would re-order the figures.
- Line 228: Dates are different from those listed in line 187.
- Line 260-263: Incorrect statements are made in these sentences. I’m seeing ~2.5 K spread in minima and only ~1.5K spread in maxima. Also it looks like RC is overestimating daily maximum by 0.2K instead of underestimating by 1 K as stated.
- Figure 6: MSE for which model? All models? Training MSE? Testing MSE? Be more specific in caption.
- Section 4.2: Replace “gradient” with trend. “Gradient” implies spatial slope.
- Figure 11: What difference is this? Mean difference? Max difference? Replot with colorbar centered on zero.
- Figure 13: PLease include a colorbar so we can confirm that blue = colder ad red = warmer and that the values are centered on zero. It can be qualitative (e.g., “cold” and “warm” instead of numbers)
- Line 371-372: I’m observing the opposite (red line, which I’m guessing is the median or mean, is lower for CORDEX). Say what the red line is in the figure caption (fig 14).
- Appendices B,C,D don’t appear to be mentioned in the text.
Citation: https://doi.org/10.5194/egusphere-2022-234-RC2
Model code and software
Trained models and code accompanying 'Downscaling using Deep Convolutional Autoencoders, a case study for South East Asia' Oliver Levers https://doi.org/10.5281/zenodo.6986257
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
767 | 246 | 28 | 1,041 | 17 | 18 |
- HTML: 767
- PDF: 246
- XML: 28
- Total: 1,041
- BibTeX: 17
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1