the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Parallel SnowModel (v1.0): a parallel implementation of a Distributed Snow-Evolution Modeling System (SnowModel)
Abstract. SnowModel, a spatially distributed, snow-evolution modeling system, was parallelized using Coarray Fortran for high-performance computing architectures to allow high-resolution (1 m to 100’s of meters) simulations over large, regional to continental scale, domains. In the parallel algorithm, the model domain is split into smaller rectangular sub-domains that are distributed over multiple processor cores using one-dimensional decomposition. All of the memory allocations from the original code have been reduced to the size of the local sub-domains, allowing each core to perform fewer computations and requiring less memory for each process. A majority of the subroutines in SnowModel were simple to parallelize; however, there were certain physical processes, including blowing snow redistribution and components within the solar radiation and wind models, that required non-trivial parallelization using halo-exchange patterns. To validate the parallel algorithm and assess parallel scaling characteristics, high-resolution (100 m grid) simulations were performed over several western United States domains and over the contiguous United States (CONUS). The CONUS scaling experiment had approximately 71 % parallel efficiency; runtime decreased by a factor of 32 running on 2304 cores relative to 52 cores (the minimum number of cores that could be used to run such a large domain as a result of memory and time limitations). CONUS 100 m simulations were performed for 21 years (2000–2021) using 46,238 and 28,260 grid cells in the x and y dimensions, respectively. Each year was simulated using 1800 cores and took approximately 5 hours to run.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(3137 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(3137 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1612', Anonymous Referee #1, 26 Aug 2023
This is a review of “Parallel SnowModel (v1.0): a parallel implementation of a Distributed Snow-Evolution Modeling System (SnowModel)” where the authors describe imporovements made to Snowmodel by including a distributed parallel scheme. These improvements allow for running over large spatial extents and for longer temporal periods.
Broadly, I like what the authors have done to use CAF to bring parallelism to an existing code. Hydrology has long lacked HPC-aware models and the science has been poorer for it.
However, I struggle to follow aspects of this manuscript and I do not find the scaling results convincing. Finally, the lack of any validation against observations gives me pause, especially given the SWE results shown are almost certainly wrong for large portions of the domain in Figure 11. I will detail these concerns below.
First, the introduction essentially fails to cite any of the European or Canadian literature on snow dynamics, blowing snow, and the existing model developments that have been made. Notable contributions from Mott, Durand, Lehning, Vionnet, Marsh, Pomeroy, Musselman, MacDonald, Morin, Fang, and Essery to name but a few are all missing and would provide valuable context to the Liston, et al modelling efforts.
Secondly, I find the mixing of methods and results to be very confusing. This is exceptionally bad in the Parallel Performance (S4.2) section where multiple code revisions are described. It is not at all clear where the different ‘Distributed high Sync’, etc are coming from. In some of these, the results presented are trivial — of course one would expect increased synchronization across more processes to incure scaling limitations. It is not clear if the SnowTran-3D plateau at 36-processes is the final code, or was a WIP code. I get the impression the authors are attempting to convey their profiling journey to optimize their code, but a) a general audience likely is not interested in all the specifics and b) it’s confusing laid out leaving an interested reader muddled. For example on line 386, it is unclear /what/ versions of the code were even used. This section strikes me as the crux of the results, and is therefore an important section. However, I struggled to make my way through it. I would strongly sugges the authors split the methodology out and well describe what was profiled, etc and how this shaped the CAF implimentation. And then in the results, clearly and simply show “it is faster by XYZ for domains PQR”.
In addition, the 16 timesteps are really not compelling as currently presented. I am sympathetic to the computational constraints. However, without code coverage, is there any guarantee that the code was tested in a representative manner? For example, if there were few or no melt / blowing snow (or if there was any snow!) then the results would not be typical of a run. This criticism exists for the 1 month serial v. distributed period (L333) as well. Is this a representative period of time viz a viz excercising the toughest numerical code paths (e.g., blowing snow and multilayer snowpacts, canopy interception) and highest sync code paths?
My read is that Figure 10 is the “final” code that is evaluated for scaling testing. My following comments are through this lens.
I do not find Figure 10 convincing of strong scaling. I would expect PNW to be the most difficult to simulate region with deep snow covers, and many blowing snow events. It performs weakly, with essentially plateaued scaling at 750 processes. As more non-blowing snow (and non-snow) cells are added in the CONUS domain, the scaling increases (shown in Figure 11). Essentially my read is the more non-snow cells that are added, the better the scaling. This is not a strong scaling result. Rephrased, over domains with significant snow processes, the scaling is poor.
The simulated SWE results presented in Figure 11 are suspect. This is total SWE on the ground in Feb, correct? In the middle of the winter (Feb) there is snow covering much of Canada — the foothills of Alberta, the Priaries of AB, SK, MB, and the Boreal forests of AB, SK, and MB. In the simulation results shown in Figure 11a, the domain east of continental divid, including the eastern Rockies, is shown as having zero SWE. This is almost certainly not correct. The authors note that an evaluation of the SWE data will be done at a later point, but if this number of no-op grid cells are being used for the scaling evaluation, then the scaling evaluation is not representative of a real winter simulation.
Figure 11e shows the erosion and then deposition across a ridgeline. However, in most mountain regions, this deposited snow will avalanche to a lower elevation. Given there is no avalanche model in this code and no avalanche literature is cited, these results are not compelling. Perhaps this is a ridgeline that doesn’t have avalanches. But this needs to be noted if true.
In conclusion, I like that the authors are describing making the code HPC-aware by using CAF with a simple halo exchange. I think there is value in showing the community that “legacy” models can be updated and that it is “not that hard.” Such messaging has the potential to help normalize HPC-aware code development. However, the scaling results seem to show significant limitations in the scaling and the better CONUS scaling is almost certainly due to not simulating snow (in places erroneously). As a result I feel that the authors have over-stated their results that the model has strong-scaling and scales efficiently. I am also concerned that the model is not producing reasonable SWE.
Specific comments follow:
L9 100’s -> 100s (not possessive)
L21 1800 cores contradicts 2304 listed above?
L34 meters -> write out the order of magnitude. Just meters could be 1000s!
L51 “can be” is a bit hedgy. I think this would be stronger to state what aspects of snowmodel result in it being computationally expensive — physically based, 2 layer snow model with energy balance with lateral transport.
L71 dimensional?
L89 “properties” rather, these are states and fluxes
L91 This is unclear — is parallel input the only thing holding it back?
L104 missing closing ]
L131 The 23-24 period is unclear. It is perhaps made more clear in the results section, but my notes here were asking if this was the sim period or just a subset of the full year extracted? If the former, what are the initial conditions?
L147 “we hope to” I would be more firm in “we show” or similar
L166 “CAF syntax…” not clear that this ads much — other aspects of Fortran syntax are not noted. Is this just for algorithm readability later on? If the authors keep this, I suggest tightening this section as much as possible
L195 Throughout, “process’s” should be “process’” as per -> “possessive of a plural noun is formed by adding only an apostrophe when the noun ends in _s_”
L199 I know that HX has been defined by here, but I’d forgotten what this was and I would suggest considering writing it out again. Or just keep writing it out.
L200 “images” -> processes
L202 “some CAF implementations” Which ones? Why not just not support them / avoid them?
L215 is this spatially variable? if not, how do you select a representative value for something like CONUS domain?
L221 I would suggest using monospace fonts instead of italics to refer to algorithm variables
L230 I would clearly note it’s slow because of the comms overhead + mem transfer
L260 What happens if there is a wind direction discontinuity between the HX boundaries?
L285 Why maintain the serial portion if it makes the parallel code less optimal?
L286 Reading past here I think I figured out centralized, but it’s not super clear. My notes at this point were confused. The coorindation of all the processes working on this is not very clear to me and would benefit from a description.
L297 I’m not sure describing the non-parallel ASCII files is worth while. Why not simply state it needs binary files?
L300 process’
L315 How slow/bottleneck/compute intensive is this step?
L333 this needs code coverage to convince the reader that the compute intensive code paths have been stressed such that these are representative results.
L364 I struggled with this section to understand what code version was what, how it was related to the final code, how different it was, etc. Suggest cutting or at a minimum tighten significantly. I would also move the methodology descriptions into the methodology section.
L386 what are these different code versions?
L390 same code coverage criticism here
L431 “SWE-melt” suggest “ablation”
L432 Good to validate in the future, but as noted above the results as presented do not look right for mid winter across the northern US and especially Canada
L465 I believe this is over-stated
L526 Why are these scripts not available? It should be included so-as to make the experiments reproducible. Where can one obtain the input met forcing?Citation: https://doi.org/10.5194/egusphere-2023-1612-RC1 - AC3: 'Reply on RC1', Ross Mower, 11 Nov 2023
-
RC2: 'Comment on egusphere-2023-1612', Chen Zhang, 03 Sep 2023
In general, the manuscript is well written with clear objectives, meticulous methods, and results. The study introduced a novel parallelization method to accelerate the SnowModel and apply it to simulations on a larger scale, which carries significant scientific significance. However, I am concerned that the scientific reproducibility and presentation quality of this manuscript should be improved before any publication with standards expected for GMD. Below, I will provide detailed comments on each section:
Section 2: While it briefly introduced SnowModel and the authors' motivation for parallelization, I suggest separating the introduction to SnowModel into its own section and incorporating schematic diagrams of the model's structure. These diagrams would assist readers in understanding the parallelization strategies discussed in Section 3.3, and the "Parallelization Motivation" could be a subsection within Section 2.2.
Section 3: This section provides a wealth of code examples and diagrams that effectively elucidate the parallelization methods. The readers with some programming background can easily grasp the details of the parallelization techniques. However, the Section 3 delves excessively into minutiae, potentially causing readers to become lost in the details. Consider shortening this section, focusing on key aspects.
Section 4: The results presented in this section are somewhat confusing, raising concerns about the scientific quality and reproducibility of the study. Firstly, there is an overabundance of content related to model setup and evaluation metrics, which should not be presented as results. Furthermore, compared to Section 4.2, Sections 4.1 and 4.3 provide insufficient results, with a suspicion of excessive elaboration to magnify their importance.
In Section 4.1, the description of the model setup occupies a disproportionate amount of space. The data provided to support validation conclusions are overly simplistic, such as "All variables across all processes produced RMSE values of 10-6" (Lines 341-342). I would like to see more detailed model comparisons, preferably presented in graphical form. Otherwise, consider merging this section with others.
In Section 4.2, the authors present code profiling and speedup plots for three different stages, but I couldn't discern specific differences between "Distributed High Sync" and "Distributed Low Sync." I attempted to find an explanation in Section 3.4 but failed. Without a more detailed explanation, readers will struggle to understand the scientific significance of these results. For instance, it would be helpful to clarify what code optimizations improved process communication and reduced wait times.
Section 4.3 displays spatial results and time series of SWE, but it lacks information on how other snow properties performed. To convincingly demonstrate that Parallel SnowModel successfully simulates distributed snow over CONUS, it is essential to provide additional output results for different variables.
Section 6: This section extensively references the work of others and highlights the relevance of this study to their work. However, I believe this content would be better placed within the Discussion section. The Conclusions section should provide a comprehensive summary of the study's work and results, offer conclusive remarks, and state the research's significance without excessive referencing.
In conclusion, the manuscript requires further improvement to meet the publication requirements of the journal, particularly regarding scientific quality and presentation quality. I therefore conclude with a major revision and hope that the revised manuscript will address the above-mentioned issues.
Citation: https://doi.org/10.5194/egusphere-2023-1612-RC2 - AC4: 'Reply on RC2', Ross Mower, 11 Nov 2023
-
AC1: 'Author Comments-2023-1612', Ross Mower, 17 Oct 2023
Please find the attached Author Comments responding to the Referee Comments previously posted.
- AC2: 'Reply on AC1', Ross Mower, 18 Oct 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1612', Anonymous Referee #1, 26 Aug 2023
This is a review of “Parallel SnowModel (v1.0): a parallel implementation of a Distributed Snow-Evolution Modeling System (SnowModel)” where the authors describe imporovements made to Snowmodel by including a distributed parallel scheme. These improvements allow for running over large spatial extents and for longer temporal periods.
Broadly, I like what the authors have done to use CAF to bring parallelism to an existing code. Hydrology has long lacked HPC-aware models and the science has been poorer for it.
However, I struggle to follow aspects of this manuscript and I do not find the scaling results convincing. Finally, the lack of any validation against observations gives me pause, especially given the SWE results shown are almost certainly wrong for large portions of the domain in Figure 11. I will detail these concerns below.
First, the introduction essentially fails to cite any of the European or Canadian literature on snow dynamics, blowing snow, and the existing model developments that have been made. Notable contributions from Mott, Durand, Lehning, Vionnet, Marsh, Pomeroy, Musselman, MacDonald, Morin, Fang, and Essery to name but a few are all missing and would provide valuable context to the Liston, et al modelling efforts.
Secondly, I find the mixing of methods and results to be very confusing. This is exceptionally bad in the Parallel Performance (S4.2) section where multiple code revisions are described. It is not at all clear where the different ‘Distributed high Sync’, etc are coming from. In some of these, the results presented are trivial — of course one would expect increased synchronization across more processes to incure scaling limitations. It is not clear if the SnowTran-3D plateau at 36-processes is the final code, or was a WIP code. I get the impression the authors are attempting to convey their profiling journey to optimize their code, but a) a general audience likely is not interested in all the specifics and b) it’s confusing laid out leaving an interested reader muddled. For example on line 386, it is unclear /what/ versions of the code were even used. This section strikes me as the crux of the results, and is therefore an important section. However, I struggled to make my way through it. I would strongly sugges the authors split the methodology out and well describe what was profiled, etc and how this shaped the CAF implimentation. And then in the results, clearly and simply show “it is faster by XYZ for domains PQR”.
In addition, the 16 timesteps are really not compelling as currently presented. I am sympathetic to the computational constraints. However, without code coverage, is there any guarantee that the code was tested in a representative manner? For example, if there were few or no melt / blowing snow (or if there was any snow!) then the results would not be typical of a run. This criticism exists for the 1 month serial v. distributed period (L333) as well. Is this a representative period of time viz a viz excercising the toughest numerical code paths (e.g., blowing snow and multilayer snowpacts, canopy interception) and highest sync code paths?
My read is that Figure 10 is the “final” code that is evaluated for scaling testing. My following comments are through this lens.
I do not find Figure 10 convincing of strong scaling. I would expect PNW to be the most difficult to simulate region with deep snow covers, and many blowing snow events. It performs weakly, with essentially plateaued scaling at 750 processes. As more non-blowing snow (and non-snow) cells are added in the CONUS domain, the scaling increases (shown in Figure 11). Essentially my read is the more non-snow cells that are added, the better the scaling. This is not a strong scaling result. Rephrased, over domains with significant snow processes, the scaling is poor.
The simulated SWE results presented in Figure 11 are suspect. This is total SWE on the ground in Feb, correct? In the middle of the winter (Feb) there is snow covering much of Canada — the foothills of Alberta, the Priaries of AB, SK, MB, and the Boreal forests of AB, SK, and MB. In the simulation results shown in Figure 11a, the domain east of continental divid, including the eastern Rockies, is shown as having zero SWE. This is almost certainly not correct. The authors note that an evaluation of the SWE data will be done at a later point, but if this number of no-op grid cells are being used for the scaling evaluation, then the scaling evaluation is not representative of a real winter simulation.
Figure 11e shows the erosion and then deposition across a ridgeline. However, in most mountain regions, this deposited snow will avalanche to a lower elevation. Given there is no avalanche model in this code and no avalanche literature is cited, these results are not compelling. Perhaps this is a ridgeline that doesn’t have avalanches. But this needs to be noted if true.
In conclusion, I like that the authors are describing making the code HPC-aware by using CAF with a simple halo exchange. I think there is value in showing the community that “legacy” models can be updated and that it is “not that hard.” Such messaging has the potential to help normalize HPC-aware code development. However, the scaling results seem to show significant limitations in the scaling and the better CONUS scaling is almost certainly due to not simulating snow (in places erroneously). As a result I feel that the authors have over-stated their results that the model has strong-scaling and scales efficiently. I am also concerned that the model is not producing reasonable SWE.
Specific comments follow:
L9 100’s -> 100s (not possessive)
L21 1800 cores contradicts 2304 listed above?
L34 meters -> write out the order of magnitude. Just meters could be 1000s!
L51 “can be” is a bit hedgy. I think this would be stronger to state what aspects of snowmodel result in it being computationally expensive — physically based, 2 layer snow model with energy balance with lateral transport.
L71 dimensional?
L89 “properties” rather, these are states and fluxes
L91 This is unclear — is parallel input the only thing holding it back?
L104 missing closing ]
L131 The 23-24 period is unclear. It is perhaps made more clear in the results section, but my notes here were asking if this was the sim period or just a subset of the full year extracted? If the former, what are the initial conditions?
L147 “we hope to” I would be more firm in “we show” or similar
L166 “CAF syntax…” not clear that this ads much — other aspects of Fortran syntax are not noted. Is this just for algorithm readability later on? If the authors keep this, I suggest tightening this section as much as possible
L195 Throughout, “process’s” should be “process’” as per -> “possessive of a plural noun is formed by adding only an apostrophe when the noun ends in _s_”
L199 I know that HX has been defined by here, but I’d forgotten what this was and I would suggest considering writing it out again. Or just keep writing it out.
L200 “images” -> processes
L202 “some CAF implementations” Which ones? Why not just not support them / avoid them?
L215 is this spatially variable? if not, how do you select a representative value for something like CONUS domain?
L221 I would suggest using monospace fonts instead of italics to refer to algorithm variables
L230 I would clearly note it’s slow because of the comms overhead + mem transfer
L260 What happens if there is a wind direction discontinuity between the HX boundaries?
L285 Why maintain the serial portion if it makes the parallel code less optimal?
L286 Reading past here I think I figured out centralized, but it’s not super clear. My notes at this point were confused. The coorindation of all the processes working on this is not very clear to me and would benefit from a description.
L297 I’m not sure describing the non-parallel ASCII files is worth while. Why not simply state it needs binary files?
L300 process’
L315 How slow/bottleneck/compute intensive is this step?
L333 this needs code coverage to convince the reader that the compute intensive code paths have been stressed such that these are representative results.
L364 I struggled with this section to understand what code version was what, how it was related to the final code, how different it was, etc. Suggest cutting or at a minimum tighten significantly. I would also move the methodology descriptions into the methodology section.
L386 what are these different code versions?
L390 same code coverage criticism here
L431 “SWE-melt” suggest “ablation”
L432 Good to validate in the future, but as noted above the results as presented do not look right for mid winter across the northern US and especially Canada
L465 I believe this is over-stated
L526 Why are these scripts not available? It should be included so-as to make the experiments reproducible. Where can one obtain the input met forcing?Citation: https://doi.org/10.5194/egusphere-2023-1612-RC1 - AC3: 'Reply on RC1', Ross Mower, 11 Nov 2023
-
RC2: 'Comment on egusphere-2023-1612', Chen Zhang, 03 Sep 2023
In general, the manuscript is well written with clear objectives, meticulous methods, and results. The study introduced a novel parallelization method to accelerate the SnowModel and apply it to simulations on a larger scale, which carries significant scientific significance. However, I am concerned that the scientific reproducibility and presentation quality of this manuscript should be improved before any publication with standards expected for GMD. Below, I will provide detailed comments on each section:
Section 2: While it briefly introduced SnowModel and the authors' motivation for parallelization, I suggest separating the introduction to SnowModel into its own section and incorporating schematic diagrams of the model's structure. These diagrams would assist readers in understanding the parallelization strategies discussed in Section 3.3, and the "Parallelization Motivation" could be a subsection within Section 2.2.
Section 3: This section provides a wealth of code examples and diagrams that effectively elucidate the parallelization methods. The readers with some programming background can easily grasp the details of the parallelization techniques. However, the Section 3 delves excessively into minutiae, potentially causing readers to become lost in the details. Consider shortening this section, focusing on key aspects.
Section 4: The results presented in this section are somewhat confusing, raising concerns about the scientific quality and reproducibility of the study. Firstly, there is an overabundance of content related to model setup and evaluation metrics, which should not be presented as results. Furthermore, compared to Section 4.2, Sections 4.1 and 4.3 provide insufficient results, with a suspicion of excessive elaboration to magnify their importance.
In Section 4.1, the description of the model setup occupies a disproportionate amount of space. The data provided to support validation conclusions are overly simplistic, such as "All variables across all processes produced RMSE values of 10-6" (Lines 341-342). I would like to see more detailed model comparisons, preferably presented in graphical form. Otherwise, consider merging this section with others.
In Section 4.2, the authors present code profiling and speedup plots for three different stages, but I couldn't discern specific differences between "Distributed High Sync" and "Distributed Low Sync." I attempted to find an explanation in Section 3.4 but failed. Without a more detailed explanation, readers will struggle to understand the scientific significance of these results. For instance, it would be helpful to clarify what code optimizations improved process communication and reduced wait times.
Section 4.3 displays spatial results and time series of SWE, but it lacks information on how other snow properties performed. To convincingly demonstrate that Parallel SnowModel successfully simulates distributed snow over CONUS, it is essential to provide additional output results for different variables.
Section 6: This section extensively references the work of others and highlights the relevance of this study to their work. However, I believe this content would be better placed within the Discussion section. The Conclusions section should provide a comprehensive summary of the study's work and results, offer conclusive remarks, and state the research's significance without excessive referencing.
In conclusion, the manuscript requires further improvement to meet the publication requirements of the journal, particularly regarding scientific quality and presentation quality. I therefore conclude with a major revision and hope that the revised manuscript will address the above-mentioned issues.
Citation: https://doi.org/10.5194/egusphere-2023-1612-RC2 - AC4: 'Reply on RC2', Ross Mower, 11 Nov 2023
-
AC1: 'Author Comments-2023-1612', Ross Mower, 17 Oct 2023
Please find the attached Author Comments responding to the Referee Comments previously posted.
- AC2: 'Reply on AC1', Ross Mower, 18 Oct 2023
Peer review completion
Journal article(s) based on this preprint
Data sets
Parallel-SnowModel-1.0 Ross Mower, Ethan Gutmann, and Glen Liston https://github.com/NCAR/Parallel-SnowModel-1.0
Model code and software
Parallel-SnowModel-1.0 Ross Mower, Ethan Gutmann, and Glen Liston https://github.com/NCAR/Parallel-SnowModel-1.0
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
384 | 197 | 26 | 607 | 15 | 17 |
- HTML: 384
- PDF: 197
- XML: 26
- Total: 607
- BibTeX: 15
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Ross Mower
Ethan D. Gutmann
Jessica Lundquist
Glen E. Liston
Soren Rasmussen
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(3137 KB) - Metadata XML