the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
WaterSoftHack Cybertraining: Reproducible Data Science, Machine Learning, and Cloud and Edge Computing Training for Collaborative Water Science Research
Abstract. The growing complexity and volume of data in water science demand advanced computational skills among researchers, yet significant barriers limit rapid skill acquisition. We present WaterSoftHack, a two-week cybertraining program designed to equip students, early-career professionals, and researchers with reproducible data science, machine learning, and cloud/edge computing skills for water science applications. The program combines open-access training resources, including the WaterSoft Python package, with cohort-based capstone projects. A rigorous selection process of interested candidates ensures diverse participation across academic levels, institutions, and backgrounds. The training integrates immersive, hands-on instruction with formal science communication, emphasizing reproducibility, scalability, and teamwork. Drawing on surveys and qualitative interviews from the first two years, we demonstrate notable skill advancement, collaborative synergy, and career advancement outcomes. WaterSoftHack highlights the importance of project-based, integrated cybertraining in building computational capacity and preparing a diverse, capable workforce for the data-driven future of water science and engineering.
- Preprint
(2052 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 26 May 2026)
- RC1: 'Comment on egusphere-2026-491', Anonymous Referee #1, 20 Apr 2026 reply
-
RC2: 'Comment on egusphere-2026-491', Anonymous Referee #2, 15 May 2026
reply
General comments: I think the paper offers a convincing case that short, intensive training efforts can be a valuable way to boost the skills of hydrologists at different career stages. I have some comments that I think might help the reader interpret some of the survey results more easily and highlight some of the lessons learned in the project that may help future efforts.
Specific comments:
In the paragraph that is lines 74-80, it may be appropriate to cite the new WRR paper about what is taught in hydrology classes (Kelleher, Christa A., John Patrick Gannon, and Dominick Ciruzzi. "The current state of undergraduate hydrology courses in North America: A path forward." Water Resources Research 62.2 (2026): e2025WR041736.)
In a couple places in the manuscript, large numbers of participants are mentioned (150-300 people, lines 169 and 640) but I didn’t catch how those folks engaged with the program. Even if I just missed this, I think it would be good to make it clearer how those people engaged when those numbers are mentioned.
Figure 2: could you make this one figure with paired bars for each of the trainings? For instance, on the institution plot, for R1 there would be two bars, one for each of the trainings. This would make it much easier to see what kind of variation there was between trainings.
Line 262: I think more details should be given about how the survey was distributed through CUAHSI… newsletter? Was it broader than that?
On the survey: the sample sizes are pretty small. I think it would be good to highlight that and the limitations associated with making conclusions based on that many responses.
Additionally: I’m a little unclear who filled out the survey from the figure captions. Initially I read it as people CUAHSI got to via whatever mechanism was used. But figure 3 says 13 people completing the surrey where figure 4 says “participants completing pre-event survey” so where the respondees people who were going to take the training or just general hydrologists?
Finally, do you have demographic info on the respondents? It would be good to know things like career stage, etc of people who identified these needs.
Figure 8: Similar to the previous par chart, I think if the two trainings were paired in one figure so that for example the “software development” bar had two bars, one for pre and post, it would make them easier to interpret. It would be awesome if Figure 8 and 9 could be combined to compare results between trainings too… but that might be tough.
In the participant experience section: much of the interpretation of what worked well aligns very well with what the science of teaching and learning tells us about how people learn. I think this section could be strengthened by citing some of this literature and highlighting similarities and/or differences
In the last paragraph of section 4.5 you mention the schedule changed quite a bit based on responses. I think a figure or even just description of the full timeline you arrived at for these trainings would be something people trying to do similar things could use! Basically a roadmap for what should happen by when.
One last thing in the discussion and conclusions: Events like yours definitely seem awesome and powerful and it seemed you learned a lot about how to teach these topics. I was wondering at the end: Are any of the lessons you learned relevant to a traditional classroom/semester long course? Can you give any advice for someone trying to add these topics to a course or develop a course on them? I think that could be very valuable for those of us who might not have the resources to run an intensive workshop, but do have the flexibility to teach a new course or adapt an existing one.
Thanks for writing this and developing the resources! I’ll definitely be looking for ways to use them.
Citation: https://doi.org/10.5194/egusphere-2026-491-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 467 | 190 | 37 | 694 | 31 | 38 |
- HTML: 467
- PDF: 190
- XML: 37
- Total: 694
- BibTeX: 31
- EndNote: 38
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is a nice summary of an innovative educational program that is bringing new skills in data science, machine learning, and cloud computing to hydrologists. Overall, I like the piece and I think it is a nice contribution. I have a few major comments and then several minor comments that I hope will improve the readability of the manuscript.
-First, I ask the authors to consider revising the sequencing of Section 4. Right now, this is a mixture of materials that the authors have developed and participant data. The ordering was somewhat confusing, and could be modified to separate (1) general materials/approach from (2) participant data and outcomes. More specifically, I’m unsure what the point of section 4.3 is – which materials were created by the authors and which already existed? I had trouble following, and any clarifications would be appreciated. There’s also a mix of past/current/future in this section that seems confusing. Could the survey described in Section 4.2 be shared? (Is the information in Section 4.2 necessary to the main text or should it be moved to supporting information?) I defer to the authors on this, but I think there might be some ways to move things around a bit considering the audience they are writing this for to ensure it is as easy to follow as possible. There’s a lot of information in this section, and anything the authors can do to organize this is appreciated. Also, consider including a timeline figure that visually displays the information contained in this section and how a typical training “flowed” (and when participants were surveyed, etc). That said, I defer to the authors on this sequencing, and appreciate anything they can do to further clarify the flow of information in this section.
-The one piece that seemed to be missing, given this is an educational manuscript, were the intended learning objectives for the trainings. If the authors used learning objectives, please consider explaining/stating these, as much of Section 4 would describe how these are met.
-This appears in minor comments, but there are many figures and tables that I think could be easily combined, to centrally locate information for both years in a single figure or table for easier interpretability
-The Discussion and Conclusions section at the end feels long and somewhat unorganized. Consider renaming Results to Results and Discussion, and including a reflection or lessons learned section here. What are future opportunities in this space, beyond just WaterSoftHack? I’d love to see the authors push beyond recommendations for their own program to some broader recommendations for others, and consider citing some literature in support of this in this section. Furthermore, what materials can others draw from that have been created from this intiative?
Minor comments:
Lines 23-24: “The growing complexity and volume of data in water science demand advanced computational skills among researchers, yet significant barriers limit rapid skill acquisition”. I would say this isn’t just a need among researchers – could this be broadened? Could you be more specific about rapid skill acquisition – for instance, I think you mean skill acquisition to use/analyze/interpret water data. Consider rephrasing this sentence if you can to be just a bit more hard-hitting.
Line 24: A transition here would be great “To address this need …”
Line 31: “the first two years” – of what?
Line 61: The start of this paragraph seems like it is missing out on introducing the concept of point observations before jumping to satellite observations
Line 61: I’m unsure what ‘hydrologically relevant’ means here – would just ‘hydrological data’ work?; ‘transformed rapidly’ – over what period? Would be good to bring in some of the historical literature on hydrologic observations here.
Line 76-77: I’m unsure if this is the case. Are there citations to back this up? For instance, I agree that an introductory hydrology course focuses on process theory, but many institutions have more than one hydrology course (though I don’t think there’s data for this, unfortunately; I would expect this is true at R1s, maybe true at R2s, but not the case at PUIs). Instead, you may consider framing this that if students take an introductory hydrology course, it is focused on these topics. However, many students probably do not take additional hydrology courses, and so miss out on this additional training. (I think we have to be careful to expect too much from a single introductory course, which is why I raise this.) I do think there may be a recently published study on hydroinformatics courses in the US, which the authors may want to review and reference in this section.
Line 96-97: I would say they don’t risk being underprepared – it is more likely that they will learn this on the job (or as MS and PhD students, at institutions that either provide training in courses or via research projects). Instead, consider reframing as a positive: Training of this kind will prepare the next generation of hydrologists to work for these agencies and with these large datasets.
Line 100-105: This section, after the first sentence, feels as though it is the opinions of the authors and so should perhaps appear later in this manuscript (e.g., Section 2)
Line 129-130: Again, this feels out of place as though it should appear later in the manuscript (e.g., Section 2)
Line 184: Is this meant to be ‘Google Colab’?
Line 187: Could you provide a brief introduction to describe what is contained in the next two paragraphs?
Line 233: Should this just be civil engineering, water resources engineering? I’m unsure why the dash is included.
Line 235: To what field? Data science or hydrology? Or both?
Section 4.2: Is the survey included in Supporting Information? Was the survey pre-populated or open ended?
Line 283: This feels like a jump from the previous paragraph. Could you specify how the survey informed the information explained in this paragraph?
Line 578-589: Please rephrase this sentence – I think it is grammatically incorrect
Should Section 4 be Results and Discussion? It does feel as though there is some discussion in each of the sub-sections.
I’d encourage the authors to revisit the section titles and revise them to ensure they are representative of what is written in the section – a few seem very specific and only pertain to part of the text within a given section
Line 688 on: the tense seems off in this section – please revise as needed
Recommendations for figures and tables:
-Figure 1 and Figure 2 look quite distorted. Could this be corrected? Consider combining Figures 1 and 2 to show the same observations for 2024 vs 2025 in individual subplots. Color could be used to indicate year.
-Figure 3 and 4: consider combining into one figure and making an A and B. This will ensure this information is grouped together in the publication
-Consider combining Tables 2 and 3 or into a single figure as A and B
-Figure 10: while this does look cool, I’m not sure it is needed