the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
WaterSoftHack Cybertraining: Reproducible Data Science, Machine Learning, and Cloud and Edge Computing Training for Collaborative Water Science Research
Abstract. The growing complexity and volume of data in water science demand advanced computational skills among researchers, yet significant barriers limit rapid skill acquisition. We present WaterSoftHack, a two-week cybertraining program designed to equip students, early-career professionals, and researchers with reproducible data science, machine learning, and cloud/edge computing skills for water science applications. The program combines open-access training resources, including the WaterSoft Python package, with cohort-based capstone projects. A rigorous selection process of interested candidates ensures diverse participation across academic levels, institutions, and backgrounds. The training integrates immersive, hands-on instruction with formal science communication, emphasizing reproducibility, scalability, and teamwork. Drawing on surveys and qualitative interviews from the first two years, we demonstrate notable skill advancement, collaborative synergy, and career advancement outcomes. WaterSoftHack highlights the importance of project-based, integrated cybertraining in building computational capacity and preparing a diverse, capable workforce for the data-driven future of water science and engineering.
- Preprint
(2052 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 26 May 2026)
- RC1: 'Comment on egusphere-2026-491', Anonymous Referee #1, 20 Apr 2026 reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 54 | 16 | 5 | 75 | 4 | 4 |
- HTML: 54
- PDF: 16
- XML: 5
- Total: 75
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is a nice summary of an innovative educational program that is bringing new skills in data science, machine learning, and cloud computing to hydrologists. Overall, I like the piece and I think it is a nice contribution. I have a few major comments and then several minor comments that I hope will improve the readability of the manuscript.
-First, I ask the authors to consider revising the sequencing of Section 4. Right now, this is a mixture of materials that the authors have developed and participant data. The ordering was somewhat confusing, and could be modified to separate (1) general materials/approach from (2) participant data and outcomes. More specifically, I’m unsure what the point of section 4.3 is – which materials were created by the authors and which already existed? I had trouble following, and any clarifications would be appreciated. There’s also a mix of past/current/future in this section that seems confusing. Could the survey described in Section 4.2 be shared? (Is the information in Section 4.2 necessary to the main text or should it be moved to supporting information?) I defer to the authors on this, but I think there might be some ways to move things around a bit considering the audience they are writing this for to ensure it is as easy to follow as possible. There’s a lot of information in this section, and anything the authors can do to organize this is appreciated. Also, consider including a timeline figure that visually displays the information contained in this section and how a typical training “flowed” (and when participants were surveyed, etc). That said, I defer to the authors on this sequencing, and appreciate anything they can do to further clarify the flow of information in this section.
-The one piece that seemed to be missing, given this is an educational manuscript, were the intended learning objectives for the trainings. If the authors used learning objectives, please consider explaining/stating these, as much of Section 4 would describe how these are met.
-This appears in minor comments, but there are many figures and tables that I think could be easily combined, to centrally locate information for both years in a single figure or table for easier interpretability
-The Discussion and Conclusions section at the end feels long and somewhat unorganized. Consider renaming Results to Results and Discussion, and including a reflection or lessons learned section here. What are future opportunities in this space, beyond just WaterSoftHack? I’d love to see the authors push beyond recommendations for their own program to some broader recommendations for others, and consider citing some literature in support of this in this section. Furthermore, what materials can others draw from that have been created from this intiative?
Minor comments:
Lines 23-24: “The growing complexity and volume of data in water science demand advanced computational skills among researchers, yet significant barriers limit rapid skill acquisition”. I would say this isn’t just a need among researchers – could this be broadened? Could you be more specific about rapid skill acquisition – for instance, I think you mean skill acquisition to use/analyze/interpret water data. Consider rephrasing this sentence if you can to be just a bit more hard-hitting.
Line 24: A transition here would be great “To address this need …”
Line 31: “the first two years” – of what?
Line 61: The start of this paragraph seems like it is missing out on introducing the concept of point observations before jumping to satellite observations
Line 61: I’m unsure what ‘hydrologically relevant’ means here – would just ‘hydrological data’ work?; ‘transformed rapidly’ – over what period? Would be good to bring in some of the historical literature on hydrologic observations here.
Line 76-77: I’m unsure if this is the case. Are there citations to back this up? For instance, I agree that an introductory hydrology course focuses on process theory, but many institutions have more than one hydrology course (though I don’t think there’s data for this, unfortunately; I would expect this is true at R1s, maybe true at R2s, but not the case at PUIs). Instead, you may consider framing this that if students take an introductory hydrology course, it is focused on these topics. However, many students probably do not take additional hydrology courses, and so miss out on this additional training. (I think we have to be careful to expect too much from a single introductory course, which is why I raise this.) I do think there may be a recently published study on hydroinformatics courses in the US, which the authors may want to review and reference in this section.
Line 96-97: I would say they don’t risk being underprepared – it is more likely that they will learn this on the job (or as MS and PhD students, at institutions that either provide training in courses or via research projects). Instead, consider reframing as a positive: Training of this kind will prepare the next generation of hydrologists to work for these agencies and with these large datasets.
Line 100-105: This section, after the first sentence, feels as though it is the opinions of the authors and so should perhaps appear later in this manuscript (e.g., Section 2)
Line 129-130: Again, this feels out of place as though it should appear later in the manuscript (e.g., Section 2)
Line 184: Is this meant to be ‘Google Colab’?
Line 187: Could you provide a brief introduction to describe what is contained in the next two paragraphs?
Line 233: Should this just be civil engineering, water resources engineering? I’m unsure why the dash is included.
Line 235: To what field? Data science or hydrology? Or both?
Section 4.2: Is the survey included in Supporting Information? Was the survey pre-populated or open ended?
Line 283: This feels like a jump from the previous paragraph. Could you specify how the survey informed the information explained in this paragraph?
Line 578-589: Please rephrase this sentence – I think it is grammatically incorrect
Should Section 4 be Results and Discussion? It does feel as though there is some discussion in each of the sub-sections.
I’d encourage the authors to revisit the section titles and revise them to ensure they are representative of what is written in the section – a few seem very specific and only pertain to part of the text within a given section
Line 688 on: the tense seems off in this section – please revise as needed
Recommendations for figures and tables:
-Figure 1 and Figure 2 look quite distorted. Could this be corrected? Consider combining Figures 1 and 2 to show the same observations for 2024 vs 2025 in individual subplots. Color could be used to indicate year.
-Figure 3 and 4: consider combining into one figure and making an A and B. This will ensure this information is grouped together in the publication
-Consider combining Tables 2 and 3 or into a single figure as A and B
-Figure 10: while this does look cool, I’m not sure it is needed