WaterSoftHack Cybertraining: Reproducible Data Science, Machine Learning, and Cloud and Edge Computing Training for Collaborative Water Science Research
Abstract. The growing complexity and volume of data in water science demand advanced computational skills among researchers, yet significant barriers limit rapid skill acquisition. We present WaterSoftHack, a two-week cybertraining program designed to equip students, early-career professionals, and researchers with reproducible data science, machine learning, and cloud/edge computing skills for water science applications. The program combines open-access training resources, including the WaterSoft Python package, with cohort-based capstone projects. A rigorous selection process of interested candidates ensures diverse participation across academic levels, institutions, and backgrounds. The training integrates immersive, hands-on instruction with formal science communication, emphasizing reproducibility, scalability, and teamwork. Drawing on surveys and qualitative interviews from the first two years, we demonstrate notable skill advancement, collaborative synergy, and career advancement outcomes. WaterSoftHack highlights the importance of project-based, integrated cybertraining in building computational capacity and preparing a diverse, capable workforce for the data-driven future of water science and engineering.