Preprints
https://doi.org/10.5194/egusphere-2025-5786
https://doi.org/10.5194/egusphere-2025-5786
07 Jan 2026
 | 07 Jan 2026
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

HARBOR - Harmonized Attributes for River Basins in One Repo: Collated River Basin Data from Multiple Collections with a Software Toolkit

Scott Peckham, Keith Jennings, Wanru Wu, Andy Wood, and Lauren Bolotin

Abstract. In the US, several different federal agencies (e.g., the USGS, NOAA, USDA, EPA, and NSF) collect information that has been or continues to be measured for river basins in support of their water-related missions and goals. This information is published online in named data collections, and each data collection has its own set of attributes and objectives. A given basin often has multiple agency IDs and may appear in multiple collections, so there is overlap between them. These  collections represent a significant investment of time and money and are a critically important resource for hydrologic modeling and monitoring, whether used operationally or for research. Unfortunately, there is significant heterogeneity across these collections, both in terms of the data they provide but also in terms of how they can be found and effectively accessed. It is also not uncommon for them to contain missing data or errors. Driven by the need to identify the most performant hydrologic model for any given river basin in the US from a collection of available models, the HARBOR project has two key goals. The first is to harmonize and bring together these datasets and associated resources in one place — just as many large cargo ships can be moored in the same harbor — which helps to increase awareness of them while also making it much easier to find, access, and use them. The second is to classify river basins into hydrologically similar groups, since if two river basins are hydrologically similar then it is likely that the same model in a collection will be most performant for both of them. To achieve these goals, a set of Python modules were created, one for each dataset, to augment, clean, and extract information from  them. Four different river basin classification methods were applied, given sufficient data, including the Hydrologic Landscape Region (HLR) method, the more process-based Seasonal Water Balance (SWB) method, a simple hydrograph-based method based on modeling with the National Water Model, and the method of using the 12 aggregated ecoregions that were used for the GAGES-II dataset. In order to address shortcomings in the SWB method, we also developed an Extended SWB method and applied it to the 9067 GAGES-II basins in CONUS.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Scott Peckham, Keith Jennings, Wanru Wu, Andy Wood, and Lauren Bolotin

Status: open (until 18 Feb 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Scott Peckham, Keith Jennings, Wanru Wu, Andy Wood, and Lauren Bolotin

Data sets

The HARBOR Data Collection Repository on GitHub Scott Dale Peckham https://github.com/peckhams/nextgen_basin_repo

Scott Peckham, Keith Jennings, Wanru Wu, Andy Wood, and Lauren Bolotin
Metrics will be available soon.
Latest update: 07 Jan 2026
Download
Short summary
Several US federal agencies (e.g., USGS, NOAA, USDA, and NSF) collect information for river basins to support their water-related missions. Data is published online in named collection  that each have their own attributes and objectives.  HARBOR harmonizes and brings together all these datasets, just as many large cargo ships can be moored in one harbor. It also classifies basins based on hydrologic similarity, helping researchers find the best model for predicting their hydrologic response.
Share