HARBOR - Harmonized Attributes for River Basins in One Repo: Collated River Basin Data from Multiple Collections with a Software Toolkit
Abstract. In the US, several different federal agencies (e.g., the USGS, NOAA, USDA, EPA, and NSF) collect information that has been or continues to be measured for river basins in support of their water-related missions and goals. This information is published online in named data collections, and each data collection has its own set of attributes and objectives. A given basin often has multiple agency IDs and may appear in multiple collections, so there is overlap between them. These collections represent a significant investment of time and money and are a critically important resource for hydrologic modeling and monitoring, whether used operationally or for research. Unfortunately, there is significant heterogeneity across these collections, both in terms of the data they provide but also in terms of how they can be found and effectively accessed. It is also not uncommon for them to contain missing data or errors. Driven by the need to identify the most performant hydrologic model for any given river basin in the US from a collection of available models, the HARBOR project has two key goals. The first is to harmonize and bring together these datasets and associated resources in one place — just as many large cargo ships can be moored in the same harbor — which helps to increase awareness of them while also making it much easier to find, access, and use them. The second is to classify river basins into hydrologically similar groups, since if two river basins are hydrologically similar then it is likely that the same model in a collection will be most performant for both of them. To achieve these goals, a set of Python modules were created, one for each dataset, to augment, clean, and extract information from them. Four different river basin classification methods were applied, given sufficient data, including the Hydrologic Landscape Region (HLR) method, the more process-based Seasonal Water Balance (SWB) method, a simple hydrograph-based method based on modeling with the National Water Model, and the method of using the 12 aggregated ecoregions that were used for the GAGES-II dataset. In order to address shortcomings in the SWB method, we also developed an Extended SWB method and applied it to the 9067 GAGES-II basins in CONUS.