Toward merging MOPEX and CAMELS hydrometeorological datasets: compatibility and statistical comparison

Sink, Katharine Owen; Brikowski, Tom

doi:10.5194/egusphere-2024-4182

Preprints

https://doi.org/10.5194/egusphere-2024-4182

Preprints

30 Jan 2025

| 30 Jan 2025

Toward merging MOPEX and CAMELS hydrometeorological datasets: compatibility and statistical comparison

Katharine Owen Sink and Tom Brikowski

Abstract. This study compares two large hydrometeorological datasets, the Model Parameter Estimation Experiment (MOPEX), and the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS), focusing on 47 shared watersheds within the continental United States. The evaluation spans daily, monthly, seasonal, and annual scales for the overlapping water years of 1981 to 2000. Spatial aggregations are conducted based on Köppen-Geiger climate regions along with annual Budyko evaporative and aridity indices. Results indicate significant differences between the datasets at daily timesteps, highlighting the challenge of high temporal resolution data reconciliation; however, compatibility markedly improves with temporal aggregation at monthly, seasonal, and annual scales. While MOPEX shows a warm bias for temperature and CAMELS shows a wet bias for precipitation, statistical analyses demonstrate that both datasets are representative of climatic conditions and extreme events. Our findings validate the results of previous research employing either dataset. Furthermore, this study serves as a foundation for the merging and extension of MOPEX and CAMELS datasets.

Received: 29 Dec 2024 – Discussion started: 30 Jan 2025

Competing interests: The contact author has declared that neither of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1641 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1641 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

01 Sep 2025

Toward merging MOPEX and CAMELS hydrometeorological datasets: compatibility and statistical comparison

Katharine Sink and Tom Brikowski

Hydrol. Earth Syst. Sci., 29, 4015–4054, https://doi.org/10.5194/hess-29-4015-2025,https://doi.org/10.5194/hess-29-4015-2025, 2025

Short summary

Katharine Owen Sink and Tom Brikowski

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-4182', Anonymous Referee #1, 17 Feb 2025

Overall, this is an interesting study comparing two commonly utilized catchment data sources. The analysis for the continental US (CONUS) appears to demonstrate statistically significant differences in the aggregate regarding temperature and precipitation. The differences shown are important, however greater attention to explaining the differences, and their significance would significantly improve the manuscript. The use of machine learning is not clearly articulated in the work and its significance is not yet clear. Greater attention should be paid to discussing the impacts of these differences on future modeling efforts as well.
The manuscript would also benefit from a clear statement of the goals of the research, i.e. is the goal to show that the two data sets are equivalent and therefore can be merged? Or is to identify where the two data sets differ and to explain why they are different, with the goal of adjusting one, or the other to allow merging? See line 45 for the first time this is made clear in the text. I would suggest clearly stating this in the abstract as well
Line 103. How will this study address “uncertainties within the data sets? This is an unclear statement
Lines 140-150. This is a confusing paragraph for those not intimately familiar with either data set. You state there are large discrepancies between the CAMEL SAC model ET and CAMEL-WB. Why is this important when comparing CAMELS to MOPEX, the goal of this work? Please expand this section and make it clear why these differences in ET with CAMELS is important to the goal of this work.
Tables 4 and 5: These tables need far more explaining. The text indicates that they are internal variability of the two data sets, yet in each case, only a single mean is presented. The text is unclear as the tables do not provide the reader with any form of comparison here. The text indicates “within” the data sets, but the tables appear to provide “between” the data sets. Please expand section 4.1 to be clearer here.
Line 272. Does the fact that averaging over greater temporal scales reduce the dispersion a major finding here? it would seem like this would be an expected result?.
Line 323. It’s not surprising that the variation in arid region precipitation is greater but what does “ remain the most consistent” in the text mean? Consistent between data sets? Please be specific.
Line 375: Some discussion of why these differences exist would be valuable here. . A bit of speculation will be helpful and appropriate.
Line 630, Section 4.4 It is not fully apparent why machine learning validation was undertaking for this work and how it helps in the analysis. Please justify its use in more clarity.

Citation: https://doi.org/10.5194/egusphere-2024-4182-RC1
- AC1: 'Reply on RC1', Katharine Sink, 10 Mar 2025
  
  Thank you for your time and suggestions. Please see the attached pdf document for our responses to each comment.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4182-AC1
RC2:
'Comment on egusphere-2024-4182', Anonymous Referee #2, 22 Apr 2025

Summary

This manuscript presents a detailed comparison between two widely used streamflow and meteorological datasets for the continental United States, MOPEX and CAMELS, investigating their consistency and discrepancies from daily to annual scales. The study is based on a carefully designed statistical analysis and is relevant to the hydrological modeling and large-sample hydrology communities. The work is rigorous, and the results are clearly communicated and well discussed. I have a few remarks and suggestions for improvement that the authors might find useful.
Specific comments
- In the abstract and elsewhere, the term ‘bias’ is used to describe the differences between MOPEX and CAMELS. Since bias is typically defined with respect to a reference or ground truth, it would be helpful to clarify that this refers to relative bias (i.e., systematic differences between datasets), rather than absolute error. While this becomes clearer within the manuscript, the abstract might mislead readers into thinking that MOPEX is definitively too warm or CAMELS too wet.
-The manuscript could benefit from a more in-depth discussion of which dataset may be more reliable under certain conditions. Lines 685–687 touch upon this subject but could be expanded. For instance, CAMELS uses Daymet meteorological forcing, which could be potentially considered more reliable for regional hydrological analyses. However, its evapotranspiration values are derived from the SAC-SMA hydrologic model and, as the authors show, can exhibit implausible behavior. These trade-offs, i.e., between modern gridded meteorological inputs and model-based ET estimates, deserve a more explicit discussion to help guide dataset selection for different hydrological applications.
-Line 725: Please provide a citation for the NCDC COOP and SNOTEL datasets used in MOPEX. Additionally, a brief explanation of the nature of these data sources, including their observational basis and common sources of uncertainty, would help readers better understand the reliability and limitations of the meteorological data used in these databases.
-Figure 2: Could the authors clarify the meaning of the blue color in the map? It's not evident from the caption or figure description.
-Section 3.2.2: Please include references for all the statistical tests used (e.g., Fligner-Killeen test, Welch’s t-test).

Citation: https://doi.org/10.5194/egusphere-2024-4182-RC2
- AC2: 'Reply on RC2', Katharine Sink, 24 Apr 2025
  
  Thank you for your time and feedback on our manuscript. We appreciate your suggestions. Please refer to the attached pdf for our responses.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4182-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-4182', Anonymous Referee #1, 17 Feb 2025

Overall, this is an interesting study comparing two commonly utilized catchment data sources. The analysis for the continental US (CONUS) appears to demonstrate statistically significant differences in the aggregate regarding temperature and precipitation. The differences shown are important, however greater attention to explaining the differences, and their significance would significantly improve the manuscript. The use of machine learning is not clearly articulated in the work and its significance is not yet clear. Greater attention should be paid to discussing the impacts of these differences on future modeling efforts as well.
The manuscript would also benefit from a clear statement of the goals of the research, i.e. is the goal to show that the two data sets are equivalent and therefore can be merged? Or is to identify where the two data sets differ and to explain why they are different, with the goal of adjusting one, or the other to allow merging? See line 45 for the first time this is made clear in the text. I would suggest clearly stating this in the abstract as well
Line 103. How will this study address “uncertainties within the data sets? This is an unclear statement
Lines 140-150. This is a confusing paragraph for those not intimately familiar with either data set. You state there are large discrepancies between the CAMEL SAC model ET and CAMEL-WB. Why is this important when comparing CAMELS to MOPEX, the goal of this work? Please expand this section and make it clear why these differences in ET with CAMELS is important to the goal of this work.
Tables 4 and 5: These tables need far more explaining. The text indicates that they are internal variability of the two data sets, yet in each case, only a single mean is presented. The text is unclear as the tables do not provide the reader with any form of comparison here. The text indicates “within” the data sets, but the tables appear to provide “between” the data sets. Please expand section 4.1 to be clearer here.
Line 272. Does the fact that averaging over greater temporal scales reduce the dispersion a major finding here? it would seem like this would be an expected result?.
Line 323. It’s not surprising that the variation in arid region precipitation is greater but what does “ remain the most consistent” in the text mean? Consistent between data sets? Please be specific.
Line 375: Some discussion of why these differences exist would be valuable here. . A bit of speculation will be helpful and appropriate.
Line 630, Section 4.4 It is not fully apparent why machine learning validation was undertaking for this work and how it helps in the analysis. Please justify its use in more clarity.

Citation: https://doi.org/10.5194/egusphere-2024-4182-RC1
- AC1: 'Reply on RC1', Katharine Sink, 10 Mar 2025
  
  Thank you for your time and suggestions. Please see the attached pdf document for our responses to each comment.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4182-AC1
RC2:
'Comment on egusphere-2024-4182', Anonymous Referee #2, 22 Apr 2025

Summary

This manuscript presents a detailed comparison between two widely used streamflow and meteorological datasets for the continental United States, MOPEX and CAMELS, investigating their consistency and discrepancies from daily to annual scales. The study is based on a carefully designed statistical analysis and is relevant to the hydrological modeling and large-sample hydrology communities. The work is rigorous, and the results are clearly communicated and well discussed. I have a few remarks and suggestions for improvement that the authors might find useful.
Specific comments
- In the abstract and elsewhere, the term ‘bias’ is used to describe the differences between MOPEX and CAMELS. Since bias is typically defined with respect to a reference or ground truth, it would be helpful to clarify that this refers to relative bias (i.e., systematic differences between datasets), rather than absolute error. While this becomes clearer within the manuscript, the abstract might mislead readers into thinking that MOPEX is definitively too warm or CAMELS too wet.
-The manuscript could benefit from a more in-depth discussion of which dataset may be more reliable under certain conditions. Lines 685–687 touch upon this subject but could be expanded. For instance, CAMELS uses Daymet meteorological forcing, which could be potentially considered more reliable for regional hydrological analyses. However, its evapotranspiration values are derived from the SAC-SMA hydrologic model and, as the authors show, can exhibit implausible behavior. These trade-offs, i.e., between modern gridded meteorological inputs and model-based ET estimates, deserve a more explicit discussion to help guide dataset selection for different hydrological applications.
-Line 725: Please provide a citation for the NCDC COOP and SNOTEL datasets used in MOPEX. Additionally, a brief explanation of the nature of these data sources, including their observational basis and common sources of uncertainty, would help readers better understand the reliability and limitations of the meteorological data used in these databases.
-Figure 2: Could the authors clarify the meaning of the blue color in the map? It's not evident from the caption or figure description.
-Section 3.2.2: Please include references for all the statistical tests used (e.g., Fligner-Killeen test, Welch’s t-test).

Citation: https://doi.org/10.5194/egusphere-2024-4182-RC2
- AC2: 'Reply on RC2', Katharine Sink, 24 Apr 2025
  
  Thank you for your time and feedback on our manuscript. We appreciate your suggestions. Please refer to the attached pdf for our responses.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4182-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (28 May 2025) by Lelys Bravo de Guenni

AR by Katharine Sink on behalf of the Authors (29 May 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (02 Jun 2025) by Lelys Bravo de Guenni

RR by Anonymous Referee #2 (23 Jun 2025)

ED: Publish as is (25 Jun 2025) by Lelys Bravo de Guenni

AR by Katharine Sink on behalf of the Authors (30 Jun 2025) Manuscript

Journal article(s) based on this preprint

01 Sep 2025

Toward merging MOPEX and CAMELS hydrometeorological datasets: compatibility and statistical comparison

Katharine Sink and Tom Brikowski

Hydrol. Earth Syst. Sci., 29, 4015–4054, https://doi.org/10.5194/hess-29-4015-2025,https://doi.org/10.5194/hess-29-4015-2025, 2025

Short summary

Katharine Owen Sink and Tom Brikowski

Viewed

Total article views: 2,226 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,620	520	86	2,226	101	159

HTML: 1,620
PDF: 520
XML: 86
Total: 2,226
BibTeX: 101
EndNote: 159

Views and downloads (calculated since 30 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	118	10	4	132
Feb 2025	110	28	4	142
Mar 2025	88	14	4	106
Apr 2025	90	30	10	130
May 2025	38	28	2	68
Jun 2025	34	8	6	48
Jul 2025	32	34	0	66
Aug 2025	266	20	8	294
Sep 2025	402	22	4	428
Oct 2025	44	38	2	84
Nov 2025	58	34	6	98
Dec 2025	48	38	10	96
Jan 2026	54	72	10	136
Feb 2026	68	36	8	112
Mar 2026	28	52	4	84
Apr 2026	20	25	0	45
May 2026	105	16	2	123
Jun 2026	12	4	1	17
Jul 2026	3	10	1	14
Aug 2026	2	1	0	3

Cumulative views and downloads (calculated since 30 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	118	10	4	132
Feb 2025	110	28	4	142
Mar 2025	88	14	4	106
Apr 2025	90	30	10	130
May 2025	38	28	2	68
Jun 2025	34	8	6	48
Jul 2025	32	34	0	66
Aug 2025	266	20	8	294
Sep 2025	402	22	4	428
Oct 2025	44	38	2	84
Nov 2025	58	34	6	98
Dec 2025	48	38	10	96
Jan 2026	54	72	10	136
Feb 2026	68	36	8	112
Mar 2026	28	52	4	84
Apr 2026	20	25	0	45
May 2026	105	16	2	123
Jun 2026	12	4	1	17
Jul 2026	3	10	1	14
Aug 2026	2	1	0	3

Viewed (geographical distribution)

Total article views: 2,222 (including HTML, PDF, and XML) Thereof 2,222 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 02 Aug 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1641 KB)
Metadata XML

Short summary

This study compares two prominent hydrometeorological datasets across 47 shared watersheds in the United States to assess their compatibility, using R programming language. While daily temperature and precipitation data showed notable discrepancies, agreement improved at monthly, seasonal, and annual scales. The findings validate both datasets for previous hydrological studies and justification for merging them into a unified dataset, enhancing water resource management nationwide.


Total:	0
HTML:	0
PDF:	0
XML:	0