Ocean Model Analysis and Prediction System version 4.1i (OceanMAPSv4p1i)

Divakaran, Prasanth; Sakov, Pavel; Brassington, Gary B.; Huang, Xinmei

doi:10.5194/egusphere-2026-534

Preprints

https://doi.org/10.5194/egusphere-2026-534

Preprints

29 Apr 2026

| 29 Apr 2026

Ocean Model Analysis and Prediction System version 4.1i (OceanMAPSv4p1i)

Prasanth Divakaran, Pavel Sakov, Gary B. Brassington, and Xinmei Huang

Abstract. The Ocean Model Analysis and Prediction System (OceanMAPS) is a short-range, near-global, eddy-resolving ocean forecasting system developed at the Bureau of Meteorology. OceanMAPS runs daily, producing 7-day forecasts of 3D prognostic fields of ocean currents, temperature, salinity and sea level anomalies (SLA’s). OceanMAPS is based on MOM5 ocean general circulation model and uses EnKF-C software for data assimilation. Consistent with the previous version of OceanMAPS, version v4p1i (OceanMAPSv4p1i), is based on a hybrid Ensemble Kalman Filter with 48 dynamic and 144 static members. However, OceanMAPSv4p1i employs a 1-day analysis cycle in place of the 3-day cycle in OceanMAPSv4p0i. OceanMAPSv4p1i utilises an asynchronous data assimilation of observations, including Sea Surface Temperature (SST; 2-hourly), SLA (12-hourly), and temperature and salinity profiles (daily). OceanMAPSv4p1i produces better performance in forecast skill and mean absolute error scores in Sea Level Anomaly, Sea Surface Temperature and subsurface Temperature. Improvements gained are greater in surface fields, such as sea level anomaly and sea surface temperature, which have less persistence and a greater tendency. A reduction of ~10 % in SST errors and a ~7–8 % reduction in SLA errors is demonstrated in forecast stats. OceanMAPSv4p1i forecasts also better represent mesoscale ocean eddies.

Received: 29 Jan 2026 – Discussion started: 29 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3065 KB)

Supplement (1353 KB)

Download & links

Prasanth Divakaran, Pavel Sakov, Gary B. Brassington, and Xinmei Huang

Interactive discussion

Status: closed

CC1:
'Comment on egusphere-2026-534', Peter Oke, 15 May 2026
Overall assessment
This manuscript presents an update to the Ocean Model Analysis and Prediction System (OceanMAPS v4.1i), describing changes to the data assimilation cycle and evaluating forecast performance relative to the previous version (v4.0i). The transition to a higher-frequency assimilation cycle is a relevant development, and the system itself is of clear importance to the operational oceanography community.
The manuscript contains a substantial amount of detail and presents a broad evaluation of forecast performance. However, several aspects of the methodological description, justification of configuration choices, statistical evaluation, and interpretation require clarification or strengthening. Addressing these issues would significantly improve the clarity, rigour, and interpretability of the study.
A further concern relates to the level of novelty and its justification for publication. The primary change between v4.0i and v4.1i appears to be the adoption of a one-day analysis cycle (with BRT and NRT steps) in place of the previous three-day cycle. While this is an operationally relevant modification, it is not clear yet that this change alone constitutes a sufficiently substantial methodological advance to warrant publication in its current form, particularly given that the reported improvements are mixed. The results show clear gains for SST and SLA, but more limited improvements for temperature and a deterioration (or at best marginal change) in salinity. In its current form, the manuscript reads primarily as a system upgrade report rather than a clear demonstration of a scientific advance. To strengthen the contribution, the authors should more clearly articulate what new understanding is provided, or provide deeper analysis that links the changes in configuration to the observed performance differences.
Recommendation: Major revisions are required. The manuscript describes an important operational system, but improvements are needed in the clarity of the system description, justification of methodological choices, consistency of assumptions, statistical rigour of the evaluation, and balance of interpretation. The novelty of the changes and the strength of the results should also be more clearly justified. Addressing these issues would substantially strengthen the manuscript and clarify its scientific contribution.
Major comments
Clarity and completeness of system description

The description of the data assimilation cycle and forecast system (particularly Section 2.5 and Figure 1) is difficult to follow in its current form. The manuscript combines descriptions of v4.1i and v4.0i in a way that assumes familiarity with the earlier system, which makes it challenging to reconstruct how the new system actually operates.
It would improve clarity to present a clean, self-contained description of v4.1i, followed by a separate comparison with v4.0i.
Several aspects remain unclear throughout Section 2.5, including:
how many analyses are performed per cycle (one? Or one for each ensemble?)

how the 48 ensemble members are used in each step. Is it only for the data assimilation?

the distinction between BRT and NRT analyses

how the “catch-up” runs are defined and why only “selected ensemble members” (L274) are used

the definition and purpose of run001–run004

The terminology used (e.g., “best-estimate”, “catch-up runs”, “batching”) is also ambiguous and sometimes colloquial, making the workflow difficult to follow.
A schematic diagram with a clearly explained caption of the analysis steps, along with a step-by-step description of each stage of the system, would greatly improve clarity.
Figure 1 helps explain the elements of the forecast. But the very brief caption doesn’t help the reader understand what’s shown.
Justification of configuration choices

The configuration includes a large number of parameters and empirical factors (e.g., observation errors, R-factors, K-factors, inflation, alpha parameter, flux perturbations), but there is little explanation of how these values were chosen.
For example:
observation error magnitudes for subsurface temperature and salinity

multipliers applied to atmospheric forcing (Table 1)

R-factors and K-factors used to scale observation impact

capped inflation of 2% (L242)

alpha parameter (L241)

It is not clear whether these values are:
derived from prior studies,

tuned through trial-and-error,

based on formal sensitivity experiments, or

derived from analysis of the system’s performance and errors.

Without justification, it is difficult to assess whether these choices are appropriate or transferable to other systems. The manuscript would benefit from either supporting evidence (e.g., sensitivity studies) or a discussion of how these parameters were determined.
Treatment of observation errors

The treatment of observation errors appears inconsistent across variables.
At L226, assumed errors for subsurface temperature and salinity are very large (0.5^oC or 0.075 psu) compared to instrumental uncertainties (e.g., Argo errors are typically ~0.002^oC and ~0.01 psu). This suggests the inclusion of representation error, but this is not explicitly stated or justified.
By contrast, at L227, SST and SLA errors are taken directly from observational data files. These values likely reflect measurement uncertainty and do not include representation error associated with the model resolution. These values aren’t explicitly stated reported in the manuscript.
This implies that:
representation error may be included for subsurface variables

but not included for SST and SLA

This inconsistency requires clarification. It may also help explain why the system shows improvement for SST and SLA but weaker performance for subsurface temperature and salinity.
Static ensemble and localisation

The static ensemble is constructed from a ~1^o resolution model (L207), while the localisation radius is set to 150–175 km (L224). What 1^o resolution model was used? No details are provided in the paper.
At this coarse resolution:
1^o corresponds to ~100–110 km

therefore the localisation radius spans roughly 1-2 grid points

Given that the Gaspari and Cohn taper goes to zero at the specified radius, this implies that:
covariances are truncated very locally

the spatial structure of the background covariance is limited

This raises the possibility that the assimilation of SST and SLA is effectively reduced to a largely vertical projection at each location, rather than a fully spatially distributed adjustment.
The manuscript refers to quasi-dynamical consistency (L77), but it is not clear how this is achieved under these constraints. The relationship between the static ensemble resolution, localisation scale, and resulting covariance structure should be explained more clearly.
Experimental design and duration

The evaluation period spans only 5.5 months (L296). Within this period:
the reported statistics are not stationary,

the manuscript notes that differences are “noticeable in the initial half of the experiment period” (L334), and

identifies “seasonality in temperature bias” (L345).

This suggests that the experiment period is too short to draw robust conclusions.
A longer evaluation period (at least one full annual cycle) would allow:
assessment across seasons

evaluation of whether improvements are consistent or transient

If longer experiments are not feasible, the limitations of the short period should be clearly acknowledged, and conclusions framed more cautiously.
Statistical evaluation of results

The assessment relies on globally averaged statistics and visual comparisons, but no formal statistical tests are applied.
The manuscript frequently refers to “significant” improvements, but:
no null hypothesis is defined

no p-values or confidence intervals are provided

no assessment is made of whether differences are statistically distinguishable

the only error bars are in Figures 10 and 11, showing a significant overlap of error bars for results from v4.1i and v4.0i – implying that the differences may not be statistically significant.

This is particularly important given that some reported differences are small (e.g., L324, L446).
The manuscript would benefit from:
formal hypothesis testing (e.g., two-sample tests)

or inclusion of uncertainty estimates (e.g., standard deviation, confidence intervals)

Without this, it is difficult to determine whether reported improvements are robust or within sampling variability. Other studies comparing forecast statistics have applied such tests and been able to more rigorously distinguish between performance that is statistically significant, and performance that is not statistically significant. See, for example, Oke and Rykova 2025; https://doi.org/10.3389/fmars.2025.1729116
Interpretation of results

The reported improvements are mixed:
SST and SLA show improvement

temperature improvements are modest

salinity shows deterioration or negligible change

However, the interpretation tends to emphasise improvements without fully addressing the weaker performance in salinity.
In addition, the manuscript focuses on describing differences in performance but provides limited explanation of why these differences occur.
For example:
why does increasing assimilation frequency improve SST and SLA?

why does salinity performance degrade?

How can improvement in some variables, and degradation in others be explained when those variables are dynamically linked?

what role do observation density, representation error, or model dynamics play?

A more process-based interpretation would strengthen the scientific contribution.
Forecast evaluation strategy

Most results focus on 1-day forecasts. Over such a short lead time, forecast skill may be difficult to distinguish from persistence (i.e., using the last analysis as a forecast). Figures 11 and 12 are the only figures that consider forecast lead times longer than a day.
It would be useful to include:
longer forecast lead times (e.g., 7 days)

or comparisons against persistence or alternative baseline analyses

This would better demonstrate the value of the dynamical system.
Detailed comments
SST and SLA forecasts improve, but salinity forecasts deteriorate and temperature differences are modest. The overall evidence for improvement is therefore mixed.
L19: Clarify whether sea-level anomaly is treated as a surface field or an integrated quantity of the water column.
L54: Since the Bluelink ReANalysis (BRAN) was used as a testbed for OceanMAPS for an extended period, this should be stated explicitly.
L81: Clarify whether the new version demonstrates improved skill associated with state-dependent, time-evolving error covariance. Maybe some examples of state-dependent covariance fields would help.
L113: SST is defined multiple times.
L179: Provide justification for the multipliers used to perturb atmospheric forcing. Were these based on estimates of forcing uncertainty? Or trial and error?
L207: Describe what model was used to construct the static ensemble. Given that the operational model is eddy-resolving (0.1^o), it is not clear why a 1^o model is used for this purpose.
L207–220: The description of the assimilation cycle is difficult to follow. A clearer step-by-step description or schematic would help. The description is concise, but unclear.
L224: Localisation radius of 150-175 km is small relative to the 1^o ensemble grid. The implications should be discussed explicitly.
L226: Assumed observation errors for subsurface temperature and salinity are large compared to instrument accuracy. If intended to represent representation error, this should be clearly stated and justified.
L227: Clarify SST and SLA observation error values and why representation error is not included.
L229: Clarify whether “grid cell” refers to the 0.1^o model grid or the 1^o ensemble grid.
L237: An R-factor of 6 is applied to temperature and salinity. Clarify how this modifies the effective observation errors.
L240: Combined use of observation error, R-factor, and K-factor suggests potentially very large effective errors. It would help to report typical values actually used. Maybe a histogram.
L264: Specify which climatology is used for salinity restoring. The reference is to 2003. But many more modern climatologies are available.
L269–292 and Figure 1: The forecast system description is unclear. Clarify how many forecasts are produced and how ensemble members contribute.
L271–276: Clarify the roles of each task (BRT analysis, hindcast, NRT analysis, catch-up runs).
Figure 1: The number of ensemble members and forecasts is unclear. It looks like 48 dynamical members (but only used for the assimilation) and just 4 forecasts.
Section 2.5: Would benefit from reorganisation for clarity rather than conciseness.
L296: A longer experiment period would allow evaluation across seasons.
L299: Clarify whether surface flux perturbations are identical between systems (v4.0i and v4.1i).
L305: Specify depth range for temperature and salinity comparisons.
Section 3.1: Claims of “significant” differences are not supported by statistical testing.
Figures 2–5: Inclusion of uncertainty estimates (e.g., error bars) would help interpretation.
L333: Use consistent terminology for SLA.
L339: Clarify meaning of “consistent”.
L340: Clarify interpretation of “minimal corrections”. Are the increments smaller in v4.1i compared to v4.0i?
L345: Seasonal effects suggest need for longer evaluation period.
L345: Averaging over the full water column obscures vertical structure; depth-resolved analysis would be more informative (e.g., contour plots of MAE on time vs depth plots).
L349: Clarify whether improvements reflect seasonal variation or changes in the system itself.
Section 3.2 and Figures 6–9: Many differences are small; it is unclear which are statistically meaningful.
Figure 10: Including a third category for each case, reporting the number of bins with no statistically significant difference. There is a lot of white in the bottom panels of Figures 2-5, implying many bins with no significant difference.
Figures 11–12: Consider testing whether error growth differences are statistically significant (test the statistical difference of the trend in each plot).
L442: In some cases error growth appears faster in v4.1i; this should be acknowledged.
L483: Statement of overall superiority is not consistent with salinity results.
Figure 13: Provides limited additional insight and could be reconsidered.
Figure 14: The standard deviation of salinity (in Figure 14d) is large (!0.4 psu for v4.1, ~0.32 psu for v4.0i, ~0.25 psu for the observations). Perhaps salinity is unconstrained. This would be consistent with the large assumed observation errors and degradation of salinity in v4.1i. This should be discussed.
L511: Avoid colloquial phrasing.
L539: The manuscript would benefit from the inclusion of example oceanographic fields, rather than relying exclusively on statistical comparisons. At present, the evaluation is entirely based on summary metrics, which provide limited insight into the actual structure and realism of the model output. In particular, the reader is not given a clear sense of the level of detail achieved in the “mesoscale ocean representation,” which is a stated objective of the system.
Including even a single illustrative case study would improve the paper. For example, the authors could present SST and SLA fields over a 7-day forecast period for a dynamically active region (e.g., the Tasman Sea), comparing forecasts from the new and previous versions against a suitable verifying analysis (such as OceanCurrent or another independent product). Selecting a specific event - such as an eddy interaction, eddy merger, or variability in the separation of a western boundary current - would provide a concrete demonstration of system performance.
Such examples would complement the statistical analysis and allow readers to visually assess how differences between system configurations translate into changes in the representation of ocean features. At present, the absence of any oceanographic fields limits the interpretability of the results and makes it difficult to assess the practical impact of the reported improvements.
L551–558: Consider whether this material is necessary for the paper.
Citation: https://doi.org/10.5194/egusphere-2026-534-CC1
CEC1:
'Comment on egusphere-2026-534 - No compliance with the policy of the journal', Juan Antonio Añel, 06 Jun 2026

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, you do not share the code for the OceanMAPSv4p1i framework and its components, and you must publish them in an open repository that complies with our policy. In addition, you have archived the forecast outputs in the NCI data catalogue; however, the NCI data catalogue does not fulfil GMD’s requirements for a persistent data archive because:
- It does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).

- It does not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,
If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2026-534-CEC1
- AC1: 'Reply on CEC1', Prasanth Divakaran, 15 Jun 2026
  
  We acknowledge that Peter’s review is not very positive and that the paper requires substantial revision in its current form. Also, Juan’s comments about not complying with the “code and data policy”. We can see that we won’t be able to meet them either.
  
  After consulting with the coauthors, we decided to withdraw the paper. We will rework the paper by taking Peter’s comments on board internally.
  
  Citation: https://doi.org/10.5194/egusphere-2026-534-AC1

Interactive discussion

Status: closed

CC1:
'Comment on egusphere-2026-534', Peter Oke, 15 May 2026
Overall assessment
This manuscript presents an update to the Ocean Model Analysis and Prediction System (OceanMAPS v4.1i), describing changes to the data assimilation cycle and evaluating forecast performance relative to the previous version (v4.0i). The transition to a higher-frequency assimilation cycle is a relevant development, and the system itself is of clear importance to the operational oceanography community.
The manuscript contains a substantial amount of detail and presents a broad evaluation of forecast performance. However, several aspects of the methodological description, justification of configuration choices, statistical evaluation, and interpretation require clarification or strengthening. Addressing these issues would significantly improve the clarity, rigour, and interpretability of the study.
A further concern relates to the level of novelty and its justification for publication. The primary change between v4.0i and v4.1i appears to be the adoption of a one-day analysis cycle (with BRT and NRT steps) in place of the previous three-day cycle. While this is an operationally relevant modification, it is not clear yet that this change alone constitutes a sufficiently substantial methodological advance to warrant publication in its current form, particularly given that the reported improvements are mixed. The results show clear gains for SST and SLA, but more limited improvements for temperature and a deterioration (or at best marginal change) in salinity. In its current form, the manuscript reads primarily as a system upgrade report rather than a clear demonstration of a scientific advance. To strengthen the contribution, the authors should more clearly articulate what new understanding is provided, or provide deeper analysis that links the changes in configuration to the observed performance differences.
Recommendation: Major revisions are required. The manuscript describes an important operational system, but improvements are needed in the clarity of the system description, justification of methodological choices, consistency of assumptions, statistical rigour of the evaluation, and balance of interpretation. The novelty of the changes and the strength of the results should also be more clearly justified. Addressing these issues would substantially strengthen the manuscript and clarify its scientific contribution.
Major comments
Clarity and completeness of system description

The description of the data assimilation cycle and forecast system (particularly Section 2.5 and Figure 1) is difficult to follow in its current form. The manuscript combines descriptions of v4.1i and v4.0i in a way that assumes familiarity with the earlier system, which makes it challenging to reconstruct how the new system actually operates.
It would improve clarity to present a clean, self-contained description of v4.1i, followed by a separate comparison with v4.0i.
Several aspects remain unclear throughout Section 2.5, including:
how many analyses are performed per cycle (one? Or one for each ensemble?)

how the 48 ensemble members are used in each step. Is it only for the data assimilation?

the distinction between BRT and NRT analyses

how the “catch-up” runs are defined and why only “selected ensemble members” (L274) are used

the definition and purpose of run001–run004

The terminology used (e.g., “best-estimate”, “catch-up runs”, “batching”) is also ambiguous and sometimes colloquial, making the workflow difficult to follow.
A schematic diagram with a clearly explained caption of the analysis steps, along with a step-by-step description of each stage of the system, would greatly improve clarity.
Figure 1 helps explain the elements of the forecast. But the very brief caption doesn’t help the reader understand what’s shown.
Justification of configuration choices

The configuration includes a large number of parameters and empirical factors (e.g., observation errors, R-factors, K-factors, inflation, alpha parameter, flux perturbations), but there is little explanation of how these values were chosen.
For example:
observation error magnitudes for subsurface temperature and salinity

multipliers applied to atmospheric forcing (Table 1)

R-factors and K-factors used to scale observation impact

capped inflation of 2% (L242)

alpha parameter (L241)

It is not clear whether these values are:
derived from prior studies,

tuned through trial-and-error,

based on formal sensitivity experiments, or

derived from analysis of the system’s performance and errors.

Without justification, it is difficult to assess whether these choices are appropriate or transferable to other systems. The manuscript would benefit from either supporting evidence (e.g., sensitivity studies) or a discussion of how these parameters were determined.
Treatment of observation errors

The treatment of observation errors appears inconsistent across variables.
At L226, assumed errors for subsurface temperature and salinity are very large (0.5^oC or 0.075 psu) compared to instrumental uncertainties (e.g., Argo errors are typically ~0.002^oC and ~0.01 psu). This suggests the inclusion of representation error, but this is not explicitly stated or justified.
By contrast, at L227, SST and SLA errors are taken directly from observational data files. These values likely reflect measurement uncertainty and do not include representation error associated with the model resolution. These values aren’t explicitly stated reported in the manuscript.
This implies that:
representation error may be included for subsurface variables

but not included for SST and SLA

This inconsistency requires clarification. It may also help explain why the system shows improvement for SST and SLA but weaker performance for subsurface temperature and salinity.
Static ensemble and localisation

The static ensemble is constructed from a ~1^o resolution model (L207), while the localisation radius is set to 150–175 km (L224). What 1^o resolution model was used? No details are provided in the paper.
At this coarse resolution:
1^o corresponds to ~100–110 km

therefore the localisation radius spans roughly 1-2 grid points

Given that the Gaspari and Cohn taper goes to zero at the specified radius, this implies that:
covariances are truncated very locally

the spatial structure of the background covariance is limited

This raises the possibility that the assimilation of SST and SLA is effectively reduced to a largely vertical projection at each location, rather than a fully spatially distributed adjustment.
The manuscript refers to quasi-dynamical consistency (L77), but it is not clear how this is achieved under these constraints. The relationship between the static ensemble resolution, localisation scale, and resulting covariance structure should be explained more clearly.
Experimental design and duration

The evaluation period spans only 5.5 months (L296). Within this period:
the reported statistics are not stationary,

the manuscript notes that differences are “noticeable in the initial half of the experiment period” (L334), and

identifies “seasonality in temperature bias” (L345).

This suggests that the experiment period is too short to draw robust conclusions.
A longer evaluation period (at least one full annual cycle) would allow:
assessment across seasons

evaluation of whether improvements are consistent or transient

If longer experiments are not feasible, the limitations of the short period should be clearly acknowledged, and conclusions framed more cautiously.
Statistical evaluation of results

The assessment relies on globally averaged statistics and visual comparisons, but no formal statistical tests are applied.
The manuscript frequently refers to “significant” improvements, but:
no null hypothesis is defined

no p-values or confidence intervals are provided

no assessment is made of whether differences are statistically distinguishable

the only error bars are in Figures 10 and 11, showing a significant overlap of error bars for results from v4.1i and v4.0i – implying that the differences may not be statistically significant.

This is particularly important given that some reported differences are small (e.g., L324, L446).
The manuscript would benefit from:
formal hypothesis testing (e.g., two-sample tests)

or inclusion of uncertainty estimates (e.g., standard deviation, confidence intervals)

Without this, it is difficult to determine whether reported improvements are robust or within sampling variability. Other studies comparing forecast statistics have applied such tests and been able to more rigorously distinguish between performance that is statistically significant, and performance that is not statistically significant. See, for example, Oke and Rykova 2025; https://doi.org/10.3389/fmars.2025.1729116
Interpretation of results

The reported improvements are mixed:
SST and SLA show improvement

temperature improvements are modest

salinity shows deterioration or negligible change

However, the interpretation tends to emphasise improvements without fully addressing the weaker performance in salinity.
In addition, the manuscript focuses on describing differences in performance but provides limited explanation of why these differences occur.
For example:
why does increasing assimilation frequency improve SST and SLA?

why does salinity performance degrade?

How can improvement in some variables, and degradation in others be explained when those variables are dynamically linked?

what role do observation density, representation error, or model dynamics play?

A more process-based interpretation would strengthen the scientific contribution.
Forecast evaluation strategy

Most results focus on 1-day forecasts. Over such a short lead time, forecast skill may be difficult to distinguish from persistence (i.e., using the last analysis as a forecast). Figures 11 and 12 are the only figures that consider forecast lead times longer than a day.
It would be useful to include:
longer forecast lead times (e.g., 7 days)

or comparisons against persistence or alternative baseline analyses

This would better demonstrate the value of the dynamical system.
Detailed comments
SST and SLA forecasts improve, but salinity forecasts deteriorate and temperature differences are modest. The overall evidence for improvement is therefore mixed.
L19: Clarify whether sea-level anomaly is treated as a surface field or an integrated quantity of the water column.
L54: Since the Bluelink ReANalysis (BRAN) was used as a testbed for OceanMAPS for an extended period, this should be stated explicitly.
L81: Clarify whether the new version demonstrates improved skill associated with state-dependent, time-evolving error covariance. Maybe some examples of state-dependent covariance fields would help.
L113: SST is defined multiple times.
L179: Provide justification for the multipliers used to perturb atmospheric forcing. Were these based on estimates of forcing uncertainty? Or trial and error?
L207: Describe what model was used to construct the static ensemble. Given that the operational model is eddy-resolving (0.1^o), it is not clear why a 1^o model is used for this purpose.
L207–220: The description of the assimilation cycle is difficult to follow. A clearer step-by-step description or schematic would help. The description is concise, but unclear.
L224: Localisation radius of 150-175 km is small relative to the 1^o ensemble grid. The implications should be discussed explicitly.
L226: Assumed observation errors for subsurface temperature and salinity are large compared to instrument accuracy. If intended to represent representation error, this should be clearly stated and justified.
L227: Clarify SST and SLA observation error values and why representation error is not included.
L229: Clarify whether “grid cell” refers to the 0.1^o model grid or the 1^o ensemble grid.
L237: An R-factor of 6 is applied to temperature and salinity. Clarify how this modifies the effective observation errors.
L240: Combined use of observation error, R-factor, and K-factor suggests potentially very large effective errors. It would help to report typical values actually used. Maybe a histogram.
L264: Specify which climatology is used for salinity restoring. The reference is to 2003. But many more modern climatologies are available.
L269–292 and Figure 1: The forecast system description is unclear. Clarify how many forecasts are produced and how ensemble members contribute.
L271–276: Clarify the roles of each task (BRT analysis, hindcast, NRT analysis, catch-up runs).
Figure 1: The number of ensemble members and forecasts is unclear. It looks like 48 dynamical members (but only used for the assimilation) and just 4 forecasts.
Section 2.5: Would benefit from reorganisation for clarity rather than conciseness.
L296: A longer experiment period would allow evaluation across seasons.
L299: Clarify whether surface flux perturbations are identical between systems (v4.0i and v4.1i).
L305: Specify depth range for temperature and salinity comparisons.
Section 3.1: Claims of “significant” differences are not supported by statistical testing.
Figures 2–5: Inclusion of uncertainty estimates (e.g., error bars) would help interpretation.
L333: Use consistent terminology for SLA.
L339: Clarify meaning of “consistent”.
L340: Clarify interpretation of “minimal corrections”. Are the increments smaller in v4.1i compared to v4.0i?
L345: Seasonal effects suggest need for longer evaluation period.
L345: Averaging over the full water column obscures vertical structure; depth-resolved analysis would be more informative (e.g., contour plots of MAE on time vs depth plots).
L349: Clarify whether improvements reflect seasonal variation or changes in the system itself.
Section 3.2 and Figures 6–9: Many differences are small; it is unclear which are statistically meaningful.
Figure 10: Including a third category for each case, reporting the number of bins with no statistically significant difference. There is a lot of white in the bottom panels of Figures 2-5, implying many bins with no significant difference.
Figures 11–12: Consider testing whether error growth differences are statistically significant (test the statistical difference of the trend in each plot).
L442: In some cases error growth appears faster in v4.1i; this should be acknowledged.
L483: Statement of overall superiority is not consistent with salinity results.
Figure 13: Provides limited additional insight and could be reconsidered.
Figure 14: The standard deviation of salinity (in Figure 14d) is large (!0.4 psu for v4.1, ~0.32 psu for v4.0i, ~0.25 psu for the observations). Perhaps salinity is unconstrained. This would be consistent with the large assumed observation errors and degradation of salinity in v4.1i. This should be discussed.
L511: Avoid colloquial phrasing.
L539: The manuscript would benefit from the inclusion of example oceanographic fields, rather than relying exclusively on statistical comparisons. At present, the evaluation is entirely based on summary metrics, which provide limited insight into the actual structure and realism of the model output. In particular, the reader is not given a clear sense of the level of detail achieved in the “mesoscale ocean representation,” which is a stated objective of the system.
Including even a single illustrative case study would improve the paper. For example, the authors could present SST and SLA fields over a 7-day forecast period for a dynamically active region (e.g., the Tasman Sea), comparing forecasts from the new and previous versions against a suitable verifying analysis (such as OceanCurrent or another independent product). Selecting a specific event - such as an eddy interaction, eddy merger, or variability in the separation of a western boundary current - would provide a concrete demonstration of system performance.
Such examples would complement the statistical analysis and allow readers to visually assess how differences between system configurations translate into changes in the representation of ocean features. At present, the absence of any oceanographic fields limits the interpretability of the results and makes it difficult to assess the practical impact of the reported improvements.
L551–558: Consider whether this material is necessary for the paper.
Citation: https://doi.org/10.5194/egusphere-2026-534-CC1
CEC1:
'Comment on egusphere-2026-534 - No compliance with the policy of the journal', Juan Antonio Añel, 06 Jun 2026

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, you do not share the code for the OceanMAPSv4p1i framework and its components, and you must publish them in an open repository that complies with our policy. In addition, you have archived the forecast outputs in the NCI data catalogue; however, the NCI data catalogue does not fulfil GMD’s requirements for a persistent data archive because:
- It does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).

- It does not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,
If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2026-534-CEC1
- AC1: 'Reply on CEC1', Prasanth Divakaran, 15 Jun 2026
  
  We acknowledge that Peter’s review is not very positive and that the paper requires substantial revision in its current form. Also, Juan’s comments about not complying with the “code and data policy”. We can see that we won’t be able to meet them either.
  
  After consulting with the coauthors, we decided to withdraw the paper. We will rework the paper by taking Peter’s comments on board internally.
  
  Citation: https://doi.org/10.5194/egusphere-2026-534-AC1

Prasanth Divakaran, Pavel Sakov, Gary B. Brassington, and Xinmei Huang

Supplement

https://doi.org/10.5194/egusphere-2026-534-supplement

Prasanth Divakaran, Pavel Sakov, Gary B. Brassington, and Xinmei Huang

Viewed

Total article views: 408 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
314	72	22	408	25	33	20

HTML: 314
PDF: 72
XML: 22
Total: 408
Supplement: 25
BibTeX: 33
EndNote: 20

Views and downloads (calculated since 29 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	55	15	4	74
May 2026	194	39	15	248
Jun 2026	28	5	2	35
Jul 2026	37	13	1	51

Cumulative views and downloads (calculated since 29 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	55	15	4	74
May 2026	194	39	15	248
Jun 2026	28	5	2	35
Jul 2026	37	13	1	51

Viewed (geographical distribution)

Total article views: 391 (including HTML, PDF, and XML) Thereof 391 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 31 Jul 2026

Short summary

The Ocean Forecast System (OceanMAPS), based on EnKF data assimilation at the Australian Bureau of Meteorology, has been upgraded by introducing a one-day BRT and NRT analysis cycle in place of the previous version's 3-day single BRT analysis cycle. This design change reduces the overall latency of the analysis. Use of NRT analysis average as forecast initial condition significantly improved mesoscale ocean eddy representation and reduced forecast errors.


Total:	0
HTML:	0
PDF:	0
XML:	0