the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
HailCam: An Automated Imaging System for Real-Time Measurement of Hail Size Distributions and Fall Rates
Abstract. Ground-based hail observations with high temporal resolution and precise microphysical quantification remain critically scarce, limiting the validation of radar-based hail detection algorithms and convective-scale numerical models. Existing automatic hail sensors often suffer from small sampling areas, susceptibility to rain interference, and limited automation in post-event processing. We present HailCam, an intelligent hail observation instrument integrating high-definition optical imaging, automated particle collection, and real-time deep learning inference to address critical gaps in time-resolved ground-based hail microphysics measurements. The system employs a ConvNeXt-Tiny architecture with Mask R-CNN for instance segmentation, capturing hailstone number, size distribution, and number flux at one-minute intervals over a 60 cm × 60 cm sampling area. Laboratory validation using synthetic ice spheres (5–45 mm) and polystyrene foam spheres demonstrates 91 % sizing accuracy within ±5 % relative error (RMSE 0.21–1.71 mm) and counting linearity of R² = 0.9989. Field intercomparison with an OTT Parsivel² disdrometer during a nocturnal hail event on 9 May 2025 reveals consistent temporal evolution of hailfall and statistically indistinguishable size distributions (Kolmogorov-Smirnov D = 0.167–0.250, p > 0.84), though absolute counts differ due to distinct phase-discrimination methodologies. HailCam provides co-located, time-stamped measurements essential for validating radar-based hail algorithms and constraining convective-scale numerical models, particularly in complex terrain where remote sensing is challenged.
- Preprint
(1272 KB) - Metadata XML
-
Supplement
(402 KB) - BibTeX
- EndNote
Status: open (until 05 Jun 2026)
- RC1: 'Comment on egusphere-2026-1127', Anonymous Referee #1, 20 May 2026 reply
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 188 | 48 | 15 | 251 | 30 | 11 | 15 |
- HTML: 188
- PDF: 48
- XML: 15
- Total: 251
- Supplement: 30
- BibTeX: 11
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear Lyu et al.,
This manuscript highlights a novel technique/device for the measurement of hail size and frequency, which are known to be difficult problems with limited robust solutions. The results show that the created device has impressive skill for both tasks. In particular, relative errors of only a few percentage points when measuring diameter (Figure 4) has the potential to be highly beneficial when populating hail size distributions, especially when considering many competing methods (such as human reporting) bin hail diameter into coarse categories of for example 5 mm. Overall, I am excited for what opportunities this device could bring to the atmospheric science community. Well done.
I do have a few comments/suggestions that may help improve the overall quality of the manuscript. I have 2 main comments given below, along with a brief section for standard editing recommendations such as spelling or grammar etc.
Comment #1: This is perhaps my most pressing recommendation. It appears that section 2.2 titled “Deep Learning–Based Hailstone Segmentation and Characterization” could use some expansion or refinement. In the interest of EGU AMT review aspect #6 (Is the description of experiments and calculations sufficiently complete and precise to allow their reproduction by fellow scientists?), I would prefer to see more details/explanations on the deep learning architecture and feature engineering process. In particular, section 2.2.1 could use some thought.
Firstly, the description given in the paragraph between lines 110 – 124 and the contents of Figure 2 leave me confused about exactly the sequence of ML architectures I would have to deploy if I were to try and reproduce this work. Lines 112 through 116 imply the ConvNeXt-Tiny backbone’s output is what is fed into the Feature Pyramid Network (FPN). However, the hatched rectangle in Figure 2 seems to imply the ConvNeXt-Tiny makes up the at least the downsampling portion of the FPN. It appears there is extensive overlap between these two architectures (rather than a sequence of two independent architectures), but this section can make this unclear. I would recommend more written descriptions clarifying this and perhaps a rework of the figure itself. For the figure, a legend may help, along with text explaining exactly where the ConvNeXt-Tiny comes into play, similar to the text given for other features such as the FPN.
Secondly, the instance segmentation performed by a R-CNN could be more clearly defined in Figure 2. I suspect this architecture runs on the fused feature map (yellow rectangle) and before the start of the 3 final models in the rightmost section of the figure, but this is not clear. It seems the text for this topic (ex: lines 119/120) is clearer, but the figure could use a legend/text for this as well.
Finally, (and most critically) I would like to see in section 2.2.1 more discussion on dataset organization (training/validation/testing splits) and the chosen hyperparameters used in the various ML architectures. Starting in line 129, the authors have noted 8742 images in a training dataset. However, there is no discussion on subsequent validation or testing datasets needed for hyperparameter searches and result verification, respectively. It is not clear if the 8742 images were sliced into each of these splits, or if other images were used. Additionally, clarification on the validation split can give more confidence to readers that the optimal hyperparameters were discovered while clarification on the testing split would offer additional assurance to the integrity of any ML results.
With regard to hyperparameters, some seem to be discussed in lines 133 to 136, but it is not clear if this is an exhaustive list for the various architectures. If it is exhaustive, something to note this would be helpful. If a hyperparameter search was performed, discussion on what the search spaces were would be helpful for reproducibility. Perhaps a table on all of the chosen hyperparameters and their spaces would solve this all at once, even if in supplemental material or appendices etc.
Comment #2: The results highlighted in figure 8 (also lines 299 to 308) showing the differences between the disdrometer and HailCam hail size distributions (HSDs) seem to have a lot of potential important implications. A widely accepted mathematical estimate for HSDs is the gamma distribution (for example see doi: https://doi.org/10.1175/1520-0469(1987)044<1062:ATPROT>2.0.CO;2), so both systems showing a decreasing exponential is curious. Some further discussion noting the lack of this distribution may be warranted, even if it is just to acknowledge that the sample size of 2 events may not be large enough to see it.
It is also curious that although both systems show a negative exponential (and no statistically significant differences), the disdrometer has an increase in probabilities around the 8 mm mark, which is a characteristic of a gamma distribution. Is there perhaps something more happening? Such as the HailCam’s under sizing bias coming into play or melting effects in the 60 second interval the HailCam processes in (perhaps hinted at in line 381 of conclusions)? I note that the authors included lots of discussion highlighting that the differences between HailCam and the disdrometer may be associated with mechanical differences between the devices, however, a few extra more detailed sentences acknowledging the potential for further biases could be helpful, especially when, for example, the authors state somewhat competing assertions in lines 308 to 310 and lines 313 to 316:
“This discrepancy suggests that the disdrometer's velocity-based classification algorithm may be misidentifying larger raindrops or graupel particles as hail, whereas HailCam's imaging-based segmentation strictly enforces morphological criteria for solid-phase identification.”
“Despite the visual divergence in distribution shapes, particularly the disdrometer's enhanced probabilities in the 7–12 mm range, the cumulative distribution functions are sufficiently similar that the observed differences may plausibly arise from sampling variability rather than systematic measurement bias.”
Technical Corrections:
Once again, great work! I look forward to the final version of the manuscript.