Performance and longevity of compact all-in-one weather stations – the good, the bad and the ugly
Abstract. We provide a long-term evaluation of compact, all-in-one automatic weather stations (AiOWS) compared to professional-grade Automatic Weather Stations (AWS). We examine the performance, longevity, and degradation of six AiO WS models over several years of non serviced use. The objective was to determine how closely these low-cost stations meet World Meteorological Organization (WMO) performance standards for temperature, humidity, wind, and precipitation, and to identify their weaknesses and maintenance needs.
Previous studies show the potential value of AiOWS when data are properly quality-controlled, yet long-term reliability remains uncertain. To address this we deployed six AiO WS units— Davis VVue, Davis VP2, METER ATMOS41, Lufft WS601, and Vaisala WXT520, alongside two collocated reference AWS meeting WMO standards. Before field installation, each unit was tested in (KNMI’s) calibration lab for baseline validation. The stations were then operated in open terrain for multiple years without any servicing, simulating typical end-user neglect.
Initially, all AiO WS met manufacturer specifications. After long-term exposure, however, sensors displayed varied durability. The Vaisala unit operated continuously for over 13 years, while others failed between four and seven years due to corrosion, component wear, and sensor drift. The METER and Davis VVue remained mostly functional but with degraded performance, whereas both Davis VP2 rain gauges failed early due to reed switch damage.
Temperature measurements were the most robust. In climate chamber tests, new and aged sensors maintained accuracy within ±0.3 °C across -15 °C to 30 °C, drifting slightly (underestimating by 0.5–0.7 °C) above 30 °C. Field data confirmed these results, though strong solar radiation caused overestimations during summer. The Vaisala and Davis VVue units remained within WMO Class B limits after a decade. Relative humidity showed consistent deterioration. Most sensors overestimated low humidity and underestimated above 90 %, particularly the METER unit, whose bias grew markedly after five years. Wind speed accuracy degraded due to mechanical wear. Cup anemometers underreported low winds and failed completely in some cases. Sonic sensors (Vaisala, METER) produced erratic readings after several years, highlighting their fragility outdoors. Precipitation performance was weakest across all models. Tipping bucket designs suffered from clogging, internal corrosion, and undercatch errors, while haptic or drip-based sensors became inaccurate as components aged or fouled.
We concluded that compact AiO WS can provide scientifically useful temperature data if properly managed but fall short for humidity, wind, and particularly precipitation unless regularly serviced. Long-term unattended operation severely limits reliability, yet moderate maintenance can potentially restore performance close to WMO Class A/B standards, extending their utility for dense observation networks.
The authors have performed a very interesting, long-term study on the quality of "all-in-one" weather stations, or Personal Weather Stations, looking at the decay of quality over time when these stations are operated without maintenance. It's a unique study, and thereby worthy of publishing in my opinion since it addresses a big unknown in the use of non-WMO data (quality drift over time). However, the presentation of the results, the structure of the text and the meager application discussions leave the overall quality of the manuscript something to be desired. I recognize that additional experiments for such a long-term study are fully impossible, but there are some scientific improvements to be made nevertheless before the article is fully suitable for publication. hence my recommendation for Major Revisions, with the sidenote that this mainly contains the framing, structure and presentation of the work and not so much its experimental core.
A major point I felt while reading through the manuscript is that, while it's a really interesting work, the presentation feels shallow. The authors do not go much beyond presenting some statistics on performance, and the only comparisons made are to the standard WMO table of station siting (as well as their reference data at the weather field). I would have liked to see some comparisons to similar studies, or studies using PWS data: for instance in section 3.2.1, a lot of the somewhat cheaper brands of PWS (e.g. Netatmo) suffer from moisture retention at high RH values - hence what you see in those stations is that moisture gets inside the sensor and oversaturates it (RH reported at nearly 100) for a long time. This problem of moisture pooling inside the sensor is also an issue for e.g. the Netatmo sonic anemometer which understandably deteriorates its usage - it would have been interesting to draw those comparisons and look a little further than just the findings in the field: what do they mean? Similarly, the authors could dive a little deeper into the data: I get that for wind observations a direct comparison to a different height is tricky, but it would benefit the manuscript if that was at least given a go. Now the wind results, as well as the rain results because of the equipment failure, feel fairly underwhelming and inconclusive. On the application side of things, I feel like the focus is too much on direct comparison to WMO guidelines and equipment, which will always be an unfair comparison. Rather, the power and interesting use cases of PWS data is in those locations where WMO siting will always say it's imperfect: heterogeneous terrain and especially cities. So rather than focus on the poor performance, I would like to see the authors' thoughts on when these data CAN help: where and how should we as scientists, or citizen scientists, deploy these stations, in order to have them both running well for a longer time, and provide good data? There are quite a few other studies using PWS data (the authors already mention a few) that are pretty positive on their usage, but a thorough discussion of the link between this work and those studies is now missing from this paper. Creating that connection, between this well-studied field experiment and those opportunistic sensing studies would strengthen the field as a whole.
The figures don't really help with that feeling of shallow presentation: figures 4 and 5 especially are giant tables, without proper captions, that I cannot read very well in the printed version of the manuscript. The presentation idea is very nice, showing the bias in time, but providing a giant table without context is fairly overwhelming. Also the RH colorbar is counterintuitive: positive biases would mean higher RH for the observations, which tends to be colored blue (minor detail). Figure 2 is of quite low resolution. Figure 6 is quite nice as an example of the level of filth that can accumulate in rain gauges, though a small explanation of the scale bar on the bottom would be nice (I imagine it's a ruler in cm?). table 1 can be referred to a bit more often when WMO siting classes are referenced in the text, e.g. in the conclusions. In that table, an overview of the measurement equipment beyond their accuracy would be helpful: e.g. the type of wind sensor, do they have a radiation shield, single/double tipping bucket etc etc, for easier comparison between the brands of PWS.
Some smaller comments, issues and points below: