Towards Best Practices in UAV Thermal Remote Sensing in Complex Environments
Abstract. Thermal infrared (TIR) remote sensing using uncrewed aerial vehicles (UAVs) is a promising approach for measuring surface temperatures in complex environments. This study examines the challenges encountered and the lessons learned from UAV TIR surveys of a cryospheric landform in the Swiss Alps. We conducted laboratory experiments and field observations to develop, implement and evaluate the effectiveness of different correction schemes. The results reveal significant dependencies between the internal temperature of the camera and the retrieved surface temperatures, showing a non-linear bias of the UAV TIR camera towards cold, warm, and hot targets. The correction schemes produce divergent outcomes; some amplify extremes, while others reduce the temperature spatial distribution. Validation against data from in situ radiometers and ground surface temperature loggers shows that field calibration provides the most accurate results, whereas drift correction can be misleading in environments with complex topography. By addressing technical and environmental limitations, we provide best practices for UAV TIR surveys and post-processing strategies. Our findings highlight the importance of robust calibration, topographic characterisation and site-specific validation to accurately retrieve surface energy budget-relevant variables in rapidly changing mountainous environments.
Review for egusphere-2026-787 by Naegeli et al.
I appreciated the authors’ systematic effort to explore and correct biases in UAV-based thermal infrared (TIR) surveys. This is an important and ongoing methodological challenge, and the manuscript addresses a relevant problem.
However, I found the manuscript difficult to follow in its current form. The description of the correction procedures is fragmented across sections (methods vs. results), making it challenging to understand what each experiment is designed to test and how the corrections are derived. In addition, the language throughout the manuscript often lacks precision, which further complicates interpretation.
My main concern relates to the conclusions. Based on the presented analysis, it is not clear that the proposed corrections consistently improve results relative to the raw imagery. In several cases, performance appears comparable—or even degraded—after correction. This raises an important question about the necessity and applicability of the correction workflow, which is not fully addressed. I also expected a more thorough spatial analysis of validation results (e.g., identification of systematic spatial patterns), particularly given that landscape effects are introduced in the introduction but only briefly discussed later.
Despite these concerns, I believe the manuscript has scientific value. However, revisions are needed to improve structure, clarity, and the alignment between results and conclusions.
Specific comments:
Introduction: The introduction is comprehensive, but it does not clearly set up the experimental design. I recommend restructuring it to explicitly introduce the main sources of error (e.g., camera temperature effects, drift, atmospheric/emissivity factors) and link these directly to the three correction approaches tested in the study.
The objectives are currently quite broad. The “challenges and lessons learned” component would be more appropriate as a post-analysis outcome rather than a primary objective. It would help to more clearly distinguish between objectives, experiments, and expected outputs.
L91: Remove “over the Murtèl and Marmugnun rock glacier” in the first sentence, as it is repeated in L95. The paragraph would read more smoothly if it moves from a general description to the specific site.
Figure 1: The ground control point labels in panel (a) are very difficult to read—please increase their size. I also do not find panel (a) particularly informative in its current form. Consider enlarging panel (b) instead, as it better shows surface texture and terrain characteristics.
L147: I am not sure what is meant by “GST of the TIR mosaic.” Does this refer to the orthomosaic derived from the TIR data and resampled to 20 cm resolution? Please clarify.
L150: “PT100 (OMEGA SA1-RTD-4W)” is not clearly described. Please specify that this is a platinum resistance temperature sensor (or equivalent), so the reader understands the measurement type.
L170: I do not think this section is necessary. The brief definition of complex terrain in the introduction is sufficient. Instead, this section would be more useful if it provided a short overview of the three correction approaches that will be developed, to help frame the methodology.
L206: I am not clear on what is meant by “camera-target temperature dependency.” Since both the blackbody and the camera temperature appear to change together, is this experiment intended to assess errors associated with the operating temperature of the camera–target system? Please clarify exactly what this correction is designed to quantify.
L222: The notation for temperatures is confusing. Please clearly define each variable (e.g., , , blackbody temperature, etc.) and use consistent terminology throughout. It is currently difficult to follow what each term represents and what the goal of the correction is.
L230: As I understand it, this correction applies when the camera temperature changes while the target temperature remains constant—i.e., during camera warm-up at the start of a flight. Please confirm if this is correct and clarify the conditions under which this correction is relevant.
L242–245: I had difficulty following this section. Please clarify whether the same TCP is used across the “warm,” “hot,” and “very hot” conditions, the sequence of measurements (e.g., shaded pre-survey, exposed pre-survey, post-survey), and whether the images were taken with the UAV-mounted TIR camera or a separate handheld system (the figure caption suggests a different camera, but this is not clearly stated). If a separate camera is used, please specify this in the data section. Also, why are measurements limited to only three time steps?
L260: Please be consistent in how instruments are referenced. The SI-121-SS is referred to as the Apogee radiometer, but elsewhere as a “thermal infrared radiometer.” Use one consistent name. Also, please define NIST.
L276: Did the radiometer measurements overlap spatially with the GST measurements? It would be valuable to compare these in situ datasets directly, as different measurement approaches (radiometric vs. contact) may yield different results and influence validation.
L285: Do you apply each correction individually and then in combination? In other words, are separate mosaics generated for each correction and a final fully corrected mosaic? Please clarify.
L287: Am I correct in understanding that you correlate the spatial index with the error in the corrected TIR data? Please clarify this step.
Figure 4: The figure is difficult to read—please increase text size. Also, the schematic is somewhat unclear: background temperature and emissivity appear to contribute jointly to both emissivity and atmospheric corrections. These processes may need to be represented more clearly as separate pathways. It is also unclear where fits in this framework.
L296: As earlier, please clarify whether this refers to the internal camera temperature or the radiometric temperature measured by the camera.
Figure 5: I find this figure difficult to interpret. For example, at ~29.9 °C, is the goal to keep the camera at a constant temperature, or is it varying? The correlation appears to be between camera temperature and blackbody-derived pixel temperature, but the interpretation is not clear. The difference between the blackbody temperature and pixel temperature is attributed to camera effects, but in panel (d), the camera temperature closely matches the blackbody temperature while the pixel temperatures still show substantial variability. This seems inconsistent with the interpretation in L292. Additionally, L301 refers to a “range” in pixel temperature, but the figure shows standard deviation. Across panels (a–e), the range of pixel values appears similar despite large differences in camera temperature variability. Overall, I am missing key information about whether camera temperature is controlled or varying in each experiment, and how this links to the conclusions drawn.
Figure 6: This figure is clearer. It effectively shows that when the camera is warmer, pixel temperatures over a stable blackbody are more variable, whereas lower camera temperatures correspond to more stable measurements.
L304: I assume the internal camera temperature changes (e.g., warms up) during flight. It would help to state this explicitly to guide interpretation of the trends shown.
L306: Please clarify what is meant by “highest camera instability.” Is this referring to the largest variance, drift, or another metric?
Figure 7: I am not fully clear on what is being shown. Are these the raw data used to derive the drift correction described in the methods? If so, this should be stated explicitly. More broadly, the manuscript structure is confusing: corrections are described in the methods, but the data used to derive them appear in the results.
Figure 8: “Raw” is not a correction scheme—please revise the caption accordingly. It would help to clearly distinguish between raw data, individual corrections, and the combined correction.
L320: Replace “overflown” with “flown over.”
L330: The comparison of validation approaches is valuable. However, the distinction between contact-based measurements (e.g., GST) and radiometric measurements should be introduced earlier in the manuscript, as it is central to interpreting these results.
L350: I find this statement somewhat misleading. While one case shows a difference of 0.03 °C versus 0.04 °C, Figure 10 suggests that, overall, the corrections do not consistently improve performance relative to raw imagery. This should be discussed more cautiously.
Figure 10: Is there a reason results are not grouped by sensor type (e.g., radiometer vs. GST)? Grouping them may help reveal patterns. The variability across validation approaches is striking. It also appears that corrections do not consistently reduce error. For example, in 2023, raw and field comparisons are very similar, and drift correction may even worsen agreement. At the same time, Figure 9 suggests that drift correction improves spatial patterns (e.g., removing the cold corner), indicating that different metrics (bias vs. spatial structure) may lead to different conclusions. This is worth discussing explicitly.
L365: Based on Figure 10, I do not think this statement is supported. Raw imagery appears to perform similarly to corrected outputs in many cases, which weakens the argument for applying all corrections.
L366: This section feels like a mix of recommendations and reflections on what did not work. Consider restructuring it under a clearer heading (e.g., “Recommendations”) and separating it from the uncertainty discussion.
L372: I do not understand this sentence. Please clarify what is meant by “inevitable” and how this relates to assessing bias. If correlations between metrics were an objective, they should be clearly presented rather than only mentioned in discussion.
L385: This is the first time it is mentioned that TCPs were intended for direct calibration of the TIR orthomosaic but were not successful. This should be introduced earlier, especially given their role in pre- and post-flight correction.
L432: Micro-slope effects and emissivity changes (particularly at angles >30°) are important. It remains unclear whether averaging over these effects produces behaviour equivalent to a flat surface, or whether biases remain.
L441: TRI is discussed but not shown in the manuscript. If it is important to the analysis, it should be included rather than only referenced.
Figure 11: This appears to be a standard field workflow and does not seem specific or novel to this study. Its added value to the manuscript is limited.