Semi-Supervised Segmentation for Mapping Urban Expansion and Hazard Exposure in Lima, Peru
Abstract. Urban expansion in rapidly growing cities increases exposure to natural hazards but remains difficult to monitor in regions with limited data. This challenge is amplified in places such as Metropolitan Lima, where global datasets of urban areas lack precision along complex and rapidly changing city boundaries. As a result, recent growth in informal and peripheral zones is not well defined. This study introduces a practical application of a semi-supervised mapping approach that combines satellite imagery with partially labeled information and targeted manual refinement to identify new built-up areas in Metropolitan Lima from 2016 to 2025. The method improves the detection of small and fragmented structures, including emerging informal settlements that global datasets frequently miss. Results show that Metropolitan Lima expanded by approximately 76 km2 during the study period. A portion of this growth occurred in coastal zones exposed to tsunamis, in areas with medium to high landslide susceptibility, and on soil types where strong ground shaking is amplified during large earthquakes. These findings highlight the continued concentration of people and infrastructure in hazard-prone terrain.
This study presents a semi-supervised segmentation framework based on satellite images and partially labeled data to improve the detection of small and informal settlements in Metropolitan Lima, which is often missed by global datasets. Results show that the city expanded by about 76 km² between 2016 and 2025, with a significant share of new development occurring in areas exposed to tsunami, landslide, and seismic hazards, highlighting growing risk in hazard-prone zones. Overall, this study is well-designed and comprehensive, and the findings are meaningful for risk-informed and resilient urban management. However, I still have several comments and suggestions for improving the current work.
1) Lines 17-18: It is suggested to make it clear that the correlation refers to Spearman’s correlation coefficient, and present the corresponding p-value that indicates its statistical significance.
2) Lines 41-42: What is the difference between the proposed semi-supervised segmentation framework and those in the literature?
3) Line 45: “SAR” stands for Synthetic Aperture Radar? It is better to use the full name for its first appearance in the manuscript.
4) For figures with maps, it is suggested to add “N” to the north arrow, and add labels and units like “longitude (°)” and “latitude (°)” to the axes.
5) Equation (1): Why does the power number of e include a coefficient of “5”?
6) Equations (2) and (6): The right square bracket is missing for the second term (the expected loss, E_u) on the right-hand side of the equation.
7) Line 118: Maybe a typo: pi_m should be pi_n?
8) Section 3.1: What is the spatial resolution of the images for deep learning modeling? Would this affect the model performance since the resolution of WSF dataset is 10 m?
9) Line 170 and Figure 6: It is typically expected that the model performance in validation is poorer than that in training, but this figure show that the loss values of the two stages almost overlap with each other. It is suggested to randomly split the dataset into training and validation to guarantee the model’s robustness. In addition, which set of model weights among the 200 epochs were chosen for further model comparison?
10) Line 193: It stated that “This result is expected since WSF effectively represents consolidated urban zones worldwide”. If that is the case, both the precision and recall evaluation metrics of WSF should be higher than those of the proposed framework.
11) Lines 197-199: What are the possible reasons why the performance difference is relatively large in these cases?
12) Figure 11(a): It should be noted that the uncertainty in the flood modeling process is not negligible. Thus, it is suggested to employ the probabilistic flood inundation maps instead of the deterministic maps for the further exposure analysis if possible. Please refer to the paper below.
Reference:
“Uncertainty analysis and quantification in flood insurance rate maps using Bayesian model averaging and hierarchical BMA” (https://doi.org/10.1061/JHYEFF.HEENG-58)
13) Figure 12: How are the clusters defined and what is “Ha” in the horizontal axis? It is also suggested to change the label of the vertical axis in Figures 12(b)-12(d) to the accumulated area for the corresponding hazard.
14) Lines 339-340: The statement that “the improved recall in peripheral and remote areas” may be true only for the urban area according to Table 2.