1,479
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Capturing deprived areas using unsupervised machine learning and open data: a case study in São Paulo, Brazil

, , &
Article: 2214690 | Received 03 Dec 2022, Accepted 12 May 2023, Published online: 19 May 2023
 

ABSTRACT

Managing the rapid growth of deprived areas (commonly known as slums, informal settlements, etc.) in cities of Low- to Middle-Income Countries (LMICs) demands detailed and consistent information that is often unavailable. Recent Earth Observation (EO) mapping approaches with supervised classification models overlook the diversity of deprived areas and require resource-intensive training sets. In this study, we analyse the potential of unsupervised machine learning (ML) models to capture intra-urban diversity of deprived areas in São Paulo, using solely open geodata. We provide a workflow of characterising deprivation at a city scale with a disaggregated approach, offering scalability and transferability potential. First, we extract a pool of spatial features from open geospatial datasets to characterise the morphological and environmental conditions of the study area. After input preparation, we train and optimise a k-means model, including a coupled feature importance tool. Four cluster types emerged with different deprivation aspects such as higher and lower accessibility to services and infrastructure, sparser and denser occupation; regular and complex morphology; flat and steep terrain. This alternative methodology to capture diversity of deprived areas with open EO-based features can inform locally targeted, thus more efficient, urban policies and interventions.

Acknowledgments

The authors acknowledge the support of Alexandra Pedro on the validation procedure, Dr. Raian Maretto and Dr. Flávia Feitosa for the technical consultancy on machine learning models, and Dr. Caroline Gevaert for special assistance on uncertainties sources.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

The entire workflow, datasets, model input, code scripts and model output are archived and available at GitHub platform (https://github.com/ltrentooliveira/MSc_Archive) to ensure maximal replicability.

Notes

Additional information

Funding

This research received no external funding.