324
Views
0
CrossRef citations to date
0
Altmetric
Review Article

Training data in satellite image classification for land cover mapping: a review

ORCID Icon, ORCID Icon & ORCID Icon
Article: 2341414 | Received 19 Nov 2023, Accepted 07 Apr 2024, Published online: 14 Apr 2024

ABSTRACT

The current land cover (LC) mapping paradigm relies on automatic satellite imagery classification, predominantly through supervised methods, which depend on training data to calibrate classification algorithms. Hence, training data have a critical influence on classification accuracy. Although research on specific aspects of training data in the LC classification context exists, a study that organizes and synthetizes the multiplicity of aspects and findings of these researches is needed. In this article, we review the training data used for LC classification of satellite imagery. A protocol of identification and selection of relevant documents was followed, resulting in 114 peer-reviewed studies included. Main research topics were identified and documents were characterized according to their contribution to each topic, which allowed uncovering subtopics and categories and synthetizing the main findings regarding different aspects of the training dataset. The analysis found four research topics, namely construction of the training dataset, sample quality, sampling design and advanced learning techniques. Subtopics included sample collection method, sample cleaning procedures, sample size, sampling method, class balance and distribution, among others. A summary of the main findings and approaches provided an overview of the research in this area, which may serve as a starting point for new LC mapping initiatives.

Introduction

Land cover (LC) is a descriptor of the Earth’s surface, a crucial element for the study of the environment (Herold et al., Citation2006). LC information is critical to comprehend the interaction between humans and the environment (Gómez et al., Citation2016) and benefits society in areas ranging from disasters and resource management to climate and agriculture, supporting environmental and socio-economic decision-making (Wulder et al., Citation2008).

Remote sensing has been acknowledged as one of the most effective means to derive LC information, since it can provide adequate spatial coverage, systematic observations and reach inaccessible areas in a cost-effective way (Rybicki et al., Citation2020; Yifang et al., Citation2015). Today, an increasing number of Earth Observation satellites provide a variety of data options with sufficient spatial and temporal resolutions to derive LC information (Gómez et al., Citation2016). Systematic observations provided by optical sensors of medium spatial resolution (10–30 m), such as Landsat and Sentinel-2, and coarse (>100 m) spatial resolution, such as MODIS, have been the main source of data for deriving LC information (Chaves et al., Citation2020).

The typical approach for deriving LC information from remote sensing data consists in producing a thematic map through image classification (Stehman & Foody, Citation2019). Classification can be divided into pixel-based and object-based approaches, with the former being predominant. The extraction of thematic information settles in the identification of spectral and/or temporal patterns in the data that can be associated with different types of LC. The current LC mapping paradigm relies on advanced classification algorithms, specially supervised methods, to extract thematic information from satellite imagery (Talukdar et al., Citation2020; Wulder et al., Citation2018).

Supervised methods require samples of the land cover classes of interest, whose labels are known a priori, in order to train the classification algorithms. These samples are referred to as training samples or training data, among other denominations, and consist of representative examples from which classifiers learn a function to predict the class label of unseen instances. Due to playing a fundamental role in the algorithm learning and generalization, training data have a strong influence on classification accuracy (Maxwell et al., Citation2018). Studies have shown that selection of training data can have a larger impact on classification accuracy than the choice of classifier (Foody & Arora, Citation1997; C. Huang et al., Citation2002). Therefore, special attention should be dedicated towards the study of the training data used for supervised LC classification.

Given their crucial role in the classification process, numerous studies in the field of LC classification have investigated training data and their associated aspects. Whilst some studies prioritize training data as their central focus (Fonte et al., Citation2020; Hermosilla et al., Citation2022; Paris & Bruzzone, Citation2021; Santos et al., Citation2021; Shetty et al., Citation2021; Zhou et al., Citation2020), others treat it as a secondary contribution or mere complementary or side experiments (Cherif et al., Citation2022; Graesser et al., Citation2022; Rodriguez-Galiano et al., Citation2012; Song et al., Citation2012; Venter & Sydenham, Citation2021), reflecting the diverse approaches taken in this subject. Studies have addressed a range of aspects related to the training sample, most notably the collection method (Bratic et al., Citation2023; Costa et al., Citation2022; Fonte et al., Citation2020; Hermosilla et al., Citation2022; Jamshidpour et al., Citation2020; Congcong Li et al., Citation2021), quality (Pelletier et al., Citation2017; Santos et al., Citation2021), size (X. Huang et al., Citation2015; Congcong Li et al., Citation2014; Shetty et al., Citation2021; Zhou et al., Citation2020) and class balance and distribution (Nguyen et al., Citation2020; Preidl et al., Citation2020; Shetty et al., Citation2021; Zhou et al., Citation2020). Forms of iteratively acquiring training data have also been investigated (Congcong Li et al., Citation2016; E. Li et al., Citation2015; Jiayi Li et al., Citation2020; J. Wang et al., Citation2015). These existing researches, however, are dispersed throughout the scientific literature. Therefore, the need for a study that encapsulates, organizes and synthetizes the multiplicity of aspects, elements and methods described in previous research on training data for LC classification becomes evident.

This study serves as a systematic review, offering a structured and synthesized compendium of information on this subject. The primary objective is to identify key research topics pertaining to training data in the context of LC classification, presenting and analysing the various facets and contributions associated with each topic. The review adopts a methodological approach to identify and analyse articles that address aspects of the training dataset in the context of LC classification. Relevant documents were identified in the Scopus bibliographic database. A process of abstract and text screening helped exclude records that did not fit the scope of the review. We delimited our scope to encompass studies that employed a pixel-based classification approach using optical satellite images of medium to coarse spatial resolution, which represent the most usual approach and data sources used for LC classification. Then, selected documents were analysed and systematically synthetized. Finally, research topics were identified, outlining their subtopics, if applicable, and main methods and approaches found. Such review may benefit new LC mapping initiatives, serving as a starting point that provides an overview about what and how key aspects of training data are being addressed by researchers.

This paper is organized according to the following structure: Section 2 describes the materials and methods. Section 3 presents the results and discussion. Section 4 poses the conclusions.

Materials and methods

The review process is structured on a rigorous protocol aimed at systematically identifying and selecting relevant studies from highly ranked journals. Documents were identified from electronic databases using an appropriately formulated search query, following a procedure of abstract and full-text screening to select studies according to eligibility criteria. The intention was to ensure that the studies included in our review met criteria related to relevance and quality. An overview of the process is exhibited in .

Figure 1. Overview of the process employed for determining the included articles.

Figure 1. Overview of the process employed for determining the included articles.

Database identification and search

The electronic database chosen for searching the literature was Scopus. It is a well-established database used to consult scientific documents, having a very extensive archive with a wide interdisciplinary coverage. In the context of remote sensing, Scopus covers the main publishers and sources.

The search query was designed to cover different variations of terms that might refer to training data. Additionally, the terms land cover and classification were also included using a logical AND operator. Therefore, the final query aimed to retrieve papers that mention training data in the context of land cover classification. The search identified the terms within the title, keywords and abstract and covered the period starting from 2008, which corresponds to the year when the Landsat archive became freely available and an increasing number of studies about satellite image classification began to unfold, up to November 2022. Special operators were used in order to account for variations, while simplifying the search query. For instance, “training sampl*” covers variations such as “training sample”, “training samples” and “training sampling”. The search query can be consulted in .

Table 1. Search query used to retrieve documents from scopus.

Eligibility criteria and screening

Retrieved records were filtered based on source type. Only articles published in peer-reviewed journals were considered. Hence, conference proceedings, books, book chapters and others were removed. Furthermore, due to the large number of retrieved documents, articles were selected from the top six journals with most documents retrieved. These were high-impact and highly ranked remote sensing journals, namely Remote Sensing of Environment, ISPRS Journal of Photogrammetry and Remote Sensing, International Journal of Remote Sensing, Remote Sensing, IEEE Transactions on Geoscience and Remote Sensing and IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. Records from these journals corresponded to more than 50% of the total documents retrieved.

In a first stage, abstract screening was conducted to exclude documents outside the scope of this review or not related to training data. Subsequently, full-text screening was conducted to exclude further records. As the search included the abstract, numerous documents mentioned training data in their abstract but did not present an actual analysis or contribution targeted to training data, which resulted in the exclusion of such documents. In addition, documents focused on specific approaches or types of data were excluded. Articles that used Very High Spatial Resolution (VHSR), UAV or LiDar data were excluded, as their spatial resolution normally targets elements instead of classes (e.g. tree crowns instead of forest), which requires a different training sampling strategy. Studies that used exclusively Synthetic Aperture Radar or hyperspectral data were also removed, due to their distinctive characteristics and their atypical usage as data sources for LC classification. Moreover, documents should have met the inclusion criterion of pixel-based analysis. Thus, documents containing object-based, sub pixel or super pixel approaches were excluded. As the scope of the review was limited to LC mapping for a specific moment in time, articles focused on LC change without performing LC classification using training samples were also excluded. Finally, documents concentrated on discussing aspects more related to the learning process of algorithms (e.g. algorithmic mechanisms to put increased weight on more difficult samples) rather than the training data were excluded. It is worth noting that the inclusion strategy not only admitted documents focused on the training data but also documents focused on other aspects of the classification, with training data being a secondary yet relevant contribution.

Identification and systematization of research topics

After selecting the studies included in the review, a process of content analysis was conducted. Each document was analysed in order to identify research topics. Then, documents were characterized according to their contributions to each research topics. A single article may have had contributions to one or more topics. Having such information catalogued, a further step aimed to systematize the research topic subtopics and categories.

Results and discussion

This review included 114 articles from 6 high-impact and highly ranked remote sensing journals (). Remote Sensing was the journal with the most articles (55), accounting for almost half of the total number of documents. The other journals had a similar number of contributions, ranging from 7 (IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing) to 17 (ISPRS Journal of Photogrammetry and Remote Sensing). Distribution of selected documents by year of publication is exhibited in .

Figure 2. Number of documents included by year.

Figure 2. Number of documents included by year.

Table 2. Number of documents included by journal.

In terms of the content analysis, it was possible to identify four main research topics () related to training data in the context of LC classification: construction of the training dataset, quality of the sample, sampling design and advanced learning strategies.

Figure 3. Research topics and subtopics.

Figure 3. Research topics and subtopics.

Construction of the training dataset

A central element across the examined studies is how the training dataset was constructed. This review identified distinct training sample collection methods, i.e. processes employed to acquire training sampling units. Essentially, training data was collected through manual, automatic or hybrid methods, each having different subtypes ().

Table 3. Summary of the sample collection methods, their types and examples of data sources. NA: Non-Applicable.

Manual collection

The acquisition process of manual methods typically relies on collecting training data by visual interpretation of images or during field visits and ground surveys. Multiple studies performed manual training sample collection through visual interpretation of images (Belgiu & Csillik, Citation2018; Ghorbanian et al., Citation2020; Chenxi Li et al., Citation2021). In this process, analysts are required to mark points or delineate areas containing samples of the classes of interest in an image, typically within a Geographic Information System (GIS) environment. The most common data sources used for this purpose were VHSR images (Q. Li et al., Citation2020; Naboureh et al., Citation2020), including aerial photographs (Mellor et al., Citation2015) and Google Earth images (Congcong Li et al., Citation2016), although moderate and high spatial resolution (HR) images were also reported to be used (Jia et al., Citation2020; Maulik & Chakraborty, Citation2013). Some studies took advantage of auxiliary data sources to assist manual collection. Temperature and precipitation data (Congcong Li et al., Citation2016), historical imagery (Zhao et al., Citation2016) and field survey photos (J. Wang et al., Citation2015) were used in complement to VHSR or HR images to confirm sample coherence. Time series of spectral data were also used to identify temporal patterns and ensure temporal consistency (Congcong Li et al., Citation2016; Zhao et al., Citation2016). Although collection through visual interpretation may eliminate the need for field visits, it is limited by the visual information captured by the image. As a result, it may prove unfeasible to differentiate certain LC classes, leading to potential interpretation errors. Additionally, visual interpretation can be time- and labour-intensive process.

Another form of manual collection of training data consists in acquiring samples during ground surveys or field visits (Amor et al., Citation2018; Q. Zhu et al., Citation2021). Typically, samples of LC classes are identified on site and their geo-location is determined using a GPS. Additionally, photos taken on site can help sample characterization (Banks et al., Citation2015; J. Wang et al., Citation2015). A few studies reported using a combination of visual interpretation and field visits for training sample collection (Álvarez-Martínez et al., Citation2018; Nguyen et al., Citation2020; Santos et al., Citation2021; Sun et al., Citation2015). Such combination can be useful in situations in which certain classes are indistinguishable based on visual interpretation of aerial imagesfor instance, distinguishing between crop types. However, field visits can be highly demanding in terms of time and resources and not capable of collecting a large amount of training data.

Automatic collection

Given the costly and labour-intensive nature of manual collection of training samples, alternative approaches have introduced automatic methods. Most of the articles examined in this review adopted these kind of methods. Distinct subtypes of automatic methods were identified. The most common subtype of automatic collection was based on the extraction of training samples with the aid of pre-existing datasets, also referred to as reference datasets (Cherif et al., Citation2022; Congcong Li et al., Citation2021; Shao & Lunetta, Citation2012; Z. Zhang et al., Citation2022). This procedure relies on using spatial and thematic information of reference datasets to determine the location and class label of training sampling units.

Reference data for automatic extraction may come from different sources. Multiple studies extracted training data automatically from existing LC maps, such as the National Land Cover Database (NLCD) (Brown et al., Citation2020; Colditz, Citation2015; Xie et al., Citation2020; Zhou et al., Citation2020), Earth Observation for Sustainable Development of forest (EOSD) land cover map (Hermosilla et al., Citation2022), Corine Land Cover (CLC) (Leinenkugel et al., Citation2019; Rybicki et al., Citation2020), GlobeLand30 (Y. Hu et al., Citation2018; Lin et al., Citation2019; Ma et al., Citation2017), MODIS land cover product (Xie et al., Citation2019; H. K. Zhang & Roy, Citation2017), GLC 2000 (Radoux et al., Citation2014), MapBiomas (Cherif et al., Citation2022) and other national LC maps (Viana et al., Citation2019). Training samples were also automatically extracted from other sources of LC data, such as the European Land Use-Land Cover Area Frame Survey (LUCAS) dataset (Ghassemi et al., Citation2022; Venter & Sydenham, Citation2021). Crowdsourced data, such as OpenStreetMap (Fonte et al., Citation2020; Liu et al., Citation2020) and farmer-submitted images (S. Wang et al., Citation2020), were also used as data sources to assist in automatic extraction of samples.

While some studies extracted training data automatically from a single pre-existing LC product (Cherif et al., Citation2022; Howard et al., Citation2012; Z. Zhang et al., Citation2022), others combined multiple datasets to enhance extraction confidence or to take advantage of products targeting specific classes (Bratic et al., Citation2023; Dash et al., Citation2023). For instance, studies in the United States used the NLCD, the Cropland Data Layer (CDL), the National Wetland Inventory (NWI), among others to extract training data automatically (Congcong Li et al., Citation2021; Yang et al., Citation2018). In Canada, the EOSD land cover map, the National Wetland Status dataset and the National Forest Inventory were used for automatic training sample selection (Hermosilla et al., Citation2022). Studies developed in Europe used CLC in conjunction with other products from the Copernicus Land Monitoring Service, such as the High Resolution Layers (HRL), to collect training data automatically (Leinenkugel et al., Citation2019; Rybicki et al., Citation2020). Automatic collection using pre-existing datasets allows the acquisition of larger samples. However, it may often result in the inclusion of mislabelled sampling units, thereby demanding a process of quality control, which is discussed in more detail in section (3.2.2).

A less common subtype of automatic training sample collection methods was extraction based on spectral statistics. This included unsupervised collection and threshold or rule-based collection. Training data were automatically selected in an unsupervised fashion by applying clustering algorithms and labelling the resulting clusters with a quantitative method that compares the clusters with a reference map (Langford et al., Citation2019; Paris et al., Citation2022) or by spectral indices ranking (Q. Zhang et al., Citation2023). Iterative processes used spectral indices to select samples based on index values, while also including procedures designed to maximize sample diversity and ensure label consistency (Jiayi Li et al., Citation2019). Some studies relied on rules of spectral patterns (Aswatha et al., Citation2017) or empirically defined thresholds of spectral indices (Jing et al., Citation2015; E. Li et al., Citation2015) in order to extract training data automatically. A filtering process was applied in order to remove sampling units that met the thresholds for more than one class (X. Huang et al., Citation2015).

Hybrid collection

A small group of studies used hybrid or semi-automatic methods, which combined automatic and manual collection. On the one hand, combinations took advantage of automatic extraction from high confidence pre-existing products targeted to map specific classes (Z. Zhu et al., Citation2016) to complement a predominantly manually collected sample. On the other hand, manual collection granted the flexibility to acquire training data for a small set of more complex classes, while the remaining classes had samples collected through automatic approaches (Shetty et al., Citation2021). Additionally, manual samples were used to conduct a preliminary classification, which was then used to collect training data automatically (Knorn et al., Citation2009).

Sample quality

This research topic refers to the study of the influence of noise in the training data and to what extent it affects classification accuracy. It also lists sample cleaning and refinement procedures identified among the analysed documents.

Class/label noise

With respect to mislabelling in the training data, also referred to as class or label noise, multiple studies aimed at evaluating the impact on classification accuracy when removing or adding mislabelled sampling units to the training dataset. These studies focused on comparing the accuracy obtained with and without mislabelling in the training dataset. Outlier removal procedures were employed to obtain a cleaned, less noisy sample. These procedures included change detection to ensure acquiring only unchanged pixels when collecting samples from an outdated map (Wessels et al., Citation2016), classification uncertainty ranking (Venter & Sydenham, Citation2021), removal based on thresholds of Mahalanobis distance and erosion of edge pixels (Z. Zhu et al., Citation2016) and outlier reduction based on self-organizing maps and Bayesian inference (Santos et al., Citation2021). Additionally, a group of articles investigated classification sensitivity to different levels of mislabelling by introducing artificial mislabelling in the training data (Mellor et al., Citation2015; Pelletier et al., Citation2017; Rodriguez-Galiano et al., Citation2012).

Studies presented fairly distinct conclusions regarding the effect of mislabelling on accuracy. On one hand, some studies suggested that no significant difference was verified when applying cleaning procedures to remove potentially mislabelled sampling units from the training dataset (Venter & Sydenham, Citation2021; Wessels et al., Citation2016; Z. Zhu et al., Citation2016). On the other hand, other studies highlighted the influence of mislabelling in classification accuracy, demonstrating that removing class noise resulted in an increase in classification accuracy (Santos et al., Citation2021). Given these divergent findings, it is difficult to draw definitive conclusions about the effect of mislabelling on classification accuracy. In terms of sensitivity to different levels of mislabelling, introduction of artificial mislabelling revealed that classification accuracy decreased as the rate of mislabelling in the training data increased, although a certain stability was verified with low levels of mislabelling (Mellor et al., Citation2015; Pelletier et al., Citation2017; Rodriguez-Galiano et al., Citation2012). Studies also revealed that larger samples were more robust to class noise. They too concluded that the higher the amount of mislabelling, the more computationally expensive the classification became and more uncertain were the results.

It should be noticed, however, that the few aforementioned studies used the Random Forest classifier, which has been acknowledged as being robust to noise in the training data (Maxwell et al., Citation2018). This might contribute to explain why cleaning procedures and low levels of mislabelling exhibited negligible impact on classification accuracy. It is expected that distinct classifiers would be affected by class noise differently. In this regard, this review lacked studies including a more diverse set of classifiers. Furthermore, it should be considered that other aspects might influence the impact of mislabelling on classification accuracy, namely the class nomenclature and classification dimensionality.

Pixel purity, another aspect related to label noise, was also examined. The purity of a pixel refers to the extent to which a pixel belongs to a single class and is not contaminated by other classes. Pixels that fall into the latter case are referred to as mixed pixels. A group of studies aimed at experimenting with different configurations of pixel purity (Y. Chen et al., Citation2016; Colditz, Citation2015; He et al., Citation2019; Shao & Lunetta, Citation2012). They proposed to create samples with pixels having distinct degrees of purity and evaluated the accuracy obtained with samples composed of pure versus mixed pixels. However, their results showed no convergence in terms of which configuration was preferable. It is worth noting that these studies were conducted with MODIS data. The lack of studies using data from sensors with finer spatial resolution may suggest that pixel purity has not been a concern when working with data from such sensors.

Sample cleaning and refinement procedures

This subsection presents distinct training sample cleaning and refinement procedures found in this review. Here, most studies were limited to including such procedures as part of their methodology, without conducting an assessment of the procedures effect on classification accuracy.

Removing mislabelled sampling units from the training dataset was a recurring concern in numerous studies (Hermosilla et al., Citation2022; Congcong Li et al., Citation2021; Rybicki et al., Citation2020), especially those relying on automatic collection of training data from pre-existing reference datasets. Issues related to the reference data, including nomenclature (map legend), thematic inaccuracy, Minimum Mapping Unit (MMU) and contemporaneity were identified as potential sources of labelling errors. Therefore, various filtering and refinement protocols have been proposed to address these issues, aiming to prevent mislabelling in the training sample.

Compatibility between the legend of the reference map and the intended class nomenclature of the classification was ensured prior to extracting training data (Fonte et al., Citation2020; Ghassemi et al., Citation2022). Such harmonization of nomenclatures not only helped determine the correspondence of reference map classes with target classification nomenclature (Rybicki et al., Citation2020) but also prevented sampling in areas prone to class ambiguity (Liu et al., Citation2020; Venter & Sydenham, Citation2021).

Simply intersecting multiple datasets can improve spatial and thematic coherence during sample extraction (Rybicki et al., Citation2020; Yang et al., Citation2018; J. Zhang et al., Citation2023). More sophisticated approaches intersected multiple datasets to generate a hierarchical training pool based on the level of reliability defined by distinct intersections of datasets, then preferably selecting samples from the most reliable level (Hermosilla et al., Citation2022). Agreement among distinct pre-existing products was also considered for including samples in a consensus pool, which was expanded through an iterative process (Congcong Li et al., Citation2021). A super-pixel approach was employed to collect samples from areas where 8 LC products were consistent (Jin et al., Citation2022).

Another refinement strategy consisted in ensuring temporal consistency, meaning collecting only sampling units whose class was constant in different versions of the dataset (Xie et al., Citation2019, Citation2020; H. K. Zhang & Roy, Citation2017). Alternatively, some studies reinforced temporal consistency by using change detection methods (Wessels et al., Citation2016; Xie et al., Citation2020) or vegetation disturbance data (Hermosilla et al., Citation2022), therefore preventing the acquisition of potentially mislabelled sampling units when using outdated reference maps.

Regarding the spatial component, a common practice was to avoid borders between classes. To this end, processes of polygon erosion were applied (Leinenkugel et al., Citation2019; Liu et al., Citation2020; Radoux et al., Citation2014; Viana et al., Citation2019). Moreover, some studies proposed to consider only sampling units located in a homogeneous region. For instance, sampling units were collected only if all their neighbours within a given window (e.g. 3 × 3pixels) belonged to the same class (Xie et al., Citation2019, Citation2020; H. K. Zhang & Roy, Citation2017) or if the spectral variation among the pixel was lower than a given limit (Xie et al., Citation2019). Harmonization of MMU with target classification resolution was also employed to prevent acquiring mislabeled training data (Howard et al., Citation2012; H. K. Zhang & Roy, Citation2017).

A further sample refinement strategy consisted in creating filtering rules based on empirical thresholds tailored for each class. Spectral indices such as the Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), Biophysical Composition Index (BCI), digital numbers from night-time light (NTL) datasets and the Copernicus HRLs values were used to refine samples. For instance, thresholds of BCI and NTL helped detect and remove artificial surface pixels misclassified as cultivated land (Lin et al., Citation2019). Similarly, thresholds of HRL Imperviousness and Tree Cover Density helped distinguish between forested areas and artificial surface, whilst thresholds of NDWI and NDVI can evidence mislabelling related to water bodies (Rybicki et al., Citation2020)

Other filtering approaches included using methods based on statistics to refine the automatically collected training data. Probability threshold-based outlier detection (Radoux et al., Citation2014) and the Euclidean distance between sampling units and the spectral centroid of the class (Xie et al., Citation2019) were employed to remove mislabelled training sampling units. It is worth noting that although these types of filtering strategies can amplify the probability of selecting reliable sampling units, they might eliminate diverse yet informative sampling units in the process. As a result, degrading the classifier’s predictive capabilities.

More advanced refinement alternatives were also identified. A clustering technique aimed to select training samples from the cluster with the fewer doubtful assignments in order to overcome the limitations associated with the automatic extraction based on a pre-existing land cover and land use map (Viana et al., Citation2019). A novel and interesting alternative proposed to address mislabelling prior to sample extraction by firstly conducting spatial and semantic decomposition of a reference thematic map and only then proceeding to sample selection (Paris & Bruzzone, Citation2021). Such approach helped address issues related to spatial and semantic aggregation, which arise, respectively, from aspects such as the minimum mapping unit and the level of abstraction of the map legend. The spatial decomposition relied on a clustering technique to identify pixels (i.e. sampling units) correctly associated with the class of their polygon of origin. On the other hand, the semantic decomposition relied on a Gaussian distribution and Mahalanobis distance to identify classes aggregated under the same semantic label.

Feature noise

Only one study included in this review addressed feature noise, i.e. noise in the satellite measurements. Based on the premise that training sampling units subjected to cloud contamination are detrimental to the classifier, an approach was proposed to recover unclear training sampling units based on their similarity with clear sampling units from the same class (T. Hu et al., Citation2018). The lack of studies focusing on feature noise might be explained by the common workflow adopted in remote sensing image classification, which addresses feature noise such as cloud contamination in a pre-processing stage prior to training data collection. Regardless of this established approach, further studies could investigate how including sampling units that were originally not noise-free might affect classification accuracy.

Sampling design

This research topic analyses the articles from the perspective of four main subtopics: training sample size, sampling method and class balance and distribution (). As making adjustments to sample size and configurations of class balance and distribution is relatively simple, some studies whose main contribution was related to other research topics also conducted experiments with sample size, class balance and distribution as a secondary contribution. Although in such cases the analysis is normally limited, e.g. to a single classifier, they could still provide an overview of the influence of sampling design on classification accuracy. On the other hand, studies with a primary emphasis on sampling design could offer a thorough investigation about this particular subtopic.

Table 4. Subtopics of the sampling design subtopic and their categories (NA: Non-Applicable).

Training sample size

A number of studies conducted experiments to evaluate the impact of training sample size in classification accuracy. Essentially, these experiments consisted in selecting increasingly larger subsets of the training data to train a classifier and then compute the accuracy at each step.

Previous studies suggested that the ideal amount of training sampling units might depend on aspects such as the algorithm, number of input variables and the extent and spatial variability of the study area (C. Huang et al., Citation2002). Nonetheless, a general rule found in the literature advocates that larger samples tend to yield better classification accuracies (Chenxi Li et al., Citation2021; Congcong Li et al., Citation2014; Zhou et al., Citation2020), as they are more likely to represent class variability adequately.

In this review, most studies testing different sample sizes concluded that there was a positive correlation between sample size and classification accuracy (Heydari & Mountrakis, Citation2018; Chenxi Li et al., Citation2021; Song et al., Citation2012; Zhou et al., Citation2020). In other words, increasing sample size resulted in increasing accuracy. However, the most significant increase was normally seen at the initial increments in sample size (X. Huang et al., Citation2015; Congcong Li et al., Citation2016; Xie et al., Citation2019; Z. Zhu et al., Citation2016). After a certain point, in most cases, the curves tended to reach a plateau, showing negligible gains in accuracy with further increments in the number of training sampling units. Only a small group of studies concluded that variations in sample size resulted in non-significant increases or even a small decrease in accuracy (Jing et al., Citation2015; Shetty et al., Citation2021).

Most of the studies considered in the evaluation of the impact of training sample size used the Random Forest classifier, with only a limited number of studies comparing different classifiers. These compared algorithms such as Random Forest, SVM, CART, k-Nearest Neighbors (k-NN) and Artificial Neural Networks (ANN). The results revealed that Random Forest was less sensitive to variations in training sample size. For instance, when experimenting with training data subsets of different sizes, Random Forest was found to be more stable in comparison to SVM, CART and k-NN (Jing et al., Citation2015; Shetty et al., Citation2021). Algorithms such as SVM were likely to benefit more from the increase in sample size (Paris & Bruzzone, Citation2021). In addition, experiments with SVM, k-NN, ANN and Bag Tree (a tree ensemble approach slightly different from the Random Forest implementation) revealed that the tree ensemble not only exhibited the smallest variation in accuracy after changes in the training sample size, but also had the best results with the smallest training sample (Heydari & Mountrakis, Citation2018). Studies focused specifically on Random Forest confirmed its low sensitivity to variations in training sample size. A ten-fold expansion in sample size resulted in a gain of only 3% in accuracy (Venter & Sydenham, Citation2021). Similarly, increasing the amount of training sampling units by 50 times produced gains in accuracy limited to approximately 2% (Z. Zhu et al., Citation2016).

Sampling method

The sampling method corresponds to how the sampling units are selected from a statistical perspective. The most common sampling methods found in the remote sensing literature are simple random, stratified random and systematic (Chenxi Li et al., Citation2021). Many articles included in this review did not provide information regarding the sampling method used for training data collection. Others only mentioned random selection, without specifying whether it was simple or stratified. However, among the studies that disclosed their sampling method, it was clear that the majority preferred a stratified random sampling approach, a technique that helps reduce variance. Examples of studies that adopted stratified random sampling include (Y. Hu et al., Citation2018; Paris & Bruzzone, Citation2021; Xie et al., Citation2019; Zhong et al., Citation2021; Zhou et al., Citation2020). Articles that compared simple random, stratified random and systematic sampling concluded that the stratified approach provided better results in terms of accuracy (Colditz, Citation2015; Chenxi Li et al., Citation2021). Alternative sampling methods, such as object-oriented sampling, revealed promising results (Chenxi Li et al., Citation2021).

Class distribution and balance

Class distribution and balance are additional aspects related to sampling design. Class distribution refers to how training sampling units are allocated among classes (Z. Zhu et al., Citation2016). Essentially, sampling units can be distributed equally or unequally among classes. In the equal distribution all classes have the same amount of training sampling units, whereas in the unequal the amounts are different. Typically, the number of sampling units of a class in an unequal distribution is proportional to the area of that class in the study region (Hermosilla et al., Citation2022). Such an approach is also known as proportional distribution. Other unequal distributions might occur as a result of a simple random sampling (Pierdicca et al., Citation2014). Class balance, on the other hand, refers to the difference between the amount of training sampling units of the majority and minority classes (Z. Zhu et al., Citation2016). A sample is considered imbalanced if such difference is very large.

Configuration of class distribution and balance in the training dataset can affect classification accuracy. Multiple studies adopted a balanced and equally distributed sample (X. Huang et al., Citation2015; Jia et al., Citation2020; Rybicki et al., Citation2020; T. Zeng et al., Citation2019), which tends to equalize the priorities of each class during training, thereby avoiding poor predictive performance for classes with a comparatively small amount of training sampling units. However, studies that compared equal and proportional configurations of class distribution reported that superior accuracies were consistently achieved using a proportional distribution (Colditz, Citation2015; Nguyen et al., Citation2020; Preidl et al., Citation2020; Shetty et al., Citation2021; Z. Zhu et al., Citation2016). Proportions were typically derived from reference LC datasets (Hermosilla et al., Citation2022; Venter & Sydenham, Citation2021). Despite exhibiting higher overall accuracy, a proportional distribution may result in an imbalanced sample, which may cause degradation of the accuracy of the minority class. Such degradation can be explained by the learning process of some classifiers, which focuses on minimizing the overall error rate, ignoring class performance. Hence, due to having fewer training sampling units, minority classes normally have less influence on the accuracy, which causes them to be overlooked in the error minimization process (Maxwell et al., Citation2018).

In order to prevent imbalance in the sample, some studies adapted the proportional distribution approach, fixing a limit of minimum and maximum sampling units per class (Fonte et al., Citation2020; Z. Zhang et al., Citation2022; Z. Zhu et al., Citation2016). However, studies lacked a proper assessment of how such limits affected the accuracy of minority classes. Other alternatives to address sample imbalance proposed an oversampling of the minority class (De Simone et al., Citation2022; Maxwell et al., Citation2018) and/or an undersampling of the majority class (Naboureh et al., Citation2020). A balanced and equally distributed training sample was reported to yield a significantly higher accuracy for the minority class, despite the slightly lower overall accuracy (Shetty et al., Citation2021). With respect to sample size, there was evidence that the positive effect of increasing sample size was greater on samples with an equal distribution in comparison with samples following a proportional configuration (Z. Zhu et al., Citation2016).

It is worth to point out the importance of considering the class distribution of the validation or test sample in the accuracy assessment. Different distributions of the test sample might result in distinct accuracy results, as demonstrated in (Heydari & Mountrakis, Citation2018). The same study showed that a classifier trained with a proportional training dataset exhibited fairly higher overall accuracies when the accuracy assessment was conducted with a proportional test dataset in comparison to an equally distributed test set. In contrast, a training dataset with equal class allocation exhibited no significant difference in overall accuracy between the assessments conducted with equal and proportional distributions in the test set. In summary, when comparing different configurations of training sample class distribution, using a test set with an equal class distribution seems to be the most reasonable approach, since a test set with a proportional allocation tends to inflate the results for the proportional training sample. However, among the studies evaluated in this part of the review, only a few explained their test sample distribution. As a result, it was not possible to analyse to what extent the proportional training approach could have been favoured by proportional test sets. Nonetheless, there was evidence that the training dataset with proportional class distribution outperformed the equal allocation approach when the accuracy assessment was conducted with an equal allocation test set (Colditz, Citation2015; Shetty et al., Citation2021). Regardless of the comparative studies, test sample distribution should be chosen according to the study objectives. Stehman and Foody (Citation2019) presented an extensive discussion regarding accuracy assessment, including sampling design. They suggested, for instance, that equal distribution should be used if all classes were intended to have similar user’s accuracy and were considered equally important for the study objectives. Other test sample distribution configurations can be used if the aim is to determine an overall scene accuracy.

Advanced learning techniques

The advanced learning techniques relates to specific learning strategies employed in supervised classification of land cover. Techniques were grouped as active learning, semi supervised learning, their adaptations and transfer learning. Active learning relies on the interaction between model and analyst to collect additional training sampling units iteratively, hence reducing the cost of manual labelling. The model conducts a classification and provides the analyst with unseen examples selected by a query criterion (e.g. classification uncertainty). Next, the analyst is responsible to label these examples, which are then incorporated to the previous training sample. This process is repeated through multiple iterations. The purpose of active learning is to create a training sample based on the selection of the most informative training sampling units, which can maximize generalization capabilities (Tuia et al., Citation2011). A few studies included in this review demonstrated the potential of active learning for land cover classification (Amor et al., Citation2018; Jiayi Li Chang, et al., Citation2020). Other studies proposed adaptations to the usual active learning procedure, such as labelling additional training sampling units in an automatic fashion (X. Huang et al., Citation2015; Kim et al., Citation2017). Moreover, some studies adopted query criteria commonly used in active learning, such as classification confidence and uncertainty, to expand the training sample by selecting informative sampling units (Jiayi Li Hu, et al., Citation2019; J. Wang et al., Citation2015; S. Zeng et al., Citation2018), despite not strictly following the active learning process.

Semi-supervised learning, on the other hand, proposes to expand the training sample by incorporating pseudo-labels obtained from unlabelled data after an initial classification conducted with a small and human-labelled training dataset. Then, the pseudo-labelled sampling units with the highest classification confidence are added to the initial training sample, forming a new training dataset. Such an expanded training set can, thereby, leverage the predictive capabilities of future classifications. A variety of studies examined in this review successfully demonstrated the potential of this approach (E. Li et al., Citation2015; Maulik & Chakraborty, Citation2013; Naboureh et al., Citation2020). Other studies combined principles of semi-supervised and active learning to expand the training sample and improve the classification results (Jiayi Li Chang, et al., Citation2020; Li, Gamba et al., Citation2014). Different approaches relied on unsupervised clustering to provide pseudo-labelled data to expand the training sample (J. Hu et al., Citation2019; Langford et al., Citation2019; Paris et al., Citation2022; Q. Zhang et al., Citation2023). A lower resolution land cover product was used to transfer samples and then generate pseudo-labels for classifying higher resolution images (Y. Chen et al., Citation2023). Some approaches required no initial training sample, being convenient in situations where the collection of training data is unfeasible (H. Li et al., Citation2017).

Multiple studies employed transfer learning in order to classify land cover. Transfer learning is a machine learning technique that aims to facilitate knowledge transfer between distinct yet related problems or tasks (Bejiga et al., Citation2019). Among the different transfer learning techniques, domain adaptation (DA) can be considered as one of the most suitable for LC classification problems. DA consists in exploiting information from a source domain to build models that can provide good performances on a target domain (Banerjee et al., Citation2015). In remote sensing image classification source and target domains can be understood as images acquired on distinct geographic locations or on the same location at different points in time (Tuia et al., Citation2016). In this review, a group of studies proposed to use available labelled samples from a reference year (source domain) to create a training sample and perform classification in a target year (target domain) (Lin et al., Citation2019; Zhong et al., Citation2021; Q. Zhu et al., Citation2021). Since changes in LC might occur between the reference and target years, studies adopted strategies to mitigate mislabelling when transferring the labels. Samples would undergo change detection procedures based on, for instance, spectral angle distance (Ghorbanian et al., Citation2020; H. Huang et al., Citation2020), change vector analysis (X. Chen et al., Citation2012; Lin et al., Citation2019; Liu et al., Citation2020) and multivariate alteration detection (Wessels et al., Citation2016), to ensure that only labels of samples considered as unchanged were transferred from the source to the target domain. With respect to the spatial domain, classifiers trained with samples from a given geographical area (source domain) were used to classify land cover types of a different area (target domain) (Q. Li et al., Citation2020). Some studies coupled transfer learning with elements of semi supervised learning, aiming at deriving pseudo-labels for the target domain based on information from the source domain with the purpose of training a model for the target domain (Hao Li et al., Citation2021; Tong et al., Citation2020, Citation2023). Transfer learning was also coupled with active learning to update an LC map (Demir et al., Citation2013). A distinct type of DA found in the review proposed to analyse the source and target domains in order to identify invariant classification features, meaning features that are discriminative regardless of the domain (Bejiga et al., Citation2019; Ye et al., Citation2017). Although this approach focused more on the features, it can leverage the potential of existing labelled samples, extending their contribution to other domains.

Overall, active, semi-supervised and transfer learning techniques can contribute to alleviate the need for extensive training data, which is a limiting factor for the success of supervised image classification.

Conclusion, limitations and future directions

This literature review provided an overview of the current research on training data in the context of LC classification. Using a protocol of identification and selection of relevant documents, 114 peer-review articles were analysed, allowing for the identification of four major research topics, namely construction of the training dataset, sample quality, sampling design and advanced learning techniques. Included documents were characterized according to each research topic, with an identification of their subtopics and categories. The review revealed the main types of studies and approaches that have been developed within the research topics.

For the construction of the training dataset, three main sample collection methods, manual, automatic and hybrid, were identified. Manual methods included photointerpretation and field surveys, while automatic methods involved extraction assisted by pre-existing datasets and spectral statistics, with the former being the most usual approach. Hybrid methods that integrated both manual and automatic approaches were also identified. In terms of sample quality, the review examined the influence of noise in training datasets, with divergent findings on its impact. Most studies utilized the Random Forest classifier, known for its robustness to noise, thus potentially biasing the results. Sample cleaning procedures were proposed to prevent mislabelling. Common filtering protocols included intersection of multiple datasets, performing change detection, eroding borders and use of spectral thresholds. Sampling design considerations encompassed sample size, sampling method, class distribution and class balance. Overall, studies showed a positive correlation between sample size and classification accuracy and stratified random sampling exhibited better results. Balanced samples were favoured, as imbalanced ones tended to degrade minority class accuracy. Proportional distribution consistently yielded superior results. The review also identified advanced learning techniques, which included active learning, semi-supervised learning and transfer learning. These showed promising results, especially in situations in which the collection of a sufficient amount of training data through conventional approaches was difficult.

While this review provided a valuable organization and synthesis of the research in the field, certain limitations were inherent in our methodological approach. The analysis of most topics was limited to identifying and presenting existing approaches, rather than performing a quantitative comparative analysis in order to provide guidelines. This was due to not only the lack of comparative studies, but also to the impossibility of comparing approaches from distinct studies, which used different class nomenclatures, classifiers, sample sizes and validation strategies. We believe that future studies could leverage the approaches catalogued in this review to conduct a comparative analysis aiming at identifying the most promising methods. Furthermore, contrary to our expectations, the reviewed studies did not address a fundamental question about training data: sample representativeness. There was insufficient material to discuss whether a homogeneous training data composition is more or less advantageous. Similarly, studies lacked an assessment of how homogeneous a sample collected by automatic methods could be. To this end and considering the current landscape where automatic collection seems predominant, we believe future studies could investigate to what extent large training samples collected by automatic methods are representative of the classes and whether those samples are superior to a smaller, manually collected sample designed to ensure class diversity. Lastly, although the Random Forest classifier seemed to be preferred among researches, it would be beneficial to have studies testing a more diverse set of classification algorithms.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data were collected from the Scopus website and discussed in the data collection section.

Additional information

Funding

This research was funded by Fundação para a Ciência e Tecnologia [FCT] grant number [PRT/BD/153517/2021], the Forest Research Centre and Associated Laboratory TERRA [UIDB/00239/2020]. Mário Caetano acknowledges the financial support provided by Fundação para a Ciência e a Tecnologia, Portugal [FCT] under the project [UIDB/04152/2020] - Centro de Investigação em Gestão de Informação [MagIC].

References

  • Álvarez-Martínez, J. M., Silió-Calzada, A., & Barquín, J. (2018). Can training data counteract topographic effects in supervised image classification? A sensitivity analysis in the Cantabrian Mountains (Spain). International Journal of Remote Sensing, 39(23), 8646–16. https://doi.org/10.1080/01431161.2018.1489163
  • Amor, I. B. S. B., Chehata, N., Bailly, J. S., Farah, I. R., & Lagacherie, P. (2018). Parcel-based active learning for large extent cultivated area mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(1), 79–88. https://doi.org/10.1109/JSTARS.2017.2751148
  • Aswatha, S. M., Mukherjee, J., Biswas, P. K., & Aikat, S. (2017). Toward automated land cover classification in landsat images using spectral slopes at different bands. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(3), 1096–1104. https://doi.org/10.1109/JSTARS.2016.2602390
  • Banerjee, B., Bovolo, F., Bhattacharya, A., Bruzzone, L., Chaudhuri, S., & Buddhiraju, K. M. (2015). A novel graph-matching-based approach for domain adaptation in classification of remote sensing image pair. IEEE Transactions on Geoscience and Remote Sensing, 53(7), 4045–4062. https://doi.org/10.1109/TGRS.2015.2389520
  • Banks, S., Millard, K., Pasher, J., Richardson, M., Wang, H., & Duffe, J. (2015). Assessing the potential to operationalize shoreline sensitivity mapping: Classifying multiple wide fine quadrature polarized RADARSAT-2 and landsat 5 scenes with a single random forest model. Remote Sensing, 7(10), 13528–13563. https://doi.org/10.3390/rs71013528
  • Bejiga, M., Melgani, F., & Beraldini, P. (2019). Domain adversarial neural networks for large-scale land cover classification. Remote Sensing, 11(10), 1–20. https://doi.org/10.3390/rs11101153
  • Belgiu, M., & Csillik, O. (2018). Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sensing of Environment, 204, 509–523. https://doi.org/10.1016/j.rse.2017.10.005
  • Bratic, G., Oxoli, D., & Brovelli, M. A. (2023). Map of land cover agreement: Ensambling existing datasets for large-scale training data provision. Remote Sensing, 15(15), 3774. https://doi.org/10.3390/rs15153774
  • Brown, J. F., Tollerud, H. J., Barber, C. P., Zhou, Q., Dwyer, J. L., Vogelmann, J. E., Loveland, T. R., Woodcock, C. E., Stehman, S. V., Zhu, Z., Pengra, B. W., Smith, K., Horton, J. A., Xian, G., Auch, R. F., Sohl, T. L., Sayler, K. L., Gallant, A. L., Zelenak, D., & Rover, J. (2020). Lessons learned implementing an operational continuous United States national land change monitoring capability: The Land Change Monitoring, Assessment, and Projection (LCMAP) approach. Remote Sensing of Environment, 238, 111356. https://doi.org/10.1016/j.rse.2019.111356
  • Chaves, M. E., Picoli, M. C., & Sanches, I. D. (2020). Recent applications of landsat 8/OLI and sentinel-2/MSI for land use and land cover mapping: A systematic review. Remote Sensing, 12(18), 3062. https://doi.org/10.3390/rs12183062
  • Chen, X., Chen, J., Shi, Y., & Yamaguchi, Y. (2012). An automated approach for updating land cover maps based on integrated change detection and classification methods. ISPRS Journal of Photogrammetry and Remote Sensing, 71, 86–95. https://doi.org/10.1016/j.isprsjprs.2012.05.006
  • Chen, Y., Song, X., Wang, S., Huang, J., & Mansaray, L. R. (2016). Impacts of spatial heterogeneity on crop area mapping in Canada using MODIS data. ISPRS Journal of Photogrammetry and Remote Sensing, 119, 451–461. https://doi.org/10.1016/j.isprsjprs.2016.07.007
  • Chen, Y., Zhang, G., Cui, H., Li, X., Hou, S., Ma, J., Li, Z., Li, H., & Wang, H. (2023). A novel weakly supervised semantic segmentation framework to improve the resolution of land cover product. ISPRS Journal of Photogrammetry and Remote Sensing, 196, 73–92. https://doi.org/10.1016/j.isprsjprs.2022.12.027
  • Cherif, E., Hell, M., & Brandmeier, M. (2022). DeepForest: Novel deep learning models for land use and land cover classification using multi-temporal and -modal sentinel data of the Amazon Basin. Remote Sensing, 14(19). https://doi.org/10.3390/rs14195000
  • Colditz, R. R. (2015). An evaluation of different training sample allocation schemes for discrete and continuous land cover classification using decision tree-based algorithms. Remote Sensing, 7(8), 9655–9681. https://doi.org/10.3390/rs70809655
  • Costa, H., Benevides, P., Moreira, F. D., Moraes, D., & Caetano, M. (2022). Spatially stratified and multi-stage approach for national land cover mapping based on sentinel-2 data and expert knowledge. Remote Sensing, 14(8), 1865. https://doi.org/10.3390/rs14081865
  • Dash, P., Sanders, S. L., Parajuli, P., & Ouyang, Y. (2023). Improving the accuracy of land use and land cover classification of landsat data in an agricultural watershed. Remote Sensing, 15(16), 1–24. https://doi.org/10.3390/rs15164020
  • Demir, B., Bovolo, F., & Bruzzone, L. (2013). Updating land-cover maps by classification of image time series: A novel change-detection-driven transfer learning approach. IEEE Transactions on Geoscience and Remote Sensing, 51(1), 300–312. https://doi.org/10.1109/TGRS.2012.2195727
  • De Simone, L., Ouellette, W., & Gennari, P. (2022). Operational use of EO data for National Land cover official statistics in Lesotho. Remote Sensing, 14(14). https://doi.org/10.3390/rs1414329414
  • Fonte, C. C., Patriarca, J., Jesus, I., & Duarte, D. (2020). Automatic extraction and filtering of openstreetmap data to generate training datasets for land use land cover classification. Remote Sensing, 12(20), 1–31. https://doi.org/10.3390/rs12203428
  • Foody, G. M., & Arora, M. K. (1997). An evaluation of some factors affecting the accuracy of classification by an artificial neural network. International Journal of Remote Sensing, 18(4), 799–810. https://doi.org/10.1080/014311697218764
  • Ghassemi, B., Dujakovic, A., Żółtak, M., Immitzer, M., Atzberger, C., & Vuolo, F. (2022). Designing a European-Wide crop type mapping approach based on machine learning algorithms using lucas field survey and sentinel-2 data. Remote Sensing, 14(3). https://doi.org/10.3390/rs14030541
  • Ghorbanian, A., Kakooei, M., Amani, M., Mahdavi, S., Mohammadzadeh, A., & Hasanlou, M. (2020). Improved land cover map of Iran using sentinel imagery within google earth engine and a novel automatic workflow for land cover classification using migrated training samples. ISPRS Journal of Photogrammetry and Remote Sensing, 167, 276–288. https://doi.org/10.1016/j.isprsjprs.2020.07.013
  • Gómez, C., White, J. C., & Wulder, M. A. (2016). Optical remotely sensed time series data for land cover classification: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 116, 55–72. https://doi.org/10.1016/j.isprsjprs.2016.03.008
  • Graesser, J., Stanimirova, R., Tarrio, K., Copati, E. J., Volante, J. N., Verón, S. R., Banchero, S., Elena, H., Abelleyra, D. D., & Friedl, M. A. (2022). Temporally-consistent annual land cover from landsat time series in the southern cone of South America. Remote Sensing, 14(16), 1–28. https://doi.org/10.3390/rs14164005
  • Hermosilla, T., Wulder, M. A., White, J. C., & Coops, N. C. (2022). Land cover classification in an era of big and open data: Optimizing localized implementation and training data selection to improve mapping outcomes. Remote Sensing of Environment, 268, 112780. https://doi.org/10.1016/j.rse.2021.112780
  • Herold, M., Latham, J. S., DiGregorio, A., & Schmullius, C. C. (2006). Evolving standards in land cover characterization. Journal of Land Use Science, 1(2–4), 157–168. https://doi.org/10.1080/17474230601079316
  • He, T., Xie, C., Liu, Q., Guan, S., & Liu, G. (2019). Evaluation and comparison of random forest and A-LSTM networks for large-scale winter wheat identification. Remote Sensing, 11(14), 1665. https://doi.org/10.3390/rs11141665
  • Heydari, S. S., & Mountrakis, G. (2018). Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 landsat sites. Remote Sensing of Environment, 204, 648–658. https://doi.org/10.1016/j.rse.2017.09.035
  • Howard, D. M., Wylie, B. K., & Tieszen, L. L. (2012). Crop classification modelling using remote sensing and environmental data in the Greater Platte River Basin, USA. International Journal of Remote Sensing, 33(19), 6094–6108. https://doi.org/10.1080/01431161.2012.680617
  • Huang, C., Davis, L. S., & Townshend, J. R. G. (2002). An assessment of support vector machines for land cover classification. International Journal of Remote Sensing, 23(4), 725–749. https://doi.org/10.1080/01431160110040323
  • Huang, H., Wang, J., Liu, C., Liang, L., Li, C., & Gong, P. (2020). The migration of training samples towards dynamic global land cover mapping. ISPRS Journal of Photogrammetry and Remote Sensing, 161, 27–36. https://doi.org/10.1016/j.isprsjprs.2020.01.010
  • Huang, X., Weng, C., Lu, Q., Feng, T., & Zhang, L. (2015). Automatic labelling and selection of training samples for high-resolution remote sensing image classification over urban areas. Remote Sensing, 7(12), 16024–16044. https://doi.org/10.3390/rs71215819
  • Hu, J., Hong, D., & Zhu, X. X. (2019). MIMA: MAPPER-Induced manifold alignment for semi-supervised fusion of optical image and polarimetric sar data. IEEE Transactions on Geoscience and Remote Sensing, 57(11), 9025–9040. https://doi.org/10.1109/TGRS.2019.2924113
  • Hu, T., Huang, X., Li, J., & Zhang, L. (2018). A novel co-training approach for urban land cover mapping with unclear landsat time series imagery. Remote Sensing of Environment, 217, 144–157. https://doi.org/10.1016/j.rse.2018.08.017
  • Hu, Y., Zhang, Q., Zhang, Y., & Yan, H. (2018). A deep convolution neural network method for land cover mapping: A case study of Qinhuangdao, China. Remote Sensing, 10(12), 2053. https://doi.org/10.3390/rs10122053
  • Jamshidpour, N., Safari, A., & Homayouni, S. (2020). A GA-based multi-view, multi-learner active learning framework for hyperspectral image classification. Remote Sensing, 12(2), 297. https://doi.org/10.3390/rs12020297
  • Jia, X., Khandelwal, A., Carlson, K. M., Gerber, J. S., West, P. C., Samberg, L. H., & Kumar, V. (2020). Automated plantation mapping in southeast asia using MODIS data and imperfect visual annotations. Remote Sensing, 12(4), 636. https://doi.org/10.3390/rs12040636
  • Jing, W., Yang, Y., Yue, X., & Zhao, X. (2015). Mapping urban areas with integration of DMSP/OLS nighttime light and MODIS data using machine learning techniques. Remote Sensing, 7(9), 12419–12439. https://doi.org/10.3390/rs70912419
  • Jin, Q., Xu, E., & Zhang, X. (2022). A fusion method for multisource land cover products based on superpixels and statistical extraction for enhancing resolution and improving accuracy. Remote Sensing, 14(7), 1676. https://doi.org/10.3390/rs14071676
  • Kim, Y., Park, N.-W., & Lee, K.-D. (2017). Self-learning based land-cover classification using sequential class patterns from past land-cover maps. Remote Sensing, 9(9). https://doi.org/10.3390/rs9090921
  • Knorn, J., Rabe, A., Radeloff, V. C., Kuemmerle, T., Kozak, J., & Hostert, P. (2009). Land cover mapping of large areas using chain classification of neighboring landsat satellite images. Remote Sensing of Environment, 113(5), 957–964. https://doi.org/10.1016/j.rse.2009.01.010
  • Langford, Z. L., Kumar, J., Hoffman, F. M., Breen, A. L., & Iversen, C. M. (2019). Arctic vegetation mapping using unsupervised training datasets and convolutional neural networks. Remote Sensing, 11(1), 1–23. https://doi.org/10.3390/rs11010069
  • Leinenkugel, P., Deck, R., Huth, J., Ottinger, M., & Mack, B. (2019). The potential of open geodata for automated large-scale land use and land cover classification. Remote Sensing, 11(19), 2249. https://doi.org/10.3390/rs11192249
  • Li, E., Du, P., Samat, A., Xia, J., & Che, M. (2015). An automatic approach for urban land-cover classification from landsat-8 OLI data. International Journal of Remote Sensing, 36(24), 5983–6007. https://doi.org/10.1080/01431161.2015.1109726
  • Li, J., Gamba, P., & Plaza, A. (2014). A novel semi-supervised method for obtaining finer resolution urban extents exploiting coarser resolution maps. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(10), 4276–4287. https://doi.org/10.1109/JSTARS.2014.2355843
  • Li, C., Gong, P., Wang, J., Yuan, C., Hu, T., Wang, Q., Yu, L., Clinton, N., Li, M., Guo, J., Feng, D., Huang, C., Zhan, Z., Wang, X., Xu, B., Nie, Y., & Hackman, K. (2016). An all-season sample database for improving land-cover mapping of Africa with two classification schemes. International Journal of Remote Sensing, 37(19), 4623–4647. https://doi.org/10.1080/01431161.2016.1213923
  • Li, J., Huang, X., & Chang, X. (2020). A label-noise robust active learning sample collection method for multi-temporal urban land-cover classification and change analysis. ISPRS Journal of Photogrammetry and Remote Sensing, 163(January), 1–17. https://doi.org/10.1016/j.isprsjprs.2020.02.022
  • Li, J., Huang, X., Hu, T., Jia, X., & Benediktsson, J. A. (2019). A novel unsupervised sample collection method for urban land-cover mapping using landsat imagery. IEEE Transactions on Geoscience and Remote Sensing, 57(6), 3933–3951. https://doi.org/10.1109/TGRS.2018.2889109
  • Li, H., Li, J., Zhao, Y., Gong, M., Zhang, Y., & Liu, T. (2021). Cost-sensitive self-paced learning with adaptive regularization for classification of image time series. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 11713–11727. https://doi.org/10.1109/JSTARS.2021.3127754
  • Li, C., Ma, Z., Wang, L., Yu, W., Tan, D., Gao, B., Feng, Q., Guo, H., & Zhao, Y. (2021). Improving the accuracy of land cover mapping by distributing training samples. Remote Sensing, 13(22). https://doi.org/10.3390/rs13224594
  • Lin, C., Du, P., Samat, A., Li, E., Wang, X., & Xia, J. (2019). Automatic updating of land cover maps in rapidly urbanizing regions by relational knowledge transferring from Globeland30. Remote Sensing, 11(12), 1397. https://doi.org/10.3390/rs11121397
  • Li, Q., Qiu, C., Ma, L., Schmitt, M., & Zhu, X. X. (2020). Mapping the land cover of Africa at 10 m resolution from multi-source remote sensing data with google earth engine. Remote Sensing, 12(4), 1–22. https://doi.org/10.3390/rs12040602
  • Liu, D., Chen, N., Zhang, X., Wang, C., & Du, W. (2020). Annual large-scale urban land mapping based on landsat time series in google earth engine and OpenStreetMap data: A case study in the middle Yangtze River basin. ISPRS Journal of Photogrammetry and Remote Sensing, 159, 337–351. https://doi.org/10.1016/j.isprsjprs.2019.11.021
  • Li, C., Wang, J., Wang, L., Hu, L., & Gong, P. (2014). Comparison of classification algorithms and training sample sizes in urban land classification with landsat thematic mapper imagery. Remote Sensing, 6(2), 964–983. https://doi.org/10.3390/rs6020964
  • Li, H., Wang, C., Zhong, C., Zhang, Z., & Liu, Q. (2017). Mapping typical urban LULC from landsat imagery without training samples or self-defined parameters. Remote Sensing, 9(7), 1–23. https://doi.org/10.3390/rs9070700
  • Li, C., Xian, G., Zhou, Q., & Pengra, B. W. (2021). A novel automatic phenology learning (APL) method of training sample selection using multiple datasets for time-series land cover mapping. Remote Sensing of Environment, 266, 266. https://doi.org/10.1016/j.rse.2021.112670
  • Ma, X., Tong, X., Liu, S., Luo, X., Xie, H., & Li, C. (2017). Optimized sample selection in SVM classification by combining with DMSP-OLS, landsat NDVI and GlobeLand30 products for extracting urban built-up areas. Remote Sensing, 9(3), 236. https://doi.org/10.3390/rs9030236
  • Maulik, U., & Chakraborty, D. (2013). Learning with transductive SVM for semisupervised pixel classification of remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 77, 66–78. https://doi.org/10.1016/j.isprsjprs.2012.12.003
  • Maxwell, A. E., Warner, T. A., & Fang, F. (2018). Implementation of machine-learning classification in remote sensing: An applied review. International Journal of Remote Sensing, 39(9), 2784–2817. https://doi.org/10.1080/01431161.2018.1433343
  • Mellor, A., Boukir, S., Haywood, A., & Jones, S. (2015). Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS Journal of Photogrammetry and Remote Sensing, 105, 155–168. https://doi.org/10.1016/j.isprsjprs.2015.03.014
  • Naboureh, A., Li, A., Bian, J., Lei, G., & Amani, M. (2020). A hybrid data balancing method for classification of imbalanced training data within google earth engine: Case studies from mountainous regions. Remote Sensing, 12(20), 1–21. https://doi.org/10.3390/rs12203301
  • Nguyen, L. H., Joshi, D. R., Clay, D. E., & Henebry, G. M. (2020). Characterizing land cover/land use from multiple years of landsat and MODIS time series: A novel approach using land surface phenology modeling and random forest classifier. Remote Sensing of Environment, 238, 111017. https://doi.org/10.1016/j.rse.2018.12.016
  • Paris, C., & Bruzzone, L. (2021). A novel approach to the unsupervised extraction of reliable training samples from thematic products. IEEE Transactions on Geoscience and Remote Sensing, 59(3), 1930–1948. https://doi.org/10.1109/TGRS.2020.3001004
  • Paris, C., Gasparella, L., & Bruzzone, L. (2022). A scalable high-performance unsupervised system for producing large-scale High Resolution (HR) land cover Maps: The Italian Country Case Study. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 9146–9159. https://doi.org/10.1109/JSTARS.2022.3209902
  • Pelletier, C., Valero, S., Inglada, J., Champion, N., Sicre, C. M., & Dedieu, G. (2017). Effect of training class label noise on classification performances for land cover mapping with satellite image time series. Remote Sensing, 9(2). https://doi.org/10.3390/rs90201739
  • Pierdicca, N., Chini, M., & Pelliccia, F. (2014). The contribution of SIASGE radar data integrated with optical images to support thematic mapping at regional scale. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(7), 2821–2833. https://doi.org/10.1109/JSTARS.2014.2330744
  • Preidl, S., Lange, M., & Doktor, D. (2020). Introducing APiC for regionalised land cover mapping on the national scale using sentinel-2A imagery. Remote Sensing of Environment, 240, 111673. https://doi.org/10.1016/j.rse.2020.111673
  • Radoux, J., Lamarche, C., Van Bogaert, E., Bontemps, S., Brockmann, C., & Defourny, P. (2014). Automated training sample extraction for global land cover mapping. Remote Sensing, 6(5), 3965–3987. https://doi.org/10.3390/rs6053965
  • Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., & Rigol-Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing, 67(1), 93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002
  • Rybicki, M., Gromny, E., Malinowski, R., Lewiński, S., Jenerowicz, M., Michał, K., Nowakowski, A., Wojtkowski, C., Krupiński, M., Kraetzschmar, E., & Schauer, P. (2020). Automated production of a land cover/use map of Europe based on sentinel-2 imagery. Remote Sensing, 12(3523), 1–25. https://doi.org/10.3390/rs12213523
  • Santos, L. A., Ferreira, K. R., Camara, G., Picoli, M. C. A., & Simoes, R. E. (2021). Quality control and class noise reduction of satellite image time series. ISPRS Journal of Photogrammetry and Remote Sensing, 177, 75–88. https://doi.org/10.1016/j.isprsjprs.2021.04.014
  • Shao, Y., & Lunetta, R. S. (2012). Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS Journal of Photogrammetry and Remote Sensing, 70, 78–87. https://doi.org/10.1016/j.isprsjprs.2012.04.001
  • Shetty, S., Gupta, P. K., Belgiu, M., & Srivastav, S. K. (2021). Assessing the effect of training sampling design on the performance of machine learning classifiers for land cover mapping using multi-temporal remote sensing data and Google earth engine. Remote Sensing, 13(8), 1433. https://doi.org/10.3390/rs13081433
  • Song, X., Duan, Z., & Jiang, X. (2012). Comparison of artificial neural networks and support vector machine classifiers for land cover classification in Northern China using a SPOT-5 HRG image. International Journal of Remote Sensing, 33(10), 3301–3320. https://doi.org/10.1080/01431161.2011.568531
  • Stehman, S. V., & Foody, G. M. (2019). Key issues in rigorous accuracy assessment of land cover products. Remote Sensing of Environment, 231, 111199. https://doi.org/10.1016/j.rse.2019.05.018
  • Sun, X., Li, L., Zhang, B., Chen, D., & Gao, L. (2015). Soft urban water cover extraction using mixed training samples and support vector machines. International Journal of Remote Sensing, 36(13), 3331–3344. https://doi.org/10.1080/01431161.2015.1042594
  • Talukdar, S., Singha, P., Mahato, S., Shahfahad, Pal, S., Liou, Y. A., & Rahman, A. (2020). Land-use land-cover classification by machine learning classifiers for satellite observations—a review. Remote Sensing, 12(7), 1135. https://doi.org/10.3390/rs12071135
  • Tong, X. Y., Xia, G. S., Lu, Q., Shen, H., Li, S., You, S., & Zhang, L. (2020). Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237, 111322. https://doi.org/10.1016/j.rse.2019.111322
  • Tong, X. Y., Xia, G. S., & Zhu, X. X. (2023). Enabling country-scale land cover mapping with meter-resolution satellite imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 196, 178–196. https://doi.org/10.1016/j.isprsjprs.2022.12.011
  • Tuia, D., Persello, C., & Bruzzone, L. (2016). Domain adaptation for the classification of remote sensing data: An overview of recent advances. IEEE Geoscience and Remote Sensing Magazine, 4(2), 41–57. https://doi.org/10.1109/MGRS.2016.2548504
  • Tuia, D., Volpi, M., Copa, L., Kanevski, M., & Muñoz-Marí, J. (2011). A survey of active learning algorithms for supervised remote sensing image classification. IEEE Journal on Selected Topics in Signal Processing, 5(3), 606–617. https://doi.org/10.1109/JSTSP.2011.2139193
  • Venter, Z. S., & Sydenham, M. A. K. (2021). Continental-scale land cover mapping at 10 m resolution over Europe (Elc10). Remote Sensing, 13(12), 2301. https://doi.org/10.3390/rs13122301
  • Viana, C. M., Girão, I., & Rocha, J. (2019). Long-term satellite image time-series for land use/land cover change detection using refined open source data in a rural region. Remote Sensing, 11(9), 1104. https://doi.org/10.3390/rs11091104
  • Wang, S., DiTommaso, S., Faulkner, J., Friedel, T., Kennepohl, A., Strey, R., & Lobell, D. B. (2020). Mapping crop types in southeast India with smartphone crowdsourcing and deep learning. Remote Sensing, 12(18), 1–42. https://doi.org/10.3390/RS12182957
  • Wang, J., Li, C., Hu, L., Zhao, Y., Huang, H., & Gong, P. (2015). Seasonal Land Cover Dynamics in Beijing derived from landsat 8 data using a spatio-temporal contextual approach. Remote Sensing, 7(1), 865–881. https://doi.org/10.3390/rs70100865
  • Wessels, K. J., Bergh, F., Roy, D. P., Salmon, B. P., Steenkamp, K. C., MacAlister, B., Swanepoel, D., & Jewitt, D. (2016). Rapid land cover map updates using change detection and robust random forest classifiers. Remote Sensing, 8(11), 888. https://doi.org/10.3390/rs8110888
  • Wulder, M. A., Coops, N. C., Roy, D. P., White, J. C., & Hermosilla, T. (2018). Land cover 2.0. International Journal of Remote Sensing, 39(12), 4254–4284. https://doi.org/10.1080/01431161.2018.1452075
  • Wulder, M. A., White, J. C., Goward, S. N., Masek, J. G., Irons, J. R., Herold, M., Cohen, W. B., Loveland, T. R., & Woodcock, C. E. (2008). Landsat continuity: Issues and opportunities for land cover monitoring. Remote Sensing of Environment, 112(3), 955–969. https://doi.org/10.1016/j.rse.2007.07.004
  • Xie, S., Liu, L., & Yang, J. (2020). Time-series model-adjusted percentile features: Improved percentile features for land-cover classification based on landsat data. Remote Sensing, 12(18), 3091. https://doi.org/10.3390/RS12183091
  • Xie, S., Liu, L., Zhang, X., Yang, J., Chen, X., & Gao, Y. (2019). Automatic land-cover mapping using landsat time-series data based on google earth engine. Remote Sensing, 11(24), 3023. https://doi.org/10.3390/rs11243023
  • Yang, L., Jin, S., Danielson, P., Homer, C., Gass, L., Bender, S. M., Case, A., Costello, C., Dewitz, J., Fry, J., Funk, M., Granneman, B., Liknes, G. C., Rigge, M., & Xian, G. (2018). A new generation of the United States national land cover database: Requirements, research priorities, design, and implementation strategies. ISPRS Journal of Photogrammetry and Remote Sensing, 146, 108–123. https://doi.org/10.1016/j.isprsjprs.2018.09.006
  • Ye, M., Qian, Y., Zhou, J., & Tang, Y. Y. (2017). Dictionary learning-based feature-level domain adaptation for cross-scene hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(3), 1544–1562. https://doi.org/10.1109/TGRS.2016.2627042
  • Yifang, B., Gong, P., & Gini, C. (2015). Global land cover mapping using Earth observation satellite data: Recent progresses and challenges. ISPRS Journal of Photogrammetry and Remote Sensing, 103(1), 1–6. https://doi.org/10.1016/j.isprsjprs.2015.01.001
  • Zeng, S., Wang, Z., Gao, C., Kang, Z., & Feng, D. (2018). Hyperspectral image classification with global-local discriminant analysis and spatial-spectral context. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(12), 5005–5018. https://doi.org/10.1109/JSTARS.2018.2878336
  • Zeng, T., Wang, L., Zhang, Z., Wen, Q., Wang, X., & Yu, L. (2019). An integrated land cover mapping method suitable for low-accuracy areas in global land cover maps. Remote Sensing, 11(15), 1777. https://doi.org/10.3390/rs11151777
  • Zhang, J., Fu, Z., Zhu, Y., Wang, B., Sun, K., & Zhang, F. (2023). A high-performance automated large-area land cover mapping framework. Remote Sensing, 15(12), 1–22. https://doi.org/10.3390/rs15123143
  • Zhang, H. K., & Roy, D. P. (2017). Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m landsat land cover classification. Remote Sensing of Environment, 197, 15–34. https://doi.org/10.1016/j.rse.2017.05.024
  • Zhang, Z., Tang, P., Hu, C., Liu, Z., Zhang, W., & Tang, L. (2022). Seeded classification of satellite image time series with lower-bounded dynamic time warping. Remote Sensing, 14(12), 1–25. https://doi.org/10.3390/rs14122778
  • Zhang, Q., Zhang, Z., Xu, N., & Li, Y. (2023). Fully automatic training sample collection for detecting multi-decadal inland/seaward urban sprawl. Remote Sensing of Environment, 298(February), 113801. https://doi.org/10.1016/j.rse.2023.113801
  • Zhao, Y., Feng, D., Yu, L., Wang, X., Chen, Y., Bai, Y., Hernández, H. J., Galleguillos, M., Estades, C., Biging, G. S., Radke, J. D., & Gong, P. (2016). Detailed dynamic land cover mapping of Chile: Accuracy improvement by integrating multi-temporal data. Remote Sensing of Environment, 183, 170–185. https://doi.org/10.1016/j.rse.2016.05.016
  • Zhong, B., Yang, A., Jue, K., & Wu, J. (2021). Long time series high-quality and high-consistency land cover mapping based on machine learning method at heihe river basin. Remote Sensing, 13(8), 1596. https://doi.org/10.3390/rs13081596
  • Zhou, Q., Tollerud, H., Barber, C., Smith, K., & Zelenak, D. (2020). Training data selection for annual land cover classification for the Land Change Monitoring, Assessment, And Projection (LCMAP) Initiative. Remote Sensing, 12(4), 699. https://doi.org/10.3390/rs12040699
  • Zhu, Z., Gallant, A. L., Woodcock, C. E., Pengra, B., Olofsson, P., Loveland, T. R., Jin, S., Dahal, D., Yang, L., & Auch, R. F. (2016). Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS Journal of Photogrammetry and Remote Sensing, 122, 206–221. https://doi.org/10.1016/j.isprsjprs.2016.11.004
  • Zhu, Q., Wang, Y., Liu, J., Li, X., Pan, H., & Jia, M. (2021). Tracking historical wetland changes in the china side of the amur river basin based on landsat imagery and training samples migration. Remote Sensing, 13(11), 2161. https://doi.org/10.3390/rs13112161