1,250
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Regionalized classification of stand tree species in mountainous forests by fusing advanced classifiers and ecological niche model

, , , , , & show all
Article: 2211881 | Received 20 Dec 2022, Accepted 04 May 2023, Published online: 12 May 2023

ABSTRACT

Though many new remote sensing technologies have been introduced to analyze forests, regional-scale species-level mapping products are still rare, especially in large mountainous areas. Tree species abundance, low spectral separability among species and huge computing demand are hindrances for obtaining an accurate stand tree species map. This study addressed these problems by synergizing regionalization, multiple feature fusion, and model fusion and proposed a new machine learning workflow. The whole area, i.e. Yunnan province in China (approximately 390,000 km2), was firstly divided into 8 distinct floristic regions according to the distributions and phylogenetic relationships of native tree species. Thereafter, with Google Earth Engine (GEE) platform, multiple data sets, including Sentinel-2 imagery, SRTM DEM, and WorldClim bioclimatic, were collected to construct a high-dimensional feature pool for each region. Thirdly, the maximum entropy model (MaxEnt), generally used for predicting ecological niche, and three classifiers, i.e. Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost), were used to pre-classify environmental and remote sensing data, respectively. After that, two types of decision fusion strategies, parallel and serial ensembles, were applied to fuse pre-classification probability maps for each sub-region. Finally, the spatial distribution of 19 forest stand species over the whole Yunnan Province was obtained by mosaicking the best classification results from 8 sub-regions. Our method achieves an overall accuracy of 72.18% on the entire validation dataset. The decision fusion models significantly improve the classification accuracy, with the eight partitioned best fusion models improving the accuracy by 7.33%–25.39% on average compared to base classifiers. This study demonstrates that the spatial partitioning strategy and the decision fusion integrating a proper machine learning algorithm and ecological niche model can significantly improve the classification accuracy of forest stand species in montane forests.

1. Introduction

Information on the distribution of forest stand species is essential for the development of forest management policies and the sustainable use of forest resources (Fassnacht et al. Citation2014; Guo et al. Citation2020) as well as the assessment of biodiversity, habitat quality, carbon cycling, and biomass (Kollert et al. Citation2021; Ørka et al. Citation2013; Waser et al. Citation2011).

The spatial distribution of tree species is traditionally obtained through labor-intensive field surveys. Thus, inventory data are expensive and time-consuming to create, especially in a large mountainous region. Remote sensing brings opportunities for cost-efficiently and time-efficiently obtaining vegetation-related information (Grabska, Frantz, and Ostapowicz Citation2020). However, unlike other land cover types, the extremely low separability among tree species in spectral signature makes mapping tree species is still a big challenge. In this context, various data and advanced processing techniques are often needed, to meet the corresponding requirements with the desired accuracy and details of forest stand tree species mapping.

Multimodal remote sensing sensors provide key information for describing tree species. High spatial resolution multispectral images have rich spectral and textural information, and hyperspectral images with nano-scale spectral resolution are rich in spectral information (Wan et al. Citation2021); LiDAR data provide geometric information related to tree vertical structure, such as tree height, canopy diameter, and leaf area (Feng et al. Citation2020; Ke, Quackenbush, and Im Citation2010; Michaowska and Rapiński Citation2021). Current tree species classification research focuses on algorithm-driven or data-driven methods (Fassnacht et al. Citation2016). Machine learning algorithms have potent capabilities to map non-normally distributed feature into classes (Bhatt et al. Citation2022). Typical models including Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost) have been widely applied to classification. To date, with the increasing availability of earth observation data, Deep Learning (DL) models have drawn more attention. DL models can learn discriminant features from data through multi-layer neural networks, accurately capture nonlinear relationship between data and classes, and thus achieving end-to-end image classification (LeCun, Bengio, and Hinton Citation2015).

These data and classification models have been widely used to map tree species with high accuracies. For instance, very-high-resolution (VHR) Worldview-2 images and RF were used to identify the three most abundant Pine species in Galicia, with a classification accuracy of 91% (Alonso, Picos, and Armesto Citation2021). Feng et al. collaborated aerial hyperspectral with LiDAR data and used RF to identify 10 tree species in a forest farm with an OA of 96.10% (Feng et al. Citation2020). Zhang et al. applied 3DCNN to tree species classification, achieving a high accuracy of 93.14% while improving computational efficiency (Zhang, Zhao, and Zhang Citation2020). Unfortunately, these studies cannot be extended to large regions or globally due to high data collection costs.

Free-accessible images acquired by low- and medium-resolution satellites, like MODIS, Landsat, and Sentinel serials, are often used for mapping forest ecosystem over a large area. Many forest-related products, such as GLAD Forest (Hansen et al. Citation2013), FROM_GLC (Gong et al. Citation2013), and iMap World 1.0 (Liu et al. Citation2021), have been produced. However, these products treat forests as a single class or divide forests into several forest types. These researches focus on analyzing forest distribution and dynamic changes but cannot provide the spatial distribution of tree species. How to obtain spatial distribution information of large-scale tree species economically, efficiently, and accurately is still an important and urgent problem. Currently, Sentinel-2 (S2) satellite constellation may be the best choice for tree species classification in large regions. Several studies have demonstrated the high potential of unique red-edge band and high-density time series of S2 images for vegetation mapping in large regions (Grabska, Frantz, and Ostapowicz Citation2020; Welle et al. Citation2022). However, new challenges arise when classifying a geographically heterogeneous region such as the Yunnan Province.

The Yunnan province is in southwest China (), which covers an area about the size of Germany. Mapping such a large area firstly implies processing massive amounts of data, and with its cloudy and rainy climate, more images need to be processed to synthesize high-quality images. Secondly, the horizontally and vertically natural conditions are highly variable, and the increased geospatial heterogeneity of tree species’ spectra and phenology in a large area challenge the mapping process (Shirazinejad, Zoej, and Latifi Citation2022). Thirdly, Yunnan is the central part of the Indo-Burma biodiversity hotspot (Myers et al. Citation2000), the composition of tree species in this area is highly complex, and the local distribution of species is also highly variable. These factors mentioned above increase the difficulty of tree species mapping.

Figure 1. (a) Yunnan, a global biodiversity hotspot region, located in southwestern China. (b) The elevation map of the Yunnan. (c) Eight distinct floristic sub-regions of Yunnan as basic unit for classification (Li et al. Citation2015).

Figure 1. (a) Yunnan, a global biodiversity hotspot region, located in southwestern China. (b) The elevation map of the Yunnan. (c) Eight distinct floristic sub-regions of Yunnan as basic unit for classification (Li et al. Citation2015).

When studies focus on developing and optimizing the classification process of tree species in large areas, one aspect that deserves significant attention is the influence of environment on the distribution of tree species. Environmental factors, such as rainfall, temperature, geomorphology, and soil, have essential effects on the distribution of tree species (Yu et al. Citation2020). However, the spatial resolution of current environmental data is mostly lower than 1 km. For example, the TerraClimate dataset provides 4 km of temperature data (Abatzoglou et al. Citation2018), and the WorldClim dataset provides 1 km of bioclimatic variables (Hijmans et al. Citation2005). Owing to the resolution gap with remotely sensed images, environmental data are rarely incorporated in classification processes based on remotely sensed data and machine learning. Environmental data are more often used for species distribution modeling, using sample points of species associated with environmental variables to map the potential distribution areas of target species. For example, Zhou et al. used the MaxEnt model to predict the potential distribution of Cunninghamia lanceolata in China (Zhou et al. Citation2021). Although machine learning classification and ecological niche model prediction have been extensively studied, the combination of the two has rarely been explored. Researchers are therefore missing the opportunity to synergize the two domains to obtain more accurate tree species distributions (Mouta et al. Citation2021). Therefore, a rising question is how to effectively integrate ecological niche models and machine learning to do classification over such a large area.

The development of remote sensing cloud computing technology and the emergence of platforms have brought new solutions to these problems. Google Earth Engine (GEE) is a cloud computing platform, whose powerful computing capabilities have changed the traditional paradigm of remote sensing data processing and analysis (Tamiminia et al. Citation2020). Operations such as cloud masking, image composition within time windows and time series analysis are easily implemented on GEE. Meanwhile, the GEE platform also is convenient for users to employ multiple machine learning classifiers and ecological niche models to form a processing chain. In addition, spatial partitioning has been proved to improve classification accuracy for large areas with high environmental heterogeneity (Costa et al. Citation2022). It is more precise and reasonable to introduce eco-geographical regionalization strategy into classification process (Moraes et al. Citation2021).

The blooming development of multimodal sensors and technical revolution of processing platforms and methods open the door for classifying tree species over a large area. However, analyzing the data at the desired accuracy raises new challenges. In this context, we aim to build a classification chain for accurately mapping forest stand species in the whole Yunnan province by exploring floristic regionalization, possible feature-level and decision-level fusion strategies, and combinations. The objectives of this study were (ⅰ) to evaluate the effectiveness of using the floristic regionalization for forest stand tree species classification, (ⅱ) to explore the significant features for separating tree species, and (ⅲ) to develop an effective method for forest stand species classification by integrating ecological niche model and machine learning.

2. Materials

2.1. Study area

Yunnan is the most southwestern province of China () and situated in a mountainous area. It has significant environmental heterogeneity, with diverse topography, landforms, and climate. As shown in , the topography of Yunnan Province exhibits a high northwest and low southeast, with a stepped slope in the terrain, ranging in elevation from the highest mountain top 6740 m in the northwest to the lowest valley bottom 76.4 m in the southeast (Yang et al. Citation2004). The territory contains a variety of landform types including high mountains, hills, intermountain basins, river valleys, and karsts. Yunnan has the characteristics of low latitude, monsoon, and mountain plain climates and crosses seven climate zones: northern tropical, southern subtropical, middle subtropical, northern subtropical, southern temperate, middle temperate, and highland climate.

As shown in , Yunnan is bordered by the southeastern edge of the Tibetan Plateau in the northwestern part of the country and by the Southeast Asian countries in the western and southern parts, a huge meeting place of floras and formations (Li and D Citation1986). Its northwestern part is introduced by the Himalayan vegetation zone and the ancient southern continental flora, while the eastern part has numerous central and southern Chinese flora, and the northern part has the Emei, Qinling, and northwestern flora (Li and Pei Citation1991). The flora of southern Yunnan part is more closely related to the Indo-Malaysian flora, while the flora of southeastern Yunnan is more closely associated with the East Asian flora (Hua and Heil Citation2017). Geological activity and diverse topography and climate have not only forged Yunnan’s unique phytogeographic divergence and vegetation geography but also made it a well-known global biodiversity hotspot (Liu et al. Citation2021).

2.2. Multi-source data

This study used four types of data, Sentinel-2 images, WorldClim data, Digital Elevation Model (DEM) data and Forest Management Inventory data (FMI), to perform tree species classification.

2.2.1. Sentinel-2 data

High-quality images with low cloud ratio in Yunnan are often difficult to obtain due to persistent cloud cover. We counted the number of cloud-free images in three periods. As shown in , there were no valid observations for some pixels throughout 2016. In contrast, the qualified observations exceed 10 over the most tiles from 7/2015 to 12/2017 and exceed 50 from 2016 to 2020. Therefore, to obtain high-quality images and extract dense time-series related features, we used 31,440 top-of-atmosphere (TOA) images from 72 MGRS tiles between 7/2015 and 12/2020 as input data. The images from 7/2015 to 12/2017 were used to extract spectral and texture features, and the images from 2016 to 2020 were used to construct the time series data. The temporal mismatch between the extracted features and FMI data year may affect the reliability of the classification. For this problem, the details were discussed in section 5.

Figure 2. Valid observations of Sentinel-2 images in different periods. (a) 2016.1 to 2016.12; (b) 2015.7 to 2017.12; (c) 2016.1 to 2020.12.

Figure 2. Valid observations of Sentinel-2 images in different periods. (a) 2016.1 to 2016.12; (b) 2015.7 to 2017.12; (c) 2016.1 to 2020.12.

2.2.2. Bioclimatic and topographic data

The distribution of tree species on the earth’s surface is neither random nor uniform, but geographically specific and characterized by specific environmental and climatic factors (Zeb et al. Citation2021). Topographic and climatic heterogeneity has an important impact on tree species distribution. Therefore, bioclimatic and topographic-related factors are considered to be introduced for classification.

Bioclimatic data with a spatial resolution of 1 km were obtained from the WorldClim website (https://www.worldclim.org/data/worldclim21). These data consist of 19-dimensional bioclimatic variables with biological significance, such as mean annual temperature, annual precipitation, annual temperature difference, and precipitation during the rainy and dry (Hijmans et al. Citation2005). Moreover, topographic data were obtained from the Platform Space Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM).

2.2.3. Forest Management Inventory data

The Forest Management Inventory (FMI) data 2016 in China was employed as the reference data for our study. FMI aims to determine the attributes including location, natural condition, tree species composition, and distribution of each forest stand. FMI data is officially produced by local forestry and grassland administrations, under the conduction of the national administration of China. As the data has been calibrated with forest sample plots to control the error components of inventory, the data can provide a reliable reference for the different tree species.

Yunnan has about 18,000 plant species (Liu and Peng Citation2016), and it is impossible to identify all of them by using remote sensing imagery. We counted each forest stand’s distribution area in each sub-region based on FMI and selected the top 9 species in each sub-region, which resulted in 19 species in total ().

Table 1. Observed class and its acronym.

3. Methodology

This paper proposes a classification framework combining machine learning and ecological niche models to determine the spatial distribution of the major forest stand species in Yunnan Province. illustrates the whole workflow, including seven steps: (1) Spatial division; (2) Data pre-processing; (3) Feature extraction; (4) Feature reduction; (5) Classification with component classifiers; (6) Classifier fusion ensemble; and (7) Accuracy evaluation and mapping.

Figure 3. Technique workflow of research.

Figure 3. Technique workflow of research.

3.1. Floristic regionalization

We divided Yunnan Province into eight sub-regions based on the Yunnan flora system (Li et al. Citation2015). This system is produced by combining the distribution and phylogenetic relationships of seed plants of the genus 1983 in Yunnan. As shown in , the eight sub-regions were used as the basic units for independent species mapping. Feature screening, model training, and classification were performed independently for each sub-region. The goal is to make the classification and post-classification analysis applicable to local conditions and tree species, avoiding the usage of consistent feature sets for classification across the study area.

3.2. Data pre-processing

3.2.1. Sentinel-2 and topographic variables

We conducted the following steps on the acquired Sentinel-2 images to obtain cloud-free, seamless image composite. Firstly, pixels obscured and covered by clouds were removed based on the adjusted cloud score algorithm (Oreopoulos, Wilson, and Várnai Citation2011). Secondly, post-cloud masking image data from 7/2015 to 12/2017 were composed by median reducer in GEE. The composite image was used to extract spectral, vegetation index, and texture features. Thirdly, we constructed the NDVI and Red-Edge Position Index (REP) time series at 5-day period, using Sentinel-2 images from 1/2016 to 12/2020. We adopted the median of all high-quality observations available within the 5-day time window to represent the observation value. When no good quality observations were available within the 5-day period, we used linear interpolation to fill the data gaps. Although these methods address the issue of missing pixel values, there is still a significant amount of noise in the data product, which hinders further analysis and utilization of the data. To mitigate noise, we applied a Savitzky-Golay (SG) filter to smooth the NDVI and REP time series, using a moving window of size 9 and a polynomial filter order of 2.

3.2.2. Reference data

The quality and quantity of the training samples are critical for training models. FMI data were filtered to obtain accurate and representative samples. To obtain samples for forest and non-forest classification, we conducted the following three steps. Firstly, the plots of forest (pure forest, shrub economic forest, mixed forest, other shrub woodlands, other special shrubs, economic forest, open woodland, bamboo forest) and non-forest (construction land, watershed, cropland grassland, and other non-forest lands) were selected from FMI data. Secondly, the selected candidates were further purified according to the NDVI standard deviation calculated from pixels in each plot candidate. Candidates with high standard deviation value were removed, and centroids of the rest plot were selected as sample points. Finally, the rest samples were further purified through visually inspecting the overlaying of sample points and Sentinel-2 median composite image.

Similar steps were also employed for producing tree species samples. Firstly, plots, where the first dominant species accounts for more than 65% of the total stock volume, were seemed as pure and preserved. Secondly, the top 9 tree species in terms of coverage in each sub-region were counted from FMI, respectively, and corresponding plots were selected from the pure plots of each sub-region. Thirdly, with each plot as a basic unit, the standard deviations of Blue, NIR, SWIR, NDVI, and Greenness bands and the area were calculated, and plots with low standard deviations and large areas were preserved. Fourthly, centroids of the preserved sub-compartments were selected as candidate sample points and finally purified by visually inspecting the overlaying of sample points and image composite.

The samples of each category were divided into a training sample set (75%) and a validation sample set (25%), and shows the numbers of training and validation data.

Figure 4. The number of samples for each observed class.

Figure 4. The number of samples for each observed class.

3.3. Feature pools and feature reduction

3.3.1. Mining features from multiple data

Tree species differ in canopy shape, phenology, and habitat. To fully understand and accurately distinguish forest from non-forest and identify species, based on comprehensively reviewing the literature, this paper extracts various features, including spectral bands, texture features, vegetation indices, and other environmental features.

Ten bands (Blue, Green, Red, RedEdge1, RedEdge2, RedEdge3, NIR, Narrow NIR, SWIR1, SWIR2) were selected from the S2 image compositing data. Based on the 10 bands, the Greenness and Wetness components of Tasseled Cap Transform were also calculated as well as 16 vegetation indices, including Triangular Vegetation Index (TVI), MERIS Terrestrial Chlorophyll Index (MTCI), Normalized Burn Ratio (NBR), Normalized Difference Built-up Index (NDBI), Inverted Red-Edge Chlorophyll Index (IRECI), Modified Chlorophyll Absorption Ratio Index (MCARI), Land Surface Water Index (LSWI), Normalized Difference 750/705 Chl NDI (Chl NDI), Normalized Difference Red-Edge Index (NDRE), Red-Edge Position Index (REP), Normalized Difference Vegetation Index (NDVI), Normalized Difference Senescent Vegetation Index (NDSVI), Normalized Difference Tillage Index (NDTI), Normalized Difference Water Index (NDWI), and Red Edge Chlorophyll Index (CIrededge). The formulas for vegetation indices are shown in .

Table 2. The vegetation indices and corresponding formulas used in this study.

In addition, we computed six texture features (Sum Average, Correlation, Variance, Dissimilarity, Contrast, and Cluster shade) for each of the 10 spectral bands and the NDVI, so there are 66 texture features in total. Five-day interval time series data indices were computed from both NDVI and REP indices, resulting in a feature set with 146 bands (73 × 2). We also obtained 19 dimensions of bioclimatic variables from the WorldClim data and 3 topographic factors, namely Elevation, Slope, and Aspect, from the SRTM data using the GEE built-in functions.

Based on the basic features mentioned above, two feature pools were constructed. As shown in , the first feature pool consists of 37-dimensional features, for forest and non-forest classification. The second feature pool consists of 257-dimensional features () for forest stand species classification. Among them, 22-dimensional environmental features, including 19-dimensional bioclimatic features and 3-dimensional topographic features, were used as input features for MaxEnt; 237-dimensional feature sets, named the ALL Feature Set, were selected as inputs to three machine learning models, SVM, RF, and XGBoost.

Table 3. The feature candidates for the forest mask generation.

Table 4. The feature candidates for forest stand species classification.

3.3.2. Feature selection

The number and quality of features largely determine the algorithm’s efficiency. Selecting features with optimal separability is crucial for computational efficiency and classification performance. Thus, we optimized features before classification.

For the MaxEnt model, the factors were evaluated and selected by the following three steps. (1) Spearman correlation analysis was first performed on the variables. (2) Pre-training was performed using all environmental variables, and the importance of each dimensional variable was evaluated using Jackknife test. (3) The variable most significant in the Jackknife test was selected among the highly correlated variables (absolute value of correlation ≥0.8) (i.e. the model generated from all variables except this variable and the model generated using only this variable had the smallest difference in regularized training gain). Highly correlated and redundant variables were removed, and each environmental factor that dominates the prediction was finally identified. The selected features consist of an optimal feature set for the MaxEnt predicting model and called Feature Set 1 in the following section.

For SVM, RF, and XGBoost classifier, the relevance hierarchical clustering method (You et al. Citation2021) was used to select the best input features. This feature selection process is performed in four steps. The normalization method is firstly used to map all features to a distribution with mean value of 0 and standard deviation value of 1. Pre-training is secondly performed to determine the number of feature dimensions when the accuracy reaches saturation point. The feature importance of the features is evaluated by the Gini importance or mean reduction impurity calculated by the RF classifier and ranked by feature importance score. Spearman correlation analysis is performed on the most important features. The most important features are clustered into several clusters by the maximum depth threshold, and finally the features with the highest feature importance score in each cluster are retained. In this study, the maximum depth threshold is set to 0.5 to preserve the important features with high separability and low correlation among the selected features. The optimal features are selected for stand tree species classification, referred to as Feature set 2 in the following sections, and the set of best features for classification and non-forest classification, hereinafter referred to as Feature set 3.

It is worth noting that, as the above-mentioned above selection process independently operated in each floristic region, Feature Set 1, Feature Set 2, and the Feature Set 3 may be different from region to region.

3.4. Classification with component classifiers

MaxEnt derives the constraints on species distribution, then seeks the possible distribution with maximum entropy. When the entropy achieves maximum, the species occurrence probability distribution is closest to the true distribution of species (Phillips, Anderson, and Schapire Citation2006). The prediction results of MaxEnt are reflected by the probability of species occurrence. And the larger the value, the more suitable environmental conditions for species survival. Due to their superior and stable classification performance, three machine learning classifiers, including RF, SVM, and XGBoost, are the most popular and widely used in various applications (Abdi Citation2020; Grabska, Frantz, and Ostapowicz Citation2020).

The four component models are trained and validated using the same samples, and empirically adjustment method was used to obtain the hyperparameters for each classifier. The MaxEnt model classification was implemented on local computer using the maxent software (version 3.4.1), while the three machine learning classifiers were implemented on the GEE platform.

3.5. Classification by multi-classifier fusion

After the classification results of the four component classifiers are obtained, we construct two paradigms to fuse MaxEnt and the three machine learning classifiers parallelly () and serially (), respectively. The integration schematic of the two paradigms is shown in .

Figure 5. Schematic diagram of two decision fusion strategies.

Figure 5. Schematic diagram of two decision fusion strategies.

The first paradigm is called the parallel ensemble model. Classification is performed by four component classifiers parallelly and followed by decision fusion based on a weighted voting strategy. The final fusion objective function is defined as (1).

(1) Y=argmaxYiSj=14logwjli,yiMdj(1)

where S is the set of pixel locations, li,yiMdj denotes a probability predicted by the model j, that a pixel i belongs to a given class yi, and wj is a weight measuring confidence of the model j. Intuitively, the setting of wj is related to the training performance of models. Therefore, weights for each component classifier are calculated based on the training accuracies with wi=AccCi14AccCj and jwj=1.

The second integration paradigm is a serial mode, where the output of the MaxEnt classifier, in the form of class-specific possibilities, is stacked together with Feature set 2 and fed into the three machine learning classifiers, respectively. In this process, the possibility of each class is seemed as a feature.

4. Results

We implemented the proposed classification scheme in Yunnan province. First, we divided Yunnan Province into eight sub-regions. Then, the best classification features were selected for each subzone from the candidate features. Finally, two levels of classification, forest cover and tree species, were executed in each sub-region. In this section, experimental setting, the mapping results of forest cover and species are described.

4.1. Experimental setting and accuracy assessment

Many studies proved that forest-type classification could often be achieved with high accuracy (Grabska et al. Citation2019; Li et al. Citation2022). Therefore, the experiments focus on analyzing the performance of species-level classifications. As shown in , totally 11 feature-classifier combinations are compared to analyze the importance of features and the effectiveness of the proposed fusing strategies. Moreover, visual inspection and quantitative metrics, including confusion matrix and derived Overall Accuracy (OA) and Kappa, are employed for assessing classification performance.

Table 5. Feature-classifier combinations explored in this study.

4.2. Forests cover mapping

shows the number of features selected and the classification accuracy for each sub-region for both levels of forest cover and tree species classification. Depending on the spatial location, the number of features used for forest classification ranged from 8 to 11 dimensions, and the number of features for tree species classification ranged from 26 to 41 dimensions. The OA ranged from 93.91% to 98.06% at the forest level, and from 70.12% to 75.24% at the tree species level.

Figure 6. The numbers of optimal features and overall accuracies in the 8 sub-regions. Darker red tones mean higher accuracy for species classification.

Figure 6. The numbers of optimal features and overall accuracies in the 8 sub-regions. Darker red tones mean higher accuracy for species classification.

The forest and non-forest classification was firstly performed. Since the classification error in this level will transfer to the species-level classification, the classification accuracy is expected as high as possible. In this study, the OA is 96.88%, and the Kappa coefficient is 0.96, according to 27,296 reference samples. The classification results are shown in . The classification results are highly consistent with the forest distribution in the field. Therefore, this result can be used for producing forest mask.

Figure 7. Forest/non-forest classification result of Yunnan province.

Figure 7. Forest/non-forest classification result of Yunnan province.

4.3. Result at species level

In each sub-region, the accuracy of the 11 classification schemes was evaluated based on the validation samples. The OAs of each partition are shown in and .

Table 6. Accuracy assessment for different classification scenarios in each sub-region. The highest accuracies over all sub-regions are in bold.

The accuracies of the seven component classifiers do not exceed 67.40% in all eight partitions, and these accuracies may be insufficient to support further analysis and decision-making. In contrast, the integrated model significantly improves the classification accuracy. Among the integration models, the average accuracy of MXSR over eight regions is 69.51%, and the average accuracy of the three serial integration models, MS, MX, and MR, is higher than 71.2%. Overall, the classification performance of the integrated classifier is much higher than that of the component classifier. Among the ensemble models, three serial integrated ensembles are superior to the parallel integrated model.

By mosaicking the MR classification results of the eight sub-regions, we obtained the spatial distribution map of 19 target forest species in Yunnan Province at 10 m resolution (). The inset details the spatial distribution of tree species in the selected sites. The accuracy was evaluated by 53,934 validation sample points, and the overall accuracy was 72.18% with a kappa coefficient of 0.69. The heat map of confusion matrix is given in , from which the four classes of Larix gmelinii, Hevea brasiliensis, Abies fabri, and Juglans regia achieved high accuracies, while Alnus cremastogyne Burk., Quercus L., and Sassafras tsumu were classified with low accuracies.

Figure 8. The spatial details of the forest stand species map in Yunnan Province in 2016. Site a (103.279°E, 26.851°N); Site b (103.626°E, 25.910°N); Site c (103.031°E, 25.434°N); Site d (104.568°E, 23.026°N); Site e (101.347°E, 21.946°N); Site f (99.173°E, 25.318°N); Site g (99.688°E, 27.743°N).

Figure 8. The spatial details of the forest stand species map in Yunnan Province in 2016. Site a (103.279°E, 26.851°N); Site b (103.626°E, 25.910°N); Site c (103.031°E, 25.434°N); Site d (104.568°E, 23.026°N); Site e (101.347°E, 21.946°N); Site f (99.173°E, 25.318°N); Site g (99.688°E, 27.743°N).

Figure 9. Confusion matrix heatmap of tree species classification.

Figure 9. Confusion matrix heatmap of tree species classification.

5. Discussion

We have divided the Yunnan region into eight sub-regions, with features extracted from S2, SRTM, and WorldClim data, and used four component classifiers to map the spatial distribution of forest stand species. This section discusses the impacts of floristic regionalization, multiple data and the component classifiers and integrated models on the classification performance.

5.1. The necessity of floristic regionalization

Yunnan Province contains a variety of landform types including high mountains, hills, intermountain basins, river valleys, and karsts, as well as seven climate types and seven vegetation types. The corresponding physical differences in climate and biota between the North and South are equivalent to the differences from Hainan Island, China to Changchun, Northeast China (Yang et al. Citation2004). Spatial zoning is a practical solution for large areas with different growth conditions and complex tree species composition.

In summary, the floristic regionalization can bring four obvious strengths. First, zoning according to vegetation zones creates zonation with uniform ecological and spectral characteristics. Compared to the whole area, a sub-region has a higher consistency of geological history, geomorphology, climate, and vegetation composition, reducing spatial and temporal variability in the tree species’ spectral and phenological response (Cano et al. Citation2017). Second, each partition has a different species composition, and feature selection was independently performed for each partition to avoid classification with a set of feature combinations over a very large area. Third, the partitioning increases the proportion of the number of samples and alleviates the spatial and quantitative imbalance of samples to some extent. Fourth, by classifying the eight regions sequentially over the GEE platform, the computational burden and demand on storage are also reduced.

5.2. Variable importance assessment

5.2.1. The importance of Sentinel-2 data in stand species classification

The mean spectral reflectance of the 19 tree species extracted from July 2015 to December 2017 median composite images are shown below in . Although S2 can provide more detailed spectral and spatial information than Landsat images, it is still insufficient to distinguish between tree species. This conclusion is also confirmed by the OA shown in , where the classification accuracy using the features from S2 failed to reach 70%.

Figure 10. The mean spectral reflectance of 19 tree species, calculated from the Sentinel-2 bands.

Figure 10. The mean spectral reflectance of 19 tree species, calculated from the Sentinel-2 bands.

The reflectance differences among the species in the Red-Edge to SWIR1 bands are higher than in other spectral bands, and in this spectral band, the reflectance of broadleaf trees was higher than that of conifers, with higher values for Betula L. and Hevea brasiliensis, while Picea asperata Mast., Abies fabri, and Larix gmelinii had lower reflectance, and in this spectral range Sassafras tsumu and Larix gmelinii in reflectance differed significantly from the other species, indicating the usefulness of the three bands in distinguishing tree species. However, as expected, the reflectance differences between coniferous and broadleaf trees are much smaller in other bands, and there is a large overlap between the different tree species.

To evaluate the contribution of texture and time-series features to classification, based on three feature combinations, we assessed the separability between tree species. Three feature combinations include (a) raw bands and vegetation indices, (b) spectral features and texture features, and (c) spectral features and time series features.

The Jeffries-Matusita (JM) (Ma et al. Citation2021) distance between tree species is calculated and shown in . Higher JM values indicate greater separability of the two classes. In this study, the JM distances are classified into four classes: strongly separable (1.9–2.0), better separable (1.8–1.9), weakly separable (1.7–1.8), and poorly separable (<1.7). The JM distance between two different categories is expected to be greater than 1.8 to obtain a satisfactory classification. Several species do not co-exist in the same sub-region, so the JM distances were not calculated.

Figure 11. JM distances between species with (a) spectral bands and vegetation indices shown in , (b) spectral features and texture features, and (c) spectral features and time series features.

Figure 11. JM distances between species with (a) spectral bands and vegetation indices shown in Table 4, (b) spectral features and texture features, and (c) spectral features and time series features.

In Scheme a, the separability of most tree species pairs is poor due to the high similarity in spectral reflectance of the tree species. In Scheme b, separability between species improved significantly with the inclusion of textural features, but sassafras was still difficult to discriminate from Sassafras tsumu, Pinus yunnanensis, and Alnus cremastogyne Burk.. In Scheme c, the JM distances between species all reached over 1.9, suggesting that the time series feature can increase the separability of species.

5.2.2. The importance of environmental data in stand species classification

Regional arboreal species composition can be largely explained in terms of a long history of biogeographic and evolutionary events. Remote sensing-based tree species classification usually concentrates on data and methods, often ignoring the rich context that environmental data can provide.

In our study, the most important variable for stand species classification is Elevation, ranked highest in evaluation quantitative importance scores in all eight sub-regions. By overlaying the classification results with the elevation maps, the elevation percentage map of each tree species was obtained (), which also reflected the differences in elevation distribution of each tree species. For example, Larix gmelinii and Abies fabri are distributed above 3200 m, while Hevea brasiliensis and Sassafras tsumu occur at lower elevations below 1500 m. Numerous studies confirm that topographic features facilitate tree species classification, especially for species that follow a natural height gradient (Grabska, Frantz, and Ostapowicz Citation2020).

Figure 12. Distribution map of forest stand species in the altitude range.

Figure 12. Distribution map of forest stand species in the altitude range.

Each species has its own preferred temperature range, acceptable temperature extremes, and rainfall. A large amount of literature indicates that climate conditions are the key factor determining the large-scale distribution patterns of organisms (Datta, Schweiger, and Kühn Citation2020). Our study uses environmental factors and the MaxEnt model to assign the probability of tree species occurrence for each geographical location. As shown in , adding these probability layers in two forms to the process based on remote sensing image classification can improve classification accuracy. The constraint of environmental factors on species distribution increases classification performance. shows that the classification accuracy of the categories with higher environmental constraints is also higher, such as Hevea brasiliensis, Abies fabri, and Larix gmelinii. Other species may be excluded due to lower environmental tolerance in the growing areas of these species, making these species have higher homogeneity. However, in our study, the range of some tree species was over-predicted. The climate variables used indicate that these tree species should have a wider geographical range than in reality, which may be because the climate factors we used are not the main limiting factors for the geographical range of tree species (Wiens and Graham Citation2005). In addition, the characteristics of the environmental factors used have a resolution of 1 km, and the low resolution ignores the micro-scale processes of local effects of tree species. Many tree species require specific small-scale habitat attributes (Sinclair, White, and Newell Citation2010). We believe that future research needs to consider more ecological and geographical factors, such as the CHELSA dataset, which contains explicit indices for many ecological and physiological processes. Our test results in local areas show that this data can provide more accurate distribution ranges. Of course, to generate high-precision spatiotemporal microclimate, it may be necessary to combine global climate datasets with in situ microclimate measurements, long-term meteorological station data, and high-resolution remote sensing data (Bobrowski, Weidinger, and Schickhoff Citation2021).

Table 7. Comparison of classification accuracies of different methods in each sub-region.

5.3. Species mapping performance by fusion multiple models

To evaluate the mapping performance of the MR serial model, we compare the classification performance of the M, ROPT, and MR models in this section. The MR method achieves the best accuracy in the vast majority of sub-regions (7 out of 8). visually demonstrates the advantages of the MR integrated model. The accuracy difference between MR and M for the eight sub-regions is from 21.00% to 31.06% with an average difference of 25.39%. The difference in accuracy with ROPT is 11.63% at the maximum and 5.47% at the minimum. Although the M classifier obtained the lowest OA in each partition, adding the classification results of the M classifier as features to ROPT resulted in a better classifier than the component classifier MR. The great advantage in classification accuracy proves the effectiveness and potential of the MR-integrated model for mapping tree species in forest stands. In addition, the classification accuracy of the MR serial integration method was higher than 70% in 8 partitions with different ecological conditions, which proved that the method can be used for tree stand species mapping under diverse ecological conditions with good stability.

shows the S2 false color images (the combination of bands 3, 4, and 8), the FMI data, and the classification results of the three methods, from four selected regions for visual evaluation. We can find that the classification results of different classifiers differed greatly. The M classifier with low resolution environmental data for classification predicted poorly that the geographically expressed area of each tree species was larger than its actual distribution area, which is consistent with the conclusions reached in previous studies (Chandra et al. Citation2021; Pshegusov et al. Citation2022). The RF model, which uses higher resolution remote sensing data for classification, provides finer details than the M model. However, there are still a lot of misclassifications and noise due to the low separability of tree species. The MR integrated by both can alleviate the problems of both, the integrated classifier MR achieves the best classification result, and its classification results are more consistent with the FMI data. MR by taking advantage of different data and classifiers. Environmental data provide a geographical representation of the ecological niche of tree species, and remote sensing images provide macroscopic and fine scale spectral, textural, and phenological differences of tree species. Orchestrating the interplay of various data and theories with modeling has been identified as a promising approach to obtaining species information (Maréchaux et al. Citation2021). Studies on species distribution and invasions (Engler et al. Citation2013; Kattenborn et al. Citation2019) also confirm that using integrated datasets and combining ecological niche models and machine learning algorithms can benefit improve classification accuracy.

Figure 13. The spatial details of classification results in forest stand species. (a) Original image, (b) the Forest Management Inventory data, (c) M classification, (d) ROPT classification, and (e) MR classification.

Figure 13. The spatial details of classification results in forest stand species. (a) Original image, (b) the Forest Management Inventory data, (c) M classification, (d) ROPT classification, and (e) MR classification.

In addition, we collected typical tree species classification cases in recent years and compared the results of the cases with ours. A brief overview is shown in , and more detailed information can be found in the literature to read the original content (Bjerreskov, Nord-Larsen, and Fensholt Citation2021; Boschetti et al. Citation2007; Engler et al. Citation2013; Fang et al. Citation2020; Grabska, Frantz, and Ostapowicz Citation2020; Grabska et al. Citation2019; Hościło and Lewandowska Citation2019; Immitzer, Atzberger, and Koukal Citation2012; Ke, Quackenbush, and Im Citation2010; Li, Hu, and Noland Citation2013; Ma et al. Citation2021; Plakman et al. Citation2020; Shirazinejad, Zoej, and Latifi Citation2022; Sun et al. Citation2019; Wang and Ren Citation2021; Wessel, Brandmeier, and Tiede Citation2018).

Figure 14. Brief review on research cases in tree species classification.

Figure 14. Brief review on research cases in tree species classification.

Because these cases tested different area sizes, classification methods and tree species, it is not reasonable to directly compare precision. However, we can still compare individual study cases in terms of the following aspects. The species number, reported in listed study cases ranged from 3 to 19 species, the area tested ranged from 5.23 km2 to 43,000 km2, the resolution of the data used ranged from a maximum of 0.1 m to a minimum of 16 m, and the overall accuracies ranged from 61.3% to 97%. Compared with the listed cases, the number of tree species (19) and the size of the study area (394100 km2) in our study have a definite advantage. The overall accuracy of our study was 72.18%, which is low compared to some study cases. However, we did not use ultra-high resolution or additional input data (e.g. LiDAR). This accuracy is acceptable given the scope and class numbers of the study. This precision is significantly better than the results reported for the 19 tree species study (61.3%) (Fang et al. Citation2020) and comparable to the reported for the 18 tree species cases (73.25%) (Sun et al. Citation2019), but the area of the two cases is 8.6 km2 and 177 km2, respectively, which is much smaller than our study. In contrast, the proposed method can achieve accurate and stable mapping of tree species distribution in large areas under different ecological conditions.

5.4. Limitations and future work

The variety of multiple data in terms of spatial resolution, acquisition date and times may affect the classification performance and the spatial distribution patterns of the categories.

In the GEE platform, based on the geographical reference, multiple features extracted from datasets, including S2, SRTM DEM, and WorldClim bioclimatic, can be easily resampled to the same pixel size and stacked together. However, the up-sampling process of low-resolution data sets cannot provide extra information than the original data set. The resolution is usually coarse, and they often do not really describe the spatial distribution of tree species but rather to predict habitat suitability conditions at a given site (Engler et al. Citation2013). This is also the main reason that the probabilistic output of MaxEnt model was seen as an extra feature set, rather than directly assigned as a component classifier like RF or SVM. As we can see through , when the low spatial resolution probability map output from the MaxEnt model is integrated with the machine learning model, it significantly changes the spatial pattern of the output map, and species with fewer samples may be smoothed out by the dominant category. While this improves classification accuracy overall, we believe that the effect of this integration is not always positive, and that the introduction of the low spatial resolution probability maps output by the MaxEnt model may reduce the classification accuracy of certain spatial locations and specific categories, and reduce the distribution area of small sample categories.

Expected for spatial mismatch, the temporal mismatch among multiple data is another factor that may increase uncertainties of classification. To avoid missing some key tree species phenology, we decided to construct 5-day intensive time series data. However, in most areas of the study area, data from both S2A/B satellites still do not allow for 5-day repeat observations, and until March 2017, a single satellite required 10 days, which combined with the cloudy and rainy climate, resulting in fewer pixels available for image acquisition. To construct High-Spatiotemporal-Resolution time series data, we used S2 data for 5 years from 2016 to 2020, which were synthesized according to the time of image acquisition in the year (DOY, Day of Year) to construct a 5-day dense NDVI and REP time series. In this study, we used sample data from the 2016 FMI. This operation led to the mismatch in acquisition time between the sample data and the imagery used. This mismatch may introduce uncertainties.

In this study, we predicted the distribution of 19 tree species in Yunnan by integrating machine learning classifiers and ecological niche models and further experiments and validations can be conducted in other types of forests in the future. We used only four models in this experiment, and more models can be introduced in the future to increase the diversity and complementarity of models. In addition, with growing data availability, more eco-physiological and remote sensing data can be integrated into the models. This process will benefit species classification by a better understanding of the species physiological limits and selecting features more sensitive to species range limits.

6. Conclusions

Detailed and accurate mapping of forest stand species has an important practical need. To achieve effective mapping of the mountain vegetation, it is necessary to fully consider the ecological characteristics of tree species and the displayed remote sensing characteristics and to use multiple data and classification methods for integrated discrimination. In this study, the spatial distribution of 19 stand tree species in Yunnan Province was mapped based on the GEE platform, in collaboration with remote sensing image machine learning algorithm classification and ecological niche modeling, with an overall accuracy of 72.18% assessed through 53,934 validation sample points. Higher classification accuracy was obtained by integrating decision fusion models from both domains compared to supervised classification or ecological niche modeling of environmental data. The study shows that integrating machine learning algorithms and ecological niche models is effective in regions with high environmental heterogeneity. Spectral, spatial, and temporal extracted from remote sensing data and various environmental variables contribute to tree species classification.

Supplemental material

Supplemental Material

Download MS Word (8.9 MB)

Acknowledgments

We thank anonymous reviewers and editors for their valubale comments and suggestions, which greatly improved the quality of our maniucrpt.

Disclosure statement

No potential conflict of interest was reported by the authors.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/15481603.2023.2211881.

Additional information

Funding

The work was supported by the National Natural Science Foundation of China [32160369]; Major Scientific and Technological Projects of Yunnan Province [202202AD080010]; National Natural Science Foundation of China [31860182]; National Natural Science Foundation of China [32060320]; National Natural Science Foundation of China [41961053].

References

  • Abatzoglou, J. T., S. Z. Dobrowski, S. A. Parks, and K. C. Hegewisch. 2018. “TerraClimate, a High-Resolution Global Dataset of Monthly Climate and Climatic Water Balance from 1958–2015.” Scientific Data 5 (1): 1–24. doi:10.1038/sdata.2017.191.
  • Abdi, A. M. 2020. “Land Cover and Land Use Classification Performance of Machine Learning Algorithms in a Boreal Landscape Using Sentinel-2 Data.” GIScience & Remote Sensing 57 (1): 1–20. doi:10.1080/15481603.2019.1650447.
  • Ahamed, T., L. Tian, Y. Zhang, and K. C. Ting. 2011. “A Review of Remote Sensing Methods for Biomass Feedstock Production.” Biomass & bioenergy 35 (7): 2455–2469. doi:10.1016/j.biombioe.2011.02.028.
  • Alonso, L., J. Picos, and J. Armesto. 2021. “Forest Cover Mapping and Pinus Species Classification Using Very High-Resolution Satellite Images and Random Forest.” ISPRS Annals of the Photogrammetry, Remote Sensing & Spatial Information Sciences 3: 203–210. doi:10.5194/isprs-annals-V-3-2021-203-2021.
  • Bhatt, P., A. Maclean, Y. Dickinson, and C. Kumar. 2022. “Fine-Scale Mapping of Natural Ecological Communities Using Machine Learning Approaches.” Remote Sensing 14 (3): 563. doi:10.3390/rs14030563.
  • Bjerreskov, K. S., T. Nord-Larsen, and R. Fensholt. 2021. “Classification of Nemoral Forests with Fusion of Multi-Temporal Sentinel-1 and 2 Data.” Remote Sensing 13 (5): 950. doi:10.3390/rs13050950.
  • Bobrowski, M., J. Weidinger, and U. Schickhoff. 2021. “Is New Always Better? Frontiers in Global Climate Datasets for Modeling Treeline Species in the Himalayas.” Atmosphere 12: 543. doi:10.3390/atmos12050543.
  • Boschetti, M., L. Boschetti, S. Oliveri, L. Casati, and I. Canova. 2007. “Tree Species Mapping with Airborne Hyper‐spectral MIVIS Data: The Ticino Park Study Case.” International Journal of Remote Sensing 28 (6): 1251–1261. doi:10.1080/01431160600928542.
  • Cano, E., J. -P. Denux, M. Bisquert, L. Hubert-Moy, and V. Chéret. 2017. “Improved Forest-Cover Mapping Based on MODIS Time Series and Landscape Stratification.” International Journal of Remote Sensing 38 (7): 1865–1888. doi:10.1080/01431161.2017.1280635.
  • Carlson, T. N., and D. A. Ripley. 1997. “On the Relation Between NDVI, Fractional Vegetation Cover, and Leaf Area Index.” Remote Sensing of Environment 62 (3): 241–252. doi:10.1016/S0034-4257(97)00104-1.
  • Chandrasekar, K., M. Sesha Sai, P. Roy, and R. S. Dwevedi. 2010. “Land Surface Water Index (LSWI) Response to Rainfall and NDVI Using the MODIS Vegetation Index Product.” International Journal of Remote Sensing 31 (15): 3987–4005. doi:10.1080/01431160802575653.
  • Chandra, N., G. Singh, S. Lingwal, J. S. Jalal, M. S. Bisht, V. Pal, M. P. S. Bisht, B. Rawat, and L. M. Tiwari. 2021. “Ecological Niche Modeling and Status of Threatened Alpine Medicinal Plant Dactylorhiza Hatagirea D. Don in Western Himalaya.” Journal of Sustainable Forestry 41 (10): 1–17. doi:10.1080/10549811.2021.1923530.
  • Costa, H., P. Benevides, F. D. Moreira, D. Moraes, and M. Caetano. 2022. “Spatially Stratified and Multi-Stage Approach for National Land Cover Mapping Based on Sentinel-2 Data and Expert Knowledge.” Remote Sensing 14 (8): 1865. doi:10.3390/rs14081865.
  • Dash, J., and P. Curran. 2004. “The MERIS Terrestrial Chlorophyll Index.” International Journal of Remote Sensing 25 (23): 5403–5413. doi:10.1080/0143116042000274015.
  • Datta, A., O. Schweiger, and I. Kühn. 2020. “Origin of Climatic Data Can Determine the Transferability of Species Distribution Models.” NeoBiota 59: 61–76. doi:10.3897/neobiota.59.36299.
  • Duan, B., S. Fang, Y. Gong, Y. Peng, X. Wu, and R. Zhu. 2021. “Remote Estimation of Grain Yield Based on UAV Data in Different Rice Cultivars Under Contrasting Climatic Zone.” Field Crops Research 267: 108148. doi:10.1016/j.fcr.2021.108148.
  • Engler, R., L. T. Waser, N. E. Zimmermann, M. Schaub, S. Berdos, C. Ginzler, and A. Psomas. 2013. “Combining Ensemble Modeling and Remote Sensing for Mapping Individual Tree Species at High Spatial Resolution.” Forest Ecology and Management 310: 64–73. doi:10.1016/j.foreco.2013.07.059.
  • Fang, F., B. E. McNeil, T. A. Warner, A. E. Maxwell, G. A. Dahle, E. Eutsler, and J. Li. 2020. “Discriminating Tree Species at Different Taxonomic Levels Using Multi-Temporal WorldView-3 Imagery in Washington DC, USA.” Remote Sensing of Environment 246: 111811. doi:10.1016/j.rse.2020.111811.
  • Fang, P., X. Zhang, P. Wei, Y. Wang, H. Zhang, F. Liu, and J. Zhao. 2020. “The Classification Performance and Mechanism of Machine Learning Algorithms in Winter Wheat Mapping Using Sentinel-2 10 M Resolution Imagery.” Applied Sciences 10 (15): 5075. doi:10.3390/app10155075.
  • Fassnacht, F. E., H. Latifi, K. Stereńczak, A. Modzelewska, M. Lefsky, L. T. Waser, C. Straub, and A. Ghosh. 2016. “Review of Studies on Tree Species Classification from Remotely Sensed Data.” Remote Sensing of Environment 186: 64–87. doi:10.1016/j.rse.2016.08.013.
  • Fassnacht, F. E., C. Neumann, M. Forster, H. Buddenbaum, A. Ghosh, A. Clasen, P. K. Joshi, and B. Koch. 2014. “Comparison of Feature Reduction Algorithms for Classifying Tree Species with Hyperspectral Data on Three Central European Test Sites.” IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 7 (6): 2547–2561. doi:10.1109/jstars.2014.2329390.
  • Feng, B., C. Zheng, W. Zhang, L. Wang, and C. Yue. 2020. “Analyzing the Role of Spatial Features When Cooperating Hyperspectral and LiDar Data for the Tree Species Classification in a Subtropical Plantation Forest Area.” Journal of Applied Remote Sensing 14 (02): 022213. doi:10.1117/1.JRS.14.022213.
  • Gao, B. -C. 1996. “NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space.” Remote Sensing of Environment 58 (3): 257–266. doi:10.1016/S0034-4257(96)00067-3.
  • Gong, P., J. Wang, L. Yu, Y. Zhao, Y. Zhao, L. Liang, Z. Niu, et al. 2013. “Finer Resolution Observation and Monitoring of Global Land Cover: First Mapping Results with Landsat TM and ETM+ Data.” International Journal of Remote Sensing 34 (7): 2607–2654. doi:10.1080/01431161.2012.748992.
  • Grabska, E., D. Frantz, and K. Ostapowicz. 2020. “Evaluation of Machine Learning Algorithms for Forest Stand Species Mapping Using Sentinel-2 Imagery and Environmental Data in the Polish Carpathians.” Remote Sensing of Environment 251: 112103. doi:10.1016/j.rse.2020.112103.
  • Grabska, E., P. Hostert, D. Pflugmacher, and K. Ostapowicz. 2019. “Forest Stand Species Mapping Using the Sentinel-2 Time Series.” Remote Sensing 11 (10): 1197. doi:10.3390/rs11101197.
  • Guo, Y., Z. Li, E. Chen, X. Zhang, L. Zhao, E. Xu, Y. Hou, and R. Sun. 2020. “An End-To-End Deep Fusion Model for Mapping Forests at Tree Species Levels with High Spatial Resolution Satellite Imagery.” Remote Sensing 12 (20): 3324. doi:10.3390/rs12203324.
  • Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, et al. 2013. “High-Resolution Global Maps of 21st-Century Forest Cover Change.” Science 342 (6160): 850–853. doi:10.1126/science.1244693.
  • Hijmans, R. J., S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis. 2005. “Very High Resolution Interpolated Climate Surfaces for Global Land Areas.” International Journal of Climatology: A Journal of the Royal Meteorological Society 25 (15): 1965–1978. doi:10.1002/joc.1276.
  • Hościło, A., and A. Lewandowska. 2019. “Mapping Forest Type and Tree Species on a Regional Scale Using Multi-Temporal Sentinel-2 Data.” Remote Sensing 11 (8): 929. doi:10.3390/rs11080929.
  • Hua, Z., and M. Heil. 2017. “Biogeographical Divergence of the Flora of Yunnan, Southwestern China Initiated by the Uplift of Himalaya and Extrusion of Indochina Block.” PLos One 7 (9): 7. doi:10.1371/journal.pone.0045601.
  • Immitzer, M., C. Atzberger, and T. Koukal. 2012. “Tree Species Classification with Random Forest Using Very High Spatial Resolution 8-Band WorldView-2 Satellite Data.” Remote Sensing 4 (9): 2661–2693. doi:10.3390/rs4092661.
  • Kattenborn, T., J. Lopatin, M. Förster, A. C. Braun, and F. E. Fassnacht. 2019. “UAV Data as Alternative to Field Sampling to Map Woody Invasive Species Based on Combined Sentinel-1 and Sentinel-2 Data.” Remote Sensing of Environment 227: 61–73. doi:10.1016/j.rse.2019.03.025.
  • Ke, Y., L. J. Quackenbush, and J. Im. 2010. “Synergistic Use of QuickBird Multispectral Imagery and LIDAR Data for Object-Based Forest Species Classification.” Remote Sensing of Environment 114 (6): 1141–1154. doi:10.1016/j.rse.2010.01.002.
  • Kollert, A., M. Bremer, M. Löw, and M. Rutzinger. 2021. “Exploring the Potential of Land Surface Phenology and Seasonal Cloud Free Composites of One Year of Sentinel-2 Imagery for Tree Species Mapping in a Mountainous Region.” International Journal of Applied Earth Observation and Geoinformation 94: 102208. doi:10.1016/j.jag.2020.102208.
  • LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–444. doi:10.1038/nature14539.
  • Li, X., and W. D. 1986. “The Plant Geography of Yunnan Province, Southwest China.” I Historical Biogeography: Journal of Biogeography 13 (5): 367–397. doi:10.2307/2844964.
  • Li, R., P. Fang, W. Xu, L. Wang, G. Ou, W. Zhang, and X. Huang. 2022. “Classifying Forest Types Over a Mountainous Area in Southwest China with Landsat Data Composites and Multiple Environmental Factors.” Forests 13 (1): 135. doi:10.3390/f13010135.
  • Li, J., B. Hu, and T. L. Noland. 2013. “Classification of Tree Species Based on Structural Features Derived from High Density LiDar Data.” Agricultural and Forest Meteorology 171-172: 104–114. doi:10.1016/j.agrformet.2012.11.012.
  • Li, R., N. J. Kraft, J. Yang, and Y. Wang. 2015. “A Phylogenetically Informed Delineation of Floristic Regions Within a Biodiversity Hotspot in Yunnan, China.” Scientific Reports 5 (1): 1–7. doi:10.1016/S0034-4257(96)00178-2.
  • Li, W., and S. Pei. 1991. “A Study of Flora of Yunnan.” Guihaia 11: 293–303.
  • Liu, H., P. Gong, J. Wang, X. Wang, G. Ning, and B. Xu. 2021. “Production of Global Daily Seamless Data Cubes and Quantification of Global Land Cover Change from 1985 to 2020-iMap World 1.0.” Remote Sensing of Environment 258: 112364. doi:10.1016/j.rse.2021.112364.
  • Liu, F., J. Hu, F. Yang, and X. Li. 2021. “Heterogeneity-Diversity Relationships in Natural Areas of Yunnan, China.” Chinese Geographical Science 31 (3): 506–521. doi:10.1007/s11769-021-1207-7.
  • Liu, Z., and H. Peng. 2016. “Notes on the Key Role of Stenochoric Endemic Plants in the Floristic Regionalization of Yunnan.” Plant Diversity 38 (6): 289–294. doi:10.1016/S0034-4257(96)00067-3.
  • Long, T., Z. Zhang, G. He, W. Jiao, C. Tang, B. Wu, X. Zhang, G. Wang, and R. Yin. 2019. “30 M Resolution Global Annual Burned Area Mapping Based on Landsat Images and Google Earth Engine.” Remote Sensing 11 (5): 489. doi:10.3390/rs11050489.
  • Main, R., M. A. Cho, R. Mathieu, M. M. O’Kennedy, A. Ramoelo, and S. Koch. 2011. “An Investigation into Robust Spectral Indices for Leaf Chlorophyll Estimation.” Isprs Journal of Photogrammetry & Remote Sensing 66 (6): 751–761. doi:10.1016/j.isprsjprs.2011.08.001.
  • Ma, M., J. Liu, M. Liu, J. Zeng, and Y. Li. 2021. “Tree Species Classification Based on Sentinel-2 Imagery and Random Forest Classifier in the Eastern Regions of the Qilian Mountains.” Forests 12 (12): 1736. doi:10.3390/f12121736.
  • Maréchaux, I., F. Langerwisch, A. Huth, H. Bugmann, X. Morin, C. P. O. Reyer, R. Seidl, et al. 2021. “Tackling Unresolved Questions in Forest Ecology: The Past and Future Role of Simulation Models.” Ecology & Evolution 11 (9): 3746–3770. doi:10.1002/ece3.7391.
  • Michaowska, M., and J. Rapiński. 2021. “A Review of Tree Species Classification Based on Airborne LiDar Data and Applied Classifiers.” Remote Sensing 13 (3): 353. doi:10.3390/rs13030353.
  • Moraes, D., P. Benevides, H. Costa, F. D. Moreira, and M Caetano. 2021. “Assessment of the Introduction of Spatial Stratification and Manual Training in Automatic Supervised Image Classification.” In Earth Resources and Environmental Remote Sensing/GIS Applications XII, LNEE, 291–298. Singapore: SPIEdoi:10.1007/978-981-16-0289-4_39.
  • Mouta, N., R. Silva, S. Pais, J. M. Alonso, J. F. Gonçalves, J. Honrado, and J. R. Vicente. 2021. “‘The Best of Two Worlds’—Combining Classifier Fusion and Ecological Models to Map and Explain Landscape Invasion by an Alien Shrub.” Remote Sensing 13 (16): 3287. doi:10.3390/rs13163287.
  • Myers, N., R. A. Mittermeier, C. G. Mittermeier, G. A. B. da Fonseca, and J. Kent. 2000. “Biodiversity Hotspots for Conservation Priorities.” Nature 403 (6772): 853–858. doi:10.1038/35002501.
  • Oreopoulos, L., M. J. Wilson, and T. Várnai. 2011. “Implementation on Landsat Data of a Simple Cloud-Mask Algorithm Developed for MODIS Land Bands.” IEEE Geoscience & Remote Sensing Letters 8 (4): 597–601. doi:10.1109/LGRS.2010.2095409.
  • Ørka, H. O., M. Dalponte, T. Gobakken, E. Næsset, and L. T. Ene. 2013. “Characterizing Forest Species Composition Using Multiple Remote Sensing Data Sources and Inventory Approaches.” Scandinavian Journal of Forest Research 28 (7): 677–688. doi:10.1080/02827581.2013.793386.
  • Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. “Maximum Entropy Modeling of Species Geographic Distributions.” Ecological Modelling 190 (3–4): 231–259. doi:10.1016/j.ecolmodel.2005.03.026.
  • Plakman, V., T. Janssen, N. Brouwer, and S. Veraverbeke. 2020. “Mapping Species at an Individual-Tree Scale in a Temperate Forest, Using Sentinel-2 Images, Airborne Laser Scanning Data, and Random Forest Classification.” Remote Sensing 12 (22): 3710. doi:10.3390/rs12223710.
  • Pshegusov, R., F. Tembotova, V. Chadaeva, Y. Sablirova, M. Mollaeva, and A. Akhomgotov. 2022. “Ecological Niche Modeling of the Main Forest-Forming Species in the Caucasus.” Forest Ecosystems 9: 100019. doi:10.1016/j.fecs.2022.100019.
  • Richardson, A. D., S. P. Duigan, and G. P. Berlyn. 2002. “An Evaluation of Noninvasive Methods to Estimate Foliar Chlorophyll Content.” The New Phytologist 153 (1): 185–194. doi:10.3390/rs12203324.
  • Rozenstein, O., N. Haymann, G. Kaplan, and J. Tanny. 2019. “Validation of the Cotton Crop Coefficient Estimation Model Based on Sentinel-2 Imagery and Eddy Covariance Measurements.” Agricultural Water Management 223: 105715. doi:10.1016/j.agwat.2019.105715.
  • Schlerf, M., C. Atzberger, and J. Hill. 2005. “Remote Sensing of Forest Biophysical Variables Using HyMap Imaging Spectrometer Data.” Remote Sensing of Environment 95 (2): 177–194. doi:10.1016/j.rse.2004.12.016.
  • Shirazinejad, G., M. Zoej, and H. Latifi. 2022. “Applying Multidate Sentinel-2 Data for Forest-Type Classification in Complex Broadleaf Forest Stands.” Forestry: An International Journal of Forest Research 95 (3): 363–379. doi:10.1093/forestry/cpac001.
  • Sinclair, S. J., M. D. White, and G. R. Newell. 2010. “How Useful are Species Distribution Models for Managing Biodiversity Under Future Climates?” Ecology and Society 15. doi:10.5751/ES-03089-150108.
  • Sun, Y., J. Huang, Z. Ao, D. Lao, and Q. Xin. 2019. “Deep Learning Approaches for the Mapping of Tree Species Diversity in a Tropical Wetland Using Airborne LiDar and High-Spatial-Resolution Remote Sensing Images.” Forests 10 (11): 1047. doi:10.3390/f10111047.
  • Tamiminia, H., B. Salehi, M. Mahdianpari, L. Quackenbush, S. Adeli, and B. Brisco. 2020. “Google Earth Engine for Geo-Big Data Applications: A Meta-Analysis and Systematic Review.” Isprs Journal of Photogrammetry & Remote Sensing 164: 152–170. doi:10.1016/j.isprsjprs.2020.04.001.
  • Wang, X., and H. Ren. 2021. “DBMF: A Novel Method for Tree Species Fusion Classification Based on Multi-Source Images.” Forests 13 (1): 33. doi:10.3390/f13010033.
  • Wan, H., Y. Tang, L. Jing, H. Li, F. Qiu, and W. Wu. 2021. “Tree Species Classification of Forest Stands Using Multisource Remote Sensing Data.” Remote Sensing 13 (1): 144. doi:10.3390/rs13010144.
  • Waser, L. T., C. Ginzler, M. Kuechler, E. Baltsavias, and L. Hurni. 2011. “Semi-Automatic Classification of Tree Species in Different Forest Ecosystems by Spectral and Geometric Variables Derived from Airborne Digital Sensor (ADS40) and RC30 Data.” Remote Sensing of Environment 115 (1): 76–85. doi:10.1016/j.rse.2010.08.006.
  • Welle, T., L. Aschenbrenner, K. Kuonath, S. Kirmaier, and J. Franke. 2022. “Mapping Dominant Tree Species of German Forests.” Remote Sensing 14 (14): 3330. doi:10.3390/rs14143330.
  • Wessel, M., M. Brandmeier, and D. Tiede. 2018. “Evaluation of Different Machine Learning Algorithms for Scalable Classification of Tree Types and Tree Species Based on Sentinel-2 Data.” Remote Sensing 10 (9): 1419. doi:10.3390/rs10091419.
  • Wiens, J. J., and C. H. Graham. 2005. “Niche Conservatism: Integrating Evolution, Ecology, and Conservation Biology.” Annual Review of Ecology, Evolution, and Systematics 36 (1): 519–539. doi:10.1146/annurev.ecolsys.36.102803.095431.
  • Wu, C., Z. Niu, Q. Tang, and W. Huang. 2008. “Estimating Chlorophyll Content from Hyperspectral Vegetation Indices: Modeling and Validation.” Agricultural and Forest Meteorology 148 (8–9): 1230–1241. doi:10.1016/j.agrformet.2008.03.005.
  • Yang, Y., K. Tian, J. Hao, S. Pei, and Y. Yang. 2004. “Biodiversity and Biodiversity Conservation in Yunnan, China.” Biodiversity & Conservation 13 (4): 813–826. doi:10.1023/B:BIOC.0000011728.46362.3c.
  • You, N., J. Dong, J. Huang, G. Du, G. Zhang, Y. He, T. Yang, Y. Di, and X. Xiao. 2021. “The 10-M Crop Type Maps in Northeast China During 2017–2019.” Scientific Data 8 (1): 1–11. doi:10.1038/s41597-021-00827-9.
  • Yu, X., D. Lu, X. Jiang, G. Li, Y. Chen, D. Li, and E. Chen. 2020. “Examining the Roles of Spectral, Spatial, and Topographic Features in Improving Land-Cover and Forest Classifications in a Subtropical Region.” Remote Sensing 12 (18): 2907. doi:10.3390/rs12182907.
  • Zeb, S. A., S. M. Khan, and Z. Ahmad. 2021. “Phytogeographic Elements and Vegetation Along the River Panjkora - Classification and Ordination Studies from the Hindu Kush Mountains Range.” The Botanical Review 87 (4): 1–25. doi:10.1007/s12229-021-09247-1.
  • Zhang, Y., I. O. Odeh, and C. Han. 2009. “Bi-Temporal Characterization of Land Surface Temperature in Relation to Impervious Surface Area, NDVI and NDBI, Using a Sub-Pixel Image Analysis.” International Journal of Applied Earth Observation and Geoinformation 11 (4): 256–264. doi:10.1016/j.jag.2009.03.001.
  • Zhang, B., L. Zhao, and X. Zhang. 2020. “Three-Dimensional Convolutional Neural Network Model for Tree Species Classification Using Airborne Hyperspectral Images.” Remote Sensing of Environment 247: 111938. doi:10.1016/j.rse.2020.111938.
  • Zheng, G., A. Bao, X. Li, L. Jiang, C. Chang, T. Chen, and Z. Gao. 2019. “The Potential of Multispectral Vegetation Indices Feature Space for Quantitatively Estimating the Photosynthetic, Non-Photosynthetic Vegetation and Bare Soil Fractions in Northern China.” Photogrammetric Engineering & Remote Sensing 85 (1): 65–76. doi:10.14358/PERS.85.1.65.
  • Zhong, L., L. Hu, and H. Zhou. 2019. “Deep Learning Based Multi-Temporal Crop Classification.” Remote Sensing of Environment 221: 430–443. doi:10.1016/j.rse.2018.11.032.
  • Zhou, Y., Z. Zhang, B. Zhu, X. Cheng, L. Yang, M. Gao, and R. Kong. 2021. “MaxEnt Modeling Based on CMIP6 Models to Project Potential Suitable Zones for Cunninghamia Lanceolata in China.” Forests 12 (6): 752. doi:10.3390/f12060752.