602
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An efficient and accurate deep learning method for tree species classification that integrates depthwise separable convolution and dilated convolution using hyperspectral data

, ORCID Icon, , ORCID Icon, ORCID Icon, ORCID Icon, , & ORCID Icon show all
Article: 2307999 | Received 09 Oct 2023, Accepted 16 Jan 2024, Published online: 23 Jan 2024

ABSTRACT

Addressing accuracy and computational complexity challenges in hyperspectral image classification for small sample and multi-species scenarios, we developed DSC-DC, a lightweight convolutional neural network. This model is based on Depthwise Separable Convolution and Dilated Convolution and was trained using the Teakettle Experimental Forest dataset (USA). In this study, DSC-DC achieved an overall accuracy (OA) of 99.83%, average accuracy (AA) of 99.64%, and Kappa coefficient of 0.9996. Compared to Support Vector Machine and K-Nearest Neighbors, it demonstrated markedly higher OA (3.88% to 7.55%) and AA (30.71% to 34.09%). Compared to Inception-V3, ResNet50, and MSR-3DCNN, DSC-DC marginally outperformed in accuracy (OA: 0.06% to 0.31%; AA: 0.32% to 3.64%) while reducing the training time by 3.5, 5, and 35 times, and the prediction time by 2, 3, and 17 times, respectively. Moreover, DSC-DC exhibits slightly superior accuracy and efficiency compared to a 5-layer optimal structure of the 3D-CNN model. The application of the DSC-DC model to the hyperspectral dataset from the Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China, further demonstrated the reliability, versatility, and practical potential of this model. This study provides a reliable and efficient reference solution for small-sample and multi-tree classification tasks.

1. Introduction

Forests, as essential terrestrial ecosystems, provide crucial services and resources for human development and well-being (Mohd Zaki and Abd Latif Citation2017). In the context of climate change and its potential impacts on forest ecosystems, it is crucial to recognize that climate change may result in shifts in the distribution range and suitable growth areas of tree species, thereby impacting the composition and distribution of forests (Chen et al. Citation2011; Iverson and McKenzie Citation2013; Thompson et al. Citation2009). Timely and accurate tree species classification and mapping aid forest managers in understanding forest resource distribution, predicting vegetation changes, optimizing management plans, and formulating sustainable development strategies (Banskota, Wynne, and Kayastha Citation2011; Boonprong et al. Citation2018; Shang and Chisholm Citation2013). In recent years, the continuous improvement and development of remote sensing technology has provided reliable sources of data for a more efficient classification of tree species. Remote sensing technology overcomes the constraints of traditional field surveys, including the extensive manpower and time requirements and the limitations imposed by terrain, climate, and safety factors. This technology enables rapid large-scale tree species classification and assessment, assisting forest managers in making timely and informed decisions (Hovi et al. Citation2016; Reitberger, Krzystek, and Stilla Citation2008; Suratno, Seielstad, and Queen Citation2009; Yu et al. Citation2014).

It has been shown that different tree species have unique morphological structures (e.g. tree height, crown shape, branch density, and leaf cover) and biochemical uptake characteristics (e.g. chlorophyll, carotenoids, and proteins). These features affect the spectral response of tree species in the different bands of remote sensing data (Asner Citation1998; Asner et al. Citation2015; Clark, Roberts, and Clark Citation2005). Therefore, learning and utilizing these subtle spectral differences is crucial for improving the accuracy of tree species classification (Fricker et al. Citation2019). Visible light and multispectral data have been proven to be useful for tree species classification and individual tree segmentation (Ferreira et al. Citation2019; Nezami et al. Citation2020; Pölönen et al. Citation2018; Valderrama-Landeros et al. Citation2017; Yu et al. Citation2017). However, it is challenging to distinguish the subtle spectral differences between different tree species types due to the limited and discontinuous spectral bands, which cause the absence of important spectral features. Therefore, previous classification results obtained using visible light and multispectral data for multiple tree species were often unsatisfactory.

The hyperspectral data provide more detailed spectral information, enabling the capture of subtle spectral differences related to biogeochemical characteristics from the rich and continuous spectral bands. This approach effectively overcomes the phenomena of the same object having different spectra and the same spectrum resulting from different objects. As such, this method allows for better distinguishing of different forest types, particularly in areas with complex forest structures and high forest heterogeneity (Cochrane Citation2000). Currently, hyperspectral data have been widely used in image segmentation (Huang and Zhang Citation2008), anomaly detection (Li, Bioucas-Dias, and Plaza Citation2012), land cover mapping (Zhang, Zhang, and Du Citation2016), and tree species mapping in tropical, subtropical, and temperate regions (Jones, Coops, and Sharma Citation2010). Although hyperspectral data only provide horizontal information, the vertical characteristics of vegetation are equally important for tree species classification (Lim et al. Citation2003). Yao et al. (Citation2022) introduced LiDAR to integrate different information of the same object: the results showed that combining the hyperspectral data and LiDAR could improve classification accuracy. While the accuracy of the hyperspectral data classification was satisfactory, there is still the disadvantage of having too many bands, leading to large computational loads and great challenges in tree species classification. Therefore, it is necessary to explore efficient methods to reduce the calculation time.

To date, machine learning and deep learning methods have been commonly used for tree species classification using hyperspectral data. Various machine learning algorithms, such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbor (KNN), and Artificial Neural Networks (ANN), have been employed to achieve better classification results at different scales (Dalponte, Frizzera, and Gianelle Citation2019; Kandare et al. Citation2017; Modzelewska, Fassnacht, and Stereńczak Citation2020; Nevalainen et al. Citation2017). For example, Dalponte, Frizzera, and Gianelle (Citation2019) utilized SVM to identify nine tree species with an overall accuracy (OA) of 88%. Nevalainen et al. (Citation2017) employed RF and ANN for the identification of four tree species, achieving an OA of 95%. Averaged across species, these machine learning classifiers have demonstrated accuracies ranging from 63% to 98% when applied to four to 40 tree species, using tens to occasionally hundreds of trees per species for training (Fricker et al. Citation2019). However, it is important to note that machine learning methods heavily rely on labor and professional knowledge (Franklin and Ahmed Citation2018; Moisen et al. Citation2006). Consequently, these methods are not only time-consuming and labor-intensive but can also neglect the consideration of overall construction and spectral information. Moreover, the quality of the extracted characteristics can be directly affected by the knowledge level of the workers (Sothe et al. Citation2020). Therefore, in the context of large-scale remote sensing data with complex data structures, the automatic extraction of joint spatial-spectral information features becomes a key challenge.

Compared to machine learning, deep learning has better end-to-end learning capabilities and the ability to automatically extract high-level features by effectively integrating spatial-spectral information (LeCun, Bengio, and Hinton Citation2015). Convolutional Neural Network (CNN), one of the representative algorithms of deep learning, has been widely used in tree species classification (Guo et al. Citation2022; Hartling et al. Citation2019; Natesan, Armenakis, and Vepakomma Citation2020; Sun and Shi Citation2023; Zhang et al. Citation2021). For example, Sun et al. (Citation2019a) used AlexNet, VGG16, and ResNet50 to distinguish 18 tree species. Their results showed that VGG16 had the best performance with an OA of 73.25%, followed by ResNet50 with an OA of 72.93%. Yan, Jing, and Wang (Citation2021) identified six tree species after modifying the GoogLeNet: the OA and Kappa coefficient were 82.7% and 0.79, respectively. Sun et al. (Citation2019b) used a modified ResNet50 to identify seven tree species, achieving an OA of 89.64%. Zheng et al. (Citation2023) improved a coconut tree recognition method (COCODET) based on the Faster R-CNN model, achieving an average F1 score of 86.5%. An increasing number of scholars have been building deeper and more complex network architectures to enhance tree species classification accuracy (Cao and Zhang Citation2020; Chen et al. Citation2021). This is because network depth is a critical factor that influences feature extraction and image processing (He et al. Citation2016; Simonyan and Zisserman Citation2014). However, this can also increase the computational burden and affect the real-time performance of the network. To address these issues, lightweight models have begun to emerge. For instance, the MobileNet model (Howard et al. Citation2017) introduces Depthwise Separable Convolution (DSC), a technique that replaces traditional convolutional kernels with a more lightweight structure, thereby significantly reducing the parameter count of the model. This architecture has found applications in various fields including computer accelerator design (Bai, Zhao, and Huang Citation2018), ship detection (Zhang et al. Citation2019),plant disease classification (Kc et al. Citation2019), and hyperspectral image classification (Hu, Tian, and Ge Citation2023). This innovative architecture has provided a fresh perspective for the design of lightweight deep models. Hence, this study aimed to develop a model to not only improve the accuracy of tree species classification but also achieve higher computational efficiency.

The primary objective of this study was to investigate a more efficient and accurate model for classifying multiple tree species. To achieve this, we selected the Teakettle Experimental Forest (TEF) in the Southern Sierra Nevada Mountains, USA, as our research area due to its diverse tree species information. By combining the characteristics of DSC and Dilated Convolution (DC) in the DSC-DC model, we achieved the following objectives. (1) To explore a simple lightweight CNN model to improve tree classification accuracy and reduce computational time. (2) To not only compare the accuracy of DSC-DC with SVM and KNN but also assess its performance and efficiency against ResNet50, Inception-V3, 3D-CNN, and MSR-3DCNN.

2. Materials

2.1. Study area

This study utilized the dataset publicly available from Fricker et al. (Citation2019). The study area is the TEF, located in the southern Sierra Nevada Mountains (36°58′00″, 119°01′00″), as shown in (A). The TEF covers an area of approximately 40 hectares, with elevations ranging from 1,935 m to 2,630 m and an average annual rainfall of around 450 mm. The dominant forest type in the area is mixed conifer forest, accounting for approximately 65% of the total forest area, mainly distributed at elevations between 1,900 m and 2,300 m. The Teakettle Experimental Forest exhibits a rich and diverse tree species composition, including Pinus jeffreyi, Abies concolor, Abies magnifica, Pinus lambertiana, and Pinus contorta.

Figure 1. Location of the study area and sample distribution. (A) The location of the study area is represented by the red dot. (B) Overview map of the entire area covered in this study. (C) Zoomed view of hyperspectral data and labels in the red square area. (D) Zoomed view of the Canopy Height Model (CHM) image and labels in the red square area.

Figure 1. Location of the study area and sample distribution. (A) The location of the study area is represented by the red dot. (B) Overview map of the entire area covered in this study. (C) Zoomed view of hyperspectral data and labels in the red square area. (D) Zoomed view of the Canopy Height Model (CHM) image and labels in the red square area.

2.2. Remote sensing data

The airborne LiDAR and hyperspectral data were collected by the Airborne Observation Platform (AOP) of the National Ecological Observatory Network (NEON) using a DeHavilland DHC-6 Twin Otter aircraft in July 2017. The flight altitude was approximately 1,000 m. The hyperspectral data were acquired using the AVIRIS next-gen sensor with the parameters listed in . The LiDAR data, obtained simultaneously, were preprocessed to generate a Canopy Height Model (CHM). The CHM data were primarily used for assisting in the creation of labeled data and sample validation in subsequent research. The original dataset spans approximately 16 km in length and 1 km in width. However, we only selected a portion of the imagery for experiments, as shown in (B).

Table 1. Parameters of the hyperspectral data.

2.3. Field data

The field data were collected by Fricker et al. (Citation2019) in September 2017. They used a high-precision Global Positioning System (GPS) to obtain the individual tree locations and recorded detailed information such as tree species, the diameter at breast height, and the status of being dead or alive. To facilitate the identification of tree species from the CHM, isolated trees with large canopies in the canopy layer and smaller trees with distinct separation from surrounding trees were selected as much as possible during the field survey.

2.4. Data annotation and labeling

To preprocess the hyperspectral data, atmospheric correction and orthorectification were performed, resulting in processed images. The LiDAR data were processed to generate a CHM. The CHM was used to manually digitize individual tree crowns from the TEF experimental area and strip data. This allowed the pixels within the tree crowns to receive species labels. Then, in the high-resolution aerial imagery, circular labels were created with the center point of each tree as the center, and polygons were used to label the species category. Each sample label only included the ‘pure pixel’ of the tree crown, excluding the background, soil, and other tree crowns. The seven types of tree species and one type of deadwood involved in this study are reported in . For detailed data preprocessing and label creation procedures, please refer to (Fricker et al. Citation2019).

Table 2. The scientific name, code, and number of each species.

3. Methods

3.1. Theoretical foundations

3.1.1. Depthwise Separable Convolution (DSC)

CNN is a deep learning model composed of convolutional layers, activation functions, pooling layers, and fully connected layers, among others (Lecun et al. Citation1998). Its core idea is to utilize convolutional layers and pooling layers to extract local features from images, then employ fully connected layers for higher-level abstraction and classification.

Currently, in the research on hyperspectral image classification based on CNN, the main approach is to utilize 2D-CNN or 3D-CNN for hierarchical feature extraction (Feng et al. Citation2019). Consequently, most network architectures employ standard convolution operations. These involve the element-wise multiplication of filters with local regions of the input signal followed by the summation of all results to generate the output. As the network depth increases, computational and parameter complexity significantly increase, particularly when applied to hyperspectral data with multiple channels.

In our research on hyperspectral data classification, our goal is not only to improve classification accuracy by combining spatial-spectral information for feature extraction but also to reduce time costs. As illustrated in , the DSC proposed in the MobileNet model (Howard et al. Citation2017) addresses these challenges effectively. DSC decomposes the convolution operation into two steps: depthwise convolution and pointwise convolution. Firstly, independent convolution kernels are applied to each channel of the input feature map, extracting channel-specific features. Then, a 1 × 1 convolution kernel is used for pointwise convolution to linearly combine and fuse features from different channels, resulting in the final output feature map.

Figure 2. Schematic of depthwise separable convolution.

Figure 2. Schematic of depthwise separable convolution.

The principle is as follows: assuming the input feature map has dimensions (DI, DI, M), the convolution kernel has dimensions (N, DK, DK, M) and the output feature map has dimensions (DO, DO, N). The computation complexity of a standard convolution is (DO×DO×DK×DK×N×M). By contrast, the computation complexity of DSC is (DO×DO×DK×DK×M+DO×1×1×M×N). The ratio of their computation complexities can be expressed as shown in Equation (1). (1) DO×DO×DK×DK×M+DO×DO×M×NDO×DO×DK×DK×N×M=1N+1DK×DK.(1) It can be observed that as the number of output feature maps N increases, DSC has fewer parameters and computations than a standard convolution and involves fewer kernel movements. This indicates that DSC can significantly reduce the parameter count of the model and the computational load. This advantage is particularly valuable in scenarios with limited computational resources, such as deep learning applications on mobile and embedded devices. Therefore, DSC can be significantly more advantageous in such scenarios.

3.1.2. Dilated Convolution (DC)

When performing classification tasks, enlarging the receptive field of a model can help it better understand the structure and semantics of objects, capturing more global features and improving classification accuracy (Peng et al. Citation2017). In CNN structures, one direct approach is to use larger convolution kernels to increase the receptive field. However, using larger convolution kernels can also bring some challenges. Since each convolution kernel needs to learn a set of weight parameters, as the network depth increases, the number of parameters and the computation time also increase, significantly affecting the computational efficiency of the model. This poses a challenge for practical applications of the model.

To address these issues, DC (Yu and Koltun Citation2015) was introduced. DC primarily enlarges the distance between the convolution kernel and the input by adding gaps or holes in the convolution kernel (). Compared to traditional convolution, DC can systematically aggregate multi-scale contextual information without losing resolution or increasing the parameters or computations. This leads to improved perception and feature extraction accuracy. Moreover, it effectively expands the receptive field, enhancing model performance and generalization capability, and reducing the risk of overfitting. The computation method of dilated convolution is illustrated in equation (2). (2) Fi=Fi1+(ki1)×ri×n=1i1sn.(2) Here, Fi represents the receptive field of the convolution kernel in the i-th convolutional layer, ki denotes the size of the convolution kernel, ri is the dilation factor, Fi1 represents the receptive field of the convolution kernel in the (i1)-th convolutional layer, and sn is the stride of the convolution.

Figure 3. Schematic of dilated convolution.

Figure 3. Schematic of dilated convolution.

3.2. DSC-DC model architecture

In this study, a lightweight convolutional neural network was designed. The proposed model utilizes DSC to reduce the number of parameters and computational complexity while maintaining good feature extraction capability. DC is employed to expand the receptive field and enhance the understanding of image structure. The proposed model, named DSC-DC CNN, aims to meet the requirements for both model performance and computational efficiency.

The convolutional layer, the core component of CNN, is a crucial factor in determining the performance of the model. In our study, we took inspiration from the Xception (Chollet Citation2017) and MobileNet (Howard et al. Citation2017) model designs. However, instead of using DSC directly, we opted for a combination of separable convolutions and depthwise convolutions in an alternating manner. Additionally, we incorporated DC operations in some of the convolutional layers to enhance the understanding of image structures.

In determining the size and number of the convolutional kernels, we mainly referred to successful CNN network models (Sandler et al. Citation2018; Szegedy et al. Citation2017; Tan and Le Citation2019; Zhang et al. Citation2018). We set the kernel size to 3 × 3, following the principle that the number of kernels in the next layer is twice the number in the previous layer. With this design approach, our model can improve performance while reducing the number of parameters, thereby enhancing efficiency.

provides a visual depiction of the model architecture; the algorithm flow and parameters are shown in . Our DSC-DC model differs from typical image-level classification models: inspired by previous research (Dyrmann et al. Citation2016; Fawakherji et al. Citation2020; Zhang, Zhao, and Zhang Citation2020), we adopted a pixel-level classification approach. The model extracts neighborhood cubes centered around pixels as inputs for its classification process. The model includes an input layer, seven convolutional layers, two pooling layers, one flatten layer, two dense layers, one dropout layer, and one output layer. The specific workflow is as follows.

Figure 4. Schematic diagrams of the DSC-DC model architecture.

Figure 4. Schematic diagrams of the DSC-DC model architecture.

Table 3. Algorithm flow and parameter settings for the DSC-DC model.

The input hyperspectral image (HSI) cube data for the model are represented as M ∈ S × S × L, where S denotes the spatial dimensions of the input (width and height) and L denotes the number of channels. Principal Component Analysis (PCA) is applied to reduce the spectral redundancy and dimensionality of M (Abdi and Williams Citation2010). The result of the dimensionality reduction is an HSI cube M′ ∈ S × S × L′; in this study, S is 13 and L′ is 30. Therefore, the model receives spatial-spectral cubes of size M′ ∈ 13 × 13 × 30 and their corresponding labels ι as input data through the input layer.

Next, the M′ ∈ 13 × 13 × 30 undergoes feature extraction through a model that includes three separable convolutional layers (Layer 1, Layer 3, Layer 5, and Layer 8) and four depthwise convolutional layers (Layer 2, Layer 4, and Layer 7). Specifically, Layer 3 and Layer 7 employ dilated convolutions to expand the receptive field with the dilation rates set to 2 and 4, respectively. MaxPooling layers are applied in Layer 6 and Layer 9 to downsample the feature maps and reduce their spatial dimensions. Subsequently, a fully connected layer is used to flatten the downsampled feature maps into a one-dimensional vector. The ReLU activation function (Nair and Hinton Citation2010) is employed in all convolutional and dense layers to introduce non-linear transformations for extracting complex and rich features. To mitigate the risk of overfitting, a Dropout layer (Srivastava et al. Citation2014) with a dropout rate of 0.4 is introduced between the dense layers. Finally, the output layer with a softmax activation function performs multi-class classification and provides the probabilities for each class.

3.3. Experimental settings

After determining the network structure, we proceeded with the configuration and optimization of the model training parameters. A batch size of 32 was chosen, and the number of epochs was set to 100. Different window sizes were experimented with for model training, including 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15. Through experimentation and optimization, it was determined that the optimal window size was 13 × 13. Concerning the optimizer selection, a comparison was made between the SGD and Adam optimizers; the outcome revealed that better performance was exhibited by the Adam optimizer. The crucial role in model training played by the learning rate was highlighted; through a grid search among options including 0.001, 0.003, 0.005, 0.0001, 0.0003, and 0.0005, the best learning rate was identified as 0.0001.

Based on previous experimental experience, dataset size, and tree species classification tasks, and to ensure data independence between the test set and the training set for accurate model performance evaluation, this study randomly divided the training samples, validation samples, and test samples, ensuring that each data point was assigned to only one set to avoid overlaps. A total of 80% of the samples were allocated to the training and validation sets in a 1:1 ratio, while the remaining 20% were reserved for testing. The sample sizes for the training set, validation set, and test set were 41,448, 41,448, and 20,725, respectively. This allocation ratio ensured an adequate number of training and validation samples for model optimization while retaining a relatively large test set for the final comprehensive performance evaluation.

3.4. Comparative experiment

To evaluate the classification performance and feasibility of the DSC-DC model, we compared it with four classical deep learning models, Inception-V3, ResNet50, 3D-CNN (Zhang, Zhao, and Zhang Citation2020), and MSR-3DCNN (Xu et al. Citation2021), as well as two machine learning models, KNN and SVM. For the experiment, all models were implemented using Python 3.6. KNN and SVM were based on the scikit-learn framework, while all deep learning models were based on the TensorFlow and Keras open-source deep learning frameworks. The hardware configuration of the operating platform included an NVIDIA RTX A4000 GPU and an Intel(R) Xeon(R) Silver 4216 CPU @ 2.10 GHz. The training time for the deep learning methods refers to the time required to train 100 epochs. The experimental flow is shown in .

Figure 5. Flowchart of the experiment.

Figure 5. Flowchart of the experiment.

To provide a comprehensive evaluation of the classification accuracy, we adopted the overall accuracy (OA), average accuracy (AA), and Kappa coefficient as quantitative evaluation metrics for each model. OA represents the ratio of correctly classified pixels to the total number of pixels in all classes. AA indicates the average accuracy for each class. The Kappa coefficient measures the agreement between the predicted and ground truth classifications, taking into account the agreement that could occur by chance. In addition, precision, recall, and the F1 score were used as evaluation metrics to assess the individual tree species. The higher the precision, recall, OA, and F1 score, the closer the predicted value is to the true value. The equations are listed as follows: (3) OA=1Ni=1nxii(3) (4) AA=1nn=1nrecalli(4) (5) K=Ni=1nxiii=1n(xi+×x+1)N(5) (6) precision=TPTP+FP(6) (7) recall=TPTP+FN(7) (8) F1score=2×precision×recallprecision+recall,(8) where N is the total number of samples, n is the number of categories, xii is the number of correct predictions in category i, recalli is the recall of category i, xi+ is the number of true values in category i, and x+1 is the number of predicted values in category i. TP means that both the true and predicted categories are positive; FP means that the true category is negative while the predicted category is positive; and FN means that the true category is positive while the predicted category is negative.

3.5. Model reliability and applicability verification

To further validate the reliability and applicability of the DSC-DC model, we employed a publicly available hyperspectral dataset provided by Zhang, Zhao, and Zhang (Citation2020) for performance evaluation. The dataset originates from the Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China. For more detailed information about the data, please refer to Zhang, Zhao, and Zhang (Citation2020).

In this experiment, the hyperparameter settings were kept consistent with the literature. Specifically, a batch size of 64 was employed, the number of epochs was fixed at 300, and an 11 × 11 window size was utilized, with the optimal learning rate being set to 0.0001. Furthermore, in actual forestry surveys, the acquisition of data and the initial labeling work are deemed crucial for model training. An increased demand for training samples by the model could result in a heightened workload for forestry surveys. Conversely, a limited number of samples might reduce model accuracy. Therefore, to investigate the impact of sample size on the accuracy of the DSC-DC model, this study adhered to the principle of random partitioning. The training, validation, and test sets were randomly selected from each type of sample using the following three partitioning methods. (1) 2.5% of the samples were utilized for the training set, another 2.5% for the validation set, and the remaining 95% for the test set (2.5:2.5:95). (2) 10% of the samples were assigned to the training set, another 10% to the validation set, and the remaining 80% to the test set (10:10:80). (3) 40% of the samples were allocated to the training set, another 40% to the validation set, and the remaining 20% to the test set (40:40:20). The number of samples obtained using the three partitioning methods is shown in .

Table 4. Summary of training, validation, and testing samples under different partitioning methods.

4. Results

4.1. Impact of different optimizers on loss reduction

illustrates the impact of employing different optimizers on the reduction of the loss function. The results demonstrate that when using the Adam optimizer, the loss value decreased more significantly and the model converged faster in comparison to the SGD optimizer when both were trained for the same number of epochs. This observation suggests that the Adam optimizer was more effective in optimizing the model parameters, facilitating the attainment of optimal performance.

Figure 6. Impact of different optimizers on loss reduction. (A) Adam optimizer. (B) SGD optimizer.

Figure 6. Impact of different optimizers on loss reduction. (A) Adam optimizer. (B) SGD optimizer.

4.2. Effect of the dropout rate on model classification performance

To evaluate the impact of different dropout rates on the classification performance of the model, the model was tested with dropout rates of 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6. The results, which are shown in , indicate that the highest OA and Kappa coefficient values were achieved when setting the dropout rate to 0.6. However, the AA was 0.41% lower compared to the dropout rate of 0.4. When the dropout rate was set to 0.4, the OA, AA, and Kappa coefficient values were higher, and the difference between them was minimal. Therefore, the final decision was made to set the dropout rate to 0.4.

Figure 7. Effect of the dropout rate on the overall accuracy (OA), average accuracy (AA), and Kappa coefficient, where K denotes the value of the Kappa coefficient × 100.

Figure 7. Effect of the dropout rate on the overall accuracy (OA), average accuracy (AA), and Kappa coefficient, where K denotes the value of the Kappa coefficient × 100.

4.3. Effect of the input size on model classification performance

To assess the impact of the input size on the classification performance of the model, window sizes of 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15 were tested in this study. The impact of different input sizes on classification accuracy is illustrated in . It was observed that as the window size increased, the accuracy gradually improved. The highest OA and Kappa coefficient were achieved when the window size was set to 13 × 13. However, when the window size was further increased to 15 × 15, there was a slight decrease in the AA and Kappa coefficient, with the OA being only 0.01% higher compared to the 13 × 13 setting.

Figure 8. Effect of the input size on model classification performance, where K denotes the value of the Kappa coefficient × 100.

Figure 8. Effect of the input size on model classification performance, where K denotes the value of the Kappa coefficient × 100.

By plotting the confusion matrixes (), further analysis revealed that when the window size was set to 5 × 5 or 7 × 7, noticeable misclassifications occurred, particularly for Abies concolor, Calocedrus decurrens, and dead trees. When the window size was set to 9 × 9, only Quercus kelloggii, Pinus contorta, and dead trees showed no misclassifications, whereas the remaining tree species still exhibited significant misclassification. As the window size increased to 11 × 11, apart from dead trees, the misclassification of tree species improved to some extent. When the window size was set to 13 × 13, the classification performance of various tree species significantly improved and most species were correctly identified, with only a few instances of misclassification for certain species. However, the classification performance slightly declined when the window size was increased to 15 × 15 compared to the 13 × 13 window size. Considering the OA, AA, Kappa coefficient, and the analysis of the confusion matrixes, this study ultimately selected the 13 × 13 window size.

Figure 9. Confusion matrixes of different input sizes, where A, B, C, D, E, F, G, H, and I represent Abies concolor, Abies magnifica, Calocedrus decurrens, Pinus jeffreyi, Pinus lambertiana, Quercus kelloggii, Pinus contorta, dead trees, and land, respectively.

Figure 9. Confusion matrixes of different input sizes, where A, B, C, D, E, F, G, H, and I represent Abies concolor, Abies magnifica, Calocedrus decurrens, Pinus jeffreyi, Pinus lambertiana, Quercus kelloggii, Pinus contorta, dead trees, and land, respectively.

4.4. Classification results of tree species under optimal parameters

We trained the samples with the optimal experimental parameters determined above; the results are shown in . Among all tree species, Abies magnifica, Pinus lambertiana, Quercus kelloggii, and Pinus contorta achieved the overall best performance, with no misclassifications observed. Calocedrus decurrens and Pinus jeffreyi achieved precision, recall, and F1 score values above 0.99, with only two misclassifications. However, Abies concolor had four misclassifications. Only one misclassification occurred for dead trees. The statistical data indicate that precision, recall, and the F1 score are all above 0.97, demonstrating that our constructed model performs well in classifying this dataset.

Table 5. Classification results of the DSC-DC model with optimal parameters.

4.5. Comparison of different model results

To further evaluate the classification performance of our model, we compared it with four classic deep learning models, Inception-V3, ResNet50, MSR-3DCNN, and 3D-CNN, as well as two machine learning models, KNN and SVM. The quantitative evaluation metrics and time consumption results for different classification methods are presented in ; the prediction results are shown in .

Figure 10. Predicted results of tree species classification for different classification methods, where (A), (B), (C), (D), (E), (F), (G), and (H) represent the study area, KNN, SVM, Inception-V3, ResNet50, MSR-3DCNN, 3D-CNN, and DSC-DC, respectively.

Figure 10. Predicted results of tree species classification for different classification methods, where (A), (B), (C), (D), (E), (F), (G), and (H) represent the study area, KNN, SVM, Inception-V3, ResNet50, MSR-3DCNN, 3D-CNN, and DSC-DC, respectively.

Table 6. Accuracy comparison of different classification methods.

The results of the comparison with machine learning models were consistent with our expectations. The DSC-DC model exhibited significantly higher OA, AA, and Kappa coefficient compared to KNN and SVM. Specifically, in comparison to SVM, the DSC-DC model improved the OA, AA, and Kappa coefficient by 3.88%, 30.71%, and 0.1566, respectively. Similarly, compared to KNN, the DSC-DC model achieved improvements of 7.55%, 34.09%, and 0.2974 for the OA, AA, and Kappa coefficient, respectively. Upon analyzing the predicted images, we observed that KNN and SVM exhibited notable mixed pixel phenomena, whereas our model was able to alleviate this issue to some extent.

When compared to the deep learning models, it was evident that the DSC-DC model possesses distinct advantages in terms of classification accuracy. In contrast to the Inception-V3 model, the DSC-DC model improved the OA, AA, and Kappa coefficient by 0.31%, 0.81%, and 0.0008, respectively. In comparison to the ResNet50 model, the DSC-DC model achieved comparable classification accuracy, with the OA and Kappa coefficient reaching the same accuracy of 99.83% and 0.9996, respectively, while the AA was only 0.08% higher compared to the ResNet50 model. In addition, compared to the MSR-3DCNN model, the DSC-DC model exhibited a lower OA by 0.05% and a lower Kappa coefficient by 0.0001, but a higher AA by 3.64%. Lastly, in comparison to the 3D-CNN model, the differences in the OA, AA, and Kappa coefficient were generally small.

Upon closer examination of the prediction detail figure (), it was evident that the accuracy of boundary extraction for tree species classification was higher in the DSC-DC model compared to the KNN and SVM models. Furthermore, the DSC-DC model demonstrated notable proficiency in recognizing short, young trees, a task where the otheran aspect where other models exhibited limitations.

Figure 11. Predicted details of tree species using different classification methods, where (A), (B), (C), (D), (E), (F), (G), and (H) denote the study area, KNN, SVM, Inception-V3, ResNet50, MSR-3DCNN, 3D-CNN, and DSC-DC, respectively.

Figure 11. Predicted details of tree species using different classification methods, where (A), (B), (C), (D), (E), (F), (G), and (H) denote the study area, KNN, SVM, Inception-V3, ResNet50, MSR-3DCNN, 3D-CNN, and DSC-DC, respectively.

In addition to classification accuracy, the training time of the models was also compared. Under the same number of epochs, the training time of the MSR-3DCNN model was approximately 34 times longer than that of the DSC-DC model. The training time of the ResNet50 model was approximately five times longer than that of the DSC-DC model, while the training time of the Inception-V3 model was approximately 3.5 times longer than that of the DSC-DC model. In terms of prediction time, the MSR-3DCNN model took about 17 times longer than the DSC-DC model. The ResNet50 model took approximately three times longer than the DSC-DC model, while the Inception-V3 model took approximately twice as long as the DSC-DC model. Interestingly, only the 3D-CNN model exhibited comparable training and prediction times to the DSC-DC model.

We plotted the confusion matrixes to further analyze the classification performance of the DSC-DC, Inception-V3, ResNet50, MSR-3DCNN, and 3D-CNN models (). Our model outperformed the other networks in most tree species classifications, except for Abies concolor and Pinus jeffreyi, where its performance was lower than that of other models. Additionally, the performance for Calocedrus decurrens was inferior to that of MSR-3DCNN and 3D-CNN. However, Abies magnifica, Pinus lambertiana, Quercus kelloggii, and Pinus contorta were all correctly predicted. Due to the small number of samples, the ResNet50 model exhibited severe misclassification for Pinus contorta. Surprisingly, the MSR-3DCNN model exhibited excellent recognition performance for most tree species; however, it experienced significant misclassification issues for two tree species with limited sample sizes. By contrast, our model achieved robust classification results even with a small number of samples for Quercus kelloggii and Pinus contorta. These results demonstrate that the designed DSC-DC model not only slightly outperformed other classic deep learning models in terms of classification accuracy but also had significant advantages in reducing the computational burden, thereby greatly reducing time costs.

Figure 12. Confusion matrixes of different classification methods, where A, B, C, D, E, F, G, H, and I represent Abies concolor, Abies magnifica, Calocedrus decurrens, Pinus jeffreyi, Pinus lambertiana, Quercus kelloggii, Pinus contorta, dead trees, and land, respectively.

Figure 12. Confusion matrixes of different classification methods, where A, B, C, D, E, F, G, H, and I represent Abies concolor, Abies magnifica, Calocedrus decurrens, Pinus jeffreyi, Pinus lambertiana, Quercus kelloggii, Pinus contorta, dead trees, and land, respectively.

4.6. Results of the DSC-DC model reliability and applicability validation

To further validate the reliability and applicability of our model, the DSC-DC model was employed to evaluate a publicly accessible hyperspectral dataset from the Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China. The accuracy comparison results for various dataset partitioning methods are presented in . The comparative analysis revealed that as the training sample size increased, the performance metrics improved. When the training, validation, and test sets were partitioned as 40%, 40%, and 20%, respectively, the OA, AA, and Kappa coefficient increased to 99.12%, 99.54%, and 0.9950, respectively. This indicates that the DSC-DC model exhibits a certain level of dependency on the sample size. Upon analyzing the training and prediction times, it was observed that when the training sample size is smaller, the model consumes less time, resulting in higher efficiency.

Table 7. Comparison of training accuracy for the DSC-DC model across different dataset partitioning schemes (Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China).

The prediction results of the DSC-DC model under different dataset partitioning schemes are displayed in . A comparative analysis revealed that when the training sample size was small, the model exhibited lower predictive performance for categories with insufficient samples, particularly in the case of building land. However, as the sample size gradually increased, the model achieved optimal performance, even for categories with limited labeled samples. Consequently, the rational partitioning of the dataset is crucial for the model to attain the best training results; striking a balance between efficiency and accuracy is particularly important for effective forest management.

Figure 13. Prediction results of the DSC-DC Model under various dataset partitioning schemes. (A) Ground truth of the Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China. (B) Training, validation, and test sets partitioned as 2.5%, 2.5%, and 95%, respectively. (C) Training, validation, and test sets partitioned as 10%, 10%, and 80%, respectively. (D) Training, validation, and test sets partitioned as 40%, 40%, and 20%, respectively.

Figure 13. Prediction results of the DSC-DC Model under various dataset partitioning schemes. (A) Ground truth of the Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China. (B) Training, validation, and test sets partitioned as 2.5%, 2.5%, and 95%, respectively. (C) Training, validation, and test sets partitioned as 10%, 10%, and 80%, respectively. (D) Training, validation, and test sets partitioned as 40%, 40%, and 20%, respectively.

The 3DCNN model, proposed by Zhang, Zhao, and Zhang (Citation2020), has demonstrated both high accuracy and efficiency. When the training set, validation set, and test sets were divided into 2.5%, 2.5%, and 95%, respectively, the model’s OA was approximately 1% higher than that of the DSC-DC model. However, the training and prediction times for the 3DCNN model were roughly 50 s longer compared to the DSC-DC model. Additionally, building land was not identified by the 3DCNN model due to the smaller sample size, whereas the DSC-DC model still achieved a recognition accuracy of 67.69%. When the training set, validation set, and test sets were divided into 10%, 10%, and 80%, respectively, the 3DCNN model achieved an OA of 98.38%, which was approximately 2% higher than the DSC-DC model. However, the training time for the 3DCNN model was 61.5 min, approximately double that of the DSC-DC model. These experimental results highlight that the DSC-DC model can efficiently perform tree species classification tasks while maintaining high accuracy. Furthermore, these findings underscore the universality and potential of the DSC-DC model proposed in this study, suggesting that it can be applied across various domains and tasks rather than being limited to specific applications.

5. Discussion

5.1. Effect of different experimental parameter settings on classification accuracy

Setting an appropriate dropout rate can enhance the stability and robustness of the model, thereby reducing the risk of overfitting. In this study, the dropout rate was tested at values of 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6; the optimal dropout rate was found to be 0.4. The results indicated that classification accuracy is influenced by the dropout rate, and there exists a relationship between the dropout rate and accuracy. Higher or lower dropout rates can lead to a decrease in accuracy, while adjusting the dropout rate appropriately can improve both accuracy and the generalization ability while reducing overfitting, which is consistent with previous research (Hong et al. Citation2022; Srivastava et al. Citation2014). However, it should be noted that the dropout rate may produce different results depending on the specific task and dataset.

To investigate the impact of window size on model training performance, this study tested window sizes of 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15 as inputs to the model. Ultimately, the optimal window size was determined to be 13 × 13. The results of this experiment, as well as the findings of Zhang, Zhao, and Zhang (Citation2020), demonstrate that appropriate window size is crucial for optimizing model performance and training efficiency. A window that is too large may increase computational complexity, the number of parameters, gradient issues, and information redundancy. On the other hand, a window that is too small may overlook important global features, leading to a decrease in accuracy due to the loss of high-frequency information.

With optimal parameters and a sample split of 40% for the training set, 40% for the validation set, and 20% for the test set, the DSC-DC model consistently attained exceptional performance, exceeding 99% accuracy in both datasets for OA and AA. Moreover, the model exhibited a Kappa coefficient surpassing 0.99. These results underscore the significance of sample partitioning in model training. It became evident that the accuracy of the model improved with an increased number of training samples. Nevertheless, it is imperative to strike a balance since an excessive number of training samples also increases training times. Consequently, the judicious division of samples holds paramount importance in the context of practical applications in forestry.

5.2. Comparative analysis of different classification models

The DSC-DC model demonstrated clear advantages over machine learning models such as KNN and SVM. This can be attributed to several factors. On the one hand, it is possible that the designed model structure fully leverages the rich spatial-spectral features present in hyperspectral imagery, leading to improved classification accuracy. On the other hand, other deep learning models also exhibit higher accuracy compared to KNN and SVM. This is likely because deep learning models inherently possess stronger feature learning and representation capabilities, particularly when applied to hyperspectral data with many spectral bands. This advantage becomes even more pronounced, as supported by several studies (Mao et al. Citation2023; Mäyrä et al. Citation2021; Raczko and Zagajewski Citation2017). Finally, compared to DSC-DC, the impact of imbalanced sample sizes on SVM and KNN may be more significant as insufficient sample sizes can prevent SVM and KNN from achieving optimal training.

Additionally, the DSC-DC model significantly reduced the computational time while improving accuracy compared to three mainstream deep learning algorithms: ResNet50, Inception-V3, and MSR-3DCNN. This research highlights the importance of utilizing DSC instead of the conventional 2D convolutional layers used in many models. This substitution saved a significant number of parameters and accelerated the training speed. Previous studies (Chollet Citation2017; Howard et al. Citation2017) have shown that this convolutional approach has fewer parameters and computational requirements without sacrificing the accuracy of experimental results; this study supports those findings.

Comparing our model with ResNet50, we found little difference in terms of accuracy. This may be because ResNet50 combines residual connections and hierarchical feature learning, allowing it to effectively learn more complex features. Previous studies have demonstrated the strong performance of ResNet50 across various tasks (He et al. Citation2016). However, for the smaller sample sizes of Pinus contorta and Quercus kelloggii, our model still achieved high classification accuracy, whereas ResNet50 showed a decline in classification performance, especially for Pinus contorta. The DSC-DC model showed only a slight improvement in accuracy and efficiency compared to the 3D-CNN model. This is attributed to the evident advantages of the 3D-CNN model in handling spatial information: 3D-CNN has consistently demonstrated high accuracy and efficiency in tree species classification studies (Mäyrä et al. Citation2021; Nezami et al. Citation2020; Zhang, Zhao, and Zhang Citation2020).

5.3. Advantages and disadvantages of the DSC-DC model

Before finalizing the model, we attempted to construct the model by alternating DSC and 2D convolutional layers along with the inclusion of DC. However, the results were not satisfactory. Through multiple experiments, we discovered that the alternating use of separable convolutions and depthwise convolutions significantly improved the generalization ability and classification performance of the model. Firstly, separable convolutions can learn channel-specific features, while depthwise convolutions capture detailed information about channel features. The alternating use of these two operations allows for the extraction of multi-scale and multi-level features while reducing the number of parameters and improving computational efficiency. Additionally, we incorporated DC in the 3rd and 6th convolutional layers, which effectively improved classification accuracy and visual effects without compromising resolution or coverage (Yu and Koltun Citation2015). This model architecture not only enhances classification accuracy but also accelerates computational efficiency, providing a valuable reference for practical applications in forestry.

The DSC-DC model can be considered for other high-dimensional spectral image classification tasks in various fields, such as agriculture, environmental monitoring, and medical imaging. These applications can benefit from the research and improvements made to the DSC-DC model, which aims to enhance classification accuracy, reduce time costs, and accelerate model training and prediction processes, thereby providing reliable and efficient solutions.

Although the DSC-DC model achieves better results compared to other methods, we recognize that the current neural network architecture may not be optimal. Therefore, in future research, we aim to improve the performance and generalization ability of the model. The following methods and techniques can be explored in subsequent tree species classification studies.

Firstly, the integration of residual networks, attention mechanisms, and graph convolutional networks may enhance the depth and breadth of the model, thereby improving feature extraction and representation and increasing classification accuracy. Secondly, due to factors such as location, seasonality, and other conditions, forestry data may exhibit performance differences across different datasets. By applying domain adaptation and transfer learning methods, knowledge transfer can be performed to enhance the generalization ability of the model. Thirdly, in addition to hyperspectral images, other types of data such as infrared images and geographical spatial information can be incorporated. Through multimodal learning and cross-modal learning, diverse features can be extracted to enhance classification performance. Finally, self-supervised learning and weakly supervised learning strategies can leverage unsupervised or partially labeled data, enabling the model to learn meaningful feature representations, particularly in cases where dataset annotation is challenging or expensive. By exploring these avenues and continuously improving the DSC-DC model, we expect to enhance the accuracy and efficiency of tree species classification, providing significant advantages for classification tasks in forestry and other fields in the future.

6. Conclusion

In this study, we presented DSC-DC, a novel Convolutional Neural Network (CNN) model based on DSC and DC for tree species classification using hyperspectral remote sensing images. The evaluation of our model was carried out using the Teakettle Experimental Forest (TEF) dataset (USA). The results showed an impressive overall accuracy (OA) of 99.83%, average accuracy (AA) of 99.64%, and Kappa coefficient of 0.9996. Compared to machine learning models such as KNN and SVM, the DSC-DC model outperformed them significantly in terms of classification performance. Compared to well-established deep learning models like Inception-V3, ResNet50, MSR-3DCNN, and 3D-CNN, the DSC-DC model exhibited not only a certain degree of superior classification accuracy but also advantages in training and prediction time efficiency. Overall, the results show that the DSC-DC model significantly improved the accuracy and computational efficiency for multi-species classification. Additionally, we evaluated the DSC-DC model using a publicly accessible hyperspectral dataset obtained from the Jiepai branch of the Gaofeng State Owned Forest Farm in the Guangxi province, China. This assessment provided additional evidence of the reliability and applicability of the DSC-DC model.

These findings demonstrate that our model exhibits high accuracy and feasibility in addressing the challenges of multi-class classification with limited samples. By reducing the time required for model training and prediction, DSC-DC has the potential to accelerate model design, parameter optimization, experimentation, and application deployment. This improved efficiency can then lead to increased productivity and resource savings.

Acknowledgments

We would like to acknowledge all the people who have contributed to this paper. The authors would like to express their gratitude to Cambridge Proofreading (https://proofreading.org/) for the expert linguistic services provided.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The original contributions presented in this study are included in the article/supplementary material; further inquiries can be directed to the corresponding author.

Additional information

Funding

This research was jointly supported by the Key Research and Development Program of Yunnan Province, China (No. 202303AC100009), and the Ten Thousand Talent Plans for Young Top-notch Talents of Yunnan Province (No. YNWR-QNBJ-2018-184).

References

  • Abdi, H., and L. J. Williams. 2010. “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (4): 433–459. https://doi.org/10.1002/wics.101.
  • Asner, G. P. 1998. “Biophysical and Biochemical Sources of Variability in Canopy Reflectance.” Remote Sensing of Environment 64 (3): 234–253. https://doi.org/10.1016/S0034-4257(98)00014-5.
  • Asner, G. P., S. L. Ustin, P. A. Townsend, R. E. Martin, and K. D. Chadwick. 2015. “Forest Biophysical and Biochemical Properties from Hyperspectral and LiDAR Remote Sensing.” In Book Forest Biophysical and Biochemical Properties from Hyperspectral and LiDAR Remote Sensing, 429–448.
  • Bai, L., Y. Zhao, and X. Huang. 2018. “A CNN Accelerator on FPGA Using Depthwise Separable Convolution.” IEEE Transactions on Circuits and Systems II: Express Briefs 65 (10): 1415–1419. https://doi.org/10.1109/TCSII.2018.2865896.
  • Banskota, A., R. H. Wynne, and N. Kayastha. 2011. “Improving Within-genus Tree Species Discrimination Using the Discrete Wavelet Transform Applied to Airborne Hyperspectral Data.” International Journal of Remote Sensing 32 (13): 3551–3563. https://doi.org/10.1080/01431161003698302.
  • Boonprong, S., C. X. Cao, W. Chen, and S. N. Bao. 2018. “Random Forest Variable Importance Spectral Indices Scheme for Burnt Forest Recovery Monitoring—Multilevel RF-VIMP.” Remote Sensing 10 (6): 807. https://doi.org/10.3390/rs10060807.
  • Cao, K. L., and X. L. Zhang. 2020. “An Improved Res-UNet Model for Tree Species Classification Using Airborne High-resolution Images.” Remote Sensing 12 (7): 1128. https://doi.org/10.3390/rs12071128.
  • Chen, I.-C., J. K. Hill, R. Ohlemüller, D. B. Roy, and C. D. Thomas. 2011. “Rapid Range Shifts of Species Associated with High Levels of Climate Warming.” Science 333 (6045): 1024–1026. https://doi.org/10.1126/science.1206432.
  • Chen, C. Y., L. H. Jing, H. Li, and Y. W. Tang. 2021. “A New Individual Tree Species Classification Method Based on the ResU-Net Model.” Forests 12 (9): 1202. https://doi.org/10.3390/f12091202.
  • Chollet, F. 2017. “Xception: Deep Learning with Depthwise Separable Convolutions.” 2017 IEEE conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26.
  • Clark, M. L., D. A. Roberts, and D. B. Clark. 2005. “Hyperspectral Discrimination of Tropical Rain Forest Tree Species at Leaf to Crown Scales.” Remote Sensing of Environment 96 (3–4): 375–398. https://doi.org/10.1016/j.rse.2005.03.009.
  • Cochrane, M. A. 2000. “Using Vegetation Reflectance Variability for Species Level Classification of Hyperspectral Data.” International Journal of Remote Sensing 21 (10): 2075–2087. https://doi.org/10.1080/01431160050021303.
  • Dalponte, M., L. Frizzera, and D. Gianelle. 2019. “Individual Tree Crown Delineation and Tree Species Classification with Hyperspectral and LiDAR Data.” PeerJ 6:e6227. https://doi.org/10.7717/peerj.6227
  • Dyrmann, M., A. K. Mortensen, H. S. Midtiby, and R. N. Jørgensen. 2016. “Pixel-wise Classification of Weeds and Crops in Images by Using a Fully Convolutional Neural Network.” International Conference on Agricultural Engineering, Aarhus, Denmark, June 26–29.
  • Fawakherji, M., A. Youssef, D. D. Bloisi, A. Pretto, and D. Nardi. 2020. “Crop and Weed Classication Using Pixel-wise Segmentation on Ground and Aerial Images.” International Journal of Robotic Computing 2 (1): 39–57. https://doi.org/10.35708/RC1869-126258. https://hdl.handle.net/11573/1609132.
  • Feng, F., S. T. Wang, C. Y. Wang, and J. Zhang. 2019. “Learning Deep Hierarchical Spatial–Spectral Features for Hyperspectral Image Classification Based on Residual 3D-2D CNN.” Sensors 19 (23): 5276. https://doi.org/10.3390/s19235276.
  • Ferreira, M. P., F. H. Wagner, L. E. Aragão, Y. E. Shimabukuro, and C. R. de Souza Filho. 2019. “Tree Species Classification in Tropical Forests Using Visible to Shortwave Infrared WorldView-3 Images and Texture Analysis.” ISPRS Journal of Photogrammetry and Remote Sensing 149:119–131. https://doi.org/10.1016/j.isprsjprs.2019.01.019.
  • Franklin, S. E., and O. S. Ahmed. 2018. “Deciduous Tree Species Classification Using Object-based Analysis and Machine Learning with Unmanned Aerial Vehicle Multispectral Data.” International Journal of Remote Sensing 39 (15–16): 5236–5245. https://doi.org/10.1080/01431161.2017.1363442.
  • Fricker, G. A., J. D. Ventura, J. A. Wolf, M. P. North, F. W. Davis, and J. Franklin. 2019. “A Convolutional Neural Network Classifier Identifies Tree Species in Mixed-conifer Forest from Hyperspectral Imagery.” Remote Sensing 11 (19): 2326. https://doi.org/10.3390/rs11192326.
  • Guo, X. F., H. Li, L. H. Jing, and P. Wang. 2022. “Individual Tree Species Classification Based on Convolutional Neural Networks and Multitemporal High-resolution Remote Sensing Images.” Sensors 22 (9). https://doi.org/10.3390/s22093157.
  • Hartling, S., V. Sagan, P. Sidike, M. Maimaitijiang, and J. Carron. 2019. “Urban Tree Species Classification Using a WorldView-2/3 and LiDAR Data Fusion Approach and Deep Learning.” Sensors 19 (6): 1284. https://doi.org/10.3390/s19061284.
  • He, K., X. Zhang, S. Ren, and J. Sun. 2016. “Deep Residual Learning for Image Recognition.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 27–30.
  • Hong, Q. Q., X. Y. Zhong, W. T. Chen, Z. H. Zhang, B. Li, H. Sun, T. Yang, and C. Tan. 2022. “SATNet: A Spatial Attention Based Network for Hyperspectral Image Classification.” Remote Sensing 14 (22): 5902. https://doi.org/10.3390/rs14225902.
  • Hovi, A., L. Korhonen, J. Vauhkonen, and I. Korpela. 2016. “LiDAR Waveform Features for Tree Species Classification and their Sensitivity to Tree-and Acquisition Related Parameters.” Remote Sensing of Environment 173:224–237. https://doi.org/10.1016/j.rse.2015.08.019.
  • Howard, A. G., M. L. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. “Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26.
  • Hu, Y., S. Tian, and J. Ge. 2023. “Hybrid Convolutional Network Combining Multiscale 3D Depthwise Separable Convolution and CBAM Residual Dilated Convolution for Hyperspectral Image Classification.” Remote Sensing 15 (19): 4796. https://doi.org/10.3390/rs15194796.
  • Huang, X., and L. P. Zhang. 2008. “An Adaptive Mean-shift Analysis Approach for Object Extraction and Classification from Urban Hyperspectral Imagery.” IEEE Transactions on Geoscience Remote Sensing 46 (12): 4173–4185. https://doi.org/10.1109/TGRS.2008.2002577.
  • Iverson, L. R., and D. McKenzie. 2013. “Tree-species Range Shifts in a Changing Climate: Detecting, Modeling, Assisting.” Landscape Ecology 28 (5): 879–889. https://doi.org/10.1007/s10980-013-9885-x.
  • Jones, T. G., N. C. Coops, and T. Sharma. 2010. “Assessing the Utility of Airborne Hyperspectral and LiDAR Data for Species Distribution Mapping in the Coastal Pacific Northwest, Canada.” Remote Sensing of Environment 114 (12): 2841–2852. https://doi.org/10.1016/j.rse.2010.07.002.
  • Kandare, K., H. O. Ørka, M. Dalponte, E. Næsset, and T. Gobakken. 2017. “Individual Tree Crown Approach for Predicting Site Index in Boreal Forests Using Airborne Laser Scanning and Hyperspectral Data.” International Journal of Applied Earth Observation and Geoinformation 60:72–82. https://doi.org/10.1016/j.jag.2017.04.008.
  • Kc, K., Z. D. Yin, M. Y. Wu, and Z. L. Wu. 2019. “Depthwise Separable Convolution Architectures for Plant Disease Classification.” Computers and Electronics in Agriculture 165:104948. https://doi.org/10.1016/j.compag.2019.104948.
  • LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–444. https://doi.org/10.1038/nature14539.
  • Lecun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791.
  • Li, J., J. M. Bioucas-Dias, and A. Plaza. 2012. “Spectral–spatial Classification of Hyperspectral Data Using Loopy Belief Propagation and Active Learning.” IEEE Transactions on Geoscience and Remote Sensing 51 (2): 844–856. https://doi.org/10.1109/TGRS.2012.2205263.
  • Lim, K., P. Treitz, M. Wulder, B. St-Onge, and M. Flood. 2003. “LiDAR Remote Sensing of Forest Structure.” Progress in Physical Geography: Earth and Environment 27 (1): 88–106. https://doi.org/10.1191/0309133303pp360ra.
  • Mao, Y. W., Y. Guo, W. F. Zhang, Y. Su, and Y. Guan. 2023. “Tree Species Classification by Combining LiDAR, Hyperspectral Data and 3D-CNN Method.” Scientia Silvae Sinicae 59 (3): 73–83. https://doi.org/10.11707/j.1001-7488.LYKX20220533.
  • Mäyrä, J., S. Keski-Saari, S. Kivinen, T. Tanhuanpää, P. Hurskainen, P. Kullberg, L. Poikolainen, et al. 2021. “Tree Species Classification from Airborne Hyperspectral and LiDAR Data Using 3D Convolutional Neural Networks.” Remote Sensing of Environment 256:112322. https://doi.org/10.1016/j.rse.2021.112322.
  • Modzelewska, A., F. E. Fassnacht, and K. Stereńczak. 2020. “Tree Species Identification Within an Extensive Forest Area with Diverse Management Regimes Using Airborne Hyperspectral Data.” International Journal of Applied Earth Observation and Geoinformation 84:101960. https://doi.org/10.1016/j.jag.2019.101960.
  • Mohd Zaki, N. A., and Z. Abd Latif. 2017. “Carbon Sinks and Tropical Forest Biomass Estimation: A Review on Role of Remote Sensing in Aboveground-biomass Modelling.” Geocarto International 32 (7): 701–716. https://doi.org/10.1080/10106049.2016.1178814.
  • Moisen, G. G., E. A. Freeman, J. A. Blackard, T. S. Frescino, N. E. Zimmermann, and T. C. Edwards. 2006. “Predicting Tree Species Presence and Basal Area in Utah: A Comparison of Stochastic Gradient Boosting, Generalized Additive Models, and Tree-based Methods.” Ecological Modelling 199 (2): 176–187. https://doi.org/10.1016/j.ecolmodel.2006.05.021.
  • Nair, V., and G. E. Hinton. 2010. “Rectified Linear Units Improve Restricted Boltzmann Machines.” 27th International Conference on Machine Learning (ICML), Haifa, Israel, June 21–24.
  • Natesan, S., C. Armenakis, and U. Vepakomma. 2020. “Individual Tree Species Identification Using Dense Convolutional Network (DenseNet) on Multitemporal RGB Images from UAV.” Journal of Unmanned Vehicle Systems 8 (4): 310–333. https://doi.org/10.1139/juvs-2020-0014.
  • Nevalainen, O., E. Honkavaara, S. Tuominen, N. Viljanen, T. Hakala, X. Yu, J. Hyyppä, et al. 2017. “Individual Tree Detection and Classification with UAV-based Photogrammetric Point Clouds and Hyperspectral Imaging.” Remote Sensing 9 (3): 185. https://doi.org/10.3390/rs9030185.
  • Nezami, S., E. Khoramshahi, O. Nevalainen, I. Pölönen, and E. Honkavaara. 2020. “Tree Species Classification of Drone Hyperspectral and RGB Imagery with Deep Learning Convolutional Neural Networks.” Remote Sensing 12 (7): 1070. https://doi.org/10.3390/rs12071070.
  • Peng, C., X. Y. Zhang, G. Yu, G. M. Luo, and J. Sun. 2017. “Large Kernel Matters – Improve Semantic Segmentation by Global Convolutional Network.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21–26.
  • Pölönen, I., L. Annala, S. Rahkonen, O. Nevalainen, E. Honkavaara, S. Tuominen, N. Viljanen, and T. Hakala. 2018. “Tree Species Identification Using 3D Spectral Data and 3D Convolutional Neural Network.” 2018 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, Netherlands, September 23–26.
  • Raczko, E., and B. Zagajewski. 2017. “Comparison of Support Vector Machine, Random Forest and Neural Network Classifiers for Tree Species Classification on Airborne Hyperspectral APEX Images.” European Journal of Remote Sensing 50 (1): 144–154. https://doi.org/10.1080/22797254.2017.1299557.
  • Reitberger, J., P. Krzystek, and U. Stilla. 2008. “Analysis of Full Waveform LIDAR Data for the Classification of Deciduous and Coniferous Trees.” International Journal of Remote Sensing 29 (5): 1407–1431. https://doi.org/10.1080/01431160701736448.
  • Sandler, M., A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. 2018. “Mobilenetv2: Inverted Residuals and Linear Bottlenecks.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 18–23.
  • Shang, X., and L. A. Chisholm. 2013. “Classification of Australian Native Forest Species Using Hyperspectral Remote Sensing and Machine-Learning Classification Algorithms.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (6): 2481–2489. https://doi.org/10.1109/JSTARS.2013.2282166.
  • Simonyan, K., and A. Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23–28.
  • Sothe, C., C. M. De Almeida, M. B. Schimalski, L. E. C. La Rosa, J. D. B. Castro, R. Q. Feitosa, M. Dalponte, et al. 2020. “Comparative Performance of Convolutional Neural Network, Weighted and Conventional Support Vector Machine and Random Forest for Classifying Tree Species Using Hyperspectral and Photogrammetric Data.” GIScience & Remote Sensing 57 (3): 369–394. https://doi.org/10.1080/15481603.2020.1712102.
  • Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15 (1): 1929–1958. https://doi.org/10.5555/2627435.2670313.
  • Sun, Y., J. F. Huang, Z. R. Ao, D. Z. Lao, and Q. C. Xin. 2019a. “Deep Learning Approaches for the Mapping of Tree Species Diversity in a Tropical Wetland Using Airborne LiDAR and High-Spatial-Resolution Remote Sensing Images.” Forests 10 (11): 1047. https://doi.org/10.3390/f10111047.
  • Sun, X., and Y. Shi. 2023. “The Image Recognition of Urban Greening Tree Species Based on Deep Learning and CAMP-MKNet Model.” Urban Forestry & Urban Greening 85:127970. https://doi.org/10.1016/j.ufug.2023.127970.
  • Sun, Y., Q. Xin, J. Huang, B. Huang, and H. Zhang. 2019b. “Characterizing Tree Species of a Tropical Wetland in Southern China at the Individual Tree Level Based on Convolutional Neural Network.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (11): 4415–4425. https://doi.org/10.1109/JSTARS.2019.2950721.
  • Suratno, A., C. Seielstad, and L. Queen. 2009. “Tree Species Identification in Mixed Coniferous Forest Using Airborne Laser Scanning.” ISPRS Journal of Photogrammetry Remote Sensing 64 (6): 683–693. https://doi.org/10.1016/j.isprsjprs.2009.07.001.
  • Szegedy, C., S. Ioffe, V. Vanhoucke, and A. Alemi. 2017. “Inception-v4, Inception-resnet and the Impact of Residual Connections on Learning.” Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, February 4–9.
  • Tan, M. X., and Q. Le. 2019. “Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks.” 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, June 10–15.
  • Thompson, I., B. Mackey, S. McNulty, and A. Mosseler. 2009. “Forest Resilience, Biodiversity, and Climate Change: A Synthesis of the Biodiversity/Resilience/Stability Relationship in Forest Ecosystems.” In Book Forest Resilience, Biodiversity, and Climate Change: A Synthesis of the Biodiversity/Resilience/Stability Relationship in Forest Ecosystems.
  • Valderrama-Landeros, L., F. Flores-de-Santiago, J. M. Kovacs, and F. Flores-Verdugo. 2017. “An Assessment of Commonly Employed Satellite-based Remote Sensors for Mapping Mangrove Species in Mexico Using an NDVI-based Classification Scheme.” Environmental Monitoring and Assessment 190 (1): 23. https://doi.org/10.1007/s10661-017-6399-z.
  • Xu, H., W. Yao, L. Cheng, and B. Li. 2021. “Multiple Spectral Resolution 3D Convolutional Neural Network for Hyperspectral Image Classification.” Remote Sensing 13 (7): 1248. https://doi.org/10.3390/rs13071248.
  • Yan, S. J., L. H. Jing, and H. Wang. 2021. “A New Individual Tree Species Recognition Method Based on a Convolutional Neural Network and High-spatial Resolution Remote Sensing Imagery.” Remote Sensing 13 (3): 479. https://doi.org/10.3390/rs13030479.
  • Yao, Y., H. M. Qin, Z. M. Zhang, W. M. Wang, and W. Q. Zhou. 2022. “The Classification of Subtropical Forest Tree Species Based on UAV Multi-source Remote Sensing Data.” Acta Ecologica Sinica 42 (9): 3666–3677. https://doi.org/10.5846/stxb202104160991.
  • Yu, X. W., J. Hyyppä, P. Litkey, H. Kaartinen, M. Vastaranta, and M. Holopainen. 2017. “Single-sensor Solution to Tree Species Classification Using Multispectral Airborne Laser Scanning.” Remote Sensing 9 (2): 108. https://doi.org/10.3390/rs9020108.
  • Yu, F., and V. Koltun. 2015. “Multi-scale Context Aggregation by Dilated Convolutions.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 7–12.
  • Yu, X., P. Litkey, J. Hyyppä, M. Holopainen, and M. Vastaranta. 2014. “Assessment of Low Density Full-waveform Airborne Laser Scanning for Individual Tree Detection and Tree Species Classification.” Forests 5 (5): 1011–1031. https://doi.org/10.3390/f5051011.
  • Zhang, C., K. Xia, H. L. Feng, Y. H. Yang, and X. C. Du. 2021. “Tree Species Classification Using Deep Learning and RGB Optical Images Obtained by an Unmanned Aerial Vehicle.” Journal of Forestry Research 32 (5): 1879–1888. https://doi.org/10.1007/s11676-020-01245-0.
  • Zhang, L. P., L. F. Zhang, and B. Du. 2016. “Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the art.” IEEE Geoscience and Remote Sensing Magazine 4 (2): 22–40. https://doi.org/10.1109/MGRS.2016.2540798.
  • Zhang, T., X. Zhang, J. Shi, and S. Wei. 2019. “Depthwise Separable Convolution Neural Network for High-speed SAR Ship Detection.” Remote Sensing 11 (21): 2483. https://doi.org/10.3390/rs11212483.
  • Zhang, B., L. Zhao, and X. L. Zhang. 2020. “Three-dimensional Convolutional Neural Network Model for Tree Species Classification Using Airborne Hyperspectral Images.” Remote Sensing of Environment 247:111938. https://doi.org/10.1016/j.rse.2020.111938.
  • Zhang, X. Y., X. Y. Zhou, M. X. Lin, and J. Sun. 2018. “Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.” 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA, June 18–22.
  • Zheng, J., S. Yuan, W. Wu, W. Li, L. Yu, Haohuan Fu, and D. Coomes. 2023. “Surveying Coconut Trees Using High-resolution Satellite Imagery in Remote Atolls of the Pacific Ocean.” Remote Sensing of Environment 287:113485. https://doi.org/10.1016/j.rse.2023.113485.