Full article: TrmGLU-Net: transformer-augmented global-local U-Net for hyperspectral image classification with limited training samples

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In recent years, deep learning methods have been widely used for the classification of hyperspectral images. However, their limited availability under the condition of small samples remains a serious issue. Moreover, the current mainstream approaches based on convolutional neural networks do well in local feature extraction but are also restricted by its limited receptive field. Hence, these models are unable to capture long-distance dependencies both on spatial and spectral dimension. To address above issues, this paper proposes a global-local U-Net augmented by transformers (TrmGLU-Net). First, whole hyperspectral images are input to the model for end-to-end training to capture the contextual information. Then, a transformer-augmented U-Net is designed with alternating transformers and convolutional layers to perceive both global and local information. Finally, a superpixel-based label expansion method is proposed to expand the labels and improve the performance under the condition of small samples. Extensive experiments on four hyperspectral scenes demonstrate that TrmGLU-Net has better performance than other advanced patch-level and image-level methods with limited training samples. The relevant code will be opened at https://github.com/sssssyf/TrmGLU-Net

KEYWORDS:

Introduction

Hyperspectral remote sensing images have the combined characteristics of an image and a spectrum, and they contain both spectral and spatial information of surface objects. Different objects have unique spectral characteristics (S. Li et al., Citation2019). Because of this correlation, hyperspectral image (HSI) has been widely used in various geological tasks, such as recognition of tree species, precise extraction of water boundaries, and statistical analyses of land use (Y. Zhong et al., Citation2018). Such applications are inseparable from HSI classification methods. Although rich spectral and spatial information offers great possibilities for fine-scale differentiation between objects, it also poses great challenges to classification tasks. Early HSI classification research focused on mining spectral features and classification algorithms. During that time, machine learning algorithms, such as support vector machines (SVMs) (Melgani & Bruzzone, Citation2004), random forests (L. Zhang et al., Citation2018), linear discriminant analysis (C. Li et al., Citation2011), and neural networks, were successively applied to solving HSI classification problems. Facing limited computing resources, researchers have explored different methods of data dimensionality reduction (Agarwal et al., Citation2007; Luo et al., Citation2020; Villa et al., Citation2010) and band selection (Cai et al., Citation2020; B. Fang et al., Citation2020; Q. Wang et al., Citation2020; Zeng et al., Citation2019) to deal with the high dimensionality of hyperspectral images. Although early studies made improvements to the classification of hyperspectral images, the following three problems still must be solved for HSI classification: the limited availability of labelled training samples, which leads to overfitting, high dimensionality of HSI data and the high correlation between adjacent bands, which further aggravates overfitting, and limited capability to distinguish different objects by spectral characteristics alone because they may have the same spectral characteristics.

The existence of these issues severely limits the classification accuracy. To improve HSI classification accuracy, researchers have incorporated spatial information into classification. Many methods use spatial information, and they can be broadly classified into three types. The first is represented by methods of spatial feature extraction, such as extended morphological profiles (EMPs) (Benediktsson et al., Citation2005), Gabor (Jia et al., Citation2019), and local binary patterns (W. Li et al., Citation2015). These methods account for the information of the neighboring pixels of sample points in feature extraction and input the extracted features into the classifier to complete the classification process. The second type considers the distance between samples. The closer the samples, the higher the probability that they belong to the same class. Such correlations are used to constrain the classifier during classification to improve accuracy. The third type focuses on the problem of classification noise in the initial classification result of hyperspectral images, which are mainly denoised by using morphological filtering and other methods. The introduction of spatial information has improved the accuracy of HSI classification. However, the limited availability of labelled training samples continues to restrict its development and application. To this end, researchers are exploring semi-supervised learning methods, such as label propagation (J. Zhang et al., Citation2020), transductive SVM (Bruzzone et al., Citation2005), collaborative training (Wan et al., Citation2015), and active learning (Z. Wang et al., Citation2017). This activity has resulted in the rapid development of HSI classification methods, giving rise to spatial – spectral classification and semi-supervised classification. The methods are effective in improving the classification accuracy of hyperspectral images while addressing the availability issue of labelled samples to some extent. However, classification performance with these methods relies heavily on expert experience, and it usually requires manual designs of complex feature extraction rules as well as the setting of different hyperparameters for different data.

In recent years, as computing capabilities and data numbers continue to grow, data-driven methods represented by deep learning have gained great popularity in many tasks, including image recognition (K. He et al., Citation2016), object detection (Ren et al., Citation2017), semantic segmentation (Shelhamer et al., Citation2017), and three-dimensional (3D) reconstruction (Yao et al., Citation2019). Deep learning can automatically extract features from data for downstream tasks. Hence, researchers have introduced deep learning to HSI classification to develop more versatile classification methods. Commonly used deep learning models include autoencoders (Xing et al., Citation2016), deep belief networks (Y. Chen et al., Citation2015), convolutional neural networks (CNNs) (Hu et al., Citation2015), and recurrent neural networks (RNNs) (Mou et al., Citation2017). Among these, CNNs have shown good performance in dealing with high-dimensional images. Thus, those methods have received wide attention in HSI classification. Inspired by spatial-spectral classification, researchers take a certain sample point as the center and slice out a local image patch of a certain size from the hyperspectral image as the feature of that sample, which is then input to a two-dimensional (2D) CNN (Haut et al., Citation2019; B. Liu et al., Citation2018; L. Zhang et al., Citation2018), 3D CNN (Y. Chen et al., Citation2016), or other models for classification. To further improve classification accuracy, more recent deep learning-based methods are used, such as residual learning (Liu, Yu, Zhang, et al., Citation2021), DenseNet (Huang et al., Citation2017), and attention mechanisms (Kuiliang Gao, Yu, et al., Citation2021; Xue et al., Citation2021). Considering that different bands of hyperspectral images can be used as time-series data, RNNs, long short-term memory models, and others have been proposed for the classification of hyperspectral images. Besides, the transformer-based model has made some progress in HSI classification at present. A bidirectional encoder representations from transformers for HSI classification task (HSI-BERT) was early on thinking about using the transformer to capture the global dependence among pixels (J. He et al., Citation2020). A spatial-spectral transformer (SST) was proposed to utilize the transformer to solve the problem of extract dependency of long-distance sequential spectra (X. He et al., Citation2021). The spectral and spatial transformer block replacing convolution was developed in a novel spectral – spatial transformer network (SSTN) to extract features, which achieved superior performance (Z. Zhong et al., Citation2022). The SpectralFormer (Hong et al., Citation2022) rethought the issue of HSI classification from a sequential perspective to purely utilize the transformer to finish the task.

Although these methods can effectively lower the difficulty of training deep learning models with hyperspectral images, the limited availability of labelled training samples for HSI classification remains a problem. Besides, these explorations and attempts still belong to the methods that use local image patches as input. Therefore, numerous works have been proposed to solve the problem of small samples HSI classification in recent years. For instance, Data augmentation can alleviate sample scarcity, which allows the model to be more fully trained for HSI classification when there are not enough labels (Nalepa et al., Citation2020a). Transfer learning methods have been widely explored to enable the model to have a higher generalization ability for target data by training on the source data, so the model can quickly converge on the target domain and alleviate the dependence of training samples (Yifan Sun, Bing Liu, Xuchu Yu, Anzhu Yu, Kuiliang Gao et al., Citation2022). And the practice has verified the effectiveness of transfer learning both on convergence speed and improvement with limited samples (J. He et al., Citation2020; Lee et al., Citation2022; Nalepa et al., Citation2020b). Few-shot learning specializes in solving the problem with less labels, which is also explored to improve the model’s performance in HSI classification of small samples (Liu, Yu, Yu, et al., Citation2019). As a prevalent and effective scheme of few-shot learning, meta-learning mainly makes model had a capacity of learning to learn and has been explored to improve the performance of small samples HSI classification (Kuiliang Gao et al., Citation2020, Citation2022; Kuiliang; Gao, Liu, et al., Citation2021). Besides, due to the massive growth of labelled samples, except for the supervised learning paradigm, semi-supervised learning and unsupervised learning are defined according to the participation of unlabelled samples during the training. Among them, semi-supervised learning methods can combine unlabelled samples and labelled samples for joint learning, represented by generative adversarial network (GAN) (L. Zhu et al., Citation2018) and graph convolution network (GCN) (Hong et al., Citation2021). And the strategy of pseudo-label is also a representative method for semi-supervised learning and explored widely to utilize unlabelled samples to expand the number of training data (B. Fang et al., Citation2020). Unsupervised learning is explored to enable the network to learn representation of feature for HSI classification merely with unlabelled samples. The reconstruction-based methods with encoder-decoder architecture are representative and have been widely utilized to accomplish spectral-spatial feature learning (Mei et al., Citation2019; S. Zhang et al., Citation2022). And a novel unsupervised learning framework is constructed to track the spectral variation information to extract the spectrum motion feature (Sun, Liu, Yu, Yu, Gao, & Ding, Citation2022). As an influential branch of unsupervised learning, self-supervised learning aims to construct loss function by using some attributes of data instead of labelled samples, which involves contrastive learning and generative learning. Contrastive learning utilizes a special contrastive loss function to cluster the positive samples and alienate the negative samples, which enables networks to learn robust capacity of feature learning without labels (Hou et al., Citation2022; B. Liu et al., Citation2021; M. Zhu et al., Citation2022). Generative learning usually enables networks to be trained by generating samples, and a relevant method is proposed to enable network to learn spectral-spatial feature through recovering the information of masked samples (Xue et al., Citation2022). Numerous works above mentioned are significant exploration for the problem of HSI classification with limited training samples, which also effectively improves the performance of deep models under the condition of small samples (Sun et al., Citation2022). While almost all of these methods are designed for patch-level methods, how to improve the classification performance of image-level classification methods with limited training samples is still worth exploring.

Second, deep learning methods that use local image patches as input have been successfully used for HSI classification tasks (Sun et al., Citation2023). However, these methods suffer from two major problems: one is that a model having local image patches as input fails to perceive the contextual information of the whole scene; and the other is that these highly overlapping image patches generate a lot of redundant computations, which reduces classification efficiency. To address these problems, semantic segmentation models have been introduced to the classification of hyperspectral images. Specifically, whole hyperspectral images are input to the segmentation model, and the classification result for the whole scene is produced. By this approach, contextual information is utilized to improve classification accuracy while unnecessary computations are reduced. Classical semantic segmentation models include fully convolution networks (FCNs) (Shelhamer et al., Citation2017), U-Net (Ronneberger et al., Citation2015), SegNet (Badrinarayanan et al., Citation2017), and DeepLab (L. C. Chen et al., Citation2018). These models usually require a large number of densely labelled samples to optimize thousands of network parameters in a semantic segmentation task. Simultaneously, the global learning framework specializes in capturing contextual information is gradually applied in remote sensing field to obtain better performance (M. Zhu et al., Citation2022). However, the labelled samples tend to be sparse in HSI classification task. More importantly, the number of these labelled samples is usually small, which leads to poor performance when applying these classical segmentation models directly. For this reason, DSSNet (Pan et al., Citation2020) was proposed for HSI classification, which is a segmentation network containing four convolutional layers. To improve the classification accuracy by utilizing the contextual information in HSI more fully, researchers have developed segmentation models while considering the characteristics of HSI, such as the deep fully convolutional network-based spatial distribution prediction (Jiao et al., Citation2017), the spectral-spatial fully convolutional networks (SSFCN) (Xu et al., Citation2020), fast patch-free global learning (FPGA) framework based fully convolutional network (Zheng et al., Citation2020), the fully convolutional network with channel and spatial attention (FCN-CSA) (Jiang et al., Citation2021), PBiNet (B. Liu & Yu, Citation2021), FOctConvPA (Yifan Sun, Bing Liu, Xuchu Yu, Anzhu Yu, Zhixiang Xue et al., Citation2022) and the FullyContNet-Pyramid (D. Wang et al., Citation2022).

Owing to the use of local connections for reducing the number of parameters, CNNs tend to have restricted receptive fields and cannot perceive long time dependencies. This may be improved by increasing the convolutional kernel or using dilated convolution, but that can result in blurred feature boundaries. A transformer regards images as sequential one-dimensional data and learns features using the self-attention mechanism. Thus, the transformer structure has a global receptive field. Hence, it can be used to perceive contextual information in hyperspectral images.

However, except for the contextual information, the local details are also crucial because the purpose of HSI classification is to assign a class label to each pixel. To achieve the purpose of modelling both contextual information and local details, this paper proposes a global-local U-Net augmented by transformers (TrmGLU-Net) for small-sample classification of hyperspectral images. Specifically, the hyperspectral image is input as a whole. Then, alternating transformers and convolutional layers are used to perceive both its contextual and local information to improve the accuracy of HSI classification. This also allows the characteristic of transformer’s global receptive field to work better on image rather than just local patches.

The originality of this paper is shown from the following aspects:

TrmGLU-Net is proposed, which takes a whole hyperspectral image as input and performs end-to-end training. TrmGLU-Net comprises alternating transformers and convolutional layers. Skip connections are used between encoders and decoders, which allow the model to improve classification accuracy by making full use of contextual and local information.
A superpixel-based label expansion method is proposed to improve the performance of image-level methods under the condition of small samples. With this method, the original image is partitioned into segments, and the results of superpixel segmentation are used to expand labels to effectively increase supervision information and obtain higher classification accuracy.
The validity of the proposed method for small sample classification of hyperspectral images is verified using four sets of hyperspectral images with artificially labelled samples. Quantitative and qualitative experiments suggest that the combination of TrmGLU-Net and the superpixel-based label expansion method can obtain better classification results than those of semi-supervised methods, providing better adaptability when only a small number of labelled samples are available for each class of objects.

The remainder of this paper is organized as follows. Section 2. introduces the proposed classification method. Section 3. describes the validation of the proposed method for small-sample classification and its adaptability to sample size through classification experiments on four sets of hyperspectral images. Section 4. concludes the paper.

TrmGLU-Net and superpixel-based label expansion

The proposed method for HSI classification combines TrmGLU-Net and superpixel-based label expansion. The following subsections introduces the architecture of TrmGLU-Net, then its components and the method for label expansion based on superpixel segmentation.

TrmGLU-Net architecture

The architecture of the proposed TrmGLU-Net is illustrated in . The size of the model input is $B \times H \times W$ , where B, $H$ and $W$ refer to the band number, height and width of the whole image, respectively. To fully utilize whole HSI data information, an embedding layer composed of one convolution layer is deployed to projected the HSI data into the standard dimension (64). The embedding layer guarantees all spectral information can be modelled and simultaneously avoids the huge computational consumption. The network is very concise and adopts the encoder – decoder scheme of U-Net in general. What differs is that only two alternating transformers and convolutional layers are used for encoding and decoding to respond to the small sample size. First, feature transformation is performed on the input image using a conventional 2D convolution with a step size of one. During the first stage of encoding, the feature map is input into the transformer layer. Then, the output features are aggregated through two 3 $\times$ 3 convolutional layers, where the step size of the first convolutional layer is set to one, and the step size of the second convolutional layer is set to two. After the second convolution layer, the dimension of the feature space is doubled with the number of feature map channels. Next, the second stage of encoding begins. In this process, the original input is used to concatenate with the downsampled features, thus helpfully supplementing the missing details. Finally, the features output from the second stage of encoding are fed to the decoder through two transformers. The decoder and the encoder are similar in structure. However, the encoder downsamples the feature map through a convolutional layer with a step of two to aggregate features, whereas the decoder upsamples the feature map by using a deconvolution operation with a step of two. The following subsections introduce the components of TrmGLU-Net and the superpixel-based label expansion method.

Figure 1. TrmGLU-Net architecture.

Transformer

Transformer has self-attention mechanisms, allowing them to have global receptive fields and capture long-distance dependencies. However, if we want to use transformers to process the feature map in , it is first necessary to convert the feature map into a sequential form. The transformer-based method originally used to solve the image problem simply originally used to solve an image problem was simply to cut the image into fixed-size patches (Dosovitskiy et al., Citation2020). It is obviously not conducive to high-resolution visual tasks, especially hyperspectral image classification related to dense prediction task. For hyperspectral image classification, it is necessary to establish the pixel-level dependencies when using transformer, but it will lead high computational complexity. The original transformer does not have high computational efficiency, and both contextual and local information must be used for HSI classification. Considering these facts, a window-based multi-head attention (W-MSA) model (B. Liu et al., Citation2021) was adopted in this study.

As shown in , the feature map is first divided into $M \times M$ non-overlapping windows along the spatial dimension, and then it is evenly divided into $k$ non-overlapping parts along the channel dimension. Thus, we can obtain a total number of $\frac{H \times W}{M \times M} \times k$ independent windows. Next, the feature map in each window is expanded into a 2D matrix upon which the self-attention operation is performed. The $k$ denotes the number of heads for the transformer. For the feature map, $X$ , the transformer can be expressed as:

(1)

X_{k} = {X_{k}^{1}, X_{k}^{2}, \dots X_{k}^{N}}, N = \frac{H \times W}{M \times M}

(1)

(2)

Y_{k}^{i} = A t t e n t i o n (X_{k}^{i} W_{k}^{Q}, X_{k}^{i} W_{k}^{K}, X_{k}^{i} W_{k}^{V})

(2)

(3)

\hat{Y_{k}} = {Y_{k}^{1}, Y_{k}^{2}, \dots Y_{k}^{N}}

(3)

Figure 2. Schematic of the W-MSA model.

where $X_{k}^{i}$ refers to the $i$ -th window for the $k$ -th head. $W_{k}^{Q}$ , $W_{k}^{K}$ , $W_{k}^{V}$ denote the queries, keys, and values for the $k$ -th head, respectively. In the W-MSA model, the attention calculations are assigned for inside of each window, so different windows can be treated as different batches when training. The attention calculations can be formulated as:

(4)

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{k} + B) V

(4)

where $B$ is the relative position-encoding, softmax denotes the softmax normalized function. Like the original transformer, a fully connected layer is utilized to enhance the nonlinearity of the features following W-MSA. It should be noted that each window in the feature map must be flattened to a 2D matrix before being input to the transformer; it is then reverted to a feature map after the transformer before being input to the convolution layer. If $M \times M = H \times W$ , that is, $N$ = 1, the model can be called global multi-head attention model, which attempts to establish the dependency pixel by pixel in the whole image. Compared with global multi-head attention models, the W-MSA model in can significantly reduce computational efforts from $O (H^{2} \times W^{2} \times k)$ to $O (M^{2} \times H \times W \times k)$ . Besides, as the size of the feature map decreases with the deepening of the network, the receptive field of window actually increases, which attends better to contextual and local information. Therefore, it can be used to effectively improve the accuracy of HSI classification.

Convolutional layers

TrmGLU-Net employs three types of 3 $\times$ 3 convolutions. A convolution layer with a step size of one is used to extract local features. A convolution layer with a step size of two is used to reduce the spatial dimension of the feature map for feature aggregation. An inverse convolution layer (ConvT2D) with a step size of two is used to upsample the feature map. The convolution operation can be described with the following equation.

In the $j$ -th feature map of the $i$ -th convolutional layer, the value of $v_{i j}^{x y}$ at ( $x, y$ ) can be obtained using the following equation:

(5)

v_{i j}^{x y} = f (b_{i j} + \sum_{k = 1}^{m} \sum_{p = 0}^{P_{i} - 1} \sum_{q = 0}^{Q_{i} - 1} w_{i j k}^{p q} v_{(i - 1) k}^{(x + p) (y + q)})

(5)

where $P_{i}$ and $Q_{i}$ denote kernel sizes, $m$ is the number of feature maps of the $i = 1$ layer, $v_{(i - 1) k}^{(x + p) (y + q)}$ is the value of the $k$ -th feature map in the $i - 1$ layer at $(x + p, y + q)$ , $w_{i j k}^{p q}$ is the convolution kernel connected to the $k$ -th feature map of the layer $i - 1$ , $b_{i j}$ denotes the bias, $f (\cdot)$ is the activation function. The bidirectional encoder representations from transformers uses Gaussian error linear units (GELUs) as the activation function. Therefore, the model in this paper also adopts GELUs as the activation function. ConvT2D involves the process where the feature map is first interpolated before the convolution operation.

Superpixel-based label expansion

shows that, in the image-level HSI classification task, only a small number of labelled samples are available for training, which are also sparsely distributed. Such weak supervision is likely to cause noise in the classification results and blurred boundaries between different classes. Considering that the sample points within the superpixel are very likely to have consistent labels, we propose a superpixel-based label expansion method. Specifically, superpixel segmentation is performed on the hyperspectral image, and the result of superpixel segmentation is used as a mask. For a superpixel containing labelled samples, the classes of the samples within that superpixel are labelled with the classes of known samples to greatly increase the number of labelled samples. The expansion of labels will significantly strengthen the supervision information of image-level task and thus the network can be trained more adequately.

Figure 3. Schematic of the superpixel-based label expansion method.

The accuracy of superpixel segmentation is the key to the result of the proposed label expansion method. Therefore, the segmentation results of simple linear iterative clustering (SLIC) are used to expand a small number of labels. The SLIC algorithm is simple and efficient in that it only requires the input of the number of superpixels, $K$ , to obtain the segmentation results. The SLIC algorithm is as shown in Alg.1.

Extensive evidence suggests that good superpixel segmentation results can be obtained with 10 iterations for most image data. Therefore, the number of iterations in SLIC is set to 10 for later experiments. It is noted that the superpixel-based label expansion here is different from the common superpixel segmentation post-processing method, in fact, it utilizes the result of superpixel segmentation to extend the labelled sample before model training. Therefore, the superpixel-based label expansion is actually not a post-processing, and it can be understood as a pseudo-label technology to solve the problem of insufficient samples.

Experimental results and analysis

The validity of the proposed method was tested by classifying three HSI datasets. In terms of hardware settings, Intel Core Intel(R) Xeon(R) Gold 6152 central processing unit, Nvidia A100 PCIE GPU, 40-GB GPU memory, and 128-GB RAM were used. The algorithms were all implemented in Python and PyTorch. The overall accuracy (OA), average accuracy (AA), and Cohen’s kappa $(K)$ were selected as the evaluation criterion. All trials will run 10 times with different samples, and the average results were report to smoothen errors as far as possible. Besides, an additional two-tailed Wilcoxon test was conducted to show if the experimental conclusion were statistically significant (i.e. reporting the $p$ -value).

Experimental data

To test the validity of the proposed method, four HSI scenes were used for classification (i.e. a University of Pavia scene, an Indian Pines scene, a Salinas scene and a Houston scene). The scenes were obtained by the Reflective Optics Spectrographic Imaging System (ROSIS-03), the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), again AVIRIS, and the ITRES CASI-1500 sensor, respectively. To quantitatively evaluate the performance of different classification algorithms, some regions of the images were manually labelled. Specifically, data from the University of Pavia scene were classified into 9 classes with a total of 42,776 samples; data from the Indian Pines scene were classified into 16 classes with a total of 10,776 samples; data from the Salinas scene were classified into 16 classes with a total of 54,129 samples; and data from the Houston scene were classified into 15 classes with a total of 15,029 samples. The properties of these HSIs are listed in . To verify the efficacy of the proposed method for small sample classification, samples were randomly selected from each class as training samples (i.e. 20, 10, 10 and 20 samples from the University of Pavia data, the Indian Pines data, the Salinas data, and the Houston data, respectively). The remaining samples were used as test samples. For the four HSIs, the number of samples used for training was 180, 160, and 160, and 300 respectively. The proportion of training samples to the total number of samples was 0.4%, 1.5%, 0.3% and 1.9%, respectively.

Table 1. The properties of the three HSIs.

Download CSV Display Table

Parameter settings and analysis

The Adam optimization algorithm was used for network training with a learning rate of 0.00005 and 150 epochs. The expansion of superpixel tags directly affects the accuracy of classification, and SLIC requires prior input of the number of superpixels. Therefore, how to determine the number of segments becomes a key problem. To make this process more rational, we utilize a soft turning mechanism. Concretely, we firstly define a space of hard segments number $Θ = {100, 500, 1000, 2000, 5000, 10000}$ to test the effect of different number of superpixels on the classification accuracy and observe the variation trend. Each experiment was repeated 10 times. presents the box plots of overall accuracy with different numbers of segments used for the four HSIs under the defined space. The experimental results in show that both too few (100, 500) and too many (5000, 10000) superpixels lead to a decline in classification accuracy, except for the Houston dataset. The lower the number of superpixels and the higher the number of image points contained within each superpixel, the higher the probability of introducing noise in the label, thus causing a decrease in classification accuracy. On the other hand, the higher the number of superpixels, the lower the number of image points contained in each superpixel. Thus, the expanded labels were more accurate, but the number of expanded labels was smaller. Hence, there was a slight decrease in classification accuracy. As for the Houston scene, maybe it is too complex and contains a large number of linear objects, so it is not suitable to contain too many points in each superpixel. Secondly, we further narrow the range of optimal values according to the results of the previous step. We narrow the range to (500, 5000) for the first three scenes and to (5000, 13000) for the Houston scene by taking one value as the new experimental number of segments every 100 intervals within this range. In order to facilitate intuitive comparison, we fix the training sample and the variation trend of classification performance with number of segments is shown in . Thus, the optimal range is further compressed. Taking the University of Pavia scene as an example, its range is narrowed to (1200, 1300). Next, in order to improve accuracy and give consideration to efficiency, the interval is further reduced to 5 to determine the final optimal range. Therefore, the final optimal range of number of segments for four scenes are determined to (1200, 1300), (1100, 1300), (530, 565) and (11010, 12030), respectively. And the minimal of each range is selected as the final number of segments for each scene.

Figure 4. Overall accuracy with different number of segments:(a) University of Pavia; (b) Indian Pines; (c) Salinas (d) Houston.

Figure 5. The variation trend of the overall accuracy with different number of segments.

Ablation study

To demonstrate the enhancing effect of transformers and the effectiveness of superpixel-based label expansion, an ablation study was designed as follows. First, the transformers were removed from TrmGLU-Net to obtain GLU-Net for classification experiments, which were conducted by using TrmGLU-Net. Finally, superpixel-based label expansion was incorporated into the classification with TrmGLU-Net. presents the box plot distributions of classification accuracy with different methods. The label “Augment” refers to TrmGLU-Net, which was trained with labels expanded by superpixel segmentation. First, by comparing GLU-Net and TrmGLU-Net, it can be seen that after the transformers were removed, there was a significant decrease in the mean value of classification accuracy for the 10 experiments $(p < 0.005)$ . This result indicates that the introduction of transformer layer in GLU-Net promotes feature extraction, thus improving the classification accuracy of the model. We also compared the TrmGLU-Net trained with the original labels and the TrmGLU-Net trained with labels expanded based on superpixel segmentation. It was found that label expansion based on the results of superpixel segmentation significantly improve classification accuracy $(p < 0.005)$ , showing the effectiveness of the proposed method.

Figure 6. Overall accuracy with three methods: (a) University of Pavia; (b) Indian Pines; (c) Salinas (d) Houston.

Besides, as the TrmGLU-Net is proposed for small sample classification, we consider inputting data with high dimensionality may cause overfitting. Besides, the 3-channel input is a standard form for conventional Transformer model. Therefore, original hyperspectral images are transformed through principal component analysis (PCA), and the first three principal components are taken as input. The comparative results are shown as in . As we can observe, the input after PCA cannot improve the performance by reducing the dimensionality of data. And the step of PCA even weakens the performance to an extent because of potential information loss. By contrast, the full data as input can obtain higher overall accuracy on all HSI data, which shows the necessity of keeping full-dimension information. Therefore, we retain the setting of inputting the full-dimension HSI data.

Table 2. Ablation results with principal component analysis (PCA) (%).

Download CSV Display Table

Comparison of methods

show the per-class accuracy, overall accuracy, average accuracy, and Cohen’s kappa of different methods on the four HSIs. The experiment was repeated 10 times for each method. The mean and standard deviation for the 10 experiments are listed in the tables, and the $p$ -value of two-tailed Wilcoxon test are also reported. Combined with the kernel method, SVM (Melgani & Bruzzone, Citation2004) can address the overfitting issue caused by small sample sizes to a certain extent, and it has been widely used for HSI classification. Therefore, the SVM with spectral features as the direct input was chosen as the benchmark. Then, the classical EMP (Benediktsson et al., Citation2005) was selected as a representative method for manual feature extraction. Given the large number of patch-level deep learning methods, we selected three popular methods (i.e. 2D-CNN (Yue et al., Citation2016), 3D-CNN (Y. Chen et al., Citation2016) and the double-branch multi attention mechanism network (DBAM) (Ma et al., Citation2019)) for comparison. Additionally, deep few-shot learning (DFSL) (Liu, Yu, Yu, et al., Citation2019) and convolutional neural network with model-agnostic meta-learning algorithm (CNN-MAML) (Kuiliang Gao, Liu, et al., Citation2021) were also selected, which were both specifically designed for small-sample classification. The DFSL uses multiple previously collected HSIs to pretrain the model, and the trained network can be applied to process target HSIs as a feature extractor. The CNN-MAML optimizes the CNN model on plenties of different tasks to make it more general, so the model can better adapt to target tasks with only small samples. The last method was the latest image-level method FCN-Pyramid (D. Wang et al., Citation2022) with an attention mechanism, which also takes the whole hyperspectral image as model input and directly outputs the classification results of the whole scene. As shown in the table, the SVM, which used only spectral features, had the lowest classification accuracy for the four HSIs, whereas EMPs with spatial information achieved higher classification accuracy. In addition, the classification accuracy was improved by using deep learning methods based on local image blocks (e.g. 2D-CNN, 3D-CNN and DBAM), improving the accuracy. A significant increase was observed when using DBAM. A high classification accuracy was obtained with both DFSL and CNN-MAML on the University of Pavia, Salinas and Houston scenes, whereas their performance was relatively poor on the Indian Pines scene, which had lower resolution. By contrast, FCN-Pyramid only performs well on Salinas and Indian Pines scenes consisting mainly of planar objects, and it did not offer significant advantages over DBAM and DFSL owing to the small number of labelled samples. For all four HSIs, the highest overall accuracy was obtained by using the method proposed in this paper $(p < 0.005)$ . For example, on the University of Pavia scene, the average of the overall accuracy scores of TrmGLU-Net+Aug was more than 7.52% higher than that of SVM and 3.57% higher than that of FCN-Pyramid, which is also an image-level classification method. The advantage of TrmGLU-Net+Aug became more obvious when classifying the more challenging Indian Pines scene. Its accuracy was nearly 23.9% higher than that of SVM and 5.03% higher than that of FCN-Pyramid.

Table 3. Classification results with different methods on the University of Pavia dataset (%).

Download CSV Display Table

Table 4. Classification results with different methods on the Indian Pines dataset (%).

Download CSV Display Table

Table 5. Classification results with different methods on the Salinas dataset (%).

Download CSV Display Table

Table 6. Classification results with different methods on the Houston dataset (%).

Download CSV Display Table

show the maps obtained with different classification algorithms. It can be seen from the maps that isolated noise points often occur in the maps obtained with pixel-based or patch-level HSI classification methods. In contrast, classification maps with better visual results were obtained by using image-level classification methods (e.g. FCN-Pyramid and TrmGLU-Net+Aug), though their misclassified samples were often distributed in blocks. The results of the qualitative evaluation in are consistent with the results of quantitative evaluation in . In both cases, the proposed TrmGLU-Net+Aug obtained the best classification results, demonstrating its effectiveness.

Figure 7. Classification maps with different methods on the University of Pavia dataset. (a) Ground-truth map. (b) SVM. (c) EMPs. (d) 2D-CNN. (e) 3D-CNN. (f) DBAM. (g) DFSL+SVM. (h) CNN-MAML. (i) FCN-Pyramid. (j) TrmGLU-Net+Aug.

Figure 8. Classification maps with different methods on the Indian Pines dataset. (a) Ground-truth map. (b) SVM. (c) EMPs. (d) 2D-CNN. (e) 3D-CNN. (f) DBAM. (g) DFSL+SVM. (h) CNN-MAML. (i) FCN-Pyramid. (j) TrmGLU-Net+Aug.

Figure 9. Classification maps with different methods on the Salinas dataset. (a) Ground-truth map. (b) SVM. (c) EMPs. (d) 2D-CNN. (e) 3D-CNN. (f) DBAM. (g) DFSL+SVM. (h) CNN-MAML. (i) FCN-Pyramid. (j) TrmGLU-Net+Aug.

Figure 10. Classification maps with different methods on the Houston dataset. (a) Ground-truth map. (b) SVM. (c) EMPs. (d) 2D-CNN. (e) 3D-CNN. (f) DBAM. (g) DFSL+SVM. (h) CNN-MAML. (i) FCN-Pyramid. (j) TrmGLU-Net+Aug.

Conclusion

By utilizing the contextual information in a hyperspectral image, this paper obtained the classification results of the whole scene directly by inputting the whole image into the segmentation model. A non-local TrmGLU-Net is proposed for the small sample classification of hyperspectral images. This proposed network has a wider receptive field and can attend to both local and global features, which greatly improves the accuracy of HSI classification. To obtain high-accuracy classification with small sample sizes, a superpixel label expansion method is also proposed to increase the number of labelled samples to further improve classification accuracy. The classification performance of the proposed method on four popular HSI datasets is compared with that of other classification methods, including 3D-CNN, DBAM, DFSL, CNN-MAML and FCN-Pyramid. The results show that the proposed method achieves higher classification accuracy than the others.

However, the form of inputting the whole image will lead to tremendous hardware stress such as GPU memory. We will further propose corresponding strategies to address the issue in subsequent studies.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was funded by Natural Science Foundation of Henan Province [222300420387]

References

Agarwal, A., El-Ghazawi, T., El-Askary, H., & Le-Moigne, J. (2007, December 15-19). Efficient hierarchical-PCA dimension reduction for hyperspectral imagery. (Ed.),^(Eds.). 2007 IEEE International Symposium on Signal Processing and Information Technology. Giza, Egypt.
Google Scholar
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–17. https://doi.org/10.1109/TPAMI.2016.2644615
PubMed Web of Science ®Google Scholar
Benediktsson, J. A., Palmason, J. A., & Sveinsson, J. R. (2005). Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Transactions on Geoscience and Remote Sensing, 43(3), 480–491. https://doi.org/10.1109/TGRS.2004.842478
Web of Science ®Google Scholar
Bruzzone, L., Mingmin, C., & Marconcini, M. (2005, July 29). Transductive SVMs for semisupervised classification of hyperspectral data. Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS ’05. Seoul, Korea.
Google Scholar
Cai, Y., Zhang, Z., Liu, X., & Cai, Z. (2020). Efficient graph convolutional self-representation for band selection of hyperspectral image. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 4869–4880. https://doi.org/10.1109/JSTARS.2020.3018229
Web of Science ®Google Scholar
Chen, Y., Jiang, H., Li, C., Jia, X., & Ghamisi, P. (2016). Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 54(10), 6232–6251. https://doi.org/10.1109/TGRS.2016.2584107
Web of Science ®Google Scholar
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184
PubMed Web of Science ®Google Scholar
Chen, Y., Zhao, X., & Jia, X. (2015). Spectral–spatial classification of hyperspectral data based on deep belief network. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6), 2381–2392. https://doi.org/10.1109/JSTARS.2015.2388577
Web of Science ®Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. CoRr, Abs/2010.11929. https://arxiv.org/abs/2010.11929.
Google Scholar
Fang, B., Li, Y., Zhang, H., & Chan, J. C.-W. (2020). Collaborative learning of lightweight convolutional neural network and deep clustering for hyperspectral image semi-supervised classification with limited training samples. ISPRS Journal of Photogrammetry and Remote Sensing, 161, 164–178. https://doi.org/10.1016/j.isprsjprs.2020.01.015
Web of Science ®Google Scholar
Gao, K., Liu, B., Yu, X., Qin, J., Zhang, P., & Tan, X. (2020). Deep relation network for hyperspectral image few-shot classification. Remote Sensing, 12(6), 923. https://doi.org/10.3390/rs12060923
Web of Science ®Google Scholar
Gao, K., Liu, B., Yu, X., & Yu, A. (2022). Unsupervised meta learning with multiview constraints for hyperspectral image small sample set classification. IEEE Transactions on Image Processing, 31, 3449–3462. https://doi.org/10.1109/TIP.2022.3169689
PubMed Web of Science ®Google Scholar
Gao, K., Liu, B., Yu, X., Zhang, P., Tan, X., & Sun, Y. (2021). Small sample classification of hyperspectral image using model-agnostic meta-learning algorithm and convolutional neural network. International Journal of Remote Sensing, 42(8), 3090–3122. https://doi.org/10.1080/01431161.2020.1864060
Web of Science ®Google Scholar
Gao, K., Yu, X., Tan, X., Liu, B., & Sun, Y. (2021). Small sample classification for hyperspectral imagery using temporal convolution and attention mechanism. Remote Sensing Letters, 12(5), 510–519. https://doi.org/10.1080/2150704X.2021.1903611
Web of Science ®Google Scholar
Haut, J. M., Paoletti, M. E., Plaza, J., Plaza, A., & Li, J. (2019). Hyperspectral image classification using random occlusion data augmentation. IEEE Geoscience and Remote Sensing Letters, 16(11), 1751–1755. https://doi.org/10.1109/LGRS.2019.2909495
Web of Science ®Google Scholar
He, X., Chen, Y., & Lin, Z. (2021). Spatial-Spectral transformer for hyperspectral image classification. Remote Sensing, 13(3), 498. https://doi.org/10.3390/rs13030498
Web of Science ®Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016, June 27-30). Deep residual learning for image recognition. (Ed.),^(Eds.). 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA.
Google Scholar
He, J., Zhao, L., Yang, H., Zhang, M., & Li, W. (2020). HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Transactions on Geoscience and Remote Sensing, 58(1), 165–178. https://doi.org/10.1109/TGRS.2019.2934760
Web of Science ®Google Scholar
Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., & Chanussot, J. (2021). Graph convolutional networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 59(7), 5966–5978. https://doi.org/10.1109/TGRS.2020.3015157
Web of Science ®Google Scholar
Hong, D., Han, Z., Yao, J., Gao, L., Zhang, B., Plaza, A., & Chanussot, J. (2022). SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–15. https://doi.org/10.1109/TGRS.2022.3172371
Web of Science ®Google Scholar
Hou, S., Shi, H., Cao, X., Zhang, X., & Jiao, L. (2022). Hyperspectral imagery classification based on contrastive learning. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–13. https://doi.org/10.1109/TGRS.2022.3215431
Web of Science ®Google Scholar
Huang, G., Liu, Z., Maaten, L. V. D., & Weinberger, K. Q. (2017, July 21-26). Densely connected convolutional networks. (Ed.),^(Eds.). 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA.
Google Scholar
Hu, W., Huang, Y., Wei, L., Zhang, F., & Li, H. (2015). Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors, 2015, 1–12. https://doi.org/10.1155/2015/258619
Web of Science ®Google Scholar
Jiang, G., Sun, Y., & Liu, B. (2021). A fully convolutional network with channel and spatial attention for hyperspectral image classification. Remote Sensing Letters, 12(12), 1238–1249. https://doi.org/10.1080/2150704X.2021.1978582
Web of Science ®Google Scholar
Jiao, L., Liang, M., Chen, H., Yang, S., Liu, H., & Cao, X. (2017). Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(10), 5585–5599. https://doi.org/10.1109/TGRS.2017.2710079
Web of Science ®Google Scholar
Jia, S., Wu, K., Zhu, J., & Jia, X. (2019). Spectral–spatial gabor surface feature fusion approach for hyperspectral imagery classification. IEEE Transactions on Geoscience and Remote Sensing, 57(2), 1142–1154. https://doi.org/10.1109/TGRS.2018.2864983
Web of Science ®Google Scholar
Lee, H., Eum, S., & Kwon, H. (2022). Exploring cross-domain pretrained model for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–12. https://doi.org/10.1109/TGRS.2022.3165441
Google Scholar
Li, W., Chen, C., Su, H., & Du, Q. (2015). Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification. IEEE Transactions on Geoscience and Remote Sensing, 53(7), 3681–3693. https://doi.org/10.1109/TGRS.2014.2381602
Web of Science ®Google Scholar
Li, C., Chu, H., Kuo, B., & Lin, C. (2011, July 24-29). Hyperspectral image classification using spectral and spatial information based linear discriminant analysis. (Ed.),^(Eds.). 2011 IEEE International Geoscience and Remote Sensing Symposium. Vancouver, BC, Canada.
Google Scholar
Li, S., Song, W., Fang, L., Chen, Y., Ghamisi, P., & Benediktsson, J. A. (2019). Deep learning for hyperspectral image classification: An overview. IEEE Transactions on Geoscience and Remote Sensing, 57(9), 6690–6709. https://doi.org/10.1109/TGRS.2019.2907932
Web of Science ®Google Scholar
Liu, B., & Yu, X. (2021). Patch-free bilateral network for hyperspectral image classification using limited samples. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 10794–10807. https://doi.org/10.1109/JSTARS.2021.3121334
Google Scholar
Liu, B., Yu, A., Yu, X., Wang, R., Gao, K., & Guo, W. (2021). Deep multiview learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 59(9), 7758–7772. https://doi.org/10.1109/TGRS.2020.3034133
Web of Science ®Google Scholar
Liu, B., Yu, X., Yu, A., Zhang, P., Wan, G., & Wang, R. (2019). Deep few-shot learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 57(4), 2290–2304. https://doi.org/10.1109/TGRS.2018.2872830
Web of Science ®Google Scholar
Liu, B., Yu, X., Zhang, P., & Tan, X. (2021). Deep 3D convolutional network combined with spatial-spectral features for hyperspectral image classification. IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada. 48, 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
Google Scholar
Liu, B., Yu, X., Zhang, P., Yu, A., Fu, Q., & Wei, X. (2018). Supervised deep feature extraction for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 56(4), 1909–1921. https://doi.org/10.1109/TGRS.2017.2769673
Web of Science ®Google Scholar
Luo, F., Zhang, L., Du, B., & Zhang, L. (2020). Dimensionality reduction with enhanced hybrid-graph discriminant learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 58(8), 5336–5353. https://doi.org/10.1109/TGRS.2020.2963848
Web of Science ®Google Scholar
Ma, W., Yang, Q., Wu, Y., Zhao, W., & Zhang, X. (2019). Double-branch multi-attention mechanism network for hyperspectral image classification. Remote Sensing, 11(11), 1307. https://doi.org/10.3390/rs11111307
Web of Science ®Google Scholar
Mei, S., Ji, J., Geng, Y., Zhang, Z., Li, X., & Du, Q. (2019). Unsupervised spatial–spectral feature learning by 3d convolutional autoencoder for hyperspectral classification. IEEE Transactions on Geoscience and Remote Sensing, 57(9), 6808–6820. https://doi.org/10.1109/TGRS.2019.2908756
Web of Science ®Google Scholar
Melgani, F., & Bruzzone, L. (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778–1790. https://doi.org/10.1109/TGRS.2004.831865
Web of Science ®Google Scholar
Mou, L., Ghamisi, P., & Zhu, X. X. (2017). Deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(7), 3639–3655. https://doi.org/10.1109/TGRS.2016.2636241
Web of Science ®Google Scholar
Nalepa, J., Myller, M., & Kawulok, M. (2020a). Training- and test-time data augmentation for hyperspectral image segmentation. IEEE Geoscience and Remote Sensing Letters, 17(2), 292–296. https://doi.org/10.1109/LGRS.2019.2921011
Web of Science ®Google Scholar
Nalepa, J., Myller, M., & Kawulok, M. (2020b). Transfer learning for segmenting dimensionally reduced hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 17(7), 1228–1232. https://doi.org/10.1109/LGRS.2019.2942832
Web of Science ®Google Scholar
Pan, B., Xu, X., Shi, Z., Zhang, N., Luo, H., & Lan, X. (2020). Dssnet: A simple dilated semantic segmentation network for hyperspectral imagery classification. IEEE Geoscience and Remote Sensing Letters, 17(11), 1968–1972. https://doi.org/10.1109/LGRS.2019.2960528
Web of Science ®Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
PubMed Web of Science ®Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention - {MICCAI} 2015 - 18th. International Conference Munich, Germany. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28https://www.wikidata.org/entity/Q104451999.
Google Scholar
Shelhamer, E., Long, J., & Darrell, T. (2017). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. https://doi.org/10.1109/TPAMI.2016.2572683
PubMed Web of Science ®Google Scholar
Sun Y, Liu B, Wang R, Zhang P and Dai M. (2023). Spectral–Spatial MLP-Like Network With Reciprocal Points Learning for Open-Set Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sensing. 61, 1–18. https://doi.org/10.1109/TGRS.2023.3280183
Google Scholar
Sun, Y., Liu, B., Yu, X., Yu, A., Gao, K., & Ding, L. (2022). From video to hyperspectral: Hyperspectral image-level feature extraction with transfer learning. Remote Sensing, 14(20), 5118. https://doi.org/10.3390/rs14205118
Web of Science ®Google Scholar
Sun, Y., Liu, B., Yu, X., Yu, A., Gao, K., Ding, L., Zhang, B., & Plaza, A. (2022). Perceiving spectral variation: Unsupervised spectrum motion feature learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–17. https://doi.org/10.1109/TGRS.2022.3221534
Google Scholar
Sun, Y., Liu, B., Yu, X., Yu, A., Xue, Z., & Gao, K. (2022). Resolution reconstruction classification: Fully octave convolution network with pyramid attention mechanism for hyperspectral image classification. International Journal of Remote Sensing, 43(6), 2076–2105. https://doi.org/10.1080/01431161.2022.2054299
Web of Science ®Google Scholar
Villa, A., Benediktsson, J. A., Chanussot, J., & Jutten, C. (2010, June 14-16). Independent component discriminant analysis for hyperspectral image classification. (Ed.),^(Eds.). 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing. Reykjavik, Iceland.
Google Scholar
Wang, D., Du, B., & Zhang, L. (2022). Fully contextual network for hyperspectral scene parsing. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–16. https://doi.org/10.1109/TGRS.2021.3050491
Web of Science ®Google Scholar
Wang, Z., Du, B., Zhang, L., Zhang, L., & Jia, X. (2017). A novel semisupervised active-learning algorithm for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(6), 3071–3083. https://doi.org/10.1109/TGRS.2017.2650938
Web of Science ®Google Scholar
Wang, Q., Zhang, F., & Li, X. (2020). Hyperspectral band selection via optimal neighborhood reconstruction. IEEE Transactions on Geoscience and Remote Sensing, 58(12), 8465–8476. https://doi.org/10.1109/TGRS.2020.2987955
Web of Science ®Google Scholar
Wan, L., Tang, K., Li, M., Zhong, Y., & Qin, A. K. (2015). Collaborative active and semisupervised learning for hyperspectral remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 53(5), 2384–2396. https://doi.org/10.1109/TGRS.2014.2359933
Web of Science ®Google Scholar
Xing, C., Ma, L., & Yang, X. (2016). Stacked denoise autoencoder based feature extraction and classification for hyperspectral images. Journal of Sensors, 2016, 1–10. https://doi.org/10.1155/2016/3632943
Web of Science ®Google Scholar
Xu, Y., Du, B., & Zhang, L. (2020). Beyond the patchwise classification: Spectral-spatial fully convolutional networks for hyperspectral image classification. LEEE Transactions on Big Data, 6(3), 492–506. https://doi.org/10.1109/TBDATA.2019.2923243
Web of Science ®Google Scholar
Xue, Z., Yu, X., Liu, B., Tan, X., & Wei, X. (2021). HResNetAM: Hierarchical residual network with attention mechanism for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 3566–3580. https://doi.org/10.1109/JSTARS.2021.3065987
Web of Science ®Google Scholar
Xue, Z., Yu, X., Yu, A., Liu, B., Zhang, P., & Wu, S. (2022). Self-supervised feature learning for multimodal remote sensing image land cover classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–15. https://doi.org/10.1109/TGRS.2022.3190466
Web of Science ®Google Scholar
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2019, June 15-20). Recurrent MVSNet for high-resolution multi-view stereo depth inference. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA.
Google Scholar
Yue, J., Mao, S., & Li, M. (2016). A deep learning framework for hyperspectral image classification using spatial pyramid pooling. Remote Sensing Letters, 7(9), 875–884. https://doi.org/10.1080/2150704X.2016.1193793
Web of Science ®Google Scholar
Zeng, M., Cai, Y., Cai, Z., Liu, X., Hu, P., & Ku, J. (2019). Unsupervised hyperspectral image band selection based on deep subspace clustering. IEEE Geoscience and Remote Sensing Letters, 16(12), 1889–1893. https://doi.org/10.1109/LGRS.2019.2912170
Web of Science ®Google Scholar
Zhang, S., Xu, M., Zhou, J., & Jia, S. (2022). Unsupervised spatial-spectral CNN-Based feature learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–17. https://doi.org/10.1109/TGRS.2022.3153673
Google Scholar
Zhang, L., Zhang, Q., Du, B., Huang, X., Tang, Y. Y., & Tao, D. (2018). Simultaneous spectral-spatial feature selection and extraction for hyperspectral images. IEEE Transactions on Cybernetics, 48(1), 16–28. https://doi.org/10.1109/TCYB.2016.2605044
PubMed Web of Science ®Google Scholar
Zhang, J., Zhang, P., Li, B., Jing, L., & Lv, T. (2020). Semisupervised feature extraction based on collaborative label propagation for hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 17(11), 1958–1962. https://doi.org/10.1109/LGRS.2019.2958410
Web of Science ®Google Scholar
Zheng, Z., Zhong, Y., Ma, A., & Zhang, L. (2020). FPGA: Fast patch-free global learning framework for fully end-to-end hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 58(8), 5612–5626. https://doi.org/10.1109/TGRS.2020.2967821
Web of Science ®Google Scholar
Zhong, Z., Li, Y., Ma, L., Li, J., Zheng, W. S., Jiang, T., & Wu, S. (2022). Spectral–spatial transformer network for hyperspectral image classification: A factorized architecture search framework. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–15. https://doi.org/10.1109/TGRS.2021.3115699
Web of Science ®Google Scholar
Zhong, Y., Wang, X., Xu, Y., Wang, S., Jia, T., Hu, X., Zhao, J., Wei, L., & Zhang, L. (2018). Mini-UAV-Borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geoscience and Remote Sensing Magazine, 6(4), 46–62. https://doi.org/10.1109/MGRS.2018.2867592
Web of Science ®Google Scholar
Zhu, L., Chen, Y., Ghamisi, P., & Benediktsson, J. A. (2018). Generative adversarial networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 56(9), 5046–5063. https://doi.org/10.1109/TGRS.2018.2805286
Web of Science ®Google Scholar
Zhu, M., Fan, J., Yang, Q., & Chen, T. (2022). SC-EADNet: A self-supervised contrastive efficient asymmetric dilated network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–17. https://doi.org/10.1109/TGRS.2022.3230829
Google Scholar

TrmGLU-Net: transformer-augmented global-local U-Net for hyperspectral image classification with limited training samples

ABSTRACT

Introduction