Full article: SSC-SFN: spectral-spatial non-local segment federated network for hyperspectral image classification with limited labeled samples

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Hyperspectral image (HSI) classification methods based on deep learning (DL) have performed well in numerous investigations. Although many modified superpixel-wise neural networks are utilized to enhance spatial information, their ability to mine spectral information in graph structures is insufficient. Moreover, single classifier approaches are unable to extract adequate spatial and spectral information simultaneously. For the classification of large-scale research areas, many works have relied on the use of a large number of labeled samples, leading to low efficiency and weak generalization. To address these issues, an effective spectral-spatial HSI classification approach based on spectral-spatial non-local segment federated network (SSC-SFN) was developed in this study. In this framework, deconvolution is employed to recover the data size, while the lost spatial information is replaced by up-pooling. The spectral dimensional features are updated through the generation of non-Euclidean graph structures and the non-local segment smoothing technique. The convolutional neural network and graph convolutional network techniques are coupled to exploit the available spectral and spatial structure information fully. Extensive experimental results obtained using four public benchmark datasets show that the classification accuracy of SSC-SFN can exceed 90% for large-scale HSIs with limited samples.

KEYWORDS:

1. Introduction

Hyperspectral images (HSIs) with dozens or even hundreds of bands contain abundant spectral and spatial information, which facilitates extensive, in-depth studies of the Earth's surface. Hence, land use and land cover classification (Anderson Citation1976), environmental monitoring (Galieni et al. Citation2022), Earth observation management (Park and Song Citation2020), target detection (Li et al. Citation2020), and urban area planning (Hänsch and Hellwich Citation2021) all involve HSI classification. Problems including salt-and-pepper noise, spatial heterogeneity, and the complexity of ground surfaces make it challenging to perform accurate HSI classification with a small number of labeled samples (Abdelmoneim, Soliman, and Moghazy Citation2020; Irteza et al. Citation2021; Mukherjee and Singh Citation2020). A convolutional neural network (CNN) can extract rich spectral features through subsampling, while a graph convolutional network (GCN) can extract discriminative and nonlinear spectral-spatial features. However, HSI classification methods based on deep learning (DL) are limited by the fixed-format convolution kernel, which ignores many geometric details of high-dimensional data. Specifically, the sliding window of operating rules in the CNN is unable to adaptively extract important features. The primary drawback of GCN is that spectral information is not given enough consideration and only spatial context information is mined through the graph structure (Chen, Li, and Dai Citation2022; Yang et al. Citation2022; Zhang et al. Citation2017). Meanwhile, spatial information is not properly utilized after data segmentation, and the generalization ability of this method is poor for large-scale data (Amirabbas et al. Citation2018; Cao and Wang Citation2017). Therefore, the development of a more efficient and general method for classifying large-scale complex background areas with limited samples is essential. A method for classifying large-scale complex data based on limited samples is also urgently needed.

Many DL-based image processing algorithms have been successfully extended to HSI classification in the field of remote sensing (Gong et al. Citation2021; Hong et al. Citation2020; Hong et al. Citation2022; Li and Ning Citation2020; Liu et al. Citation2018a; Masolele et al. Citation2022). CNN and GCN are two effective and popular classification algorithms, but CNN only considers the spectral information of HSI, while ignoring the equally important spatial structure information (Ahishali et al. Citation2021; Dong et al. Citation2022; Jiang et al. Citation2022; Ke et al. Citation2018; Ruan et al. Citation2022). In contrast, although GCN skillfully represents the spatial topological relationship between samples, the rich spectral information is not adequately mined (Li, Chen, and Cheng Citation2022; Lu et al. Citation2022; Ye et al. Citation2021; Zhang et al. Citation2022). Therefore, Li, Chen, and Cheng (Citation2022) designed global attention blocks to optimize spectral and spatial features for enhanced learning. Wang et al. (Citation2021) applied a GCN to extract discriminative and nonlinear spectral-spatial features from HSI, to replace linear mapping features, and gradually enhanced the feature representation capabilities. However, the adjacency matrix generates enormous computational expenses when GCN classifies a large-scale high-dimensional HSI, causing the gradient to explode or vanish (He et al. Citation2022b; Shahraki and Prasad Citation2018; Wang et al. Citation2021). Second, a constructed graph structure usually contains a variety of noise types associated with the complex background, which means that the edges (relationships) between nodes are sometimes not credible, and that the spatial neighborhood information obtained may be incorrect (He et al. Citation2022a; Qin et al. Citation2019; Sun et al. Citation2022). Finally, in DL-based HSI classification methods, massive numbers of labeled samples are often utilized for model training to achieve satisfactory results, but the collection of numerous labeled samples is expensive and time-consuming.

Inputting superpixels into the classifier rather than pixels greatly lessens the need for labeled samples (Gao et al. Citation2022; Liu et al. Citation2018b; Xu et al. Citation2019). As a consequence, a large number of HSI classification methods have been developed based on superpixel segmentation techniques to enhance classification efficiency, and have achieved competitive results (Achanta et al. Citation2012; Alkhatib and Velez-Reyes Citation2019; Liu et al. Citation2011; Zhang et al. Citation2019). In (Zhang, Su, and Shen Citation2019), combining multi-scale segmentation with classical classifiers made it possible to acquire separable, compact, anti-noise superpixels. Moreover, superpixel-level methods can exploit both the spatial features and spectral information of HSIs to achieve better results than pixel-level methods against complex background surfaces (Farooq et al. Citation2019; Xie et al. Citation2020; Zhao, Su, and Yan Citation2020). Many studies have shown that the rational use of superpixel techniques accelerates the convergence of deep networks and improves the generalizability of models (Jia et al. Citation2021; Liu et al. Citation2017; Mei et al. Citation2019a).

As described in previous work, the superpixel pooling convolutional neural network (SP-CNN) represents a breakthrough in the finite sample problem, as it combines downsampling and upsampling to obtain and recover spatial information. However, due to the instability of the model itself, SP-CNN is not suitable for regions with large-scale and complex backgrounds. Moreover, the hyperparameters of these networks are overdependent on transfer learning. In summary, this method addresses the issue of obtaining high-accuracy conclusions with limited samples despite the poor robustness of the deep network framework when learning from a small number of spatial-spectral features. Non-local superpixels are exploited in the pre-processing stage of the GCN. Initially, target pixels are updated using all pixel information in each superpixel. A second fractional update for each superpixel is performed after obtaining relatively uniform patches. In addition, based on a previous theoretical comparison of the CNN and GCN, the proposed network framework (Xie et al. Citation2021) is adjusted to combine the advantages of both network types. SP-CNN is used to complement missing spatial structure information. In the graph structure of the non-local superpixel GCN, the spectral node features in the non-Euclidean domain are fully extracted. Superpixel-level classification significantly reduces the requirement for a large number of sample numbers. Therefore, the proposed method can accurately classify large-scale HSIs with complex backgrounds.

An advanced end-to-end dual-path deep network is proposed to learn small-scale features of both neatly arranged regions and large-scale irregular regions simultaneously.
A novel segment denoising strategy is proposed for global smoothing inside each superpixel to obtain uniform patches, namely the non-local segment method.
To improve the generalization ability for large-scale data, three federated classification strategies are proposed. The strategies of integrating spectral information and spatial context features are selected for different data.

2. Methodology

The deep-learning-based HSI classification framework proposed in this study consists of a superpixel pooling CNN, a GCN with superpixels as vertices, segment anything model, spectral attention models, spectral information fusion, and a fully connected network, as shown in . Spectral information from the HSI, extracted through the federated network composed of a CNN and GCN, was input into the classifier by using different spectral fusion techniques, such as superpixel superposition, superpixel multiplication, and superpixel splicing. We used superpixels instead of pixels as vertices to improve the GCN. Furthermore, using superpixels as the final output of the federated network reduced the number of samples that needed to be classified, thereby greatly reducing the classification time.

Figure 1. Schematic of a double convolution federated network.

2.1. Superpixel pooling convolutional neural network

Data collected through remote sensing usually contains large amounts of unavoidable noise and shadow. Visual interpretation and traditional preprocessing methods cannot easily eliminate such interference, which reduces the purity of the data. Interestingly, the pivotal spectral features of the target are extracted through the convolution and pooling techniques in the CNN framework. The downsampling technique is useful for classification tasks. The outstanding performance of CNN in terms of HSI classification can be attributed to its multiple convolution and pooling steps. However, these processes do not solve the issue of complex background, as an HSI generally contains many mixed pixels. In addition, spatial structure information for the HSI is gradually lost when the number of network layers is increased excessively. In our experience, simple application of spectral information cannot support effective learning of complex samples, and the classification results obtained from the network are not ideal. To solve this issue, deconvolution and up-pooling are used to supplement the lost spatial information, followed by a search to obtain the missing details.

The output matrix of the convolution layer is transposed during downsampling, thereby reversing the recorded data. According to characteristic mapping and the size of the convolution kernel, deconvolution results are easily obtained. We expand the input and output matrices into column vectors X and Y, respectively. For a given convolution kernel K, the sparse matrix C can be derived, such that. (1) $Y = CX$ (1) Through inversion of Equationequation (1(1) $Y = CX$ (1) ), the following deconvolution operation can be obtained: (2) $X = C^{T} Y$ (2) The size of the input matrix X is recovered by the deconvolution operation, but spectral information is lost. Traditional resampling techniques and linear interpolation methods can over-smooth the boundary information of the class. The proposed up-pooling procedure provides a solution to this problem. Unpooling is the reverse of the max pooling operation. Unpooling places 0 values at all positions in the unpooling layers except for the maximum feature. Thus, spatial structure information of the maximum feature is well preserved.

After the convolution and pooling operations, the CNN inputs the obtained one-dimensional vector into a fully connected layer to complete the activation of feature information learning for classification of each neural unit. However, the fully connected layer is classified at the pixel level. Therefore, a large number of labeled samples are needed to supply classifiers for training, but the acquisition of labeled samples is generally time-consuming and expensive. In addition, after continuous downsampling, spatial shape information for the target category is gradually lost. Moreover, the mixed geographic ontology undoubtedly reduces classification accuracy. To address these issues, we employ a superpixel pooling layer before the classifier. For this process, principal component analysis (PCA) and Convolution kernel (1 × 1) are used to reduce the dimensionality of the HSI. After parametric experiments, we reduced the dimension of Salinas dataset, WHU-Hi-HanChuan dataset, WHU-Hi-HongHu dataset, and WHU-Hi-LongKou dataset to 30, 50, 50, and 50 respectively, and fine-tune the data dimensions by 1 × 1 convolution kernel to anastomose the data output from GCN stream. Afterward, the HSI is segmented using entropy rate superpixel (ERS) and segment anything model (SAM), which were selected because of their fast implementation and effective recognition of edge information. According to the first law of geography, pixels within the same superpixel have a high probability of belonging to the same category; that is, homogeneity of the segmented superpixel is explainable. Therefore, it is reasonable to input each superpixel block as a single unit into the classifier. This process greatly reduces the number of samples in the input classifier. Moreover, the adaptation and texture features of the superpixel help highlight the spatial structure information of the HSI. The output feature map $Z \in R^{W \times H \times C}$ is the input superpixel pooling layer, where W represents the width of the HSI, H is the height of the HSI and C is the number of channels. An HSI with a size of W × H is divided into M superpixels, and each superpixel $S_{i}$ contains $n_{i}$ pixels. The output of the superpixel pooling layer is a matrix. The pooling function is defined as: (3) $f : S_{i} \to a_{i} \in R^{C}, i = 1, 2, \dots M$ (3) where $a_{i, j}$ is the arithmetic mean of the mean and median values of the j-th channel of $n_{i}$ pixels in superpixel $S_{i}$ (Gao et al. Citation2022). In EquationEq. (3(3) $f : S_{i} \to a_{i} \in R^{C}, i = 1, 2, \dots M$ (3) ), the median vector is used to reduce the effect of singular values on this function. Introducing the superpixel pooling layer into the CNN can effectively fuze key spectral features and spatial structure information obtained in the upsampling phase. This process also retains important spatial information from the original HSI, which is of great significance for identifying complex background information. This layer reduces the requirement for samples, essentially increasing the proportion of samples used. In other words, the adoption of superpixel pooling weakens the dependence of our proposed method on massive training sample sets, allowing this method to perform optimally with limited samples.

lists the parameters setting in the CNN path, including the size and the number of convolutional kernels, stride, and output size of the feature map.

Table 1. Parameters setting in convolutional neural network path of SSC-SFN.

Download CSV Display Table

2.2. Spectral attention mechanism

The attention mechanism can focus on key information while ignoring minor information. When classifying images, the attention mechanism is fully compatible with the convolution process, and the highest-priority features are extracted in the form of an encoder-decoder. The attention mechanism can effectively select information that has already been obtained, thereby simplifying the complex mixture of background features. In this study, the SP-CNN is modified through the addition of a spectral attention module to enhance the interdependency between the feature map and spatial background information. Through the information extraction process of the attention module, the convolutional kernel can update the weight. Therefore, spectral-spatial features in feature maps are emphasized, which improves the accuracy of the classification framework. The spectral attention module maintains the dimensionality of the data, and the size of the output remains the same as the input vector. Given the input tensor $X \in R^{W \times H \times C}$ , where W, H, and C represent the width, height, and channel of the HSI, respectively, the global context embedding vector Q is denoted as $Q \in R^{1 \times 1 \times C}$ , where. (4) $Q (k, :, :) = \frac{1}{H \cdot W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X (k, i, j)$ (4) and k is the channel index of the HSI. The attention mechanism is used to establish the nonlinear relationship between network layers and improve the correlation for feature mapping. Then, the coefficient matrix T is calculated as follows: (5) $T = ReLU (W_{2} ReLU (W_{1} Q))$ (5) where $W_{1} \in R^{\frac{C}{ω} \times C}$ and $W_{2} \in R^{C \times \frac{C}{ω}}$ . The reduction ratio $ω$ is introduced to balance the ability of the spectral attention model and computational expense of the framework. Experiments demonstrated that $ω$ = 16 works well in practice. Finally, the output $Y \in R^{W \times H \times C}$ of the spectral attention model can be recorded as: (6) $Y (k, i, j) = T (k, :, :) \cdot X (k, i, j)$ (6)

2.3. The GCN

The GCN extracts high-level features from constructed undirected graphs based on the correlations between vertices. Unlike a traditional CNN that only performs well with regular grid data, a GCN can handle data with an arbitrary non-Euclidean structure. However, in a traditional GCN (Kipf and Welling Citation2017), using pixels as vertices often leads to a large graph. This not only increases the computational burden, but also tends to yield unsatisfactory classification results that do not consider the spatial information used for constructing the graph. To address this issue, in this study, we used superpixels rather than pixels as vertices in the constructed graph. Both the spatial structure information provided by the superpixels and the spectral information of the pixels contained therein were considered during graph construction.

For a given high-dimensional HSI, we used PCA (Wold, Esbensen, and Geladi Citation1987) to reduce its dimensionality, and the first principal component to generate a base image. This setup has been used for many experiments (Li, Wu, and Zhang Citation2017; Makantasis et al. Citation2015; Mei et al. Citation2019b; Zhong et al. Citation2018b). Subsequently, the base image was segmented into superpixels using the ERS and SAM. Considering each superpixel as a vertex, we linked two vertices using a radial basis function: (7) $a_{i, j} = {\begin{matrix} e x p (- \frac{{({\bar{x}}_{i} - {\bar{x}}_{j})}^{2}}{σ^{2}}), i f S_{i} \in N (S_{j}) o r S_{j} \in N (S_{i}) \\ 0, o t h e r w i s e \end{matrix}$ (7) where ${\bar{x}}_{i}$ is the average of $n_{i}$ pixels in superpixels $S_{i}$ , $N (S_{i})$ denotes the set of spatial neighbors of $S_{i}$ . The kernel parameter σ is set to 0.2 in our experiments.

Let G = (V, E) be the constructed undirected graph, where V and E are the sets of vertices (superpixels) and edges, respectively. Assuming that A is the adjacency matrix of G, and D is the degree matrix of A, the Laplacian matrix L can be expressed as: (8) $L = D - A$ (8) To make the matrix symmetric and improve the generalization of the graph structure, a symmetric normalized Laplacian matrix (L_sym) was created as follows: (9) $L_{s y m} = D^{- \frac{1}{2}} L D^{- \frac{1}{2}} = I - D^{- \frac{1}{2}} A D^{- \frac{1}{2}}$ (9) where I is the identity matrix with the correct size. On this basis, for the graph convolution to act on the spectral domain, it needs to be Fourier-transformed. Then we have: (10) $L = U Λ U^{- 1}$ (10) where U is the set of eigenvectors of L, which is the basis of the Fourier transform, and Λ represents the diagonal matrix composed of the eigenvalues of L. To realize spectral filtering on the graph, a signal processing transformation was conducted to extract the spectral features. This was done to map information to the spectral domain for processing. The convolution process of the two given functions, f and g, can be written as: (11) $F [f (t) * g (t)] = F [f (t)] \cdot F [g (t)]$ (11) where $*$ and · denote the convolution operator and node-wise multiplication, respectively. Because U is a symmetric matrix, i.e. UU^T = E, we can obtain: (12) $L = U Λ U^{- 1} = U Λ U^{T}$ (12) According to EquationEq. (12(12) $L = U Λ U^{- 1} = U Λ U^{T}$ (12) ), the Fourier transform F of the function f on the graph G can be written as: (13) $GF [f] = U^{T} f$ (13) (14) $f = GF [f] U$ (14) Therefore, the convolution between f and g on a graph can be expressed as: (15) $G [f * g] = U {[U^{T} f] \cdot [U^{T} g]}$ (15) To reduce the computational cost of feature decomposition, Hammond et al. (Hammond, Vandergheynst, and Gribonval Citation2011) approximately fitted g_θ(Λ) by introducing the K-th order truncated expansion of Chebyshev polynomials. This can be formulated as: (16) $G [f * g_{θ}] \approx \sum_{k = 0}^{K} θ_{k} T_{k} (\tilde{L}) f$ (16) where θ_k and T_k are the Chebyshev coefficients and variable Λ of the Chebyshev polynomials, respectively. The normalized L can be expressed as $\tilde{L} = \frac{2}{λ_{\max}} L_{sym} - I$ . When K = 1 and the largest eigenvalue is 2, the convolution operation of GCN can be expressed as: (17) $H^{(l + 1)} = ρ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)} + b^{(l)})$ (17) where H^(l + 1) denotes the output of the l-th layer and ρ (·) is the activation function (e.g. a rectified linear activation function (ReLU) or Softplus function, which are applied in our proposed method). $\tilde{A} = A + I$ and ${\tilde{D}}_{i, i} = \sum_{j} {\tilde{A}}_{i, j}$ are represent the renormalization of the adjacency matrix A and degree matrix D, respectively, while W and b are the weights and biases obtained by training each model layer.

2.4. Non-local superpixel smoothing

The commonly used GCN-based HSI classification method feeds pixel-level samples into the network, establishes a relationship between the adjacency matrix and samples, and converts the non-Euclidean features extracted by the network layer into a one-dimensional vector. The fully connected neural network was used to classify the resulting vectors at the pixel level. Compared with CNNs, GCNs can effectively share information globally, and are favorable for acquiring and describing long-distance spatial relationships. However, the quality of the collected information is greatly degraded by the noisy data in the HSI. Moreover, the computational cost of classification is not only related to the sample size but also depends on the network depth and input dimensions. The GCN needed to input a large feature matrix and adjacency matrix through a complex graph structure. Therefore, images containing a large amount of data will undoubtedly increase the computational complexity of the algorithm. In addition, because the GCN only allows input of the full batch of feature matrices, all pixel-level samples must be trained simultaneously. If the input image representation is inaccurate, the final classification performance will be degraded. To lessen interference from noise, reduce the computational cost, improve classification performance, and fully learn from the correlations of nearest neighbor elements, we designed a non-local superpixel smoothing strategy, as shown in .

Figure 2. Illustration of the non-local smoothing process used in our model.

We segmented the HSI using the SAM combined with a GCN, because this can quickly retain edge pixels. Compared with the simple linear iterative clustering (SLIC) algorithm, the SAM algorithm is more suitable for processing image attributes, and the corresponding image size is more recognizable; this improves the quality of graph convolution. To obtain accurate category edges and refine the segmentation, we first performed a PCA to reduce the dimensionality of high-dimensional data. The most basic preprocessed data were used for the first principal component, which contained the most information. Superpixel technology is based on the first law of geography, i.e. neighboring pixels have a higher probability of belonging to the same category, which is known as the principle of homogeneity of superpixels. Therefore, it is rational to consider the pixels inside a superpixel as a whole during the classification procedure.

Suppose that x is the input feature value, i is the positional information of the input center node, and j is the position of all the other nodes in the same superpixel block. The non-local function is then defined as: (18) $y_{i} = \frac{1}{N (x)} \sum_{\forall j} f (x_{i} x_{j}) h (x_{j})$ (18) where f is a function for calculating the relationship between x_i and x_j, h is a unary function representing the input feature at position j, y is the output eigenvalue of the same size as x, and N(·) is a normalization function. The non-local operation used all of the pixel information in Equationequation (18(18) $y_{i} = \frac{1}{N (x)} \sum_{\forall j} f (x_{i} x_{j}) h (x_{j})$ (18) ), with the corresponding weights assigned according to the neighborhood distances, followed by cyclical calculations.

Unlike the superpixel smoothing methods that are currently used, the non-local operation considers all the pixels inside the superpixel. Specifically, the traditional superpixel smoothing technique selects the nearest neighbor or several neighborhood layers for the iterative update process, i.e. multi-scale superpixel smoothing, which undoubtedly adds a large number of parameters and hinders the identification of the optimal solution. However, the proposed non-local superpixel method treats all nodes within a given superpixel as relational nodes of the target pixel, thereby reducing the number of parameters and minimizing the process of parameter training. In addition, a GCN needs a ‘global node relationship matrix’ that can be retained during non-local operations, thus enabling the classification network to better learn weights. It should be noted that, with this method, the edge weights of the graph are automatically learned.

2.5. Federated classification strategy

Several approaches have been developed to resolve issues caused by multi-source data interaction. However, matching the data to achieve interoperability is worthy of further consideration. When dealing with diverse structural information, multiple models need to be combined to improve classification performance. For example, a CNN can extract the spectral-spatial characteristics of an HSI, whereas a GCN focuses on the topological relationship between pixels. In general, stacking multiple models is more effective than using a single frame. Federated learning (FL) allows data fusion at target nodes, which connects key features and improves network channels (Gaba et al. Citation2022; Nguyen and Zettsu Citation2021). However, reliable mechanisms must be established for sharing information updates. Several studies have demonstrated that FL enables joint training of multiple DL frameworks to integrate latent information from multiple networks.

Combining a CNN and GCN is a highly feasible approach. A non-local GCN was used at the superpixel level to improve the simplified SP-CNN. FL of the sample features of the two DL frameworks was realized. Three federated classification strategies were used for the SSC-SFN: SSC-SFN superposition (SSC-SFN-P), SSC-SFN multiplication (SSC-SFN-M), and SSC-SFN splicing (SSC-SFN-S). These strategies were implemented at the superpixel level, the l-th layer features of the CNN and GCN were extracted, and the fully connected layer was then used for classification. This can be expressed as follows: (19) $H_{P}^{l} = P (H_{CNN}^{(l - 1)}, H_{GCN}^{(l - 1)})$ (19) (20) $H_{M}^{l} = M (H_{CNN}^{(l - 1)}, H_{GCN}^{(l - 1)})$ (20) (21) $H_{S}^{l} = S (H_{CNN}^{(l - 1)}, H_{GCN}^{(l - 1)})$ (21) where P (·), M (·), and S (·) denote the superposition, multiplication, and splicing functions, respectively. $H_{CNN}^{(l - 1)}$ and $H_{GCN}^{(l - 1)}$ represent the feature matrix extracted by the l-1 layer CNN and GCN, respectively. $H_{P}^{l}$ , $H_{M}^{l}$ , and $H_{S}^{l}$ are the results obtained using the three federated strategies. The function P (·) represents the sum of corresponding elements in the matrix. The function M (·) refers to matrix multiplication; we calculated the Hadamard product of the matrix in this step. Function M (·) can be expressed as follows: (22) $A * B = [a_{i, j}] * [b_{i, j}]$ (22) where the size of matrix A = [a_i,j] and matrix B = [b_i,j] must be consistent; its elements are defined as the products of the elements corresponding to the two matrices, i.e. $(A * B)_{i, j} = a_{i, j} b_{i, j}$ . For function S (·), we utilized the splicing function in Numerical Python, but changed the matrix size. The proposed solution maps the assembled matrix compression into the distance space and approximates a unique solution step-by-step, yielding a feature matrix of the same size as the input matrix M. Function S (·) can be expressed as follows: (23) $M = [a_{n \times n}, b_{n \times n}]_{n \times 2 n}, {M_{n}}_{n = 0}^{+ \infty}$ (23)

3. Results

To confirm the effectiveness of the proposed SSC-SFN method, a large number of experiments were conducted on four generic hyperspectral datasets: Salinas, WHU-Hi-HanChuan, and WHU-Hi-HongHu. These four datasets were used as benchmarks to assess the classification performance of our proposed algorithms, both quantitatively and qualitatively.

3.1. Datasets and evaluation indicators

The Salinas dataset was collected by the airborne visible/infrared imaging spectrometer (AVIRIS) sensor from Salinas Valley (California, USA). As is shown in , it is composed of 16 categories of 512 × 217-pixel images, with 204 bands and a spatial resolution of 3.7 m. In the image analyzed herein, there are two spatially adjacent classes, ‘Grapes_U’ and ‘Vineyard_U’ with very similar spectra.

Figure 3. Salinas dataset. (a) False color image. (b) Ground-truth.

The WHU-Hi-HanChuan dataset (Zhong et al. Citation2018a; Zhong et al. Citation2020) was collected from Hanchuan, Hubei Province, China, by a Headwall Nano-Hyperspec imaging sensor (8-mm focal length) equipped on the Aibot X6 UAV V1 platform (Leica, Wetzlar, Germany). As is shown in , the study area contained seven crop species: strawberry, cowpea, soybean, sorghum, water spinach, watermelon, and greens. This dataset comprised 1,217 × 303-pixel images and 274 bands (400-1,000 nm), with a spatial resolution of about 0.109 m. Because the WHU-Hi-HanChuan dataset was collected in the afternoon when the sun was low, many areas were covered by shadow, which affected the classification.

Figure 4. WHU-Hi-HanChuan dataset. (a) False color image. (b) Ground-truth.

The third dataset used was the WHU-Hi-HongHu dataset (Zhong et al. Citation2018a; Zhong et al. Citation2020). As is shown in , this dataset was acquired with a 17-mm focal length Headwall Nano-Hyperspec imaging sensor in Honghu City, Hubei Province, China. The research area was typical of a region affected by land fragmentation, and contained 17 crop types including cotton, rape, and cabbage. This dataset comprised 940 × 475-pixel images with a spatial resolution of about 0.043 m and 270 bands (400-1,000 nm). Different varieties of the same crop type are grown in this district, for example, Chinese cabbage/cabbage and Brassica oleracea/small brassica.

Figure 5. WHU-Hi-HongHu dataset. (a) False color image. (b) Ground-truth.

The WHU-Hi-LongKou dataset was collected from Longkou Town, Hubei Province, China (Zhong et al. Citation2018a; Zhong et al. Citation2020). The dataset is based on an 8-mm focal length Headwall Nano-Hyperspec imaging sensor equipped on a DJI Matrice 600 Pro UAV platform. As is shown in , the study area includes six crops: corn, cotton, sesame, broad-leaf soybean, narrow-leaf soybean, and rice. The size of the dataset is 550 × 400 pixels, there are 270 bands from 400 to 1,000 nm, and the spatial resolution of the hyperspectral images carried by the drone is about 0.463 m.

Figure 6. WHU-Hi-LongKou dataset. (a) False color image. (b) Ground-truth.

In all experiments conducted in this study, the classification results were evaluated by adopting three commonly used indices, i.e. overall accuracy (OA), average accuracy (AA), and the kappa coefficient (κ). To overcome the examination bias caused by random marking, the mean and standard deviation of 10 independent runs were calculated as the final classification results.

3.2. Classification results and analysis

Several state-of-the-art DL classification methods were compared, i.e. MiniGCN (Hong et al. Citation2021), FuNet-C (Hong et al. Citation2021), MDGCN (Wan et al. Citation2020), SSOGCN (Zhang et al. Citation2021), CNNCRF (Zhong et al. Citation2020), FPGA (Zheng et al. Citation2020), and the SSC-SFN-M. All of these methods are based on deep networks and use fully connected networks as the classifier. For all baseline methods, the hyperparameters recommended in the papers that originally described them were adopted. compare the results obtained by applying these algorithms to the four datasets.

Table 2. Classification accuracy (%) achieved by applying different methods to the Salinas dataset.

Download CSV Display Table

Table 3. Classification accuracy (%) achieved by applying different methods to the WHU-HI-HanChuan dataset.

Download CSV Display Table

Table 4. Classification accuracy (%) achieved by applying different methods to the WHU-HI-HongHu dataset.

Download CSV Display Table

Table 5. Classification accuracy (%) achieved by applying different methods to the WHU-HI-LongKou dataset.

Download CSV Display Table

shows the results obtained by applying several classification algorithms to the Salinas dataset. According to the information in , MiniGCN, FuNet-C, MDGCN, SSOGCN, CNNCRF, and CEGCN could not accurately identify the ‘Vineyard_U’ class, while SSC-SFN-M had the advantage of identifying the ‘Grapes_U’ class. Because the spectral curves of ‘Vineyard_U’ and ‘Grapes_U’ were very similar, and they were close spatial neighbors, it was not appropriate to rely on spectral values to classify and capture their boundary information, as shown in . Therefore, to ensure excellent classification accuracy for the Salinas dataset, it was necessary to extract spatial information. Our method was capable of this, and the multiplication strategies resulted in classification accuracy of >99.4%.

Figure 7. Classification maps of the eight algorithms on Salinas dataset.

compares the results obtained by applying different methods to the WHU-Hi-HanChuan dataset, the best OA, AA, and κ values are in bold. The classification accuracy of the proposed method was 91.82%, which was about 10% higher than that of MDGCN (81.32%). We concluded that the classification accuracy of our proposed method was superior to its competitors. It should be noted that the dataset was affected by light and shade coverage when it was collected, and the spectral values of some categories fluctuate, this makes it difficult to distinguish the ‘soybean’ and ‘water’ categories in particular. However, we could capture changes in the distribution of the shaded regions using the non-local superpixel network, which rendered the classification performance of our proposed method for these categories superior to that of the other algorithms. From , it can be seen that there were large errors in the display of partially overlapping objects and isolated areas, except when SSC-SFN-M was used, this may be attributable to the segmentation ability of superpixel.

Figure 8. Classification maps were obtained using eight algorithms with the WHU-Hi-HanChuan dataset.

The classification statistics for several methods applied to the WHU-Hi-HongHu dataset are listed in . The OA of our method was about 10% higher than that of FuNet-C, which represents an important performance breakthrough and confirms the utility of our framework. The OA of MDGCN, which only used graph convolution to extract spatial-spectral information, was 83.63%. This finding shows the importance of expressing spatial neighborhood relationships. In contrast, CEGCN and FPGA required a large number of labeled samples to learn the network parameters. Serious misclassifications occurred in multiple classes, such as ‘Chinese cabbage’ and ‘tuber mustard’, when the labeled samples were limited. This dataset had 22 categories, and the uneven distribution of samples was one of the major barriers to classification. Our method could not accurately identify and locate the ‘road’ and ‘Brassica chinensis’ categories. This may have been because after the dataset was segmented, the fragmented distribution of the classes led to numerous very small patches, which weakened the influence of the global information regarding the topological structure on the classification. shows that it was difficult to identify criticality, especially in terms of the shapes of the ‘trees’ and ‘Chinese cabbage’ classes in the upper left, and the ‘Brassica chinensis’ class in the lower right. MiniGCN, FuNet-C, SSOGCN, CNNCRF, CEGCN, and SSC-SFN-M did not achieve satisfactory classification results. In summary, the frequency spectrum of these categories was complex and the airspace overlapped. The framework must be improved to enhance the recognition accuracy for these categories.

Figure 9. Classification maps were obtained using eight algorithms with the WHU-Hi-HongHu dataset.

The classification results of the WHU-Hi-LongKou dataset calculated by SSC-SFN and comparison methods are shown in , and the specific accuracy is recorded in . Compared with the other three datasets, this dataset is relatively easy to classify, and the inter-class characteristics are obvious. The OA of MiniGCN reached 84.19%, which may be due to the lack of effective integrated context information, resulting in feature fragmentation. For example, cotton and broad-leaf, corn, and mixed weeds are distinguished with less precision (a). The MDGCN classification results identified a small number of broad-leaves as sesame, which resulted in a serious misclassification (c). The proposed SSC-SFN-M model has generally obtained high accuracy, with OA reaching 98.52%. SSC-SFN-M gives a good answer to the mixed misclassification of broad-leaf and sesame, showing that the non-local segmentation features and the global context information are accurately fuzed.

Figure 10. Classification maps were obtained using eight algorithms with the WHU-Hi-LongKou dataset.

4. Discussion

In this section, we discuss the influence of the number of training samples on the classification results, the roles of federated network strategies and non-local superpixels, compare the operating times of several algorithms, and consider the impact of key parameters on the efficiency of the proposed method. All experiments were conducted using an Intel i3 9100F CPU, 16 GB memory, and an NVIDIA GEFORCE RTX 2080Ti GPU.

4.1. Impact of various labeled samples on the classification results

We first compared the proposed framework with other algorithms using four datasets with different numbers of training samples per class, the more training samples, the higher the classification accuracy for each method. To be specific, we varied the number of labeled examples per class from 5 to 30, and reported the OA values obtained by all methods for all four datasets& nbsp;(Rao et al. Citation2019). shows that our method had the best performance when applied to all four datasets, which demonstrated the effectiveness of SSC-SFN-M for obtaining spectral-spatial information, as well as its suitability for large areas with complex background information under the constraint of few samples. In addition, the federated strategy of multiplication obtained stable results, which also confirmed the importance of continuing research on superpixel multiplication. In particular, there was a discrepancy between FPGA and our new method, due to the advantage of combining the GCN and CNN mentioned previously. As expected, the SSC-SFN-M classification of the Salinas dataset was very accurate, as the accuracy of other methods is > 85% when labeling 30 samples per class. Our method had an obvious advantage in terms of its ability to operate with a limited number of labeled samples. Some comparative methods performed poorly for the three WHU-Hi datasets, indicating that these algorithms cannot easily extract complex background information, and that the proposed method has good performance for complex and large datasets. More interestingly, our method achieved relatively high classification accuracy when the sample size was about 30 per class, which also confirmed the suitability of the method for HSI classification tasks.

Figure 11. The OA according to the number of training samples. (a) Salinas (b) WHU-Hi-HanChuan (c) WHU-Hi-HongHu, and (d) WHU-Hi-LongKou.

4.2. Federated network strategy

Stacking multiple models is one way to improve the efficiency of HSI classification algorithms. After extracting deep features separately through CNN and GCN, we implemented three different federated classification strategies to improve the algorithm& nbsp;(Chen et al. Citation2016). list the classification results of the traditional methods, superpixel GCN (S-GCN), SP-CNN, and SSC-SFN using different federated classification strategies with four datasets. From the four tables, it is apparent that CNN was better than GCN for solving the HSI classification problem. The accuracy of SP-CNN was also greatly improved, indicating that the superpixel pooling process is useful. For the application of the three federated classification strategies to the Salinas, WHU-Hi-HongHu datasets, and WHU-Hi-LongKou, the multiplication strategy is more suitable for splicing and superposition. This may be because the input of SSC-SFN-M can better characterize the topological relationship between superpixels after normalizing the depth eigenvalues. For the WHU-Hi-HanChuan dataset, superpixel splicing showed superior accuracy, likely because the best approximation value of the solution was useful for analyzing the shaded areas. Interestingly, the traditional CNN performed better than S-GCN when applied to the Salinas dataset, demonstrating that it was more suitable for fixed window feature extraction and recognition. Our experiments were all implemented in a unified environment using TensorFlow (GPU version).

Table 6. Results obtained by applying several Classification schemes to the Salinas dataset.

Download CSV Display Table

Table 7. Results obtained by applying several Classification schemes to the WHU-HI-HanChuan dataset.

Download CSV Display Table

Table 8. Results obtained by applying several Classification schemes to the WHU-HI-HongHu dataset.

Download CSV Display Table

Table 9. Results obtained by applying several Classification schemes to the WHU-HI-LongKou dataset.

Download CSV Display Table

4.3. Influence of the number of superpixels

The number of superpixels also affects the classification results of our method. Therefore, we used four datasets to determine the optimal parameters of SSC-SFN-M. Notably, the maximum interclass variance method (i.e. OTSU) (Otsu Citation1979) automatically obtains the optimal threshold of superpixel size and thus does not rely on other methods such as transfer learning or data augmentation. shows how the classification results changed with the number of superpixels for the four datasets. The OA of SSC-SFN-M was excellent under all parameter conditions. In addition, the performance of SSC-SFN-M was robust because non-local superpixels can integrate spatial features. The optimal numbers of superpixels for the Salinas, WHU-Hi-HanChuan, WHU-Hi-HongHu, and WHU-Hi-LongKou datasets were 1,100, 700, 1,000, and 1,100, respectively. This was because the optimal hyperparameters were affected by many factors, including the spectral resolution, spatial resolution, and distribution characteristics of the ground features of HSI. Generally, the higher the spatial resolution, the clearer the distribution of ground features, and the more appropriate it is to increase the number of superpixels. However, imbalance of the dataset leads to ‘refinement’ of the superpixels, and in turn to a higher misclassification rate; this process occurred for the WHU-Hi-HanChuan dataset. The number of samples in each category varies greatly among the tested datasets, which impedes mining of more detailed information about spatial structures. If detailed features are to be divided during image segmentation, the number of superpixels must be sufficient, as too few superpixels will reduce the effect of superpixel segmentation. In addition, as the number of superpixels increases, time consumption and memory requirements should also be considered.

Figure 12. The OA of SSC-SFN-M according to the numbers of superpixels.

4.4. Influence of network depth

Network depth is a hyperparameter that must be considered in DL frameworks, as it can have a large impact on network performance. Therefore, 10 independent experiments were conducted and OA was used as the performance criterion to determine the impact of network depth on the results of the proposed method. As shown in , for the Salinas dataset, the performance of the SSC-SFN-M increased with network depth. The reason for this may be superpixel smoothing and the irregular edge extraction of the GCN path. Deeper networks could suppress more noise or outliers, resulting in better accuracy. For the WHU-Hi-HanChuan dataset, a network depth of 2–3 was the most reasonable. However, the method results were greatly affected by shadow occlusion and mixed ground objects, indicating that networks that were too deep could not effectively eliminate such interference. For the WHU-Hi-HongHu dataset, SSC-SFN-M achieved the best performance with three network layers. Because the WHU-Hi-HongHu dataset had ‘tighter’ large-scale land cover types at the spatial level, deepening the network will cause the gradient to vanish and there will be no performance improvement. Thus, for real datasets with complex land cover and large amounts of data, if the SSC-SFN-M is too deep performance degradation may occur. The network depth of the SSC-SFN-M was fixed to three in all of the other experiments.

Figure 13. The OA of SSC-SFN-M according to network depth for the four datasets. (a) Salinas (b) WHU-Hi-HanChuan, (c) WHU-Hi-HongHu and (d) WHU-Hi-LongKou.

4.5. Non-local superpixel

The non-local superpixel is an important facet of the proposed method, as it effectively alleviates the problem of having an entire feature matrix of GCN inputs. Non-local smoothing reduces the computational cost and preserves important edge information through iterative smoothing. This is important for the subsequent GCN process, to fully learn from the relationships of nearest neighbor pixels& nbsp;(Gimeno et al. Citation2021). Therefore, for quantitative analysis of the role of the non-local superpixel in this section, all OA values presented are average results for 10 independent experiments. To demonstrate the smoothing effect of the non-local superpixel method, we selected several existing superpixel segmentation algorithms for comparison, including Watershed (Vincent and Soille Citation1991), Meanshift (Fukunaga and Hostetler Citation1975), Quickshift, SLIC, ERS (Liu et al. Citation2011), SLIC0 and SAM. As shown in , non-local segment strategy showed satisfactory performance for the four experimental datasets, improving OA by approximately 10%. The reason for this improvement is that internal smoothing of superpixels can extract more critical boundary features and appropriate spatial information, indicating that this attempt at local pattern search smoothing was successful. Moreover, the non-local superpixel has lower sample requirements, which represents a breakthrough in solving the labeled sample problem. For the WHU-Hi-HanChuan dataset, the OA of non-local segment strategy was improved by > 8% compared with other methods, demonstrating that non-local segment strategy is excellent for considering shadow areas on farmland. SLIC0 did not perform as well as SLIC, likely because SLIC0 does not handle irregular areas of texture as well as SLIC. Interestingly, for the relatively regular Salinas dataset and the WHU-Hi-HongHu dataset, SLIC0 was advantageous, as it adds a constant compression factor to improve the shape of superpixel blocks. In addition, SAM demonstrated the advantages of the graph structure when applied to Salinas and WHU-Hi-HanChuan datasets, providing better classification results than SLIC and SLIC0; this is one of the reasons that SAM was selected as the basic segmentation algorithm for this study. The classification results for Watershed, Meanshift, and Quickshift were not ideal, indicating that the background complexity of the four datasets is high, the recognition of objects is difficult, and the possibility of missegmentation of complex scenes is markedly increased.

Table 10. Results of several superpixel segmentation algorithms for four datasets.

Download CSV Display Table

4.6. Running time

Running time is one of the most important metrics for evaluating the effectiveness of an HSI classification algorithm. An efficient classification algorithm is most suitable for practical engineering applications. reports the training and testing times of several HSI classification algorithms using four images. FuNet-C is an improved algorithm based on MiniGCN. The advantages of the mini-patch gradually disappear after the amelioration process. In addition, the slow speed of CNNCRF may be due to the CNN not considering the spatial node relationship of the graph during image interpretation, such that CNNCRF takes longer to achieve higher accuracy, making this model more time-consuming than others. The efficiency of FPGA is relatively high, as FPGA alleviates the complexity issue in highly overlapping areas and can train FreeNet via global learning with the free-patch step. Almost all of the fast algorithms employ superpixel techniques, including MDGCN, CEGCN, and the proposed method. However, the dynamic multi-scale processing adopted by MDGCN may limit its efficiency. Moreover, CEGCN adds pixel-level features for learning, which undoubtedly increases the computational cost. Compared with several other methods, our method saves time because the combination of the superpixel segmentation technique and federated network strategies makes the input classifier data smaller.

Table 11. Running time (seconds) of eight classification schemes for four datasets with 30 labeled pixels per class.

Download CSV Display Table

4.7. Performance and generalization ability analysis of SSC-SFN

In the process of model training, the problem of overfitting often occurs, that is, the model can match the training data well, but cannot predict the data other than the training set. If the test data is used to adjust the model parameters at this time, it is equivalent to the information of some of the test data known at the time of training, which will affect the accuracy of the final evaluation results. Generally, a part of the training data is randomly selected as the validation data to evaluate the training effect of the model. We divided the experimental data into training and validation sets. The test set is training-independent data, not involved in training at all, for the evaluation of the final model. The validation data is taken from the training data, but it does not participate in the training, so that the model matches the data outside the training set relatively objectively. The validation method selects K-cross validation, which divides the original data into K groups, and makes each subset of data separately. The K is defined as 100 in the experiment. In , when K = 20, the model achieves stable evaluation accuracy and shows that the results are as close as possible to the performance of the model on the test set, and the parameters can be used as the best parameters for model optimization.

Figure 14. The results of K-cross validation show that when K = 20, the model performs best.

When training the model, there may encounter insufficient training data, the training data cannot estimate the distribution of the whole data, or overtraining the model often leads to model overfitting. To prevent model overfitting, we added a Dropout layer to the model. Specifically, each neuron in the input tensor is set to 0 with a certain probability. The keep_prob was set to 0.2 (indicating that 80% of neurons were placed to 0 and the remaining 20% of neurons were divided by 0.2). The main idea of Dropout layer is distributed expression to the features (Zhou et al. Citation2023). During the training, some nodes are randomly discarded, so that these nodes do not participate in the update training of the parameters (generally set to 0.5, which closes the output of 50% of the hidden layer neurons). When the data is updated again, the election is randomly conducted. The Dropout layer allows to reduce of the dependence between neurons, because random selection of nodes makes neurons not always acting simultaneously, which effectively prevents model overfitting.

5. Conclusions

In this study, a spectral-spatial DL model for HSI classification based on a non-local segment federated network is described. This method exploits the homogeneity of superpixels to perform non-local smoothing within each superpixel to optimize each superpixel block. Thus, spatial information is obtained for small batches, which facilitates the classification of HSIs with complex backgrounds. The proposed SSC-SFN combines the CNN and GCN techniques to exploit the available spectral and spatial structure information. More importantly, the federated network strategies based on CNN and GCN used in this framework improve the robustness of feature recognition for HSI classification tasks. In summary, SSC-SFN solved the problem of classifying large-scale complex background research areas with limited labeled samples. Many experiments were conducted on four widely applied datasets. Compared with state-of-the-art DL algorithms, SSC-SFN-M achieved a higher OA (> 91.8%). Specifically, SSC-SFN-M improved accuracy by > 12% relative to FuNet-C. When the sample size per class was reduced from 30 to 5, the proposed method retained outstanding accuracy and had an extremely fast analysis speed, demonstrating the effectiveness and superiority of the proposed SSC-SFN method. The main contribution of SSC-SFN is integrating two paths between superpixels, to automatically obtain the optimal threshold for superpixel size (thus reducing the hyper-parameter optimization problem). It also achieves accurate and fast classification of large hyperspectral datasets with a small number of training samples.

Acknowledgements

We acknowledge all the reviewers and editors of the journal for their valuable comments, suggestions, and revisions on this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available from the corresponding author, [Taixia Wu], upon reasonable request.

Additional information

Funding

This work was supported by a grant from Key Laboratory of Emergency Satellite Engineering and Application, Ministry of Emergency Management; Science for a Better Development of Inner Mongolia Program: [Grant Number 2022EEDSKJXM003].

References

Abdelmoneim, H., M. Soliman, and H. Moghazy. 2020. “Evaluation of TRMM 3B42V7 and CHIRPS Satellite Precipitation Products as an Input for Hydrological Model Over Eastern Nile Basin.” Earth Systems and Environment 4: 685–698. https://doi.org/10.1007/s41748-020-00185-3.
Google Scholar
Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk. 2012. “SLIC Superpixels Compared to State-of-the-Art Superpixel Methods.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11): 2274–2282. https://doi.org/10.1109/TPAMI.2012.120.
PubMed Web of Science ®Google Scholar
Ahishali, M., S. Kiranyaz, T. Ince, and M. Gabbouj. 2021. “Classification of Polarimetric SAR Images Using Compact Convolutional Neural Networks.” GIScience & Remote Sensing 58 (1): 28–47. https://doi.org/10.1080/15481603.2020.1853948.
Web of Science ®Google Scholar
Alkhatib, M. Q., and M. Velez-Reyes. 2019. “Improved Spatial-Spectral Superpixel Hyperspectral Unmixing.” Remote Sensing 11 (20): 2374. https://doi.org/10.3390/rs11202374.
Web of Science ®Google Scholar
Amirabbas, D., A. Erchan, Y. Berrin, M. Andreas, and R. Christian. 2018. “GMM-Based Synthetic Samples for Classification of Hyperspectral Images With Limited Training Data.” IEEE Geoscience and Remote Sensing Letters 15 (6): 942–946. https://doi.org/10.1109/LGRS.2018.2817361.
Web of Science ®Google Scholar
Anderson, J. 1976. “A Land Use and Land Cover Classification System for Use With Remote Sensor Data.” US Government Printing Office 964.
Google Scholar
Cao, J., and B. Wang. 2017. “Embedding Learning on Spectral-Spatial Graph for Semisupervised Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 14 (10): 1805–1809. https://doi.org/10.1109/LGRS.2017.2737020.
Web of Science ®Google Scholar
Chen, Y., H. Jiang, C. Li, X. Jia, and P. Ghamisi. 2016. “Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks.” IEEE Transactions on Geoscience and Remote Sensing 54 (10): 6232–6251. https://doi.org/10.1109/TGRS.2016.2584107.
Web of Science ®Google Scholar
Chen, R., G. Li, and C. Dai. 2022. “DRGCN: Dual Residual Graph Convolutional Network for Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 19: 6009205.
Web of Science ®Google Scholar
Dong, Y., Q. Liu, B. Du, and L. Zhang. 2022. “Weighted Feature Fusion of Convolutional Neural Network and Graph Attention Network for Hyperspectral Image Classification.” IEEE Transactions on Image Processing 31: 1559–1572. https://doi.org/10.1109/TIP.2022.3144017.
PubMed Web of Science ®Google Scholar
Farooq, A., X. Jia, J. Hu, and J. Zhou. 2019. “Multi-Resolution Weed Classification via Convolutional Neural Network and Superpixel Based Local Binary Pattern Using Remote Sensing Images.” Remote Sensing 11 (14): 1692. https://doi.org/10.3390/rs11141692.
Web of Science ®Google Scholar
Fukunaga, K., and L. Hostetler. 1975. “The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition.” IEEE Transactions on Information Theory 21 (1): 32–40. https://doi.org/10.1109/TIT.1975.1055330.
Web of Science ®Google Scholar
Gaba, S., I. Budhiraja, V. Kumar, S. Garg, G. Kaddoum, and M. Hassan. 2022. “A Federated Calibration Scheme for Convolutional Neural Networks: Models, Applications and Challenges.” Computer Communications 192: 144–162. https://doi.org/10.1016/j.comcom.2022.05.035.
Web of Science ®Google Scholar
Galieni, A., N. Nicastro, A. Pentangelo, C. Platani, T. Cardi, and C. Pane. 2022. “Surveying Soil-Borne Disease Development on Wild Rocket Salad Crop by Proximal Sensing Based on High-Resolution Hyperspectral Features.” Scientific Reports 12: 5098. https://doi.org/10.1038/s41598-022-08969-5.
PubMed Web of Science ®Google Scholar
Gao, Q., F. Xie, D. Huang, and C. Jin. 2022. “Spectral and Spatial Reduction of Hyperspectral Image Guided by Data Reconstruction and Superpixels.” Engineering Applications of Artificial Intelligence 111: 104803. https://doi.org/10.1016/j.engappai.2022.104803.
Web of Science ®Google Scholar
Gimeno, P., V. Mingote, A. Ortega, A. Miguel, and E. Lleida. 2021. “Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation with Limited Training Data.” IEEE Signal Processing Letters 28: 1135–1139. https://doi.org/10.1109/LSP.2021.3084501.
Web of Science ®Google Scholar
Gong, H., Q. Li, C. Li, H. Dai, Z. He, W. Wang, H. Li, F. Han, A. Tuniyazi, and T. Mu. 2021. “Multiscale Information Fusion for Hyperspectral Image Classification Based on Hybrid 2D-3D CNN.” Remote Sensing 13 (12): 2268. https://doi.org/10.3390/rs13122268.
Web of Science ®Google Scholar
Hammond, D. K., P. Vandergheynst, and R. Gribonval. 2011. “Wavelets on Graphs via Spectral Graph Theory.” Applied and Computational Harmonic Analysis 30 (2): 129–150. https://doi.org/10.1016/j.acha.2010.04.005.
Web of Science ®Google Scholar
Hänsch, R., and O. Hellwich. 2021. “Fusion of Multispectral LiDAR, Hyperspectral, and RGB Data for Urban Land Cover Classification.” IEEE Geoscience and Remote Sensing Letters 18 (2): 366–370. https://doi.org/10.1109/LGRS.2020.2972955.
Web of Science ®Google Scholar
He, Q., X. Sun, Z. Yan, and K. Fu. 2022a. “DABNet: Deformable Contextual and Boundary-Weighted Network for Cloud Detection in Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 60: 5601216.
Web of Science ®Google Scholar
He, Q., X. Sun, Z. Yan, B. Li, and K. Fu. 2022b. “Multi-Object Tracking in Satellite Videos with Graph-Based Multitask Modeling.” IEEE Transactions on Geoscience and Remote Sensing 60: 5619513.
Web of Science ®Google Scholar
Hong, S., K. Cho, S. Park, T. Kang, M. Kim, G. Nam, and J. Pyo. 2022. “Estimation of Cyanobacteria Pigments in the Main Rivers of South Korea Using Spatial Attention Convolutional Neural Network with Hyperspectral Imagery.” GIScience & Remote Sensing 59 (1): 547–567. https://doi.org/10.1080/15481603.2022.2037887.
Web of Science ®Google Scholar
Hong, D., L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot. 2021. “Graph Convolutional Networks for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 59 (7): 5966–5978. https://doi.org/10.1109/TGRS.2020.3015157.
Web of Science ®Google Scholar
Hong, D., N. Yokoya, G. Xia, J. Chanussot, and X. Zhu. 2020. “X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data.” ISPRS Journal of Photogrammetry and Remote Sensing 167: 12–23. https://doi.org/10.1016/j.isprsjprs.2020.06.014.
PubMed Web of Science ®Google Scholar
Irteza, S., J. Nichol, W. Shi, and S. Abbas. 2021. “NDVI and Fluorescence Indicators of Seasonal and Structural Changes in a Tropical Forest Succession.” Earth Systems and Environment 5: 127–133. https://doi.org/10.1007/s41748-020-00175-5.
Google Scholar
Jia, S., Z. Zhan, M. Zhang, M. Xu, Q. Huang, J. Zhou, and X. Jia. 2021. “Multiple Feature-Based Superpixel-Level Decision Fusion for Hyperspectral and LiDAR Data Classification.” IEEE Transactions on Geoscience and Remote Sensing 59 (2): 1437–1452. https://doi.org/10.1109/TGRS.2020.2996599.
Web of Science ®Google Scholar
Jiang, M., X. Zhang, Y. Sun, W. Feng, Q. Gan, and Y. Ruan. 2022. “AFSNet: Attention-Guided Full-Scale Feature Aggregation Network for High-Resolution Remote Sensing Image Change Detection.” GIScience & Remote Sensing 59 (1): 1882–1900. https://doi.org/10.1080/15481603.2022.2142626.
Web of Science ®Google Scholar
Ke, H., D. Chen, X. Li, Y. Tang, T. Shah, and R. Ranjan. 2018. “Towards Brain Big Data Classification: Epileptic EEG Identification With a Lightweight VGGNet on Global MIC.” IEEE Access 6: 14722–14733. https://doi.org/10.1109/ACCESS.2018.2810882.
Web of Science ®Google Scholar
Kipf, T. N., and M. Welling. 2017. “Semi-supervised Classification with Graph Convolutional Networks.” International Conference on Learning Representations (ICLR), 1–14.
Google Scholar
Li, M., H. Chen, and Z. Cheng. 2022. “An Attention-Guided Spatiotemporal Graph Convolutional Network for Sleep Stage Classification.” Life (chicago, Ill 12 (5): 622. https://doi.org/10.3390/life12050622.
PubMedGoogle Scholar
Li, Z., Q. Ling, J. Wu, Z. Wang, and Z. Lin. 2020. “A Constrained Sparse-Representation-Based Spatio-Temporal Anomaly Detector for Moving Targets in Hyperspectral Imagery Sequences.” Remote Sensing 12 (17): 2783. https://doi.org/10.3390/rs12172783.
Web of Science ®Google Scholar
Li, X., and H. Ning. 2020. “Deep Pyramid Convolutional Neural Network Integrated with Self-attention Mechanism and Highway Network for Text Classification.” Journal of Physics: Conference Series, 1642.
Google Scholar
Li, W., G. Wu, and F. Zhang. 2017. “Hyperspectral Image Classification Using Deep Pixel-Pair Features.” IEEE Transactions on Geoscience and Remote Sensing 55 (2): 844–853. https://doi.org/10.1109/TGRS.2016.2616355.
Web of Science ®Google Scholar
Liu, T., A. Abd-EIrahman, J. Morton, and V. L. Wilhelm. 2018a. “Comparing Fully Convolutional Networks, Random Forest, Support Vector Machine, and Patch-Based Deep Convolutional Neural Networks for Object-Based Wetland Mapping Using Images from Small Unmanned Aircraft System.” GIScience & Remote Sensing 55 (2): 243–264. https://doi.org/10.1080/15481603.2018.1426091.
Web of Science ®Google Scholar
Liu, Y., F. Condessa, J. M. Bioucas-Dias, P. Du, and A. Plaza. 2018b. “Convex Formulation for Multiband Image Classification With Superpixel-Based Spatial Regularization.” IEEE Transactions on Geoscience and Remote Sensing 56: 2704–2721. https://doi.org/10.1109/TGRS.2017.2782005.
Web of Science ®Google Scholar
Liu, M., O. Tuzel, S. Ramalingam, and R. Chellappa. 2011. “Entropy Rate Superpixel Segmentation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs CO USA 20–25 June 2097–2104.
Google Scholar
Liu, B., Y. Wei, Y. Zhang, and Q. Yang. 2017. “Deep Neural Networks for High Dimension, low Sample Size Data.” Proceedings of the 21 International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25, August, 2287–2293.
Google Scholar
Lu, Y., T. James, C. Schillaci, and A. Lipani. 2022. “Snow Detection in Alpine Regions with Convolutional Neural Networks: Discriminating Snow from Cold Clouds and Water Body.” GIScience & Remote Sensing 59 (1): 1321–1343. https://doi.org/10.1080/15481603.2022.2112391.
Web of Science ®Google Scholar
Makantasis, K., K. Karantzalos, A. Doulamis, and N. Doulamis. 2015. “Deep Supervised Learning for Hyperspectral Data Classification Through Convolutional Neural Networks.” IEEE International Geoscience And Remote Sensing Symposium (IGARSS), 4959–4962. https://doi.org/10.1109/IGARSS.2015.7326945.
Google Scholar
Masolele, R., V. Sy, D. Marcos, J. Verbesselt, F. Gieseke, K. Mulatu, Y. Moges, H. Sebrala, C. Martius, and M. Herold. 2022. “Using High-Resolution Imagery and Deep Learning to Classify Land-use Following Deforestation: A Case Study in Ethiopia.” GIScience & Remote Sensing 59 (1): 1446–1472. https://doi.org/10.1080/15481603.2022.2115619.
Web of Science ®Google Scholar
Mei, X., E. Pan, Y. Ma, X. Dai, J. Huang, F. Fan, Q. Du, H. Zheng, and J. Ma. 2019b. “Spectral-Spatial Attention Networks for Hyperspectral Image Classification.” Remote Sensing 11 (8): 963. https://doi.org/10.3390/rs11080963.
Web of Science ®Google Scholar
Mei, J., Y. Wang, L. Zhang, B. Zhang, S. Liu, P. Zhu, and Y. Ren. 2019a. “PSASL: Pixel-Level and Superpixel-Level Aware Subspace Learning for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 57 (7): 4278–4293. https://doi.org/10.1109/TGRS.2018.2890508.
Web of Science ®Google Scholar
Mukherjee, F., and D. Singh. 2020. “Assessing Land Use-Land Cover Change and Its Impact on Land Surface Temperature Using LANDSAT Data: A Comparison of Two Urban Areas in India.” Earth Systems and Environment 4: 385–407. https://doi.org/10.1007/s41748-020-00155-9.
Google Scholar
Nguyen, D., and K. Zettsu. 2021. “Spatially-distributed Federated Learning of Convolutional Recurrent Neural Networks for Air Pollution Prediction.” 2021 IEEE International Conference on Big Data, 3601–3608. https://doi.org/10.1109/BigData52589.2021.9671336.
Google Scholar
Otsu, N. 1979. “A Threshold Selection Method from Gray-Level Histograms.” IEEE Transactions on Systems, Man, and Cybernetics 9: 62–66. https://doi.org/10.1109/TSMC.1979.4310076.
Web of Science ®Google Scholar
Park, S., and A. Song. 2020. “Discrepancy Analysis for Detecting Candidate Parcels Requiring Update of Land Category in Cadastral Map Using Hyperspectral UAV Images: A Case Study in Jeonju, South Korea.” Remote Sensing 12 (3): 354. https://doi.org/10.3390/rs12030354.
Web of Science ®Google Scholar
Qin, A., Z. Shang, J. Tian, Y. Wang, T. Zhang, and Y. Tang. 2019. “Spectral-spatial Graph Convolutional Networks for Semisupervised Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 16: 241–245. https://doi.org/10.1109/LGRS.2018.2869563.
Web of Science ®Google Scholar
Rao, M., P. Tang, and Z. Zhang. 2019. “Spatial-Spectral Relation Network for Hyperspectral Image Classification with Limited Training Samples.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (12): 5086–5100. https://doi.org/10.1109/JSTARS.2019.2957047.
Web of Science ®Google Scholar
Ruan, K., S. Zhao, X. Jiang, Y. Li, J. Fei, D. Ou, Q. Tang, Z. Lu, T. Liu, and J. Xia. 2022. “A 3D Fluorescence Classification and Component Prediction Method Based on VGG Convolutional Neural Network and PARAFAC Analysis Method.” Applied Sciences 12 (10): 4886. https://doi.org/10.3390/app12104886.
Google Scholar
Shahraki, F. F., and S. Prasad. 2018. “Graph Convolutional Neural Networks for Hyperspectral Data Classification.” IEEE Global Conference on Signal and Information Processing 968-972.
Google Scholar
Sun, X., P. Wang, Z. Yan, F. Xu, R. Wang, W. Diao, J. Chen, et al. 2022. “FAIR1M: A Benchmark Dataset for Fine-Grained Object Recognition in High-Resolution Remote Sensing Imagery.” ISPRS Journal of Photogrammetry and Remote Sensing 184: 116–130. https://doi.org/10.1016/j.isprsjprs.2021.12.004.
Web of Science ®Google Scholar
Vincent, L., and P. Soille. 1991. “Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations.” IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (6): 583–598. https://doi.org/10.1109/34.87344.
Web of Science ®Google Scholar
Wan, S., C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang. 2020. “Multiscale Dynamic Graph Convolutional Network for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 58 (5): 3162–3177. https://doi.org/10.1109/TGRS.2019.2949180.
Web of Science ®Google Scholar
Wang, H., Y. Cheng, C. Chen, and X. Wang. 2021. “Semisupervised Classification of Hyperspectral Image Based on Graph Convolutional Broad Network.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 2995–3005. https://doi.org/10.1109/JSTARS.2021.3062642.
Web of Science ®Google Scholar
Wold, S., K. Esbensen, and P. Geladi. 1987. “Principal Component Analysis.” Chemometrics and Intelligent Laboratory Systems 2: 37–52. https://doi.org/10.1016/0169-7439(87)80084-9.
Web of Science ®Google Scholar
Xie, F., Q. Gao, C. Jin, and F. Zhao. 2021. “Hyperspectral Image Classification Based on Superpixel Pooling Convolutional Neural Network with Transfer Learning.” Remote Sensing 13 (5): 930. https://doi.org/10.3390/rs13050930.
Web of Science ®Google Scholar
Xie, F., C. Lei, C. Jin, and A. An. 2020. “A Novel Spectral-Spatial Classification Method for Hyperspectral Image at Superpixel Level.” Applied Sciences 10 (2): 463. https://doi.org/10.3390/app10020463.
Google Scholar
Xu, H., H. Zhang, W. He, and L. Zhang. 2019. “Superpixel-based Spatial-Spectral Dimension Reduction for Hyperspectral Imagery Classification.” Neurocomputing 360: 138–150. https://doi.org/10.1016/j.neucom.2019.06.023.
Web of Science ®Google Scholar
Yang, J., H. Li, W. Hu, L. Pan, and Q. Du. 2022. “Adaptive Cross-Attention-Driven Spatial–Spectral Graph Convolutional Network for Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 19: 1–5.
Google Scholar
Ye, M., R. Ni, C. Zhang, H. Gong, T. Hu, S. Li, Y. Sun, T. Zhang, and Y. Guo. 2021. “A Lightweight Model of VGG-16 for Remote Sensing Image Classification.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 6916–6922. https://doi.org/10.1109/JSTARS.2021.3090085.
Google Scholar
Zhang, X., S. Chen, P. Zhu, X. Tang, J. Feng, and L. Jiao. 2022. “Spatial Pooling Graph Convolutional Network for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 60: 1–15.
Google Scholar
Zhang, Y., X. Jiang, X. Wang, and Z. Cai. 2019. “Spectral-Spatial Hyperspectral Image Classification with Superpixel Pattern and Extreme Learning Machine.” Remote Sensing 11 (17): 1983.
Web of Science ®Google Scholar
Zhang, H., Y. Li, Y. Zhang, and Q. Shen. 2017. “Spectral-spatial Classification of Hyperspectral Imagery Using a Dual-Channel Convolutional Neural Network.” Remote Sensing Letters 8 (5): 438–447. https://doi.org/10.1080/2150704X.2017.1280200.
Web of Science ®Google Scholar
Zhang, M., H. Luo, W. Song, H. Mei, and C. Su. 2021. “Spectral-Spatial Offset Graph Convolutional Networks for Hyperspectral Image Classification.” Remote Sensing 13 (21): 4342–4364. https://doi.org/10.3390/rs13214342.
Web of Science ®Google Scholar
Zhang, L., H. Su, and J. Shen. 2019. “Hyperspectral Dimensionality Reduction Based on Multiscale Superpixelwise Kernel Principal Component Analysis.” Remote Sensing 11 (10): 1219. https://doi.org/10.3390/rs11101219.
Web of Science ®Google Scholar
Zhao, Y., F. Su, and F. Yan. 2020. “Novel Semi-Supervised Hyperspectral Image Classification Based on a Superpixel Graph and Discrete Potential Method.” Remote Sensing 12 (9): 15280.
Web of Science ®Google Scholar
Zheng, Z., Y. Zhong, A. Ma, and L. Zhang. 2020. “FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 58 (8): 5612–5626. https://doi.org/10.1109/TGRS.2020.2967821.
Web of Science ®Google Scholar
Zhong, Y., X. Hu, C. Luo, X. Wang, J. Zhao, and L. Zhang. 2020. “WHU-Hi: Uav-Borne Hyperspectral with High Spatial Resolution (H2) Benchmark Datasets and Classifier for Precise Crop Identification Based on Deep Convolutional Neural Network with CRF.” Remote Sensing of Environment 250: 112012. https://doi.org/10.1016/j.rse.2020.112012.
Web of Science ®Google Scholar
Zhong, Z., J. Li, Z. Luo, and M. Chapman. 2018b. “Spectral-spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework.” IEEE Transactions on Geoscience and Remote Sensing 56 (2): 847–858. https://doi.org/10.1109/TGRS.2017.2755542.
Web of Science ®Google Scholar
Zhong, Y., X. Wang, Y. Xu, S. Wang, T. Jia, X. Hu, J. Zhao, L. Wei, and L. Zhang. 2018a. “Mini-UAV-borne Hyperspectral Remote Sensing: From Observation and Processing to Applications.” IEEE Geoscience and Remote Sensing Magazine 6 (4): 46–62. https://doi.org/10.1109/MGRS.2018.2867592.
Web of Science ®Google Scholar
Zhou, B., X. Zhang, X. Chen, M. Ren, and Z. Feng. 2023. “HyperRefiner: A Refined Hyperspectral Pansharpening Network Based on the Autoencoder and Self-Attention.” International Journal of Digital Earth 16 (1): 3268–3294. https://doi.org/10.1080/17538947.2023.2246944.
Web of Science ®Google Scholar

SSC-SFN: spectral-spatial non-local segment federated network for hyperspectral image classification with limited labeled samples

ABSTRACT

1. Introduction

2. Methodology

2.1. Superpixel pooling convolutional neural network