2,248
Views
1
CrossRef citations to date
0
Altmetric
Review Articles

A survey of deep learning-based classification methods for steady-state visual evoked potentials

ORCID Icon, ORCID Icon & ORCID Icon
Article: 2181102 | Received 08 Dec 2022, Accepted 10 Feb 2023, Published online: 03 Mar 2023

Abstract

Purpose

Steady-state visual evoked potential (SSVEP) based BCI has attracted great interests owing to the high information transfer rate (ITR) and little training requirement. The performance of SSVEP-based BCI heavily depends on the classification methods. Deep Learning (DL) technology provides an alternative avenue for the data classification in SSVEP-based BCI, and has received increasing interests in recent years. This review aimed to summarize the progress of DL-based classification methods for SSVEP data over the past decade.

Materials and method

The literature was searched and selected based on the research topics of DL and SSVEP. We categorized these methods into four classes, i.e., traditional neural network structures-based DL methods, traditional frequency recognition methods inspiring DL methods, attention mechanisms-based DL models, and transfer learning technology-based DL methods, and generative model-based recognition method. Moreover, we analyzed the current challenges and presented future research opportunities.

Conclusions

This study provides a systematic description of the current development status on DL-based SSVEP classification methods, and sheds insight on future researches.

1. Introduction

During the past decades, the brain-computer interface (BCI) technologies have achieved tremendous breakthroughs, which enables people who suffer from neuromuscular impairments to communicate with the outside world without using their own peripheral nerves and muscles [Citation1,Citation2]. Owing to its portability and non-invasiveness and accessibility, the non-invasive BCI has become the most popular modality in the research community. Among the various imaging modality in non-invasive BCI systems, electroencephalography (EEG) is favored by researchers due to its high temporal resolution, low cost and portability of the acquisition amplifier. EEG records the brain's electrical activities from the scalp, which reflects BCI users’ mental state or intention [Citation3]. In view of different paradigms, several EEG signals can be utilized to build a BCI system, such as motor imaginary (MI), P300 and steady-state visual evoked potential (SSVEP) [Citation4–6].

SSVEP is an EEG response evoked by a repetitive or flickering visual stimulus, which has the same fundamental frequency as the flickering stimulus as well as its harmonics [Citation7]. Compared to MI and P300, SSVEP-based BCIs could yield a high transfer rate (ITR) and provide an available large number of control commands with little user training [Citation8,Citation9]. Therefore, the SSVEP-based BCI system has experienced rapid development in the past decade. The general framework of the SSVEP-based BCI is presented in .

Figure 1. The framework of the SSVEP-based BCI. The system includes four parts: experimental paradigm, EEG recording, frequency recognition method, and application.

Figure 1. The framework of the SSVEP-based BCI. The system includes four parts: experimental paradigm, EEG recording, frequency recognition method, and application.

The performance of SSVEP-based BCI systems depends on three major factors: (1) stimulus paradigm, (2) the number of targets, and (3) classification algorithm (also termed as frequency recognition method) [Citation10]. Among these three impacting factors, the classification algorithm plays a pivotal role in achieving high-performance BCI systems. As a subtype of EEG signal, SSVEP is inclined to be contaminated by various noises and artefacts. Nevertheless, high-performance BCI requires that the system can correctly recognize the targets in a short time window by analyzing the EEG signal. Therefore, a robust classification method which could ensure quick and accurate classification results is a critical research topic for practical application. Based on different angles, a series of SSVEP classification methods proposed in the past two decades.

The first typical algorithm used was the intensity threshold method based on the fast Fourier transform (FFT) [Citation11]. This method needs parameter operation, such as channel selection [Citation12]. Without a troublesome optimization procedure, the training-free multiple channel algorithms become prevalent, such as the canonical correlation analysis (CCA) based method and its variants, multivariate synchronization index (MSI), etc. [Citation13,Citation14]. In the CCA and MSI methods, the reference signals were created by the sine-cosine function, which was lack of subject-specific characteristics. Later, individual template matching methods were proposed to replace the sine-cosine templates with those obtained by averaging calibration data at each frequency [Citation15]. CCA and MSI could achieve satisfactory recognition performance when there are only a few flicking targets and the signal length is long enough. However, the performance drops sharply when encountering a large number of stimulus targets or the signal length is short [Citation16]. In recent years, spatial filtering methods have become popular owing to their excellent performance on systems with a large number of targets, such as task-related component analysis (TRCA), correlated component analysis (CORCA), etc. [Citation17,Citation18]. These methods work in a supervised way that needs calibration data to calculate the spatial filters and reference templates, and they heavily depend on the number of calibration data of BCI users. Moreover, the calibration process is time-consuming and laborious, and the quality of the collected data would gradually decline due to the fatigue response of the subjects. Hence, designing a high-performance frequency recognition method that needs fewer calibration data or without calibration data is an urgent demand for practical BCI applications. In order to reduce the calibration procedure, some extended variants based on this state-of-the-art (SOAT) spatial filtering methods were proposed [Citation19,Citation20].

As we know, deep learning (DL) has experienced rapid development and achieved great success in many fields, such as computer vision (CV), natural language processing (NLP) and recommendation systems (RS) [Citation21–23]. Being distinguished from traditional machine learning algorithms, DL can simultaneously perform feature extraction and classifier training in an end-to-end scenario, which could avoid empirically designing the hand-crafted features. In the past decade, emerging DL technologies have received increasing interest in brain signal analysis, and it provides a promising methodology to decode brain states [Citation24–28].

Certainly, for the SSVEP-based BCI, DL technology provides an alternative avenue to seek solutions for the problems with traditional ML algorithms, in both user-dependent (UD) and user-independent (UI) scenarios. Till now, dozens of papers about the SSVEP classification methods were published. To the best of our knowledge, there is no comprehensive survey to summarize the progress of the DL-based classification methods for the SSVEP-based BCI. Hence, in this paper, we first reviewed the proposed DL methods for SSVEP classification over the past decade. Then, we analyzed the current challenges and discussed the potential directions and frontiers.

The structure of this paper is as follows. Section 2 presents the DL-based SSVEP classification methods in a taxonomy according to the ideas of designing model architectures. The challenges and potential directions and frontiers are presented in Section 3. The conclusion is presented in the last section.

2. Overview of the DL-based SSVEP classification methods

SSVEP is a multi-dimensional EEG signal, its spatial-temporal characteristics make it could benefit from various DL network architectures. Based on different angles, lots of DL classification methods have been proposed in the past decade. We present a taxonomy for the existing DL methods based on the ideas of designing network architectures and categorized the DL methods for SSVEP classification into four classes. In fact, there may exist overlap among them.

2.1. Traditional neural network structures based DL methods

2.1.1. CNN models

CNN models are widely used for EEG analysis. Among them, the EEGNET is a popular and state-of-the-art (SOTA) CNN model [Citation29]. It is a compact convolutional neural network (compact-CNN) composed of depth wise convolution and separable convolution operations. Waytowich et al. proposed to use the EEGNET to classify a 12-class SSVEP dataset (UCSD SSVEP dataset [Citation30]) in the UI classification scenario and achieved about 80% accuracy [Citation31].

Except for the universal EEGNET model, some CNN models were proposed to classify the SSVEP data under specific conditions. For example, Azan et al. proposed two CNN models to classify the raw dry-SSVEP signal [Citation32]. The models consist of SSVEP Convolutional Unit (SCU) block and two fully connected layers. Each SCU block has a 1D convolutional layer, a batch normalization layer, and a max pooling layer. The proposed models achieved superior performance on a 4-class dry-SSVEP dataset in both UD and UI classification scenarios. Nguyen et al. developed a 1D-CNN model for a bipolar single-channel SSVEP frequency recognition in a BCI speller system [Citation33]. The model comprises two convolutional layers and three fully connected layers. Using a 2-s time window, the proposed model achieved 97.4% average accuracy in an online experiment with an SSVEP speller with five flickering stimuli.

AR technology could provide the capacity to superimpose visual stimuli on the real world, and considerably enlarge the application scenarios of SSVEP-BCI. Compared to the traditional computer screen (PC), SSVEP-BCI based on Augmented Reality (AR) is more susceptible to external factors [Citation34,Citation35]. To meet the real-time processing requirements, Zhao et al. proposed a CNN model for multi-target AR-SSVEP classification [Citation36]. The model comprises four convolutional layers, three pooling layers, and a fully connected layer. It achieved 80.33% accuracy with 1-s data length on a 9-class AR-SSVEP dataset in the UD classification scenario.

To develop an efficient method for visual field tests, Khok et al. presented a multi-task CNN that can be used for SSVEP classification and visual response mapping [Citation37]. The network consists of four convolution blocks and a multi-task learning block. The proposed multi-task CNN was evaluated on the Benchmark SSVEP dataset [Citation38], with yields 92% classification accuracy in UI classification scenarios.

2.1.2. Hybrid models

Owing to the different characteristics of CNN and RNN, some researchers combined these two kinds of networks to design hybrid models. For example, Attia et al. proposed a DL model based on hybrid architecture (CNN-RNN) to classify SSVEP signals in the time domain directly [Citation39]. The CNN-RNN model used a depthwise convolution to capture features from each EEG channel independently, and employed RNN to capture the temporal relationship between the extracted features. This model achieved an accuracy of 93.59% on a 4-class SSVEP dataset in the UD classification scenario. Ishizuka et al. proposed a long short-term memory (LSTM) model to decode SSVEP signals for controlling quadcopters [Citation40]. The proposed model yielded 93% accuracy with 0.5 s time window on a 5-class SSVEP dataset in UD classification scenario, which significantly outperformed other compared methods. Recently, Pan et al. proposed an efficient CNN-LSTM model (termed as SSVEPNet) with spectral normalization and label smoothing technologies [Citation41]. Among the network components, CNN was used to learn spatial-temporal features, while LSTM was employed to encode features based on their dependencies. The label smoothing technique and spectral normalization were used to suppress noisy labels and stabilize training, respectively. SSVEPNet could achieve satisfactory performance with short data length and a small quantity of training data on both UD and UI classification scenarios on two datasets.

There are some other studies verified the effectiveness of traditional neural network structures based on DL methods [Citation42,Citation43].

2.2. Traditional frequency recognition methods inspiring DL methods

For the traditional frequency recognition methods, some potential technologies, such as the frequency domain transform, template matching technology, and filter bank technology have significantly enhanced the SSVEP classification tasks [Citation11,Citation13,Citation44]. Based on the findings in these traditional methods, these technologies have been leveraged in the DL models.

2.2.1. Frequency domain transform

In the early traditional methods, signal processing methods, such as fast Fourier transform (FFT), have been successfully used for frequency recognition [Citation11]. The EEG signals in the time domain were transformed into the frequency domain, and the spectrum features were then used for frequency detection. Owing to the prior spectrum characteristics of SSVEP signals, some DL models were proposed to integrate the FFT into the model structure or used the spectra representation obtained by FFT as the model input. For instance, Cecotti et al. first presented a CNN-based network architecture to conduct the SSVEP frequency recognition process [Citation45]. The proposed CNN model includes the fast Fourier transform (FFT) between two hidden layers to convert the signal analysis from the time domain to the frequency domain. The mean classification accuracy on a 4-class SSVEP dataset was 95.61% with a 1-s time window. Kwak et al. proposed two simple CNN models using the frequency domain representation of the EEG as the input. With a 4-class SSVEP paradigm, they obtained 99.28% and 94.03% accuracies with 2-s time window under the static and ambulatory conditions in UD classification scenarios, respectively [Citation46]. Ravi et al. proposed a magnitude spectrum convolutional neural network (M-CNN), which employs FFT to calculate the magnitude spectrum of the EEG signals as the input of a shallow CNN model [Citation47]. One year later, Ravi et al. designed a complex spectrum convolutional neural network (C-CNN) [Citation48]. The authors utilized the FFT to convert SSVEP signals to frequency domain representation and concatenated the real part and imaginary part of the complex spectrum as the input of the C-CNN model. This model achieved high performance on the 7-class SSVEP dataset and UCSD SSVEP dataset, in both UD and UI classification scenarios.

For the FFT, the frequency spectrum resolution depends on the data length. Therefore, when the short data length is used for SSVEP classification, we may not obtain the accurate spectrum at all the stimulus frequencies. Some studies used zero-padding to obtain a high enough frequency resolution for the FFT spectrum [Citation48], but the different frequency resolutions could result in different results [Citation33]. Precise spectrum representation for the EEG data of short length needs further exploration.

2.2.2. Template matching technology

Template matching technology was widely used after the CCA-based method was proposed [Citation13]. The reference signal templates could be created by sine-cosine function or by averaging multiple EEG calibration trials at each stimulus frequency. In recent years, the idea of template matching was used in designing the DL models and has achieved promising results. For instance, Xing et al. proposed a comparing network based on CNN to learn the relationship between SSVEP signals and the templates at each stimulus frequency [Citation49]. This model adopted the frequency domain signal as the input and outperformed other traditional methods including CCA and TRCA on a 4-class dataset in UD training scenario. Li et al. proposed a non-linear convolutional correlation analysis (Conv-CA) model [Citation50]. In the Conv-CA model, two parallel branches were used to transform the EEG signal and reference signals, and then a correlation layer was used to calculate the correlation coefficients between the outputs of the two branches. Evaluated on the Benchmark SSVEP dataset, the Conv-CA showed better performance than TRCA algorithm with the help of the sliding window technique in UD training scenarios.

Based on the Siamese neural network (SNN), Zhang et al. proposed a Siamese correlation analysis model (SiamCA), which consists of parallel two feature extractors with tied parameters and a top decision network [Citation51]. Considering the SSVEP signal and its corresponding reference signal may have similar temporal patterns, the SiamCA used EEG raw data and SSVEP template as the network input. Evaluated on the Benchmark SSVEP dataset and a 4-class SSVEP dataset, the SiamCA yielded better performance than the Conv-CA in UD classification scenario. It is worth noting that the SiamCA could remain high accuracy of about 60% even if the time window is extremely short (0.2 s). Recently, Zhang et al. further proposed a bidirectional SiamCA (bi-SiamCA) model and evaluated it on the 40-class benchmark dataset and a 12-class public dataset [Citation52]. The bi-SiamCA included a forward side and a backward side sub-networks with the same structure which was composed of two LSTM-based feature extractors with shared weights and a decision network, and the output of the two sub-networks was fused for the final decision. These two SiamCA models used the EEG data and the individual template signals as a pair of inputs, and in which a Siamese network was adopted to measure the similarity between the pair of inputs.

With a two-branch structure, and inspired by the eCCA and TRCA, Xiao et al. proposed a fixed template network (FTN), a dynamic template network (DTN), and a data augmentation method for intra-subject classification [Citation53]. The FTN used the pre-defined sinusoidal template, DTN used subject-specific and dynamically updated templates. DTN and FTN were compared with both traditional frequency recognition methods and DL models on three public SSVEP datasets, and trained in a two-stage strategy. The classification results indicated that DTN and FTN could achieve better performance than the compared SOTA methods.

For these methods mentioned above, the common idea behind this is to use two parallel subnetworks to extract the features for the EEG and the reference signal or template signals, and then calculate the similarity between the outputs of two branches. It is needed to mention that these models were implemented and evaluated in UD classification scenario which needs individual calibration data.

2.2.3. The filter bank technology

In addition to the fundamental frequency component, SSVEP also contains harmonic components which contribute to the recognition process [Citation53]. Based on this consideration, Chen et al. proposed the extended CCA based on the filter bank technology [Citation44]. After that, this technology has become an indispensable strategy to enhance various traditional methods. In recent years, filter bank technology was also introduced to design DL models. For instance, Dang et al. proposed a multi-harmonic linkage convolutional neural network (MHLCNN) model [Citation54]. The author extracted the fundamental and two harmonic components from the multiple-channel spectra, respectively. These three components were parallel and served as the input of three branches in the MHLCNN model. In both normal and fatigue conditions, the MHLCNN could yield better results than the compared methods on a 5-class SSVEP dataset and an 8-class steady-state motion VEP(SSMVEP) dataset. Zhao et al. proposed a filter bank convolutional neural network (FBCNN) with three parallel CNN branches, which were used to extract and learn the harmonic features in three subbands [Citation55]. The complex spectra of the EEG data in three subbands were respectively input into the three branches. The complex spectra of a channel were created by concatenating the real part and imaginary part of the FFT spectrum of that channel. Compared with other CNN models, the FBCNN model could exhibit better performance on the UCSD SSVEP dataset and Benchmark SSVEP dataset. Considering the features of the frequency domain not obvious under a short time window and time-difference information of each channel may be ignored, Ding et al. proposed a concise time-domain-based CNN model (tCNN), and an extended filter bank tCNN (FB-tCNN) for short time-window SSVEP classification [Citation56]. In the FB-tCNN model, multiple CNN subnets with share parameters were used to extract the features from corresponding multiple filtered EEG data at predefined frequency sub-bands, respectively. Evaluated on two 4-class SSVEP datasets, the FB-tCNN could yield superior performance than CCA methods and other CNN models at short time windows.

To fully utilize the information from SSVEP harmonic components and the information from non-target stimulus data, Yao et al. proposed a CNN model named FB-EEGNet with multi-label technology [Citation57]. The model included three parallel subnetworks with the EEGNET structure, and the input of each subnetwork was a sub-band signal decomposed from the raw EEG data. The FB-EEGNet achieved significantly better performance in both UD and UI classification scenarios than the CCA, FBCCA, C-CNN and Compact-CNN on two SSVEP datasets. Bassi et al. proposed three DL models (FBDNNs) based on filter bank technology and deep neural networks (DNNs), i.e. an RNN model (FBRNN), a 2D CNN model (FBCNN-2D) and a 3D CNN model (FBCNN-3D), to conduct frequency recognition for single-channel SSVEP classification with short data length [Citation58]. The single-channel SSVEP data was decomposed into ten subband components with the filter bank, which were further processed by signal processing as the time domain representation, 2D complex spectra and 3D complex spectrograms as the input of FBRNN, FBCNN-2D and FBCNN-3D, respectively. To verify the effectiveness of the FBDNNs, the data from the Oz channel from three public SSVEP datasets (Benchmark dataset [Citation38], BETA dataset [Citation59] and Portable dataset [Citation60]) were chosen for evaluation. The experimental results demonstrated that FBDNNs outperformed the compared to several methods.

As we can see, the idea of the filter bank technology allows deep neural networks to efficiently analyze and utilize the harmonic components of SSVEP from different perspectives.

2.3. Attention mechanisms-based DL models

Attention mechanism has become one of the most promising strategies and has been widely utilized in various tasks and applications since 2017 [Citation61]. One of the famous structures is the Transformer, which achieved great success in the field of computer vision, natural language processing, etc [Citation22,Citation62]. In recent years, the Transformer structure has been used for EEG analysis [Citation63–65], also in SSVEP-based BCI. For instance, Chen et al. proposed a Transformer-based deep neural network model termed SSVEPformer for enhancing the performance of zero-calibration SSVEP-BCI [Citation66]. Inspired by previous studies, the SSVEPFormer adopted the frequency spectrum of SSVEP data as network input and employed the convolutional attention module and channel MLP module to encode the SSVEP features. The experimental results have shown that SSVEPFormer could achieve better performance than Compact-CNN and C-CNN on two datasets in UI classification scenarios. More significantly, the SSVEPFormer could achieve high accuracy of about 84% and 80% on two SSVEP datasets with 1-s time window, respectively. Based on the Transformer, Li et al. proposed a novel temporal-frequency fusion Transformer (TFF-Former) across two BCI tasks under zero-training decoding conditions [Citation67]. In the TFF-Former model, two symmetrical Transformer streams were leveraged to extract temporal-spatial and frequency-spatial features, which were considered as the two views of the EEG data, respectively. Each Transformer stream included a feature extractor and a cross-view module with a cross-attention mechanism. The proposed TFF-Former was validated on the Benchmark SSVEP dataset, achieving significantly better performance than the compared baseline methods.

Besides, some studies adopted the attention mechanism as part of the model to weigh or fuse the features. For example, Gao et al. utilized an attention mechanism and multiscale convolutional neural network to build a CNN model, termed an attention-based parallel multiscale convolutional neural network (AMS-CNN) [Citation68]. In this model, the attention mechanism was adopted to weigh the advanced features extracted by two sequential convolution blocks at different time steps.

2.4. Transfer learning technology-based DL methods

In general, a high-performance BCI system relies heavily on the amount of calibration data. However, collecting calibration data is time-consuming and laborious. Prolonged experiments may even lead to subject fatigue, which reduces the quality of the collected data. Thus, reducing the calibration data has become a hot topic in the BCI community. In recent years, transfer learning techniques become feasible to address these challenging problems. ​

One strategy based on transfer learning technology is fine-tuning. For instance, Guney et al. proposed a novel DNN architecture and used the fine-tuning strategy to enhance the intra-subject classification [Citation69]. The authors first used the data from other subjects to pre-train the proposed DNN network and then used the calibration data from the target subject to fine-tune the pre-training network. The whole framework was evaluated on Benchmark and BETA SSVEP dataset, yielding remarkable performance even when the time-window is 0.4 s. Owing to the effectiveness, similar pre-training and fine-tuning strategy were used by later researchers for enhancing intra-subject classification. For example, Rostatmi et al. designed a deep convolutional neural network (DCNN) for data classification in a noisy environment. The DCNN model was first pre-trained with data in the BETA dataset and then retained and tested with the individual data from the target subject [Citation70]. Although this scheme could bring significant improvement in classification performance, it still needs some calibration data from the target subject. In order to address the problem, based on the developed DNN model in the reference [Citation69], Guney et al. further proposed a scheme to achieve SSVEP classification without user-specific training procedure [Citation71]. Specifically, they first adopted the pre-existing dataset to train a preliminary global model and fine-tuned it with data from each existing subject. For a new subject, the most representative top K fine-tuned models were selected based on the statistical similarities between this subject and the existing subjects. The selected models were then used to implement the classification with a weighted combination strategy. Evaluated on Benchmark and BETA datasets, the proposed method yielded impressive performance without requiring user-specific calibration data.

Except for transferring the knowledge from the dataset from the same SSVEP paradigm, utilizing the knowledge from other datasets in similar paradigms may be another alternative choice to reduce the calibration efforts, and accelerate the convergence of the network. For instance, Rostami et al. pre-trained a deep CNN model termed as PodNet on the BETA database, and then transferred it to the classification task on the Benchmark database [Citation72]. In this study, the two datasets were from similar SSVEP experiments.

Some other studies tried to leverage datasets from unrelated fields, such as speech and image datasets. For example, Bassi et al. used a pre-trained variant of the visual geometry group (VGG) network with the AudioSet dataset (called VGGish), and then modified the last three fully connected layers to two new ones and retraining with the SSVEP dataset in a leave-one-subject-out manner [Citation73]. In this study, the EEG data were first converted to spectrograms by short-time Fourier transform (STFT) as the input of the model. The proposed methodology was evaluated with the data in Oz channel and at 12 Hz and 15 Hz in the Benchmark dataset, outperforming the SVM and FBCCA and speeding the model training compared with the training of the model from the scratch. Paula et al. transformed the EEG data into images and used the CNNs with 2D kernels as classifiers, which were pre-trained with ImageNet dataset [Citation74]. Four different EEG-to-image transformation techniques were investigated, and the proposed method could achieve better performance than the CNNs with 1D kernels on a four-class dataset, especially with short data length.

2.5. Generative model-based recognition method

The DL methods need a large amount of data to achieve good generalization. For the BCI application scenes, it is time-consuming and very costly to collect enough training data. A potential solution is to create synthetic data to alleviate the data demand. In 2014, Goodfellow et al. first proposed the concept of a generative adversarial network (GAN) to generate simulated data [Citation75]. Since then, the GAN and various variants have been used in different fields.

Without exception, the GAN technology has been introduced into the EEG data generation for the BCI systems [Citation76]. For the SSVEP, Aznan et al. investigated three generative models, i.e. a deep convolutional GAN, a Wasserstein GAN, and a variational auto-encoder, to evaluate the efficacy of the synthetic data for a three-classification task. With the help of the generated data, the accuracies can be improved in multiple evaluation scenarios [Citation77]. One year later, Aznan et al. proposed subject-invariant SSVEP GAN (SIS-GAN), using a single neural network to generate artificial EEG data from multiple SSVEP categories [Citation78]. Interestingly, by using data features that learn to be independent of subject-specific, SIS-GAN was able to train with only the generated data and achieve higher classification accuracy than those trained with the original data. Inspired by StarGAN v2, Kwon et al. proposed a multidomain signal-to-signal conversion method to generate artificial SSVEP signals from resting EEG [Citation79].

Although the above research indicates the feasibility of the GAN for SSVEP synthetic data, only a small number of targets was considered, and the data of each frequency were generated separately in most of the studies.

3. The challenges and future research opportunities

In this section, we want to highlight the potential challenges and future research opportunities for SSVEP classification with DL methods.

3.1. The current challenges of DL models for SSVEP data

3.1.1. Enhancing performance with short data length

The classification method based on DL model for the SSVEP data has achieved impressive progress over the past several years. Although the DL methods surpass the conventional frequency recognition methods, such as FBCCA and TRCA [Citation31,Citation48], in both UD and UI classification scenarios when a small amount of individual calibration data is available, the models with short data length still cannot achieve considerable classification performance when the length of SSVEP data is shorter than 0.5 s, especially in the UI classification scenario [Citation41]. For example, the average accuracy was only 76.50% with 0.4 s time window on the benchmark dataset [Citation52]. Seeking solutions for precise spectrum representation for the EEG data of short length could be helpful to enhance the performance of the DL models [Citation48].

3.1.2. Achieving high-performance with limited calibration data

Owing to the representation ability, DL models hold the promise to learn knowledge from a dataset of pre-existing subjects, which is not easy for conventional methods [Citation69,Citation71]. Whereas, compared to the scenario with enough training data for the target subject, it is still challenging to obtain excellent classification performance with limited calibration data or without the individual calibration data. For example, for the 1-s data length of the UCSD SSVEP dataset, several studies have achieved recognition accuracy of more than 90% with enough calibration data (> =12 trials) in the subject-dependent training scenario [Citation41,Citation48,Citation53]. However, for the classification with only a small amount of calibration data (< =3 trials) or without calibration data, the recognition accuracy decreased by less than 90% [Citation31,Citation41,Citation48].

3.1.3. Addressing the problem of increasing stimulus targets

The increasing number of stimulus targets also increases the difficulty for the DL model to achieve superior classification performance [Citation66]. Another phenomenon in the literature is that most of the studies used multiple channels (such as eight or nine channels), and few studies considered the portable scenario with a small number of channels [Citation58]. But, in the reported portable scenario, these studies only verified the methods with a very limited number of stimulus targets [Citation58,Citation73].

3.1.4. Implementing SSVEP-BCI system outside the laboratory

Few studies were implemented outside the laboratory environment in the ambulatory environment [Citation46], and most of the reported results were obtained with openly available datasets or self-collected datasets. Compared with the EEG signals collected in the laboratory environment, the EEG signals collected in the real environment would contain more noise components, thus hindering the frequency recognition process. Besides, a convenient and fast portable BCI wearable device is more favored by users in practical applications [Citation42]. Few studies considered the portable scenario with fewer channels [Citation54], and verified the methods with a very limited number of stimulus targets [Citation54,Citation70]. To promote the practical application of SSVEP-BCI system in the real world, the DL-based classification methods urgently need to further develop. Therefore, more efforts need to address these challenging problems in future studies.

3.2. Future research opportunities

3.2.1. Developing new neural network models

The EEG signals are multiple-dimension time series and have specific spatial-temporal-spectrum characteristics, which are different from the images. Therefore, directly using the neural networks in the CV and NLP fields may not bring satisfactory results, although sometimes we obtain a solution by adjusting the hyperparameters. But the advanced technologies could be introduced into designing network architectures for EEG data, such as spectral normalization and label smoothing which could regularize the networks for improving generalization [Citation41,Citation57], etc. Besides, except for the common CNN and LSTM, more novel networks, such as the Transformer network [Citation66], spiking neural network (SNN) [Citation80], Capsule network [Citation81], and Siamese network [Citation52], etc., could be integrated as the backbones. For example, on the benchmark datasets with 1-s time window, the SSVEPFormer model based on the Transformer achieved about 84% accuracy in UI classification scenarios [Citation66], and the bi-SiamCA model based on the Siamese network achieved 94.07% accuracy in UD classification scenarios [Citation52].

From Section 2.2, we could find a feasible and promising avenue to develop the model by transferring the operations in traditional methods into the model architecture, such as the template matching and filter bank technologies, etc. Parts of previous studies have verified the effectiveness of designing the models from these perspectives [Citation49–58,Citation82]. In the CV and NLP fields, lots of huge models were developed [Citation83], but the training of each one was costly. We may look forward to whether the large-scale models will come out for EEG data, and the trade-off between the cost and model performance may be a critical factor. The future model should boost the classification performance on the data with short data length and a small number of channels.

3.2.2. Zero-calibration DL method

For the traditional methods and DL methods, the more calibration data available, the better the classification results [Citation41,Citation84]. For example, when the ratio of the training data to testing data increased from 2:8 to 5:5, the averaged classification accuracy of the SSVEPNET model increased from 88.62% to 96.58% with 1-s time window on a 12-class dataset [Citation41]. But, collecting enough calibration data is time-consuming and laborious, which hinders the practical use of BCI system. Plug-and-Play BCI system that achieves zero-calibration for new users is always the ultimate goal for the researchers in the BCI community. Many attempts have been conducted in the literature towards the goal [Citation67,Citation71], but a large gap still exists between the performance in zero-calibration and subject-specific calibration scenarios. For example, on a 12-class dataset, the average accuracy reached 98.20% in subject-specific calibration scenarios, but it decreased to just 84.45% in zero-calibration scenarios [Citation41].

Currently, training a network with the data from pre-existing subjects and testing on new subjects is a common way to pursue achieve zero-calibration. But, owing to the large variability of EEG data among subjects, the pre-trained model with an existing dataset usually achieves suboptimal results on a new user. After fine-turning the model with the subject-specific calibration data, the results could be improved significantly [Citation53,Citation69,Citation70]. Besides, transfer learning has displayed the potential to implement zero-calibration training [Citation71], and domain adaption has shown its effectiveness in other tasks such as motor imagery classification [Citation85,Citation86]. These technologies could be further investigated and applied to SSVEP tasks in future studies.

In previous research [Citation87,Citation88], BCI researchers leveraged incremental learning strategy to extend previously acquired knowledge when the data from a new subject is available but that is not originally present in the subject database. The DL models could also be designed to work adaptively with the new coming data in the BCI application system.

3.2.3. Cross-task learning model

For current methods, they are tailored to one specific task, such as SSVEP classification. On the one hand, these methods may be hard to work across tasks. On the other hand, it may be missing the chance to learn the shared information to enhance the performance from other data under different tasks. It is an interesting direction to design the model running across tasks [Citation67]. For example, in order to design robust algorithms across datasets and tasks (including SSVEP classification task) in the field of BCI, Du et al. proposed a neural network named Inception EEG-Net (IENet) based on Inception V4 network and Inception temporal structure [Citation89].

3.2.4. Data augment with DL method

The DL methods need a large amount of data to train the parameters, especially for the model with deep architecture. In order to generate enough training samples, one approach is using the sliding window technology, which divides an original trial into multiple segments [Citation52], and another approach is using the GAN models. Owing to the strong correlation among the data segments in the same trial, the variety which is important for DL model training among samples is limited. For the studies to use GAN model, the number of targets is small in previous studies, such as four targets (6 Hz, 6.67 Hz, 7.5 Hz and 10 Hz) in the studies by Kwon et al. [Citation79]. We need to make efforts to provide high-quality data for a large number of targets by existing GAN architectures, such as 40 targets in the benchmark dataset [Citation38]. As we know, for SSVEP classification methods, the longer data length yields better classification results. Accordingly, for the data augment, another way is to extend the data length by DL method. We have completed some interesting experiments and will publish the results in the near future.

3.2.5. Model interpretability and adversarial robustness

In the primary stage, the deep neural networks were successfully used as the classifiers to functionally solve the classification problems, but the rationality and interpretability behind the network architecture were not considered sufficiently. We may just accept the black-box characteristic of deep neural networks. Nowadays, explainable neural networks have become a hot and important topic in the DL community. We want to know how the used networks work and explore meaningful features. Some studies have addressed the model interpretability and presented a detailed visualization analysis [Citation41,Citation48,Citation66]. In the future, model interpretability and visualization may become indispensable measures to evaluate the networks.

In the CV tasks, the researchers find that the DNN are vulnerable to adversarial attacks, and proposed some solutions [Citation90]. Undoubtedly, adversarial robustness is very important for the DNN applied in security-critical scenarios. For example, in the face recognition system, the hidden attribute editing of the face image could be used to launch an anti-attack against the face recognition model [Citation91]. For the SSVEP-BCI system, the performance of the frequency recognition method is heavily dependent on the frequency information. Therefore, for traditional methods such as CCA and FBCCA, adding square wave signals as an adversarial perturbation to EEG data could greatly distort the classification result specified by the attacker [Citation92]. However, until now, no studies have analyzed the adversarial robustness of the model for SSVEP data. To ensure the safe and reliable application of SSVEP-based BCI system, we should carry out this research in future studies.

3.2.6. Building complex large-scale datasets

Although some SSVEP datasets are publicly available, such as Benchmark, BETA, these datasets were acquired under strictly controlled conditions, it is hard to simulate the real scene outside the laboratory. As the ImageNet dataset in the CV field, we may need to build large-scale datasets in which the data are acquired in a complex environment, such as those in ambulatory and asynchronous conditions [Citation46,Citation93], and under the paradigm with large instruction sets (e.g. 160 targets [Citation16]). The tasks in these datasets will be more challenging but more valuable to advance the SSVEP-based BCI toward practical applications. Besides, the data collected under wearable settings, i.e. dry electrodes and a small number of electrodes, will improve the user experience and extend the application scenes [Citation60].

4. Conclusion

In the past decades, the DL-based frequency recognition methods have achieved impressive performance in both UD and UI classification scenarios, and hold the promise for meeting the practical demands of BCI systems outside the laboratory. In this paper, we first presented basic concepts for SSVEP-based BCI, and briefly described the development process of the frequency recognition methods. Then, we reviewed the progress of DL-based methods for SSVEP data from different perspectives. Based on our understanding of existing research, we categorized these DL methods into several classes for better description and summarization, i.e. traditional neural network structures-based DL methods, traditional frequency recognition methods inspiring DL methods, attention mechanisms-based DL models, and transfer learning technology-based DL methods, and generative model-based recognition method. Although DL has advanced the classification for SSVEP data, more efforts need to improve current methods and propose new ones. Therefore, in the last section, we analyzed the current challenges from four aspects and proposed six future research opportunities for the DL researchers for the SSVEP-based BCI.

Author contributions

Y.Z., Y.P. conceived and designed this paper. Y.P., Y.Z., and J.C. wrote the paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant No.[62076209].

Notes on contributors

Yudong Pan

Yudong Pan received the B.E. degree from the Yichun University. He is currently pursuing the M.E. degree with the Southwest University of Science and Technology, China. His research interests include brain–computer interface (BCI), steady-state visual evoked potential (SSVEP), and generative adversarial network (GAN).

Yangsong Zhang

Yangsong Zhang received the Ph.D. degree in signal and information processing from the School of Life Science and Technology, University of Electronic Science and Technology of China, in 2013. He is currently a Professor with the School of Computer Science and Technology, Southwest University of Science and Technology, China. His research interests include brain–computer interface, deep learning, machine learning, medical imaging processing, etc.

Reference

  • Wolpaw JR, Birbaumer N, McFarland DJ, et al. Brain-computer interfaces for communication and control. Clin Neurophysiol. 2002;113(6):767–791.
  • Yao D, Qin Y, Zhang Y. From psychosomatic medicine, brain–computer interface to brain–apparatus communication. Brain-Apparatus Comm. 2022;1(1):66–88.
  • Wolpaw JR. Brain-computer interfaces (BCIs) for communication and control Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility. 2007. p. 1–2.
  • Qi F, Li Y, Wu W. RSTFC: a novel algorithm for spatio-temporal filtering and classification of single-trial EEG. IEEE Trans. Neural Netw Learning Syst. 2015;26(12):3070–3082.
  • Jin J, Zhang H, Daly I, et al. An improved P300 pattern in BCI to catch user’s attention. J Neural Eng. 2017;14(3):036001.
  • Jiao Y, Zhang Y, Wang Y, et al. A novel multilayer correlation maximization model for improving CCA-based frequency recognition in SSVEP brain–computer interface. Int. J. Neur. Syst. 2018;28(04):1750039.
  • Zhang Y, Xu P, Liu T, et al. Multiple frequencies sequential coding for SSVEP-based brain-computer interface. PLOS One. 2012;7(3):e29519.
  • Chen X, Chen Z, Gao S, et al. A high-itr ssvep-based bci speller. Brain-Computer Interfaces. 2014;1(3-4):181–191.
  • Yin E, Zhou Z, Jiang J, et al. A dynamically optimized SSVEP brain–computer interface (BCI) speller. IEEE Trans Biomed Eng. 2015;62(6):1447–1456.
  • Nakanishi M, Wang Y, Wang YT, et al. A high-speed brain speller using steady-state visual evoked potentials. Int J Neur Syst. 2014;24(06):1450019.
  • Cheng M, Gao X, Gao S, et al. Design and implementation of a brain-computer interface with high transfer rates. IEEE Trans Biomed Eng. 2002;49(10):1181–1186.
  • Wang Y, Wang R, Gao X, et al. A practical VEP-based brain-computer interface. IEEE Trans Neural Syst Rehabil Eng. 2006;14(2):234–240.
  • Lin Z, Zhang C, Wu W, et al. Frequency recognition based on canonical correlation analysis for SSVEP-based BCIs. IEEE Trans Biomed Eng. 2006;53(12 Pt 2):2610–2614.
  • Zhang Y, Xu P, Cheng K, et al. Multivariate synchronization index for frequency recognition of SSVEP-based brain–computer interface. J Neurosci Methods. 2014;221:32–40.
  • Bin G, Gao X, Wang Y, et al. A high-speed BCI based on code modulation VEP. J Neural Eng. 2011;8(2):025015.
  • Chen Y, Yang C, Ye X, et al. Implementing a calibration-free SSVEP-based BCI system with 160 targets. J Neural Eng. 2021;18(4):046094.
  • Nakanishi M, Wang Y, Chen X, et al. Enhancing detection of SSVEPs for a high-speed brain speller using task-related component analysis. IEEE Trans Biomed Eng. 2018;65(1):104–112.
  • Zhang Y, Guo D, Li F, et al. Correlated component analysis for enhancing the performance of SSVEP-based brain-computer interface. IEEE Trans Neural Syst Rehabil Eng. 2018;26(5):948–956.
  • Zhang Y, Yin E, Li F, et al. Two-stage frequency recognition method based on correlated component analysis for SSVEP-based BCI. IEEE Trans Neural Syst Rehabil Eng. 2018;26(7):1314–1323.
  • Wong CM, Wan F, Wang B, et al. Learning across multi-stimulus enhances target recognition methods in SSVEP-based BCIs. J Neural Eng. 2020;17(1):016026.
  • He K, Zhang X, Ren S, et al. Deep residual learning for image recognition Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–778.
  • Devlin J, Chang MW, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  • Berg R, Kipf TN, Welling M. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017.
  • Faust O, Hagiwara Y, Hong TJ, et al. Deep learning for healthcare applications based on physiological signals: a review. Comput Methods Programs Biomed. 2018;161:1–13.
  • Roy Y, Banville H, Albuquerque I, et al. Deep learning-based electroencephalography analysis: a systematic review. J. Neural Eng. 2019;16(5):051001.
  • Craik A, He Y, Contreras-Vidal JL. Deep learning for electroencephalogram (EEG) classification tasks: a review. J Neural Eng. 2019;16(3):031001.
  • Zhang X, Yao L, Wang X, et al. A survey on deep learning-based non-invasive brain signals: recent advances and new frontiers. J Neural Eng. 2021;18(3):031002.
  • Zhang Y, Cai H, Nie L, et al. An end-to-end 3D convolutional neural network for decoding attentive mental state. Neural Netw. 2021;144:129–137.
  • Lawhern VJ, Solon AJ, Waytowich NR, et al. EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018;15(5):056013.
  • Nakanishi M, Wang Y, Wang YT, et al. A comparison study of canonical correlation analysis based methods for detecting steady-state visual evoked potentials. PLOS One. 2015;10(10):e0140703.
  • Waytowich N, Lawhern VJ, Garcia JO, et al. Compact convolutional neural networks for classification of asynchronous steady-state visual evoked potentials. J. Neural Eng. 2018;15(6):066031.
  • Aznan NKN, Bonner S, Connolly J, et al. On the classification of SSVEP-based dry-EEG signals via convolutional neural networks. 2018 IEEE international conference on systems Man, and Cybernetics (SMC). Miyazaki, Japan, 07-10 October 2018. pp. 3726–3731.
  • Nguyen TH, Chung WY. A single-channel SSVEP-based BCI speller using deep learning. IEEE Access. 2019;7:1752–1763.
  • Zhang R, Xu Z, Zhang L, et al. The effect of stimulus number on the recognition accuracy and information transfer rate of SSVEP–BCI in augmented reality. J Neural Eng. 2022;19(3):036010.
  • Zhao X, Liu C, Xu Z, et al. SSVEP stimulus layout effect on accuracy of brain-computer interfaces in augmented reality glasses. IEEE Access. 2020;8:5990–5998.
  • Zhao X, Du Y, Zhang R. A CNN-based multi-target fast classification method for AR-SSVEP. Comput Biol Med. 2022;141:105042.
  • Khok HJ, Koh VTC, Guan C. Deep multi-task learning for SSVEP detection and visual response mapping. 2020 IEEE international conference on systems man, and cybernetics (SMC). Toronto, ON, Canada, 11–14 October 2020; pp. 1280–1285.
  • Wang Y, Chen X, Gao X, et al. A benchmark dataset for SSVEP-based brain–computer interfaces. IEEE Trans Neural Syst Rehabil. Eng. 2017;25(10):1746–1752.
  • Attia M, Hettiarachchi I, Hossny M, et al. A time domain classification of steady-state visual evoked potentials using deep recurrent-convolutional neural networks. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). Washington, DC, USA, 04-07 April 2018. pp. 766–769.
  • Ishizuka K, Kobayashi N, Saito K. High accuracy and short delay 1ch-ssvep quadcopter-bmi using deep learning. JRM. 2020;32(4):738–744.
  • Pan Y, Chen J, Zhang Y, et al. An efficient CNN-LSTM network with spectral normalization and label smoothing technologies for SSVEP frequency recognition. J. Neural Eng. 2022;19(5):056014.
  • Mahmood M, Mzurikwao D, Kim YS, et al. Fully portable and wireless universal brain–machine interfaces enabled by flexible scalp electronics and deep learning algorithm. Nat Mach Intell. 2019;1(9):412–422.
  • Thomas J, Maszczyk T, Sinha N, et al. Deep learning-based classification for brain-computer interfaces. 2017 IEEE international conference on systems man, and cybernetics (SMC). Banff, AB, Canada, 05-08 October 2017. pp. 234–239.
  • Chen X, Wang Y, Gao S, et al. Filter bank canonical correlation analysis for implementing a high-speed SSVEP-based brain–computer interface. J. Neural Eng. 2015;12(4):046008.
  • Cecotti H. A time–frequency convolutional neural network for the offline classification of steady-state visual evoked potential responses. Pattern Recog Lett. 2011;32(8):1145–1153.
  • Kwak NS, Müller KR, Lee SW. A convolutional neural network for steady state visual evoked potential classification under ambulatory environment. PLoS One. 2017;12(2):e0172578.
  • Ravi A, Manuel J, Heydari N, et al. A convolutional neural network for enhancing the detection of SSVEP in the presence of competing stimuli. 2019 41st annual international conference of the ieee engineering in medicine and biology society (EMBC). Berlin, Germany, 23-27 July 2019. pp. 6323–6326.
  • Ravi A, Beni NH, Manuel J, et al. Comparing user-dependent and user-independent training of CNN for SSVEP BCI. J. Neural Eng. 2020;17(2):026028.
  • Xing J, Qiu S, Ma X, et al. A CNN-based comparing network for the detection of steady-state visual evoked potential responses. Neurocomputing. 2020;403:452–461.
  • Li Y, Xiang J, Kesavadas T. Convolutional correlation analysis for enhancing the performance of SSVEP-based brain-computer interface. IEEE Trans. Neural Syst. Rehabil. Eng. 2020;28(12):2681–2690.
  • Zhang X, Qiu S, Geng M, et al. Enhancing detection of SSVEPs for high-speed brain-computer interface with a Siamese architecture. 2021 IEEE international conference on bioinformatics and biomedicine (BIBM). Houston, TX, USA,09-12 December 2021.pp. 1623–1627.
  • Zhang X, Qiu S, Zhang Y, et al. Bidirectional siamese correlation analysis method for enhancing the detection of SSVEPs. J Neural Eng. 2022;19(4):046027.
  • Xiao X, Xu L, Yue J, et al. Fixed template network and dynamic template network: novel network designs for decoding steady-state visual evoked potentials. J Neural Eng. 2022;19(5):056049.
  • Dang W, Li M, Lv D, et al. MHLCNN: multi-harmonic linkage CNN model for SSVEP and SSMVEP signal classification. IEEE Trans Circuits Syst II Express Briefs. 2021;69(1):244–248.
  • Zhao D, Wang T, Tian Y, et al. Filter bank convolutional neural network for SSVEP classification. IEEE Access. 2021;9:147129–147141.
  • Ding W, Shan J, Fang B, et al. Filter bank convolutional neural network for short time-window steady-state visual evoked potential classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2021;29:2615–2624.
  • Yao H, Liu K, Deng X, et al. FB-EEGNet: a fusion neural network across multi-stimulus for SSVEP target detection. J Neurosci Methods. 2022;379:109674.
  • Bassi PRAS, Attux R. FBDNN: filter banks and deep neural networks for portable and fast brain-computer interfaces. Biomed Phys Eng Express. 2022;8(3):035018.
  • Liu B, Huang X, Wang Y, et al. BETA: a large benchmark database toward SSVEP-BCI application. Front. Neurosci. 2020;14:627.
  • Zhu F, Jiang L, Dong G, et al. An open dataset for wearable ssvep-based brain-computer interfaces. Sensors. 2021;21(4):1256.
  • Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008, 2017.
  • Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  • Sun J, Xie J, Zhou H. EEG classification with transformer-based models. 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech). Nara, Japan, 09-11 March 2021. pp. 92-93. .
  • Song Y, Jia X, Yang L, et al. Transformer-based spatial-temporal feature learning for EEG decoding. arXiv preprint arXiv:2106.11170, 2021.
  • Liu J, Zhang L, Wu H, et al. Transformers for EEG emotion recognition. arXiv preprint arXiv:2110.06553, 2021.
  • Chen J, Zhang Y, Pan Y, et al. A Transformer-based deep neural network model for SSVEP classification. arXiv preprint arXiv:2210.04172, 2022.
  • Li X, Wei W, Qiu S, et al. TFF-former: temporal-Frequency fusion transformer for zero-training decoding of two BCI tasks. Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal, 10-14 October 2022. pp. 51–59.
  • Gao Z, Sun X, Liu M, et al. Attention-based parallel multiscale convolutional neural network for visual evoked potentials EEG classification. IEEE J. Biomed. Health Inform. 2021;25(8):2887–2894.
  • Guney OB, Oblokulov M, Ozkan H. A deep neural network for ssvep-based brain-computer interfaces. IEEE Trans. Biomed. Eng. 2022;69(2):932–944.
  • Rostami E, Ghassemi F, Tabanfar Z. Improving the classification of real-world SSVEP data in brain-computer interface speller systems using deep convolutional neural networks. Front. Biomed. Technol. 2022;9(4):248–254.
  • Guney OB, Ozkan H. Transfer Learning of an Ensemble of DNNs for SSVEP BCI Spellers without User-Specific Training. arXiv preprint arXiv:2209015112022.
  • Rostami E, Ghassemi F, Tabanfar Z. Transfer learning assisted PodNet for stimulation frequency detection in steady state visually evoked potential-based BCI spellers. Brain-Computer Interfaces. 2022;10(1):1–12.
  • Bassi PRAS, Rampazzo W, Attux R. Transfer learning and SpecAugment applied to SSVEP based BCI classification. Biomed Signal Process Control. 2021;67:102542.
  • de Paula PO, da S, Costa TB, et al. Classification of image encoded SSVEP-based EEG signals using convolutional neural networks. Expert Syst Appl. 2022;214:119096.
  • Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in neural information processing systems. 2014; 27: 2672–2680.
  • Hartmann KG, Schirrmeister RT, Ball T. EEG-GAN: generative adversarial networks for electroencephalograhic (EEG) brain signals. arXiv preprint arXiv:1806.01875, 2018.
  • Aznan NKN, Atapour-Abarghouei A, Bonner S, et al. Simulating brain signals: creating synthetic eeg data via neural-based generative models for improved ssvep classification. 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary, 14-19 July 2019. pp. 1–8.
  • Aznan NKN, Atapour-Abarghouei A, Bonner S, et al. Leveraging synthetic subject invariant EEG signals for zero calibration BCI. 2020 25th international conference on pattern recognition (ICPR). Milan, Italy, 10-15 January 2021. pp. 10418–10425.
  • Kwon J, Im CH. Novel signal-to-signal translation method based on StarGAN to generate artificial EEG for SSVEP-based brain-computer interfaces. Expert Syst Appl. 2022;203:117574.
  • Yu Q, Yan R, Tang H, et al. A spiking neural network system for robust sequence recognition. IEEE Trans. Neural Netw. Learning Syst. 2016;27(3):621–635.
  • Deng L, Wang X, Jiang F, et al. EEG-based emotion recognition via capsule network with channel-wise attention and LSTM models. CCF Trans. Pervasive Comp. Interact. 2021;3(4):425–435.
  • Müller-Putz GR, Scherer R, Brauneis C, et al. Steady-state visual evoked potential (SSVEP)-based communication: impact of harmonic frequency components. J. Neural Eng. 2005;2(4):123–130.
  • Sevilla J, Heim L, Ho A, et al. Compute trends across three eras of machine learning. arXiv preprint arXiv:220205924, 2022.
  • Gao W, Yu T, Yu JG, et al. Learning invariant patterns based on a convolutional neural network and big electroencephalography data for subject-independent P300 brain-computer interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 2021;29:1047–1057.
  • Jeon E, Ko W, Suk HI. Domain adaptation with source selection for motor-imagery based BCI. 2019 7th international winter conference on brain-computer interface (BCI). Gangwon, Korea (South), 18-20 February 2019. pp. 1–4.
  • Tang X, Zhang X. Conditional adversarial domain adaptation neural network for motor imagery EEG decoding. Entropy. 2020;22(1):96.
  • Krana M, Farmaki C, Pediaditis M, et al. SSVEP based wheelchair navigation in outdoor environments. 2021 43rd annual international conference of the IEEE engineering in medicine & biology society (EMBC). Mexico, 01-05 November 2021. pp. 6424–6427.
  • Zhang S, Ma K, Yin Y, et al. A personalized compression method for steady-state visual evoked potential EEG signals. Information. 2022;13(4):186.
  • Du Y, Liu J. IENet: a robust convolutional neural network for EEG based brain-computer interfaces. J. Neural Eng. 2022;19(3):036031.
  • Yan Z, Guo Y, Zhang C. Deep defense: training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pp. 419–428, 2018.
  • Jia S, Yin B, Yao T, et al. Adv-Attribute: inconspicuous and Transferable Adversarial Attack on Face Recognition. arXiv preprint arXiv:2210.06871 2022.
  • Bian R, Meng L, Wu D. SSVEP-based brain-computer interfaces are vulnerable to square wave attacks. Sci China Inf Sci. 2022;65(4):1–13.
  • Zhang X, Xu G, Mou X, et al. A convolutional neural network for the detection of asynchronous steady state motion visual evoked potential. IEEE Trans Neural Syst Rehabil Eng. 2019;27(6):1303–1311.