Full article: Lightweight multilayer interactive attention network for aspect-based sentiment analysis

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Aspect-based sentiment analysis (ABSA) aims to automatically identify the sentiment polarity of specific aspect words in a given sentence or document. Existing studies have recognised the value of interactive learning in ABSA and have developed various methods to precisely model aspect words and their contexts through interactive learning. However, these methods mostly take a shallow interactive way to model aspect words and their contexts, which may lead to the lack of complex sentiment information. To solve this issue, we propose a Lightweight Multilayer Interactive Attention Network (LMIAN) for ABSA. Specifically, we first employ a pre-trained language model to initialise word embedding vectors. Second, an interactive computational layer is designed to build correlations between aspect words and their contexts. Such correlation degree is calculated by multiple computational layers with neural attention models. Third, we use a parameter-sharing strategy among the computational layers. This allows the model to learn complex sentiment features with lower memory costs. Finally, LMIAN conducts instance validation on six publicly available sentiment analysis datasets. Extensive experiments show that LMIAN performs better than other advanced methods with relatively low memory consumption.

KEYWORDS:

1. Introduction

Recently, aspect-based sentiment analysis (ABSA) (Brauwers & Frasincar, Citation2022; Nazir et al., Citation2020) has attracted wide attention as a fine-grained sentiment analysis task (Wei, Zhu, et al., Citation2022; Xu, Zhang, Zhu, et al., Citation2022). Unlike traditional sentiment analysis (Wei, Liu, et al., Citation2022; Zhang, Yu, et al., Citation2022), ABSA aims to determine the sentiment polarity (a.k.a., emotional tendencies) of a given sentence or document at the aspect level. As shown in Figure , given a sentence, “Despite him not being my designated server, he took over for our inexperienced waiter the rest of the dinner”. When the word “server” is viewed as an aspect, the polarity of the sentence at the aspect level is positive, as the sentence reflects a favourable attitude towards “server”. Considering the word “waiter” as an aspect, the expected sentiment polarity is negative. Similarly, the corresponding polarity of the word “dinner” is neutral.

Figure 1. An illustration of ABSA.

Regarding the research on ABSA tasks, existing methods mostly employ deep neural networks to construct sentiment classifiers. Long short-term memory (LSTM) (Hochreiter & Schmidhuber, Citation1997), as a mainstream neural network model, has been widely used in ABSA. For example, Ma et al. (Citation2017) proposed a LSTM-based interactive attention network. Huang et al. (Citation2018) built connections between aspects and contexts by introducing an attention-over-attention network based on the LSTM. However, LSTM-based networks are difficult to parallelise, and suffer from problems with truncated backpropagation and gradient disappearance. Another widely used strategy is to introduce convolutional neural networks (CNN), such as GCAE (Xue & Li, Citation2018) and UP-CNN (Wang et al., Citation2021). CNN-based ABSA methods are mostly simple and computationally efficient. However, fixed-size convolutional windows make them difficult to learn long-distance dependencies between words. Recently, with the development of Transformer (Vaswani et al., Citation2017), attention networks are becoming popular. For instance, Song et al. (Citation2019) suggested an interactive learning model that avoids recurrence by utilising pure attention network for encoding. Yang et al. (Citation2021) constructed an attention network with local context focus mechanism to identify aspect-specific sentiment polarity. Besides these methods mentioned above, various other types of neural networks are also present in ABSA tasks. Such as, CapsNet (Jiang et al., Citation2019) and KGAN (Zhong et al., Citation2022).

Although these methods are effective, we believe that aspect-based sentiment analysis remains a challenge: how to describe complex semantic relations between aspect words and their contexts in a lightweight way. This challenge motivated us to develop a powerful neural network that captures the complex correlations between aspect words and their contexts with less memory cost.

To achieve this, we developed a Lightweight Multilayer Interactive Attention Network (LMIAN) for ABSA. Our method is data-driven, parallelisable, and does not rely on external knowledge (e.g. sentiment lexicons). The core of the entire network is an interactive computational layer. Each computational layer contains two interactive attention models. To make LMIAN able to recognise the intricate semantic relationships between aspect words and their contexts with low memory consumption. We stack multiple computational layers with their parameters shared among the layers. The final layer’s computational results are regarded as sentiment features. Our method is an end-to-end model since each component is differentiable, with the advantage of efficiently learning complex sentiment information in a lightweight way. The disadvantage of our method is that the more layers, the longer the computation time. The following is the primary work of this study.

We propose a lightweight multilayer interactive attention network for ABSA. The network can learn intricate semantic relationships between aspect words and their contexts through multiple interactive computational layers.
We design two attention models for learning representations of specific-aspects and specific-contexts. Moreover, parameters are shared among computational layers to achieve a good balance between model performance and memory cost.

2. Related works

2.1. Neural networks for ABSA

RNN-based ABSA methods have recently achieved remarkable results due to the power of recurrent neural networks (RNN) (Funahashi & Nakamura, Citation1993) for mining temporal and semantic information from data (Zhang et al., Citation2018; Zhang, Li, et al., Citation2022). AdaRNN (Dong et al., Citation2014) applied a transferred target-dependent dependency tree to a standard recursive neural network for feature learning and achieved competitive performance. TC-LSTM (Tang, Qin, Feng, et al., Citation2016) significantly improved the model’s performance by capturing information about aspects in a sentence. Apart from these methods mentioned above, many LSTM-based attention models have been proposed recently. For instance, Wang et al. (Citation2016) pioneered combining attention and LSTM for ABSA. RAM (Chen et al., Citation2017) computes long-distance sentiment features by encoding aspect words employing Bi-LSTM and attention mechanism. Xu, Zhang, Zhang, et al. (Citation2022) proposed an attention-BiLSTM method with transfer learning ability. However, RNN-based methods are memory-intensive and cannot perform parallel computations.

As another dominant neural network, convolution neural networks (CNNs) are simple and efficient since they can be well-parallelised on graphics processing units (GPUs). For example, GCAE (Xue & Li, Citation2018) and UP-CNN (Wang et al., Citation2021). However, with a fixed-size kernel window, it is difficult to establish dependencies between distant words.

2.2. Attention networks for ABSA

Recently, with the continuous progress of Transformer, how to apply it to ABSA has attracted the interest of researchers (Tang et al., Citation2020; Wu et al., Citation2020; Wu & Ong, Citation2021). Multi-Head Attention (MHA) in Transformer can model the relationship between distant words in text. This powerful representation ability can help to make connections between aspect words and their contexts. Many MHA-based techniques for sentiment analysis have been proposed, facilitating the development of ABSA. For example, Xu et al. (Citation2020) presented a novel sentiment analysis framework to overcome the time loss inherent in recursive structures. Zeng et al. (Citation2019) proposed a novel ABSA framework with a local context focus. Lv et al. (Citation2021) proposed CAMN with excellent results by introducing MHA to memory networks. However, these models ignore the memory cost caused by complex operations, which makes it challenging to balance model size and performance.

Different from the above-mentioned methods, our method achieves a good balance between performance and size. Specifically, we select attention networks that avoid recurrent structures as the model’s framework. And multiple attentional computational layers are stacked to capture complex sentiment information. In addition, we also introduce a parameter sharing strategy among the computational layers for better optimise the memory cost of the deep network.

3. Lightweight multilayer interactive attention network

We present a lightweight and effective ABSA structure, namely LMIAN. First, we provide a task definition. Second, we overview the method before introducing the interactive attention models in each computational layer. Finally, we show how the technique can be applied to the ABSA tasks.

3.1. Task definition

Given a sentence, as shown in Figure , the sentence contains n (e.g. n = 19 in Figure ) words and m (e.g. m = 3 in Figure ) aspects. Each aspect contains at least one word. Our task is to identify the emotional tendencies expressed by aspect words like “server”, “waiter”, and “dinner” in the sentence. When we process a text corpus, we first need to perform word embedding (Mikolov et al., Citation2013; Paccanaro & Hinton, Citation2001), i.e. map the word w_i into a computable continuous vector $e_{i} \in R^{1 \times d}$ . We stack the vectors of each word in the sentence to form a word embedding matrix $E \in R^{V \times d}$ . The notation V is the vocabulary size, and the notation d is the word vector dimension. e_i is a row of the E.

In LMIAN, we utilise BERT (Devlin et al., Citation2019) to map context words ${w_{c}^{1}, \dots, w_{c}^{n}}$ and aspect words ${w_{a}^{1}, \dots, w_{a}^{n}}$ into a context embedding matrix $C = [e_{c}^{1}, \dots, e_{c}^{n}] \in R^{n \times d}$ and an aspect embedding matrix $A = [e_{a}^{1}, \dots, e_{a}^{m}] \in R^{m \times d}$ , respectively.

3.2. An overview of the method

The structure of LMIAN is shown in Figure , which draws inspiration from MHA in machine translation (Vaswani et al., Citation2017). Our method contains multiple computational layers, each of which consisting of two MHA modules and Point-wise Convolution Transformation (PCT) modules. In the first computational layer, the mapped aspect vector and context vector are used as the inputs of this layer, and the MHA modules within the layer adaptively compute the significant relationships between them. The PCT modules transform the outputs of the MHA modules, and the obtained results are used as inputs to the next computational layer. We stack a series of such computational layers and repeat these steps multiple times to allow LMIAN to learn a broader and more complex abstract relationships. The output hidden representations of the last computational layer are considered as representations containing context and aspect information in relation to each other. The next step will be considered as features for sentiment analysis and used for classification.

Figure 2. The structure of the proposed LMIAN. Share indicates the parameter sharing of BERT. Pool indicates the pooling operation.

Typically, the more layers of computation are stacked, the more parameters the model has, and the more memory is consumed. To decrease the difficulty of model training, we let the parameters of each computational layer be shared. Therefore, the total number of parameters is fixed regardless of how many computational layers the model has. This strategy lays the foundation for building a lightweight multilayer interactive attention network.

3.3. Interactive attention

The attention mechanism has the ability to focus on key information through weight assignment. In this work, we design two interactive attention modules. They can learn specific-aspect context hidden representations and specific-context aspect hidden representations, respectively. The intuition is that coordinating aspect words and their context helps facilitate the capture of complex emotional information. The specific details are described in Figure .

Figure 3. Interactive attention mechanism. Linear, T, and Concat denote linear projection, transpose operation, and concatenate operation.

Suppose there is a pair of sequences $K = {k_{1}, \dots, k_{n}} \in R^{n \times d}$ and $Q = {q_{1}, \dots, q_{m}} \in R^{m \times d}$ , which we take as inputs to the attention modules. The corresponding output sequence is calculated using formula (1), where d_k is the dimension of the sequence. (1) $Attention (Q, K) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) K$ (1) MHA can learn emotional information from h distinct subspaces, thus enabling the network to capture more nuanced emotional features. The MHA is calculated as follows. (2) $\begin{aligned} MHA (Q, K) & = [hea d_{1}; \dots; hea d_{h}] W^{O} \end{aligned}$ (2) (3) $\begin{aligned} hea d_{i} & = Attention ({Q W}_{i}^{Q}, {K W}_{i}^{K}) \end{aligned}$ (3) where “;” denotes vector concatenation. $W_{i}^{Q} \in R^{d \times d_{k}}$ , $W_{i}^{K} \in R^{d \times d_{k}}$ , and $W^{O} \in R^{h d_{k} \times d}$ are learnable weights. Vector sizes in subspaces are calculated as follows. (4) $d_{k} = d / / h$ (4) In practice, we take a pair of sequences of contexts and aspects to describe the process of attention calculation. We regard the context matrix C as Q and the aspect matrix A as K, and calculate the context-specific aspect hidden representation $A_{c}$ by: (5) $A_{c} = MHA (C, A)$ (5) Next, we regard the aspect matrix A as Q and the context matrix C as K, and use formula (6) to compute the aspect-specific context hidden representation $C_{a}$ . (6) $C_{a} = MHA (A, C)$ (6)

Afterward, these hidden state representations are fed into the PCT modules for further transformation.

3.4. Point-wise convolution transformation

PCT is a tick mainly used to transform the information collected by MHA modules. According to the results of the ablation tests (Section 4.5.1), it is known that PCT is beneficial in enhancing the performance of LMIAN. The inputs of the PCT modules are the outputs of the MHA modules. The PCT is calculated as follows. (7) $PCT (H) = σ (H * W_{1} + b_{1}) * W_{2} + b_{2}$ (7) where H is the MHA output result, * denotes the convolution operation. $W_{1} \in R^{d \times d}$ and $W_{2} \in R^{d \times d}$ are trainable model parameters, $b_{1} \in R^{d}$ and $b_{2} \in R^{d}$ are the corresponding bias terms. The symbol $σ$ refers to the ReLU (Nair & Hinton, Citation2010) activation function.

Given A_c and C_a, PCT modules are applied to obtain the final context-guided aspect hidden states representation $H^{A} = {H_{1}^{A}, \dots, H_{m}^{A}}$ and aspect-guided context hidden states representation $H^{C} = {H_{1}^{C}, \dots, H_{n}^{C}}$ by: (8) $\begin{aligned} H^{A} & = P C T (A_{c}) + A \end{aligned}$ (8) (9) $\begin{aligned} H^{C} & = P C T (C_{a}) + C \end{aligned}$ (9)

3.5. Output layer

We pool and concatenate H^A and H^C to obtain final sentiment features X. Finally, a softmax is employed to compute the probability distribution Y over each polarity category. (10) $X = [P o o l (H^{A}); P o o l (H^{C})]$ (10) (11) $Y = s o f t m a x (X)$ (11)

3.6. Why multilayer interactive attention is needed

In general, networks consisting of multiple computational layers have a more robust learning ability than single-layer networks (Chen et al., Citation2021; LeCun et al., Citation2015; Tang, Qin, and Liu, Citation2016). In LMIAN, the essence of single-layer interactive attention is a weight assignment process that may have difficulty learning complex information such as transitions, negation, and emphasis in text. Multi-layer interactive attention is capable of learning high-dimensional abstract text representations. This is because each computational layer focuses on the imported aspect words and their contexts and gradually transforms them into abstract text representations. Combining a sufficient number of such transformations makes it possible to learn highly complex functions of sentence representation for a given aspect.

3.7. Model training

In this work, the aspect input and context input of the LMIAN are reconstructed as “[CLS]” + aspect + “[SEP]” and “[CLS]” + context + “[SEP]”, respectively. We update LMIAN’s parameters via backpropagation and cross-entropy loss minimisation. The loss function is as follows. (12) $l o s s = - \frac{1}{m} \sum_{i = 1}^{m} \sum_{j = 1}^{c} y_{i j} \log (\frac{e x p (z_{i j})}{\sum_{k = 1}^{c} e x p (z_{i k})})$ (12) where m refers the number of samples, c is the number of classes, $y_{i j}$ is the true value of the i-th sample on the j-th category, and $z_{i j}$ denotes the classifier’s output.

4. Experiments

4.1. Datasets

We prepare two types of sentiment analysis datasets, Chinese and English, to comprehensively evaluate the performance of LMIAN. Chinese datasets (Che et al., Citation2015; Peng et al., Citation2018) include four categories: Notebook, Phone, Car and Camera. English datasets consist of two categories: Laptop and Restaurant, from SemEval-2014 Task4 subtask2 (http://alt.qcri.org/semeval2014/task4). These datasets are collected from customer reviews of products or services. Each sample in the datasets contains three parts: review content, review object, and sentiment label. Table shows details of these datasets.

Table 1. Statistical information on sentiment analysis datasets, including two English and four Chinese.

Download CSV Display Table

4.2. Experimental settings

In experiments, we perform hyperparameter setting and weight parameter initialisation. For the word embedding vectors, we choose a pre-trained BERT-base with 768 embedding sizes for initialisation. For remaining updatable weight matrices are initialised by Glorot (Glorot & Bengio, Citation2010). See Table for hyperparameter settings.

Table 2. Hyperparameter settings.

Download CSV Display Table

4.3. Model comparisons

We list eight excellent methods as comparison models. Most of them achieved most advanced results for ABSA at that time.

ATAE-LSTM differs from the traditional LSTM in that it utilises aspect information and finally focuses with attention (Wang et al., Citation2016).
IAN is a kind of interactive attention network. After LSTM encoding, aspect words and their context words interact with attention. This is the first work to introduce the concept of interactive learning in ABSA tasks (Ma et al., Citation2017).
AOA introduce attention-over-attention to capture the interactions between aspects and contexts (Huang et al., Citation2018).
GCAE is a CNN-based ABSA model. It selectively outputs aspect-specific sentiment features by combining convolutional neural networks and gating mechanisms (Xue & Li, Citation2018).
AEN eschews RNNs and is an entirely attention-based interactive network. Two attention mechanisms, intra- and inter-MHA, are proposed for contexts modelling and context-aware aspects modelling (Song et al., Citation2019).
LCF-BERT is a BERT-based, local context-focused attention model. LCF-BERT argues that local context can better emphasise sentence polarity, and proposes two focusing schemes (Zeng et al., Citation2019).
UP-CNN is a location-aware convolutional neural network ABSA model, which embeds location information into context vectors and extracts aspect-related features (Wang et al., Citation2021).
LCF-ATEPC is a multi-task learning model improved on LCF-BERT with aspect polarity classification ability (Yang et al., Citation2021).

4.4. Experimental results

LMIAN and comparison models’ prediction results on the six datasets are shown in Tables and . From them, several conclusions can be drawn as follows.

On the Chinese dataset, LMIAN generally outperforms all comparative models overall, including LSTM-based methods and attention-based methods. For the English datasets, LMIAN also achieves highly competitive results, outperforming both CNN-based and LSTM-based methods.
Methods that initialise word vectors with pre-trained language models usually achieve better performance, in both Chinese and English datasets.
Further research on the result that LMIAN performs better on the Chinese datasets but slightly worse on the English datasets suggests that longer sentences contain more noise to the detriment of LMIAN.

Table 3. Experimental results of LMIAN and comparison models on four Chinese datasets (%).

Download CSV Display Table

Table 4. Experimental results of LMIAN and comparison models on two English datasets (%).

Download CSV Display Table

With the above analysis, LMIAN is effective on ABSA tasks in Chinese and English. Moreover, LMIAN is more effective in predicting the polarity of short sentences.

4.5. Model analysis

4.5.1. Ablation tests

To verify each element in LMIAN is reasonable, several LMIAN variants are constructed, and they are described below.

LMIAN w/o PCT: The PCT modules are removed from each computational layer.
LMIAN w/o MHA: The MHA modules are removed from each computational layer.
LMIAN with MHSA: Replacing MHA modules in each computational layer with MHSA (Vaswani et al., Citation2017).
LMIAN with LSTM: Replacing MHA modules in each computational layer with LSTM.

The F1 scores of LMIAN are generally better than its variants according to the experimental results in Table . When MHA or PCT modules are removed from each computational layer, F1 scores have different degrees of decrease. This suggests that the existence of MHA and PCT modules are meaningful. When MHA was replaced by MHSA, LMIAN performs poorly, because the model loses its interactive capability. Interestingly, LMIAN with LSTM performs well on the Phone datasets. The reason for this phenomenon may be that the sentiment features in the Phone dataset show more obvious and do not need to be captured by complex computation. However, with the introduction of LSTM, the model loses the ability of parallel computation.

Table 5. Ablation results.

Download CSV Display Table

4.5.2. Effect of number of computational layers

According to the results in Figures and , we can draw the following conclusions:

LMIAN with multiple computational layers performs better. This conclusion validates our idea that models learn complex sentiment information in a shallow interactive way is difficult.
With stacking of computational layers, LMIAN's performance may decrease. We can observe that LMIAN with two computational layers performs best on the Notebook and Camera datasets. LMIAN achieves optimal performance on the Car and Phone datasets when the number of computational layers is stacked to 3 and 5. We speculate that some relationship may exist between the size of the dataset and the complexity of the semantics. The more complex the dataset is, the more computational layers are required.

Figure 4. Accuracy for different stacking layers on the four datasets.

Figure 5. F1 score for different stacking layers on the four datasets.

4.5.3. Model size

LMIAN with parameter-sharing strategy takes less memory during training. A good balance is achieved between the computational ability and the size of the model. Results recorded in Table confirm this. We reproduce part of models that do not perform weakly with the same device and code framework. IAN-BERT and AOA-BERT are LSTM-based attention models, and memory optimisation is more difficult since hidden states need to be kept in memory to calculate attention scores. AEN-BERT and LCF-ATEPC are pure attentional encoding networks that avoid recurrent and are more lightweight compared to LSTM-based models. Our LMIAN constructs computational layers with attention networks and shares parameters among the computational layers.

Table 6. Parameters and memory cost of the model training on the Car dataset.

Download CSV Display Table

4.5.4. Case study

Lastly, we explore to what extent multiple computational layers improve LMIAN’s predictive ability. Figure presents the single- and multi-layer LMIAN prediction results for three different sentences.

We can find that model is easier to make correct predictions for sentences with explicit sentiment. For example, in the first sentence, both single- and multi-layer LMIAN successfully predict the polarity of aspect words.
For sentences with complex information like “transitions”, multi-layer LMIAN performers better. For instance, in the second sentence, single-layer LMIAN makes incorrect judgements due to without enough learning ability.
For sentences with implicit sentiment like the third example, both single- and multi-layer LMIAN fail to predict correctly. This situation may require us to combine other modals (e.g. images) to make a better judgment.

Figure 6. Prediction results of single- and multi-layer LMIAN on real samples. Samples extracted from the laptop dataset. The words in red are the aspect words in samples. “Gold” indicates true sentiment polarity. “√” denotes a correct prediction. “×” denotes an incorrect prediction. We use a three-layer LMIAN for testing.

5. Conclusions

To lightly and effectively learn complex sentiment information hidden in the text, we propose a lightweight multilayer interactive attention network, namely LMIAN. The conclusion of this paper focus on the following two points:

Multilayer interactions help improve model performance. We argue that the deep interaction of aspects and contexts is beneficial to facilitate the capture of complex emotional information. The experimental results in Section 4.5.2 show that multilayer interactions improve the model performance.
LMIAN has less memory consumption. In LMIAN, we make the parameters of each layer of the network shared, aiming at learning data representations with multiple levels of abstraction at lower complexity. The experimental results in Section 4.5.3 show that LMIAN has fewer parameters and less memory consumption.

Extensive experiments have proved that our LMIAN achieves a better balance between the model’s performance, size, and GPU memory consumption. In the future, we will further optimise our interactive attention model to have higher performance and lower GPU memory consumption. One of our ideas is to perform deep interaction of local features with aspect words to improve model performance by reducing interference information.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant number 62076006], the University Synergy Innovation Programme of Anhui Province [grant number GXXT-2021-008], the Anhui Provincial Key R&D Programme [grant number 202004b11020029], and the Scientific Research Fund for Young Teachers of Anhui University of Science & Technology [grant number QNZD2021-02].

References

Brauwers, G., & Frasincar, F. (2022). A survey on aspect-based sentiment classification. ACM Computing Surveys (CSUR), 55(4), 1–37. https://doi.org/10.1145/3503044
Web of Science ®Google Scholar
Che, W., Zhao, Y., Guo, H., Su, Z., & Liu, T. (2015). Sentence compression for aspect-based sentiment analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2111–2124. https://doi.org/10.1109/TASLP.2015.2443982
Web of Science ®Google Scholar
Chen, P., Sun, Z., Bing, L., & Yang, W. (2017). Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 452–461). https://doi.org/10.18653/v1/d17-1047
Google Scholar
Chen, Y., Zhuang, T., & Guo, K. (2021). Memory network with hierarchical multi-head attention for aspect-based sentiment analysis. Applied Intelligence, 51(7), 4287–4304. https://doi.org/10.1007/s10489-020-02069-5
Web of Science ®Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: pre-training of deep bidirectional transformers for language understanding. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171–4186). https://doi.org/10.48550/arXiv.1810.04805
Google Scholar
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., & Xu, K. (2014). Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd annual meeting of the association for computational linguistics (pp. 49–54). https://doi.org/10.3115/v1/p14-2009
Google Scholar
Funahashi, K. I., & Nakamura, Y. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. Neural Networks, 6(6), 801–806. https://doi.org/10.1016/s0893-6080(05)80125-x
Web of Science ®Google Scholar
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256). Microtome Publishing.
Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
PubMed Web of Science ®Google Scholar
Huang, B., Ou, Y., & Carley, K. M. (2018). Aspect level sentiment classification with attention-over-attention neural networks. In International conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation (pp. 197–206). https://doi.org/10.1007/978-3-319-93372-6_22
Google Scholar
Jiang, Q., Chen, L., Xu, R., Ao, X., & Yang, M. (2019). A challenge dataset and effective models for aspect-based sentiment analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 6280–6285). https://doi.org/10.18653/v1/d19-1654
Google Scholar
Kingma, D. P., & Ba, J. (2015, May 7–9). Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, San Diego, CA, United states.
Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
PubMed Web of Science ®Google Scholar
Lv, Y., Wei, F., Cao, L., Peng, S., Niu, J., Yu, S., & Wang, C. (2021). Aspect-level sentiment analysis using context and aspect memory network. Neurocomputing, 428, 195–205. https://doi.org/10.1016/j.neucom.2020.11.049
Web of Science ®Google Scholar
Ma, D., Li, S., Zhang, X., & Wang, H. (2017). Interactive attention networks for aspect-level sentiment classification. 26th International Joint Conference on Artificial Intelligence, IJCAI 2017(pp. 4068–4074). https://doi.org/10.24963/ijcai.2017/568
Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, May 2-4). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, Scottsdale, AZ, United states.
Google Scholar
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 249–256). Microtome Publishing.
Google Scholar
Nazir, A., Rao, Y., Wu, L., & Sun, L. (2020). Issues and challenges of aspect-based sentiment analysis: A comprehensive survey. IEEE Transactions on Affective Computing, 13(2), 845–863. https://doi.org/10.1109/TAFFC.2020.2970399
Web of Science ®Google Scholar
Paccanaro, A., & Hinton, G. E. (2001). Learning distributed representations of concepts using linear relational embedding. IEEE Transactions on Knowledge and Data Engineering, 13(2), 232–244. https://doi.org/10.1109/69.917563
Web of Science ®Google Scholar
Peng, H., Ma, Y., Li, Y., & Cambria, E. (2018). Learning multi-grained aspect target sequence for Chinese sentiment analysis. Knowledge-Based Systems, 148, 167–176. https://doi.org/10.1016/j.knosys.2018.02.034
Web of Science ®Google Scholar
Song, Y., Wang, J., Jiang, T., Liu, Z., & Rao, Y. (2019). Attentional encoder network for targeted sentiment classification. Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series (pp. 93–103). https://doi.org/10.1007/978-3-030-30490-4_9
Google Scholar
Tang, D., Qin, B., Feng, X., & Liu, T. (2016). Effective LSTMs for target-dependent sentiment classification. 26th International Conference on Computational Linguistics, COLING 2016 (pp. 3298–3307). Association for Computational Linguistics.
Google Scholar
Tang, D., Qin, B., & Liu, T. (2016). Aspect level sentiment classification with deep memory network. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 214–224). https://doi.org/10.18653/v1/d16-1021
Google Scholar
Tang, H., Ji, D., Li, C., & Zhou, Q. (2020). Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 6578–6588). Association for Computational Linguistics.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. 31st Annual Conference on Neural Information Processing Systems, NIPS 2017 (pp. 5999–6009). Neural information processing systems foundation.
Google Scholar
Wang, X., Li, F., Zhang, Z., Xu, G., Zhang, J., & Sun, X. (2021). A unified position-aware convolutional neural network for aspect based sentiment analysis. Neurocomputing, 450, 91–103. https://doi.org/10.1016/j.neucom.2021.03.092
Web of Science ®Google Scholar
Wang, Y., Huang, M., Zhu, X., & Zhao, L. (2016). Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 606–615). https://doi.org/10.18653/v1/d16-1058
Google Scholar
Wei, S., Zhu, G., Sun, Z., Li, X., & Weng, T. (2022). GP-GCN: Global features of orthogonal projection and local dependency fused graph convolutional networks for aspect-level sentiment classification. Connection Science, 34(1), 1785–1806. https://doi.org/10.1080/09540091.2022.2080183
Web of Science ®Google Scholar
Wei, Z. L., Liu, W. J., Zhu, G. L., Zhang, S. X., & Hsieh, M. Y. (2022). Sentiment classification of Chinese Weibo based on extended sentiment dictionary and organisational structure of comments. Connection Science, 34(1), 409–428. https://doi.org/10.1080/09540091.2021.2006146
Web of Science ®Google Scholar
Wu, Z., & Ong, D. C. (2021). Context-guided bert for targeted aspect-based sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 14094–14102). Association for the Advancement of Artificial Intelligence.
Google Scholar
Wu, Z., Ying, C., Dai, X., Huang, S., & Chen, J. (2020). Transformer-Based Multi-aspect Modeling for Multi-aspect Multi-sentiment Analysis. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 546–557). https://doi.org/10.1007/978-3-030-60457-8_45
Google Scholar
Xu, G., Zhang, Z., Zhang, T., Yu, S., Meng, Y., & Chen, S. (2022). Aspect-level sentiment classification based on attention-BiLSTM model and transfer learning. Knowledge-Based Systems, 245, 108586. https://doi.org/10.1016/j.knosys.2022.108586
Web of Science ®Google Scholar
Xu, H., Zhang, S., Zhu, G., & Zhu, H. (2022). ALSEE: A framework for attribute-level sentiment element extraction towards product reviews. Connection Science, 34(1), 205–223. https://doi.org/10.1080/09540091.2021.1981825
Web of Science ®Google Scholar
Xu, Q., Zhu, L., Dai, T., & Yan, C. (2020). Aspect-based sentiment classification with multi-attention network. Neurocomputing, 388, 135–143. https://doi.org/10.1016/j.neucom.2020.01.024
Web of Science ®Google Scholar
Xue, W., & Li, T. (2018). Aspect based sentiment analysis with gated convolutional networks. In Proceeding of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 2514–2523). Association for Computational Linguistics.
Google Scholar
Yang, H., Zeng, B., Yang, J., Song, Y., & Xu, R. (2021). A multi-task learning model for Chinese-oriented aspect polarity classification and aspect term extraction. Neurocomputing, 419(1), 344–356. https://doi.org/10.1016/j.neucom.2020.08.001
Google Scholar
Zeng, B., Yang, H., Xu, R., Zhou, W., & Han, X. (2019). Lcf: A local context focus mechanism for aspect-based sentiment classification. Applied Sciences, 9(16), 3389. https://doi.org/10.3390/app9163389
Google Scholar
Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253. https://doi.org/10.1002/widm.1253
Web of Science ®Google Scholar
Zhang, S., Yu, H., & Zhu, G. (2022). An emotional classification method of Chinese short comment text based on ELECTRA. Connection Science, 34(1), 254–273. https://doi.org/10.1080/09540091.2021.1985968
Web of Science ®Google Scholar
Zhang, W., Li, X., Deng, Y., Bing, L., & Lam, W. (2022). A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. arXiv preprint arXiv:2203.01054. https://doi.org/10.48550/arXiv.2203.01054
Google Scholar
Zhong, Q., Ding, L., Liu, J., Du, B., Jin, H., & Tao, D. (2022). Knowledge graph augmented network towards multiview representation learning for aspect-based sentiment analysis. arXiv preprint arXiv:2201.04831. https://doi.org/10.48550/arXiv.2201.04831
Google Scholar

Lightweight multilayer interactive attention network for aspect-based sentiment analysis

Abstract

1. Introduction