Full article: Multi-class multi-label classification of social media texts for typhoon damage assessment: a two-stage model fully integrating the outputs of the hidden layers of BERT

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

With the development of social media, it has become increasingly important to quickly and accurately identify social media texts related to disasters (e.g. typhoon) to aid in rescue and recovery efforts. Currently, multi-class classification and pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) are widely used for text classification. However, most studies on typhoon damage classification are multi-class single-label, which contradicts to the reality that a social media text may correspond to multiple types of damage. Moreover, the outputs of the hidden layers of BERT are not fully utilized. This paper proposes a two-stage multi-class multi-label classification method for typhoon damage assessment by fully integrating the outputs of the hidden layers of BERT. In the first stage, sentence vectors are adopted to identify typhoon damage-related texts. In the second stage, word matrices are applied for multi-class multi-label classification to further classify the texts into five damage categories (i.e. transportation, public, electricity, forestry, and waterlogging). The two stages are trained end-to-end to identify typhoon damage from social media texts. Experiments on $SinaWeibo$ texts during typhoon landfall in Chinese coastal regions demonstrate that the proposed method can effectively improve the accuracy of text classification and comprehensively assess typhoon damage.

KEYWORDS:

1. Introduction

Typhoon, an extreme and recurrent weather phenomenon, profoundly disrupts the daily lives of individuals and the regular functioning of society. The extent of typhoon damage is inversely proportional to the speed and effectiveness of emergency response (Lamsal, Harwood, and Read Citation2022). Therefore, timely access to information about the damage is critical for rapid and efficient rescue and recovery efforts. Traditional methods of assessing typhoon damage involve the use of remote sensing satellites and field surveys. However, remote sensing poses challenges in providing timely and comprehensive information on typhoon damage, especially in relation to human activities, while field surveys for damage assessment are time-consuming and labor-intensive (Rodríguez et al. Citation2020; Senaratne et al. Citation2023). These approaches are inadequate to meet the demands of emergency departments. Recently, there has been exponential growth in social media (e.g. $Twitter$ , $SinaWeibo$ ), which has fundamentally changed the way people interact with each other. The public takes on the role of ‘sensors’, rapidly disseminating vast amounts of information about the typhoon damage (Lam et al. Citation2023; Ogie et al. Citation2022). Social media, characterized by broad public participation, timely data updates, and low-cost data collection, addresses the shortcomings of traditional data sources.

Various classification methods have been developed to identify disaster-related social media texts (Vongkusolkit and Huang Citation2021). Typically, this classification process involves two separate steps: distinguishing relevant social media texts from irrelevant data, and then further classifying the identified relevant content. An established method, known as the text-based approach, measures the relevance of social media texts by using machine learning algorithms to search for specific keywords. However, this technique is susceptible to contextual influences and may miss relevant texts that lack certain keywords (Kumar, Singh, and Saumya Citation2019). For example, during a typhoon, a message such as ‘I plan to commute by rowing today’ may indicate waterlogging caused by the typhoon, even without explicit keywords such as ‘flooding’ or ‘rising water’.

With the development of deep learning, large language models pre-trained on the Transformer-based architecture are at the forefront of text classification. Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al. Citation2018), pre-trained on large corpora, acquires contextual text representations and demonstrates broad applicability and impressive performance in a wide range of downstream tasks. However, the outputs of different layers of BERT capture various linguistic nuances and show distinct performance across tasks (N. F. Liu, Gardner, et al. Citation2019). The front layers capture essential details such as character-based attributes, while the middle layers encapsulate syntactic nuances and the back layers prioritize semantic features. Most studies only use the output of the last layer of BERT for tasks, which can not fully leverage the outputs of the hidden layers. Furthermore, multi-class single-label classification methods are often used for text classification. However, due to the complex nature of text expressions, this method can not fully capture the meaning of the texts. For example, the statement ‘The typhoon knocked down trees and power lines, causing a power outage and leaving the house in darkness’ denotes both forestry and electricity damage.

To overcome the above limitations, this paper proposes a multi-class multi-label classification method for typhoon damage assessment by adopting a two-stage model that fully integrates the outputs of the hidden layers of BERT. The special classification tokens ([CLS]) and the vectors of each word of the outputs of all the 12 hidden layers of BERT, referred to as sentence vectors and word matrices, are applied to represent texts in two stages, respectively. The first stage is to identify typhoon damage-related texts, Bidirectional Long Short-Term Memory (BiLSTM) is applied to effectively capture contextual information from sentence vectors. In this stage, social media texts related to typhoon damage are identified and the corresponding word matrices of texts are passed to the second stage. The second stage is multi-class multi-label classification, a semantic integration-based convolutional neural network for text classification (Sit-CNN) is proposed to identify patterns and features in structured data and to extract linguistic nuances from word matrices. In this stage, social media texts are classified into five specific damage categories. In our method, the outputs of the hidden layers of BERT are integrated in groups from lower to higher layers to optimize text representations at each stage. Additionally, an end-to-end classification framework is proposed to balance the two stages and improve overall performance.

In summary, the main innovative contributions are threefold:

(1)	We propose a multi-class multi-label classification method to assess typhoon damage by classifying Sina Weibo texts into multi-class multi-label. Compared to existing single-class single-label or multi-class single-label methods, the proposed multi-class multi-label method is more suitable for investigating real-world typhoon damage situations.
(2)	Our proposed method fully integrates sentence vectors and word matrices of all the hidden layers of BERT by employing grouping strategies and dynamic kernels. Compared to utilizing only the output of the last layer of BERT, the proposed method provides a holistic understanding of texts at different semantic levels.
(3)	Our proposed method employs end-to-end training, allowing the model to optimize the entire task seamlessly. Compared to traditional non-deep learning approaches, our method reduces the need for manual feature engineering, making it better suited to address the requirements of typhoon damage assessment.

2. Related work

2.1. Social media for disaster analysis

Social media (e.g. Twitter, Sina Weibo) plays a vital role during disasters, providing an interactive and collaborative platform for communication and information dissemination (Huang et al. Citation2019; Li, Huang, and Emrich Citation2020). When traditional communication channels are unable to provide real-time updates or sufficient information in a flooding disaster, social media acts as an effective tool, allowing users to obtain and share information based on their needs (Tim et al. Citation2017). The use of social media escalates during typhoons, earthquakes, and many other natural disasters, resulting in an influx of timely information that supports the establishment of disaster situational awareness (Saroj and Pal Citation2020). In addition, social media facilitates dynamic interaction, enabling timely communication between individuals and emergency response institutions such as municipalities and government agencies during Hurricane Harvey (Ngamassi et al. Citation2022). In contrast to the general public, emerging influential contributors on social media platforms are gaining increasing attention by consistently disseminating disaster situation updates to the public. However, without effective oversight, social media is vulnerable to issues of authenticity and fairness. False information can easily spread widely and negatively impact rescue and recovery efforts (Goodchild and Glennon Citation2010). Research suggests a bias in the use of $Twitter$ , communities with higher disaster-related $Twitter$ use tend to be communities with better social-geographical conditions (Zou et al. Citation2018).

Social media has been used to improve situational awareness during the preparation, response, impact, and recovery phases of disasters (Cimellaro, Reinhorn, and Bruneau Citation2010; Wu and Cui Citation2018; C. Zhou et al. Citation2024). Sentiment analysis based on social media texts can shed light on the reactions of local residents, enabling informed disaster management strategies in an earthquake (Yang et al. Citation2019). Moreover, topic classification provides insight into what people are focusing on. A comprehensive classification scheme including warning and advisory, casualties and damage, information sources, donations and assistance, and people is applied to Hurricane Sandy (Imran et al. Citation2013; Yu et al. Citation2020). In contrast, some studies focus on specific topics. For example, a study analyzes the spatial distribution of the demand for relief supplies during Typhoon Haiyan's impact on the Philippines, covering refugee supplies, energy, food, clothing, medical, and water (Zhang et al. Citation2021). In addition, correlation analyzes reveal the potential of social media for disaster management, which can contribute to quantitative analysis of disaster. For example, a study find a significant correlation between the geographical distribution of damage-related texts and the economic losses of disasters, and the introduction of social media and artificial neural network greatly improves the damage assessment (Li et al., Citation2023). In general, the use of social media is a research trend to effectively reflect the situation of disasters. However, most of the existing studies have categorized social media texts into multi-class single-label, which contradicts the diversity and complexity of text expressions. The multi-class multi-label classification method proposed in this paper corresponds a single text to multiple damage categories, thus mining text connotations comprehensively and accurately.

2.2. BERT-based model for text classification

BERT revolutionizes language processing by capturing bidirectional context within text, providing a deep understanding of word meanings within sentences (Devlin et al. Citation2018). Several developments have enhanced BERT's capabilities. For example, a Robustly Optimized BERT Pretraining Approach (RoBERTa) exhibits greater robustness than BERT, achieved by training on extensive data (Y. Liu, Ott, et al. Citation2019). A Lite BERT (ALBERT) addresses memory consumption concerns and accelerates training by modifying BERT's architecture (Lan et al. Citation2019). A distilled version of BERT (DistillBERT) uses knowledge distillation to reduce BERT's size by 40% and speed up inference by 60%, while maintaining 99% of the original capabilities (Sanh et al. Citation2019). SpanBERT extends BERT to better represent and infer text spans (Joshi et al. Citation2020). Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) introduces a more sample-efficient pre-training task that replaces token detection, improving the efficiency of the pre-training process (Clark et al. Citation2020).

In downstream tasks, fine-tuning models on smaller datasets for specific text classification tasks through transfer learning can leverage the extensive pre-training on large amounts of data of BERT, resulting in notable benefits (El-Alami, El Alaoui, and Nahnahi Citation2022). Hybrid models such as BERT-BiLSTM and BERT-CNN have been applied to various text classification tasks. In these models, BERT first processes the texts to generate contextual embeddings, which are then sequentially fed into the BiLSTM or CNN layer for further processing. BERT-BiLSTM combines the bidirectional context understanding of BERT with the sequential understanding of BiLSTM. For example, in named entity recognition tasks, researchers extract valuable medical information from massive electronic health records (Dai et al. Citation2019). Similarly, sentiment analysis tasks accurately extract opinions and tendencies from investor and consumer statements published on Chinese Internet platforms. This enables the inference of sentiment orientation, providing crucial technical support for understanding energy market trends during social events (Cai et al. Citation2020). On the other hand, BERT-CNN integrates the contextual embeddings of BERT with the pattern recognition capabilities of CNN. For example, in offense detection tasks, hate speech and offensive language are detected from user-generated content such as articles, books, and blogs (Mehta et al. Citation2022). In addition, in sentence recognition tasks, the study of causality contained in financial text reveals potential laws of economic activity (Wan and Li Citation2022). Overall, BERT is one of the mainstream text classification methods for various domains. However, most of the studies have only used the output of the last layer of BERT, which does not mine the information of the hidden layers and limits the accuracy of the classification. Some researchers have already recognized this problem and improved the model performance by incorporating all hidden states parameters (B. Zhou et al. Citation2022). However, exploiting the information from the hidden layers is still an area of research. To address this challenge, we propose a two-stage model employing grouping strategies and dynamic kernels to fully integrate the information extracted by the hidden layers of BERT.

3. Methodology

In this paper, we propose a multi-class multi-label classification method by adopting a two-stage model that fully integrates the outputs of the hidden layers of BERT, thus achieving a comprehensive understanding of social media texts for typhoon damage assessment. shows the architecture of the proposed method, which consists of three parts (i.e. data processing, identification of typhoon damage-related texts, and multi-class multi-label classification for five damage categories). Before classification, the texts are preprocessed and fed into BERT to obtain the sentence vectors and word matrices extracted by all the 12 hidden layers. Grouping strategies are designed to extract semantic information at different levels. The first stage is identification of typhoon damage-related texts, where BiLSTM is adopted to extract low to high-level semantic features from the sentence vectors. In stage 1, the presence of typhoon damage is inferred from the texts, and if this is the case, the corresponding word matrices of the texts are passed to stage 2. The second stage is multi-class multi-label classification for five damage categories. In stage 2, Sit-CNN is proposed to extract more features from the word matrices, and the damage-related texts are further classified into five damage categories. In the light of multi-class multi-label classification, the texts can correspond to multiple damages simultaneously. The details of each part are presented in the following sections.

Figure 1. Architecture of the multi-class multi-label classification method by adopting a two-stage model that integrates the outputs of the hidden layers of BERT.

3.1. Data processing

The social media texts are preprocessed before being fed into BERT. This process includes segmenting the texts into words using a tokenizer, adding a special token [CLS] at the beginning, converting the segmented texts into corresponding token embeddings, and filling zero vectors for shorter texts to ensure that the inputs are of consistent length. In addition, position and segment embeddings are generated, which, together with the token embeddings, constitute the inputs to BERT.

BERT consists of 12 transformer encoders (). At its core is self-attention, which allows each word to consider its context by weighing the importance of other words. Multi-head attention further enriches these interactions, allowing the model to capture different aspects of the relationship. These elements are combined with feed-forward neural networks, residual connectivity and layer normalization to form layers that can learn increasingly abstract representations of the texts.

Figure 2. Architecture of BERT.

The outputs of BERT include sentence vectors and word matrices. Sentence vectors are [CLS] vectors that have an overall semantic representations, suitable for identifying typhoon damage-related texts. Conversely, word matrices are formed by vectors of each word, which provide more detailed local semantic representations, ideal for multi-class multi-label classification. To effectively integrate semantic information at different levels from the outputs of all the 12 hidden layers of BERT, grouping strategies are applied to the sentence vectors and the word matrices (). Since the textual representations of BERT are abstracted layer by layer, the grouping strategies are applied from low to high levels. The grouping strategy (G1) indicates that 12 hidden layers are grouped into one group, and the combination of the hidden layers is 1-12, and so on for other grouping strategies. The results of the different grouping strategies are compared to obtain the best representations of the texts in the subsequent classifications.

Table 1. Grouping strategies for the outputs of 12 hidden layers of BERT.

Download CSV Display Table

3.2. Stage 1: identification of typhoon damage-related texts

In the first stage, sentence vectors obtained from data processing are fed into BiLSTM for binary classification to identify typhoon damage-related texts. shows the architecture of BiLSTM, which consists of two layers: the forward LSTM processes the input sentence vectors from left to right, and the backward LSTM processes from right to left. Each layer comprises memory cells that maintain a cell state and three gates, including forget gate, input gate, and output gate. These three gates regulate the flow of information and control what information should be remembered, forgotten, or output at each time step.

Figure 3. Architecture of BiLSTM.

In greater detail, the forget gate controls which information from the cell state should be forgotten. It can be defined as: (1) $f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$ (1) where $W_{f}$ is the weight matrix, $b_{f}$ is the bias vector, $h_{t - 1}$ is the hidden state at the previous time step, $x_{t}$ is the input data at the current time step, and σ is the sigmoid function that maps the value of the forget gate to the range between 0 and 1, where 0 means forget completely and 1 means keep all information.

Similarly, the input gate, which controls what new information should be added to the cell state, can be obtained by: (2) $i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})$ (2) The output gate, which controls which parts of the cell state should be output to the hidden state, can be formulated as: (3) $o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})$ (3) The candidate cell state, representing new information that can be added to the cell state, uses the hyperbolic tangent function instead of the sigmoid function and can be generated from: (4) ${\hat{c}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})$ (4) where tanh is the hyperbolic tangent function.

Then, the cell state and the hidden state at the current time step are updated by: (5) $c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\hat{c}}_{t}$ (5) (6) $h_{t} = o_{t} ⊙ \tanh (c_{t})$ (6) where $c_{t - 1}$ is the cell state at the previous time step.

For the sentence vectors of all the 12 hidden layers of BERT, the sentence vectors in each group are fed into BiLSTM to obtain the concatenated hidden states of forward and backward LSTMs. The concatenated hidden states of the last time step in each group are integrated by the group attention module to fuse semantic information at different levels. The fully connected layer determines whether the texts are related to typhoon damage through the softmax activation function.

3.3. Stage 2: multi-class multi-label classification for five damage categories

If the results of the first stage (Section 3.2) are related to typhoon damage, the corresponding word matrices of the texts are fed into the multi-class multi-label classification for five damage categories. In this stage, we propose a Sit-CNN () to dynamically and selectively extract semantic information at different levels.

Figure 4. Architecture of Sit-CNN.

In Sit-CNN, padding is applied to each group of word matrices to ensure consistent dimensionality after convolution. Then, the word matrices are processed by the convolutional layers, where the size of convolutional kernels are $k \times 768$ (k ranges from 2 to 4, and 768 is the size of each word vector), with 12 kernels for each size to dynamically capture semantic information. The convolutional features of each kernel are averaged and summed to obtain the integrated features. These features are then expanded by a fully connected layer and reshaped to obtain the kernel attention weights, which are calculated by the softmax activation function. The convolutional features are multiplied by the respective kernel attention weights to generate selective features, which are then processed by maximum pooling and concatenated to acquire semantic features of each group. Sit-CNN dynamically adopts and selectively emphasizes certain kernels in the convolutional layers, thus allowing the model to focus on semantic information at different levels during training and inference.

To avoid information isolation caused by group convolution, we adopt group shuffle to disrupt the semantic features of each group and integrate them through a fully connected layer. The semantic features of each group are concatenated to obtain the final semantic features of the texts. To realize the multi-class multi-label classification, the final semantic features are downscaled by a fully connected layer. Then, a sigmoid activation function is applied to infer the independent probabilities of the texts corresponding to the five damage categories. The probability threshold is set to 0.5 for each damage category to determine whether the text is associated with it. Finally, we obtain five Boolean labels representing the damage information in the texts, and each text relates to one or more damage categories.

4. Experimental design and results

4.1. Dataset

China is one of the most typhoon-prone countries in the world, with coastal regions such as Guangdong, Taiwan, Fujian and Zhejiang affected by typhoons throughout the year. As China's largest public social media platform, $SinaWeibo$ has a large number of users and a high level of participation in netizens' discussions on current events. Therefore, we adopt $SinaWeibo$ texts as experimental data, and a total of 319,051 texts about typhoon in these four regions from 2010 to 2019 are collected, covering more than 50 typhoons. We randomly select 10,575 texts in the dataset for manual labeling, including 4486 positive samples (related to typhoon damage) and 6089 negative samples (not related to typhoon damage).

Based on the Chinese Standard for Technical Specifications for Meteorological Disaster Surveys, typhoon damage are classified into five categories, including transportation, public, electricity, forestry, and waterlogging. The description and examples of the classification scheme are shown in . The positive samples are further labeled with these five damage categories, some examples of multi-label $SinaWeibo$ texts are shown in .

Table 2. Typhoon damage classification scheme.

Download CSV Display Table

Table 3. Examples of multi-label Sina Weibo texts.

Download CSV Display Table

Among the 4486 manually labeled texts, 1510 (33.66%) are transportation, 1653 (36.85%) are public, 1023 (22.80%) are electricity, 322 (7.18%) are forestry, and 1599 (35.64%) are waterlogging, where there is a bias in the number of damage categories in the labeled samples. However, the multi-class multi-label classification in proposed method focuses more on the multi-label combinations. shows the number of multi-label combinations before and after data enhancement. In order to improve the ability of model to recognize the presence of damage categories in texts, each multi-label combination needs to be distributed across the training, validation and testing sets. Therefore, we manually duplicate, truncate, and replace synonyms for texts within combinations numbering less than 40, while retaining semantic information about the corresponding damage categories. 4486 texts related to typhoon damage are expanded to 5035 as positive samples. In addition, to balance the positive samples and negative samples, we randomly select 5035 out of 6089 texts not related to typhoon damage as negative samples. The final dataset contains 10,070 texts, with half positive and half negative samples. For each experiment, we apply stratified random sampling to select 70% of the data as the training set, 15% as the validation set, and 15% as the testing set.

Table 4. Number of multi-label combinations before and after data enhancement.

Display Table

4.2. Implementation details

In the proposed model, BERT is initialized with pre-trained weight from ‘bert-base-chinese’ (https://huggingface.co/bert-base-chinese) and the rest of the model is initialized with random non-zero numbers. The maximum length is limited to 130 by counting the length of texts in the dataset. The training batch size and the initial learning rate are set to 32 and 2e−5, respectively.

In the first stage, the binary cross-entropy loss is applied as the loss function, noted as $Los s_{1}$ and can be defined as: (7) ${Loss}_{1} = - [ylog (p) + (1 - y) \log (1 - p)]$ (7) where $y$ is the actual label, which can be 0 or 1, $p$ is the output of binary classification, indicating the probability that the texts are related to typhoon damage.

In the multi-class multi-label classification of the second stage, the focal loss is applied independently for each category $c$ and overall focal loss is the mean of the individual category losses, noted as $Los s_{2}$ and can be defined as: (8) $Los s_{2} = - \frac{1}{C} \sum_{c = 1}^{C} (1 - p_{c})^{γ} \log (p_{c})$ (8) where $C$ is the total number of damage categories, $p_{c}$ is the probability that the text belongs to a particular damage category $c$ , and γ is the focusing parameter that smoothly adjusts the rate at which easy samples are down-weighted. Here γ is set to 2. The focal loss can effectively address the issue of model performance caused by sample imbalance.

Moreover, the total loss of the model is the weighted sum of the binary classification loss and the multi-class multi-label classification loss, noted as $Loss$ and can be calculated by: (9) $Loss = α Los s_{1} + Los s_{2}$ (9) where $α$ represents the weighting factor.

Throughout the training process, we take adaptive moment estimation (Adam) as an optimizer to dynamically adjust the learning rate to accelerate parameters convergence. The maximum number of training epochs is set to 50, and the parameters are updated using a mini-batch gradient descent approach. Early stopping is applied aiming to avoid overfitting, and the model stops training if $Loss$ does not decrease within 3 epochs. The model weight with the least $Loss$ is selected for inference. All models are implemented using Python and pertinent packages.

4.3. Evaluation metrics

In the experiment, precision, recall, and F1-score are adopted to evaluate the performance of the model. In binary classification, precision measures the proportion of correctly identified typhoon damage-related texts among all typhoon damage-related texts inferred by the model, which includes both true positives and false positives. Recall is the proportion of correctly identified typhoon damage-related text among all ground truth, which is the sum of true positives and false negatives. F1-score is the harmonic mean of precision and recall, and a higher F1-score indicates a better balance between precision and recall. In multi-class multi-label classification, these metrics are adopted by calculating the metrics for each category and then taking the average over all categories. The macro-averaging method treats each category equally and facilitates a comprehensive evaluation of the model's performance in identifying different typhoon damage categories.

4.4. Experiment results

4.4.1. Model performance evaluation

In this section, we compare the effects of different grouping strategies and weighting factors on model performance. First, we examine the model performance with various grouping strategies. Considering the difference between the values of $Los s_{1}$ and $Los s_{2}$ , to balance the losses in the two stages, we set the range of weighting factor α from 0.01 to 0.1 and initialize $α$ to 0.05 to select the best grouping strategy. shows that the most effective grouping strategy is G3, which divides the output of 12 hidden layers of BERT into 3 groups. In the first stage of identifying typhoon damage-related texts, the precision, recall and F1-score are 0.895, 0.945 and 0.919. In the second stage of multi-class multi-label classification for five damage categories, the precision, recall and F1-score are 0.890, 0.916 and 0.846. Compared to using only the output of the last layer of BERT (the second row of ), our proposed two-stage method can effectively exploit the information within and between groups of 12 hidden layers of BERT and obtain the best representations of the texts by integrating semantic features at different levels.

Table 5. Model performance of different grouping strategies.

Display Table

Furthermore, we discuss how to balance the importance of the two stages in the end-to-end framework to optimize the performance of the overall model. Referring to Equation (Equation9(9) $Loss = α Los s_{1} + Los s_{2}$ (9) ), we recognize that the weighting factor $α$ plays a crucial role in balancing the two stages. As mentioned earlier, we set the range of α from 0.01 to 0.1, and take values at 0.01 intervals for comparison. shows that α at 0.07 and 0.05 yields the highest F1-scores of 0.921 and 0.846 for binary and multi-class multi-label classification, respectively. Compared to training the two stages individually (the second row of ), our proposed method further improves the classification performance of both stages. α influences the direction of the gradient update by altering the weights of the losses for the two stages, and the features learned from different stages are integrated to optimize the text representations. Ultimately, we adopt G3 and 0.05 as the best grouping strategy and weighting factor of our proposed model, which leads to optimal performance improvements.

Table 6. Model performance of different weighting factors.

Display Table

In addition, to examine the generalization performance of the best model, we compare the model losses and F1-scores on the training, validation, and testing sets (). According to the early stopping in this paper, the model stops training at the epoch 7, when the validation loss reaches a minimum of 0.023 and does not decrease in the next 3 epochs. In our proposed method, the initial weight of the model is set as the pre-training weight of ‘bert-base-chinese’ and fine-tuned by optimizing the grouping strategies and weighting factors. In the first stage of identifying typhoon damage-related texts, the best validation and testing F1-scores are 0.912 and 0.919. In the second stage of multi-class multi-label classification for five damage categories, the best validation and testing F1-scores are 0.849 and 0.846. In general, our proposed method achieves robust results on the validation and testing sets for both stages.

Figure 5. Results of model losses and F1-scores on training, validation and testing sets. (a): Model losses; (b): F1-scores of stage 1; (c): F1-scores of stage 2.

4.4.2. Overview of typhoon damage

This section provides an overview of typhoon damage. Three severely affected typhoons are selected from each coastal region (i.e. Guangdong, Taiwan, Fujian, and Zhejiang) (). We count the number of Sina Weibo texts related to five damage categories (e.g. transportation, public, electricity, forestry, and waterlogging) over seven days for each typhoon. shows the distribution of damage categories for different typhoons over time in four regions. The quantity curves represent the sum of the daily damage categories, and the bolded date on the horizontal axis of each figure is the time of typhoon landfall. In general, typhoon damage exhibits unimodal and bimodal distributions.

Figure 6. Distribution of damage categories for different typhoons over time in four regions. (a)–(c) Guangdong: Vicente, Mujigae, Hato; (d)–(f) Taiwan: Soulik, Soudelor, Megi; (g)–(i) Fujian:Trami, Fitow, Nesat; (j)–(l) Zhejiang: Haikui, Fung-wong, Lekima.

Table 7. Selected typhoons in four regions.

Download CSV Display Table

The unimodal distribution indicates that the number of Sina Weibo texts related to typhoon damage generally increases, peaks on the day of typhoon landfall, and then gradually decreases. Take (d) Soulik and (g) Trami as examples. The number of $SinaWeibo$ texts related to typhoon damage reflects people's concern about typhoon impacts on social media, with Trami receiving more attention compared to Soulik. Moreover, the slope of the quantity curves reflects the time span of the discussion on typhoon impacts. Soulik's quantity curve changes drastically, while Trami's changes moderately, indicating that Trami receives attention for a longer period of time than Soulik.

The bimodal distribution is similar to the unimodal curve until the peak is reached, but there is a rebound in the number of texts related to typhoon damage after the peak to form a secondary peak. Consider (b) Mujigae and (i) Nesat as examples. The first peak of Mujigae occurs on the day of typhoon landfall, and the number of texts related to typhoon damage decreases on the second day, while damage about electricity continues to increase until October 7, forming a secondary peak. To verify that the quantity curve is consistent with reality, we examine Sina Weibo texts inferred as electricity during Mujigae's landfall and find many texts about power outages and life inconvenience until the secondary peak. In contrast, the secondary peak of Nesat is mainly about waterlogging. Similarly, when we check the Sina Weibo texts during Nesat's landfall inferred as waterlogging, we find that many texts express the meaning of ‘just gone a Nesat and here comes a Haitang’. Relevant information indicates that on July 31, the day after Nesat's landfall, Haitang landed in Fujian, bringing a new round of heavy rainfall. The heavy rainfall brought by the twin typhoons one day apart makes the waterlogging serious, as reflected in the bimodal distribution of Nesat. The distribution of damage categories helps us not only understand the damage characteristics of different typhoons but also recognize the causes of their formation.

4.4.3. Temporal pattern of typhoon damage

In this section, we focus on the temporal pattern of typhoon damage. illustrates the proportion of each damage category over seven days for all selected typhoons. Overall, the temporal pattern of typhoon damage suggests that the main categories of damage before the typhoon landfall are transportation and public, while waterlogging accounts for the largest average proportion after the typhoon landfall.

Figure 7. Proportions of each damage category over seven days for all selected typhoons. (a) Average proportions of damage categories; (b)-(h) Proportional distribution of damage categories on each day.

Before the typhoon landfall, damage-related texts are posted on $SinaWeibo$ , with the most prominent damage categories being transportation and public. In (a), two days before (-2d) the typhoon landfall, the average proportion of damage-related texts about transportation and public account for 44.8% and 41.3%, respectively. One day before (-1d), these texts decrease to 34.5% and 31.3%, while the proportions for the remaining damage categories are all below 15%. Before a typhoon arrives, the traffic management authority imposes restrictions based on weather warnings, leading to road closures, flight cancelations, and the suspension of trains, subways, and busses, thereby increasing transportation-related damage. Additionally, the impending typhoon disrupts daily life, resulting in the suspension of work and school, as well as the postponement or cancelation of activities, increasing public-related damage. Texts related to typhoon damage before the day of typhoon landfall emphasize the impact of human strategies, which are essential measures for dealing with typhoons. After the typhoon landfall, the damage category with the largest average proportion is waterlogging, which is related to the fact that typhoons are usually accompanied by heavy rainfall. The damage category with the most fluctuating average proportion is electricity, while the other damage categories exhibit more stable changes in their average proportions.

In addition, box plots are employed to analyze the proportional distribution of damage categories on each day ((b–h)). The outliers reveal the abnormal distribution of typhoon damage. (c,d) show an unusual surge in texts related to public damage on the day and the day before the typhoon landfall. This anomaly is associated with Typhoon Fung-wong, which made landfall in Zhejiang in late September. Historical typhoon data indicate that Zhejiang experiences the majority of typhoons from mid-July to mid-September. The rare later typhoon landfall severely disrupts people's daily routines, leading to an increase in texts on Sina Weibo about work and school suspensions and event cancelations, resulting in an abnormal increase in public-related damage compared to other typhoons. By examining the temporal pattern of damage categories, we can gain insight into the major damage categories before and after the typhoon landfall, and also identify the specific proportions of typhoon damage categories.

4.4.4. Spatial pattern of typhoon damage

In this section, we analyze the spatial pattern of typhoon damage. shows the proportional distribution of damage categories across regions and typhoons. In general, the spatial pattern of typhoon damage reveals both similarity and discrepancy.

Figure 8. Proportional distribution of damage categories across regions and typhoons.

First, there is similarity among regions. All four regions experience a high proportion of public-related damage. Moreover, it can be observed that Guangdong and Taiwan share a similar proportional distribution of damage categories, with a higher proportion of damage related to electricity and forestry, while Fujian and Zhejiang exhibit more damage related to transportation and waterlogging. This similarity reflects the spatial interconnectedness of typhoon damage in different regions. Second, there is also discrepancy across typhoons. The typhoons with the highest proportions in the transportation, public, electricity, forestry, and waterlogging categories are Nesat (40.3%), Fung-wong (43.3%), Mujigae (40.5%), Vicente (40.2%), and Fitow (34.5%). In contrast, the typhoons with the lowest proportions are Mujigae (6.8%), Nesat (14.1%), Fung-wong (1.1%), Fitow (3.4%), and Mujigae (6.2%). This discrepancy reflects the varying levels of concern attributed to each typhoon, which are somewhat related to the intensity of the typhoon and the effectiveness of rescue and recovery efforts in different regions.

5. Conclusion and future research

To perform typhoon damage assessment by social media texts, this paper proposes a multi-class multi-label classification method by adopting a two-stage model that fully integrates the outputs of the hidden layers of BERT. In our proposed method, the first stage adopts sentence vectors for binary classification to identify texts related to typhoon damage, while the second stage employs word matrices for multi-class multi-label classification to further classify texts into five damage categories: transportation, public, electricity, forestry, and waterlogging. BiLSTM and Sit-CNN are designed to integrate the outputs of all the hidden layers of BERT employing grouping strategies and dynamic kernels, selectively extracting semantic information at different levels to optimize text representations. In an end-to-end framework, we achieve typhoon damage identification from a large number of social media texts by training and balancing both stages together. Experiments on Sina Weibo texts during typhoon landfall in Chinese coastal regions demonstrate that the proposed method can effectively improve the accuracy of text classification, achieving F1-scores of 0.919 and 0.846 for the two stages, respectively. Moreover, the spatial and temporal patterns of typhoon damage are comprehensively analyzed, providing crucial information for damage assessment, rescue and recovery efforts.

With the advancement of large language models (LLMs), models such as Generative Pre-trained Transformers (GPT) have excelled in a variety of text mining applications. However, the new LLMs do not affect the innovativeness of our proposed multi-class, multi-label classification method, where the idea of optimally combining hidden layer features can also be applied in new LLMs (e.g. GPT). Moreover, the end-to-end approach for identifying typhoon damage from massive social media data is uniquely useful. Future research could rely on the superior performance of LLMs to generate content-rich data related to typhoon damage to improve model performance, and other deep learning technologies, such as graph neural networks and multi-modal learning, could be further explored. Moreover, additional information could be incorporated, including environmental conditions (e.g. wind and precipitation), auxiliary typhoon data (e.g. scale and intensity), and other social media data (e.g. images and videos). Finally, various applications of social media texts in typhoon or other disasters could be investigated (e.g. disinformation detection and rescue route optimization), and social media data mining could be improved by handling reposts and differentiating the public and organizational users.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data were derived from the following resources available in the public domain: https://weibo.com/.

Additional information

Funding

This research was supported in part by the Guangdong Provincial Key Laboratory of Intelligent Urban Security Monitoring and Smart City Planning under Grant No. GPKLIUSMSCP-2023-KF-02, the National Natural Science Foundation of China under Grant No. 42271325, the National Key Research and Development Program of China under Grant No. 2020YFA0714103, the Innovation Group Project of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) under Grant No. 311022018.

References

Cai, Ren, Bin Qin, Yangken Chen, Liang Zhang, Ruijiang Yang, Shiwei Chen, and Wei Wang. 2020. “Sentiment Analysis About Investors and Consumers in Energy Market Based on BERT-BiLSTM.” IEEE Access 8:171408–171415. https://doi.org/10.1109/Access.6287639.
Web of Science ®Google Scholar
Cimellaro, Gian Paolo, Andrei M. Reinhorn, and Michel Bruneau. 2010. “Framework for Analytical Quantification of Disaster Resilience.” Engineering Structures 32 (11): 3639–3649. https://doi.org/10.1016/j.engstruct.2010.08.008.
Web of Science ®Google Scholar
Clark, Kevin, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. “Electra: Pre-Training Text Encoders as Discriminators Rather Than Generators.” arXiv preprint arXiv:2003.10555: 1–18.
Google Scholar
Dai, Zhenjin, Xutao Wang, Pin Ni, Yuming Li, Gangmin Li, and Xuming Bai. 2019. “Named Entity Recognition Using BERT BiLSTM CRF for Chinese Electronic Health Records.” In 2019 12th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI), 1–5. IEEE.
Google Scholar
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv preprint arXiv:1810.04805: 1–16.
Google Scholar
El-Alami, Fatima-Zahra, Said Ouatik El Alaoui, and Noureddine En Nahnahi. 2022. “A Multilingual Offensive Language Detection Method Based on Transfer Learning from Transformer Fine-Tuning Model.” Journal of King Saud University-Computer and Information Sciences 34 (8): 6048–6056. https://doi.org/10.1016/j.jksuci.2021.07.013.
Web of Science ®Google Scholar
Goodchild, Michael F., and J. Alan Glennon. 2010. “Crowdsourcing Geographic Information for Disaster Response: A Research Frontier.” International Journal of Digital Earth 3 (3): 231–241. https://doi.org/10.1080/17538941003759255.
Web of Science ®Google Scholar
Huang, Xiao, Zhenlong Li, Cuizhen Wang, and Huan Ning. 2019. “Identifying Disaster Related Social Media for Rapid Response: A Visual-Textual Fused CNN Architecture.” International Journal of Digital Earth 13 (9): 1017–1039. https://doi.org/10.1080/17538947.2019.1633425.
Web of Science ®Google Scholar
Imran, Muhammad, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, and Patrick Meier. 2013. “Practical Extraction of Disaster-Relevant Information From Social Media.” In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 1021–1024.
Google Scholar
Joshi, Mandar, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. “Spanbert: Improving Pre-Training by Representing and Predicting Spans.” Transactions of the Association for Computational Linguistics 8:64–77. https://doi.org/10.1162/tacl_a_00300.
Google Scholar
Kumar, Abhinav, Jyoti Prakash Singh, and Sunil Saumya. 2019. “A Comparative Analysis of Machine Learning Techniques for Disaster-Related Tweet Classification.” In 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129), 222–227. IEEE.
Google Scholar
Lam, Nina S. N., Michelle Meyer, Margaret Reams, Seungwon Yang, Kisung Lee, Lei Zou, Volodymyr Mihunov, et al. 2023. “Improving Social Media Use for Disaster Resilience: Challenges and Strategies.” International Journal of Digital Earth 16 (1): 3023–3044. https://doi.org/10.1080/17538947.2023.2239768.
Web of Science ®Google Scholar
Lamsal, Rabindra, Aaron Harwood, and Maria Rodriguez Read. 2022. “Socially Enhanced Situation Awareness From Microblogs Using Artificial Intelligence: A Survey.” ACM Computing Surveys 55 (4): 1–38. https://doi.org/10.1145/3524498.
Web of Science ®Google Scholar
Lan, Zhenzhong, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. “Albert: A Lite BERT for Self-Supervised Learning of Language Representations.” arXiv preprint arXiv:1909.11942: 1–17.
Google Scholar
Li, Zhenlong, Qunying Huang, and Christopher T. Emrich. 2020. “Introduction to Social Sensing and Big Data Computing for Disaster Management.” In Social Sensing and Big Data Computing for Disaster Management, 1–7. Routledge.
Google Scholar
Li, Shaopan, Yan Wang, Hong Huang, Lida Huang, and Yang Chen. 2023. “Study on Typhoon Disaster Assessment by Mining Data from Social Media Based on Artificial Neural Network.” Natural Hazards 116 (2): 2069–2089. https://doi.org/10.1007/s11069-022-05754-5.
Web of Science ®Google Scholar
Liu, Nelson F., Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. “Linguistic Knowledge and Transferability of Contextual Representations.” arXiv preprint arXiv:1903.08855: 1–22.
Google Scholar
Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, et al. 2019. “Roberta: A Robustly Optimized Bert Pretraining Approach.” arXiv preprint arXiv:1907.11692: 1–13.
Google Scholar
Mehta, Meet, Dhruv Gada, Riddhi Sharma, Khushi Chavan, and Pratik Kanani. 2022. “Offense Detection Using BERT and CNN.” In 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT), 1–5. IEEE.
Google Scholar
Ngamassi, Louis, Hesam Shahriari, Thiagarajan Ramakrishnan, and Shahedur Rahman. 2022. “Text Mining Hurricane Harvey Tweet Data: Lessons Learned and Policy Recommendations.” International Journal of Disaster Risk Reduction 70:102753. https://doi.org/10.1016/j.ijdrr.2021.102753.
Web of Science ®Google Scholar
Ogie, R. I., S. James, A. Moore, T. Dilworth, M. Amirghasemi, and J. Whittaker. 2022. “Social Media Use in Disaster Recovery: A Systematic Literature Review.” International Journal of Disaster Risk Reduction 70:102783. https://doi.org/10.1016/j.ijdrr.2022.102783.
Web of Science ®Google Scholar
Rodríguez, Oriol, Joan Bech, Juan de Dios Soriano, Delia Gutiérrez, and Salvador Castán. 2020. “A Methodology to Conduct Wind Damage Field Surveys for High-Impact Weather Events of Convective Origin.” Natural Hazards and Earth System Sciences 20 (5): 1513–1531. https://doi.org/10.5194/nhess-20-1513-2020.
Web of Science ®Google Scholar
Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.” arXiv preprint arXiv:1910.01108: 1–5.
Google Scholar
Saroj, Anita, and Sukomal Pal. 2020. “Use of Social Media in Crisis Management: A Survey.” International Journal of Disaster Risk Reduction 48:101584. https://doi.org/10.1016/j.ijdrr.2020.101584.
Web of Science ®Google Scholar
Senaratne, Hansi, Martin Mühlbauer, Stephan Götzer, Torsten Riedlinger, and Hannes Taubenböck. 2023. “Detecting Crisis Events From Unstructured Text Data Using Signal Words As Crisis Determinants.” International Journal of Digital Earth 16 (2): 4601–4620. https://doi.org/10.1080/17538947.2023.2278714.
Web of Science ®Google Scholar
Tim, Yenni, Shan L. Pan, Peter Ractham, and Laddawan Kaewkitipong. 2017. “Digitally Enabled Disaster Response: The Emergence of Social Media As Boundary Objects in a Flooding Disaster.” Information Systems Journal 27 (2): 197–232. https://doi.org/10.1111/isj.v27.2.
Web of Science ®Google Scholar
Vongkusolkit, Jirapa, and Qunying Huang. 2021. “Situational Awareness Extraction: A Comprehensive Review of Social Media Data Classification During Natural Hazards.” Annals of GIS 27 (1): 5–28. https://doi.org/10.1080/19475683.2020.1817146.
Web of Science ®Google Scholar
Wan, Chang-Xuan, and Bo Li. 2022. “Financial Causal Sentence Recognition Based on BERT-CNN Text Classification.” The Journal of Supercomputing 78: 6503–6527. https://doi.org/10.1007/s11227-021-04097-5.
Web of Science ®Google Scholar
Wu, Desheng, and Yiwen Cui. 2018. “Disaster Early Warning and Damage Assessment Analysis Using Social Media Data and Geo-Location Information.” Decision Support Systems 111:48–59. https://doi.org/10.1016/j.dss.2018.04.005.
Web of Science ®Google Scholar
Yang, Tengfei, Jibo Xie, Guoqing Li, Naixia Mou, Zhenyu Li, Chuanzhao Tian, and Jing Zhao. 2019. “Social Media Big Data Mining and Spatio-Temporal Analysis on Public Emotions for Disaster Mitigation.” ISPRS International Journal of Geo-Information 8 (1): 29. https://doi.org/10.3390/ijgi8010029.
Web of Science ®Google Scholar
Yu, Manzhu, Qunying Huang, Han Qin, Chris Scheele, and Chaowei Yang. 2020. “Deep Learning for Real-Time Social Media Text Classification for Situation Awareness–Using Hurricanes Sandy, Harvey, and Irma as Case Studies.” In Social Sensing and Big Data Computing for Disaster Management, 33–50. Routledge.
Google Scholar
Zhang, Ting, Shi Shen, Changxiu Cheng, Kai Su, and Xiangxue Zhang. 2021. “A Topic Model Based Framework for Identifying the Distribution of Demand for Relief Supplies Using Social Media Data.” International Journal of Geographical Information Science 35 (11): 2216–2237. https://doi.org/10.1080/13658816.2020.1869746.
Web of Science ®Google Scholar
Zhou, Chengle, Zhi He, Anjun Lou, and Antonio Plaza. 2024. “RGB-to-HSV: A Frequency-Spectrum Unfolding Network for Spectral Super-Resolution of RGB Videos.” IEEE Transactions on Geoscience and Remote Sensing 62: 1–18.
Web of Science ®Google Scholar
Zhou, Bing, Lei Zou, Ali Mostafavi, Binbin Lin, Mingzheng Yang, Nasir Gharaibeh, Heng Cai, Joynal Abedin, and Debayan Mandal. 2022. “VictimFinder: Harvesting Rescue Requests in Disaster Response from Social Media with BERT.” Computers, Environment and Urban Systems 95:101824.
Web of Science ®Google Scholar
Zou, Lei, Nina S. N. Lam, Shayan Shams, Heng Cai, Michelle A. Meyer, Seungwon Yang, Kisung Lee, Seung-Jong Park, and Margaret A. Reams. 2018. “Social and Geographical Disparities in Twitter Use During Hurricane Harvey.” International Journal of Digital Earth 12 (11): 1300–1318. https://doi.org/10.1080/17538947.2018.1545878.
Web of Science ®Google Scholar

Multi-class multi-label classification of social media texts for typhoon damage assessment: a two-stage model fully integrating the outputs of the hidden layers of BERT

ABSTRACT

1. Introduction