Full article: Exploring the efficacy and reliability of automatic text summarisation systems: Arabic texts in focus

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

This study compared the salient features of the three basic types of automatic text summarisation methods (ATSMs)—extractive, abstractive, and real-time—along with the available approaches used for each type. The data set comprised 12 reports on the current issues on automatic text summarisation methods and techniques across languages, with a special focus on Arabic whose structure has been largely claimed to be problematic in most ATSMs. Three main summarizers were compared: TAAM, OTExtSum, and OntoRealSumm. Further to this, a humanoid version of the summary of the data set was prepared, and then compared to the automatically generated summary. A 10-item questionnaire was built to help with the assessment of the target ATSMs. Also, Rouge analysis was performed to assess the efficacy of all techniques in minimising the redundancy of the data set. Findings showed that the precision of the target summarizers differed considerably, as 80% of the data set has been proven to be aware of the problems underlying ATSMS. The remaining parameters were in the normal range (65–75%). In light of the equations-based assessment of ATSMS, the highest range was noted with the removal of stop word, the least range was noted with POS tagging, stem weight, and stem collection. Regarding Arabic, the statistical analysis has been proven to be the most effective summarisation method (accuracy = 57.59%; reminiscence = 58.79%; F-Value = 57.99%). Further research is required to explore how the lexicogrammatical nature of languages and generic text structure would affect the text summarisation process.

Keywords:

1. Introduction

The advent of the 21st century has increased the penetration of the internet into human lives. The ease with which information can be submitted to and extracted from the internet has given birth to an information boom, viz., a huge range of information is available on the internet, and a single click can produce thousands of results on a single topic in less than a second. Such rapidly increasing information databases have given birth to the idea of summarising vast amounts of data to ease the extraction of useful information (Allahyari et al., Citation2017). The idea of developing automatic summarisation software with the capacity to summarise large volumes of data with a high degree of precision in a short time has gained marked popularity (Lin et al., Citation2018). The process of shortening or abstracting a specific data set using natural language processing methods and computational techniques with the primary goal of representing the essential information in the original texts is referred to as automatic summarisation.

Historically speaking, in the mid-20th century, as the data were summarised by simple statistical techniques, the output summaries had the flavour of the original text (Qaroush et al., Citation2021). Notable progress has been noted since then, and various forms of summarisation technique have emerged. Initially, single-document summarisation was introduced. In this method, all the detailed data present in a single document were summarised and converted into a short data set (De Maio et al., Citation2016). Later on, multi-document summarisation methods gained popularity. In these methods, multiple documents on different topics were assessed and summarised to produce a concise data set while maintaining the originality of the text (John et al., Citation2017). These summarisation methods also began to make their mark in the Arabic literary world. Many techniques of text summarisation gained popularity, and the use of methods based on machine learning was admired by many (Dutta et al., Citation2018). In this regard, current research acknowledges two major text summarisation techniques: extractive and abstractive.

Extractive summarisation techniques create data summaries by identifying salient information through the extraction of words and sentences from the original text (Verma & Om, Citation2019). Though the extractive summarisation technique manages to yield the highly significant content of actual documents by neglecting repetitions and redundancies, it simultaneously ignores the gist of the main text and sometimes the semantic analysis of that summary, which contains mistakes as well (Khan et al., Citation2016). In contrast, abstractive summarisation techniques consider all the ideas and themes discussed in the entire document by developing an internal semantic representation to produce a summary with many new words that are usually not found in the original text (Song et al., Citation2018). In addition, advances in technology have given birth to a new method of automatic summarisation called “real-time summarisation”, which has broadened the range of summarisation techniques in advanced literature studies (Lin & Ng, Citation2019).

Due to the exponential amount of information created due to globalization and digitalization, it is time-consuming to monitor all published posts that relate to an object or describe how an event evolves over time. This practice could also overwhelm users with postings that are redundant and useless. The user looks for updates to be released over time for many instances where unpredicted occurrences, such as an earthquake or act of terrorism, may occur (Yang et al., Citation2022). Multi-document summarising, which can summarise either full document sets or individual documents in the context of others that have already been summarised, is likely to be crucial in these circumstances. Multi-document summaries should, ideally, include both the essential information that is shared by all the documents and is pertinent to the user’s query, as well as additional information that is specific to some of the individual documents (Nathani et al., Citation2020).

Given the scope of the current study, the purpose of a text summary is to provide the reader with a reduced version of the information in a document, especially in Arabic documents (Ren et al., Citation2013). Data compression is required since all information sources, including news, biographical information, and historical information, include a large volume of difficult-to-read data. The amount of digital information exchanged in such a digital age is enormous. Therefore, it is essential to create machine learning algorithms that can automatically condense longer scripts and create precise summaries that the target audience can easily understand (Goldstein et al., Citation2011).

As the use of summarisation techniques became more common, diverse challenges emerged. One significant challenge is the issue of repetition of data in summaries in addition to the high level of redundancy that characterized many of the automatically summarised texts. Moreover, the lack of reasonable background knowledge on the part of who those conduct the summarisation process makes reaching a high reduction rate a real challenge. A third challenge that is persistent is the assessment of the quality of the available text summarisers (Hahn & Mani, Citation2000; Jezek & Steinberger, Citation2008). One final challenge is the lack of scholarly consensus regarding how to construct a precise summary. Still, scientists began to find possible solutions to overcome these issues, and some promising advances have now been made (Widjanarko et al., Citation2018).

Despite the fact that a plethora of studies aimed at offering solutions to many of the previously mentioned challenges, there is a dearth of research on assessing the precision and accuracy of available automatic summarisation techniques. Therefore, the present study seeks to answer two major questions: (1) What are the procedures followed in recurrent automatic text summarisation methods and techniques? And (2) what are the requirements that must be met by an automatic summarisation method or technique to offer highly qualified summaries of texts in different languages, especially Arabic? Towards this end, the study adopted a theoretical framework that comprises the two theoretical concepts of “automatic summarisation” and “summarisation techniques and methods” in a way that tests the following hypothesis: A comprehensive comparison of all available methods and techniques of automatic summarisation of Arabic documents offers an empirical assessment of the quality of the output summaries.

By answering these two questions and testing the previous hypothesis, the present study seeks to offer a comprehensive description of the automatic text summarisation processing underlying extractive, abstractive, and hybrid methods. Also, it aims at highlighting the technical requirements for improving the quality of the output summaries. In so doing, it contributes to the existing body of knowledge on the salient ATSMs. Indeed, scarce studies have compared the currents ATSMs on different data sets in various languages, including Arabic. We claim that the methodology used for comparing such methods is all-inclusive, as it covers the stages of pre-processing, weight extraction, and stem classification. For verifying the accuracy of the comparison process, such methods are compared in light of diverse yet complementary parameters, including the implemented equations, precision/recall, and the target database.

The reminder of the present study is structured as follows: Section 2 offers a review on related literature addressing diverse automatic summarisation techniques across languages. Section 3 describes the study method in terms of data collection and analysis procedure. Section 4 discusses the major findings. Section 5 summarizes the study and offers recommendations for further research.

2. Literature review

The unprecedented growth of online information caused summarisation techniques to develop in a similar way. Despite the fact that automatic text summarisation is a relatively recent discipline in natural language processing, considerable efforts have been made to develop text summarizers that offer highly precise summaries, especially with difficult-grammar languages such as Arabic.

For instance, Al Qassem et al. (Citation2017) explored the major difficulties in summarizing Arabic texts by surveying available summarisation methods and techniques. Findings highlighted the lack of an Arabic golden standard corpus that would help with comparing current ATS methods and techniques. The existence of many dialects and the morphological complexity of Arabic were among the challenges of Arabic ATS methods. Moreover, findings affirmed that hybrid summarisation systems outperformed numerical systems and systems based on Rhetorical Structure Theory. Similar findings could be traced in Lagrini et al. (Citation2018). In the face of common challenges of Arabic ATS methods, especially those related to the semantic component of the output summaries, Alkhudari (Citation2020) developed an approach that consider all the semantically relevant aspects of data. Using the Rouge measure, the developed approach has been proven to be effective in retrieving semantically relevant data.

Shimpikar and Govilkar (Citation2017) were mainly concerned with approaching the mechanism of extractive techniques that are principally used to create a summary from a variety of text resources. In extractive techniques, a resonance method was used to determine the importance of a given phrase based on its weight and the frequency of its terms. They evaluated the accuracy of such techniques using a sequence matching technique to highlight the correlation between the related content and the overview.

When it comes to extractive summarising tasks, especially in the legal domain, the current deep learning approaches mostly focus on context-free word embedding. Such approaches are unable to obtain a deeper semantic comprehension of the text, and therefore they ultimately have a negative impact on summary performance. Mohan Kalyan et al. (Citation2021) argued that the unique deep contextualized embeddings-based approach BERTSLCA is significantly effective in achieving the extractive summarization goal. The fundamental inspiration for the model came from the BERT variation BERTSUM. Initially, BERTSUM was used to receive the documents in order to obtain sentence-level word embedding in the Arabic language. Then, using the Bi-Long and Short-Term Memory-Unit (Bi-LSTM), extracting architecture was designed to capture the long reliance among sentences. Finally, to improve the ability to recognise the significance of various sentences, an attention mechanism was added. Experimental findings on data from public interest lawsuits and the CAIL 2020 dataset showed that the suggested technique performs competitively in all cases.

With the increasing advancements in social media applications, Twitter and other microblogging platforms have grown to be crucial sources of situational awareness during emergencies. Thus, summarising microblogs made during emergencies has grown to be a significant focus of research in recent years. A number of summary methods have been proposed for document summarisation in general and for summarising microblogs in particular. However, a few studies have investigated which algorithms are better suited for summarising microblogs generated after mishaps. For instance, El-Kassas et al. (Citation2021) assessed and contrasted the effectiveness of eight different kinds of extractive summarisation algorithms (ESA) used to summarise microblogs published during emergency situations. Findings affirmed the existence of different challenges to ESA covering multi-document summarisation (e.g., incoherent reference), user-specific summarisation (e.g., multilingual documents), and applications of text summarisation (e.g., lengthy documents).

Information dissemination on any real-time and actual event is a trendy research area nowadays, and therefore automatic summarisation methods are becoming more popular, especially through the use of social media. This trend makes it necessary to summarise all pertinent tweets in order to rapidly and accurately grasp real-time occurrences. For instance, in Rudrapal et al. (Citation2019), a two-phase summarisation method was suggested to generate an abstract summary of any social media event associated with the summarisation methods. The method begins by identifying essential information related to events and then explores the partial textual entailment (PTE) relationship between phrases to remove the most redundant information possible. Next, an abstract summary is created using the key sentences that are least redundant. Findings showed that the proposed strategy performed better than both the baseline and cutting-edge event summarising approaches.

Likewise, Chellal et al.’s study (Chellal et al., Citation2016) is based on a NTSM-novel tweet summarising method where the choice of an incoming tweet is made instantly upon the availability of a tweet. The suggested method calculates thresholds for making the decision, in contrast to conventional systems where criteria are defined. Three criteria—the information received and its novelty and applicability in light of the user’s interests—are combined as a condition to determine which tweets are selected. Only tweets that meet a computational model criterion in terms of information content and novelty are included in the summary.

By addressing the various ATS components, which may include the building blocks, approaches, data sets, methods, procedures, different evaluation techniques, and directions for future research, the present study seeks to offer a comprehensive analysis on the most recurrent Arabic ATS methods so as to equip researchers with the mental calliper ancillary to a more precise implementations of such methods, with a particular emphasis on Arabic.

3. Theoretical underpinnings

Nowadays, the internet users are able to access a variety of data sources, not all of which contain essential information. As a result, it is practically impossible to find the most relevant and significant information among the large amount of data. In order to fully comprehend a text, attention should be paid to the pertinent information that is contained throughout a complete data set. Consequently, text summarisation was offered as an effective process to handle the non-stopping accumulation of data over vast resources. Since the process of text summarisation aims at producing pertinent and important text from the gamut of information resources, all websites, text documents, and user-provided direct text feeds are regarded as text resources. The result is a readable summary containing all relevant information in a document occupying a considerably less space when compared to the original document (Gambhir & Gupta, Citation2016). Hovy and Lin (Citation1996) defines an automatic summary as a text produced by software that is cohesive and incorporates a substantial amount of pertinent information from the original text. Its compression ratio is less than one-third of the original document’s length.

Yet, the generation of automatic summaries is not practically simple as diverse challenges emerge. The key challenges include redundancy (especially when summarising multiple documents), coreference, temporal dimension, and order of sentences (Goldstein et al., Citation2000). One more challenge is the rapid growth of information technologies and the enormous volume of written information (e.g., news stories, academic papers, legal documents, banking, education, social media, etc.) that is generated online and stored in numerous archives. In view of such challenges, a robust automatic text summarisation (ATS) is becoming crucial. The atomisation of the process of text summarisation required the development of computational (NLP) methods by means of algorithms to create a subset or a summary outlining the key and significant information in a document or multi-documents.

Since the 1950s, scientists have been working to develop ATS techniques and procedures. So far, there are three basic types of ATS techniques: extractive summarisation (ES), abstractive summarisation (AS), and hybrid summarisation (HS; Figure ). Such classification is based on the summarisation method.

Figure 1. Main text summarisation approaches (El-Kassas et al., 2020).

The extractive summarisation techniques, on the one hand, choose the key relevant phrases from the source document(s) and join them to create the summary. Whereas, the abstractive approach creates summaries using phrases that are distinct from the primary text after representing the input document(s) in an intermediate form. In other words, the output summary ideas and concepts reinterpreted in a new form. However, abstractive techniques require considerable natural language processing that renders them very difficult to handle, and therefore, extractive summarisation techniques were developed in an effort to provide summaries that are more cohesive and useful. Notably, the extractive strategy is the main focus of the majority of related studies, and therefore more emphasis should be placed on the implementation of abstractive and hybrid techniques (Sun et al., Citation2021). Hybrid automatic text summarisation combines both abstractive and extractive methods through the extraction of key relative sentences to generate a new document based on a given corpus (Binwahlan et al., Citation2010). Moreover, as Gambhir and Gupta (Citation2016) confirm, several extraction methods that implement a number of machine learning techniques have been developed over the course of a decade for automatic summary production. Despite all these diverse approaches, the generated summaries are still markedly different from the explanations produced by humans.

Other classifications of ATS techniques include the number of input documents, the nature of the output summary, and the summary language, and the summarisation algorithms, and the summary content. Based on the number of input documents, summarisation could handle a single document (e.g., a reading comprehension passage) or multiple documents (customer’s product reviews). As far as the nature of the output summary is concerned, summarizers could be classified into generic (a concise summary of the key information one or more documents) or query-based, where the summary is adapted based on a user’s query (Van Lierde & Chow, Citation2019). Additionally, the summary language might be monolingual, multilingual, or cross-lingual, and the content could be either inductive (the general idea of a document), informative (all significant topics in a document), or evaluative (a critique of a particular topic; Bhat et al., Citation2017). Equally important, the algorithms implemented in the summarisation process could be supervised where the sample data training phase is necessary, or unsupervised where the sample data training phase is necessary (Alami et al., Citation2021).

Due to the rapid growth of web pages and the difficulty to find relevant information, a new type of text summarisation emerged: web-based summarisation. Different search engines such as WebInEssence, Alta Vista, and Google Fast are entitled to systematically summarize the important information in several related web pages. Relatedly, the emergence of Web 2.0 applications, especially social media networks resulted in massive documents that too many to control. As a result, sentiment-based analysis techniques were developed, giving rise to opinion mining techniques for data clustering in terms of polarity as well as subjectivity.

Indeed, automatic text summarisation helps to save reading time, select documents when doing research, render indexing more successful, provide personalized information, and improve the number of texts that commercial abstract services can process (Ab & Sunitha, Citation2020). However, several issues represent real challenges to the process of automatic text summarisation. Such issues include redundancy, the way of constructing a coherent summary, and the assessment of the quality of available summarisation methods. In view of the challenges pertinent to automatic text summarisation, considerable attempts have been made. For instance, Sarkar (Citation2010) offered a method for addressing redundancy when summarising multiple documents. Such a method relies on selecting the first sentence in each paragraph, and then other similar sentences are excluded unless they offer new and relevant information. Carbonell and Goldstein’s MMR (Maximal Marginal Relevance) approach is a good example in this regard (Carbonell & Goldstein, Citation1998).

4. Methods

The present study strictly follows the procedure of the comparative approach that addresses comparable phenomena represented by three major summarisation techniques: extractive summarisation, abstractive summarisation and real-time summarisation. For each technique, a summarizer is implemented: topic-aware abstractive text summariser (TAAM), optimal transport extractive summariser (OTExtSum), and ontology based real-time summariser (OntoRealSumm) are used. To compare these three techniques, a questionnaire comprising 10 questions (numbered as RQ1- RQ10) is designed to explore researchers’ evaluations and reflections of the most current and widely used automatic summarisation methods (ATSMs). These questions were built on the findings of more than 90 comparative studies addressing automatic text summarisation techniques, with a special focus on summarizers of Arabic documents. Table below shows the data sources used in building the questions of the questionnaire that is the present study’s main instrument.

Table 1. Data sources used for building the study’s questionnaire on ATSMs

Download CSV Display Table

After exploring all of the available literature on ATSMs, the questionnaire’s questions were phrased and the purpose of each question is clarified as displayed in Table below.

Table 2. Alignment of the questionnaire’s questions about ATSMs with their purposes

Download CSV Display Table

The data extracted from the sources used to build the questionnaire were collected from online published studies. To guarantee the reliability of the comparison process, we made sure that all the datasets were of the same length and genre. The overall percentage of each question shows the maximum awareness of the automatic text summarisation methods and techniques addressed in the present study. The data set comprised 12 reports on the same current affairs issues. Finally, The comparison process worked at three steps: (1) the salient features of the 12 reports were extracted by means of different equations, based on word frequency, title word, sentence length, and length location, (2) a humanoid version of summary of the data was prepared, keeping in line with all three summarisation techniques, to be compared with the automatically generated summary, and (3) Rouge analysis was performed to assess the efficacy of all techniques in minimising redundancy. Further to this, and for the purpose of producing the text summary, we followed a three-step framework (Figure ): (1) All sentences in the documents are pre-processed and extracted, (2) the weightage of each sentence is calculated by using the location and length of the entire stem, in which all words count and different used phrases are calculated for language identification, and (3) sentences are finally categorized and evaluated with the similarity index and lowest and highest value considerations.

Figure 2. Text summarisation framework for languages (Arabic).

5. Analysis and discussion

Following the procedure of analysis stated above, the comparison of the summarisation techniques of Arabic demonstrates that the adjusted page rank results are a superior choice to other approaches. The Lex Rank and Text Rank are potential further techniques. When the number of techniques was altered, the performance improved until after the performance stabilisers, the interactions reached 10,000. Using the Alkhalil morphological analysis for Arabic can also improve the performance of the summarisation technique as demonstrated in Table below. To evaluate the accuracy of the data and F-distribution (for comparing the set of statistical models), the ANOVA (Analysis of Variance) was used.

Table 3. Comparison of different summarisation techniques in Arabic language*

Download CSV Display Table

5.1. Features extraction

The feature extraction of these summarisation methods is calculated by adopting different basic equations. The first equation (EquationEq. 1)(1) $W o r d^{'} s F r e q u e n c y = \frac{N o . o f w r o d^{'} s e x i s t a n c e i n t h e w h o l e d o c u m e n t}{t o t a l n o . o f w o r d i n w h o l e d o c u m e n t}$ (1) concerns the frequency of a particular word in a text or in a group of texts. It goes as follows:

(1)

W o r d^{'} s F r e q u e n c y = \frac{N o . o f w r o d^{'} s e x i s t a n c e i n t h e w h o l e d o c u m e n t}{t o t a l n o . o f w o r d i n w h o l e d o c u m e n t}

(1)

Similarly, the second equation (EquationEq. 2)(2) $T i t l e W o r d^{'} s P r o b a b i l i t y = \frac{T i t l e w o r d^{'} s C o u n t i n w h o l e s t e m}{S t e m l e n g t h}$ (2) is implemented to calculate the probability of the title word, i.e., the probability of the sentence having the target word.

(2)

T i t l e W o r d^{'} s P r o b a b i l i t y = \frac{T i t l e w o r d^{'} s C o u n t i n w h o l e s t e m}{S t e m l e n g t h}

(2)

The third equation (EquationEq. 3)(3) $S t e m L e n g t h^{'} s P r o b a b i l i t y = \frac{T o t a l w o r d c o u n t i n S t e m}{W o r d c o u n t i n t h e l o n g e s t s t e m}$ (3) is used to calculate the probability of stem length, i.e., the length of sentences in a text. Such a probability relies on the word pattern, and the result would be the best ranking sentence involving the target word.

(3)

S t e m L e n g t h^{'} s P r o b a b i l i t y = \frac{T o t a l w o r d c o u n t i n S t e m}{W o r d c o u n t i n t h e l o n g e s t s t e m}

(3)

In applying Eq. 1, Eq. 2, and Eq. 3 respectively, we found that the repetitive, unmarked sentences are typically found within the paragraph, whereas the crucial, marked sentences are typically peripheral, i.e., either at the beginning or the end of the paragraph. Further to this, all sentences containing significant quantitative data are marked as crucial to the text’s overall meaning and, therefore, are more likely to appear in the text summary.

The fourth equation (EquationEq. 4)(4) $Numerical Data= N u m e r i c a l D a t a (N o . S T o t a l L e n g t h o f S t e m$ (4) is used to calculate the ratio of the numerical data (e.g., Roman numerals and digits) in light of the length of sentences. This step would help with offering numerical facts in the summary of a text.

(4)

Numerical Data= N u m e r i c a l D a t a (N o . S T o t a l L e n g t h o f S t e m

(4)

All given data were then extracted using different sources, and based on such data, all questions shown in Table were prepared.

Having all previous equations applied to the whole data set represented by summarisation methods, findings showed that the largest recorded ranges were for “removal of stop word” and stem making’, while the formation of the stem was also less than that. The other least values included “stem by metric term”, “words collection”, “POS tagging”, “stem selection”, and “stem segments.” Table offers the range of all the parameters characteristic of the text summarisation methods under investigation.

Table 4. Equation-based assessment of summarisation methods

Download CSV Display Table

All the parameters of the sentence-making process and its assessment were also measured by the Hidden Markov Model (HMM). This model is based on summary and non-summary-based research. The HMM had the following organisational scheme: 2s+1 states (SS-Summary State) and s + 1 (NSS-Non-Summary States) alternated. Only in non-summary states and only in review states, previous studies (e.g., Josef & Ježek, Citation2009; Shimpikar & Govilkar, Citation2017) permitted “hesitation” and “skipping next state”. A sample HMM with 7 nodes and s = 3 is shown in . The transition matrix estimate, whose elements (i, j) reflect the empirical chances of switching from state i to j, was formed by the authors using the TREC (Text REtrieval Conference) dataset as the training corpus and the maximum-likelihood estimate for each transition probability. The characteristics were assumed to be multivariate normal as a simplification. The matrix was computed using the training data to predict the output function for each state. Although they made the assumption that each output function had the same covariance matrix, they predicted 2s+1 means. Evaluation was carried out by contrasting them with extracts produced by humans.

Figure 3. The Hidden Markov Model (HMM) for the extraction of three lead sentences.

5.2. Comparison of journals targeting ATSMs

Likewise, we compared all articles that have already been published in different journals on automatic text summarisation methods. To collect the diverse text summarisation methods, the two following equations are used.

P r e c i s s i o n = C o r r e c t O r d e r / C o r r e c t O r d e r + W r o n g O r d e r

R e c a l l = C o r r e c t O r d e r / C o r r e c t O r d e r + M i s s e d O r d e r

The distribution of the articles addressing text summarisation methods across languages (mainly the graph method, rephrasing, and template-based) published in different journals is shown in Table . Given the parameters of perceision and recall, we found that both IEEE and ACM had the largest number of articles; other journals had fewer publications.

Table 5. Comparison of the precision and recall of summarisation methods across languages including Arabic

Download CSV Display Table

Similar findings have been reached about Indian languages. The entire project was based on a syntactic model with rich semantic graphs used to identify sentences and languages. Another method of auto summarisation was also proposed for different languages, in which linguistic processing was used to get a grip on each sentence with the help of triples of each sentence in order to get the exact summary of the abstract (Finkel et al., Citation2005).

As demonstrated in most of the previous studies that addressed automatic summarisation methods in different languages employed extractive summarisation techniques. The prior context from the original content is included in the reward systems in the summary-specific continuity. Although these variables have excellent linguistic quality prediction accuracy, it should be noted that the supporting context frequently contains less significant content. As a result, there is a conflict between such methods for improving content and methods for improving linguistic quality, which calls for the creation of further abstractive approaches (Dernoncourt et al., Citation2018).

For each database, the following features are specified: The database URL, the database name, the collection of documents, the vocabulary used, the input domain (such as news or blogs), the linguistic used in the database, the number of paperwork, and if the data source allows single-document and/or multi-document summarisation. With the exception of the datasets “EASC,” “SummBank,” “CAST,” and “CNN-corpus,” whose features were taken from the relevant papers and webpages, the first five features in (Table ) were filled from (Saggion, Citation2009). A multi-document summarisation dataset’s document count is expressed as “30x10,” which denotes that there are roughly 10 documents in each of the dataset’s 30 document clusters.

Table 6. ATSMs based on standard databases

Download CSV Display Table

Our comparison showed that there is a considerable negative correlation between discourse connectives and structure/coherence. This can be explained by the fact that an extracted summary frequently lacks the necessary context. An abstractive system, however, may organise a discourse structure and include the proper connectives (Karmakar et al., Citation2015). The good prediction accuracy for the data analysis and further higher precisions for the performance assessment show that issues about the language quality of summaries may be answered logically with the help of current computational approaches. Automatic assessment will make testing easier throughout system development, and data obtained outside of NIST evaluation cycles can also be reported (Martschat & Markert, Citation2018). The evaluation process is required to understand the summarisation methods. All these summarisation methods are verified even after the computer evaluation. This evaluation may be carried out by adopting different techniques, and these may be evaluated by adopting a proper formula that may have the correct and wrong order of the sentence (Pitler et al., Citation2010; Tas & Kiyani, Citation2017). In this regard, Elbarougy et al. (Citation2020) affirmed that the TLS’s evaluation metrics are inadequate. They discovered metrics’ flaws through a formal and empirical analysis of the literature, and therefore they created a family of ROUGE variations based on alignment that are specific to TLS. Using a set of task-specific tests, they discovered that these measures behaved as anticipated.

All of the previous methodologies in the field of Arabic text summarisation work in a similar way. It was noted that the findings of previous studies were similar in different ways. The questionnaire method is based on determining which features from the dataset are most appropriate to use in summarizing. This step can be accomplished by selecting features, discovering new features, maximizing frequently used features, feature engineering, using features for semantics and linguistics, finding features to produce coherent sentences, and adding grammatical features. The problem is that any dataset needs to be pre-processed using proper stemming in addition to POS tagging in order to categorise word classes such as nouns, verbs, adjectives, etc. in order to prevent word removal and tokenizing. Equally important, it is quite difficult to experiment with collaborative statistical techniques and machine learning for extracting summaries. The most recent method may include the Hidden Markov Model (HMM), and assessment through the equation used for abstractive summarising in various situations in order to enhance these models and their performance by supporting their coherence.

6. Conclusion and future research

This study aimed at fulfilling two main objectives: (a) describing and comparing the procedures followed in recurrent automatic text summarisation methods and techniques; and (b) outlining the requirements that must be met by an automatic summarisation method or technique to offer highly qualified summaries of texts in different languages, including Arabic. Regarding the first objective, our comparative method has been empirically proven to be effective in outlining the procedure followed in extractive (OTExtSum), abstractive (TAAM), and real-time (OntoRealSumm) summarisation techniques. Regarding the second objective, the study affirmed that each summarisation technique requires different processing measures to be followed in order to offer highly qualified textual summaries in different languages, including Arabic. As far as Arabic is concerned, the statistical analysis has been proven to be the most effective summarisation method (accuracy = 57.59%; reminiscence = 58.79%; F-Value = 57.99%). Furthermore, the abstractive method, rather than the extractive method, offered more reliable summaries as it managed to organise the discursive structure of the target documents. However, our assessment highlighted the existing limitations of the current techniques and algorithms. Further research is required to explore how the lexicogrammatical nature of languages and generic text structure would affect the text summarisation process. Also, building an Arabic golden standard corpus would help with more refined and systematic comparison of the current ATSMs.

Acknowledgements

This study is supported via funding from Prince Sattam bin Abdulaziz University, project number (PSAU/2023/R/1444).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Ab, A., & Sunitha, C. (2020). An overview on document summarisation techniques. International Journal of Recent Advances in Engineering & Technology, 8(3), 31–14. https://doi.org/10.46564/ijraet.2020.v08i03.007
Google Scholar
Al Qassem, L. M., Wang, D., Al Mahmoud, Z., Barada, H., Al-Rubaie, A., & Almoosa, N. I. (2017). Automatic Arabic summarisation: A survey of methodologies and systems. Procedia Computer Science, 117, 10–18. https://doi.org/10.1016/j.procs.2017.10.088
Google Scholar
Alami, N., Meknassi, M., En-nahnahi, N., El Adlouni, Y., & Ammor, O. (2021). Unsupervised neural networks for automatic Arabic text summarisation using document clustering and topic modeling. Expert Systems with Applications, 172, 114652. https://doi.org/10.1016/j.eswa.2021.114652
Web of Science ®Google Scholar
Alkhudari, A. (2020). Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis. International Journal of Engineering & Technology, 9(2), 342. https://doi.org/10.14419/ijet.v9i2.30324
Google Scholar
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Using text summarisation methods: A brief survey. International Journal of Advanced Computational Sciences, 8, 1784–1799. https://doi.org/10.48550/arXiv.1707.02268
Google Scholar
Bhat, I. K., Mohd, M., & Hashmy, R. (2017). SumItUp: A hybrid single-document text Summarizer. Advances in Intelligent Systems and Computing, 619–634. https://doi.org/10.1007/978-981-10-5687-1_56
Google Scholar
Binwahlan, M. S., Salim, N., & Suanmali, L. (2010). Fuzzy swarm diversity hybrid model for text summarisation. Information Processing & Management, 46(5), 571–588. https://doi.org/10.1016/j.ipm.2010.03.004
Web of Science ®Google Scholar
Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. https://doi.org/10.1145/290941.291025
Google Scholar
Chellal, A., Boughanem, M., & Dousset, B. (2016). Multi-criterion real time tweet summarisation based upon adaptive threshold. 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI). https://doi.org/10.1109/wi.2016.0045
Google Scholar
De Maio, C., Fenza, G., Loia, V., & Parente, M. (2016). Time aware knowledge extraction for microblog summarisation on Twitter. Information Fusion, 28, 60–74. https://doi.org/10.1016/j.inffus.2015.06.004
Web of Science ®Google Scholar
Dernoncourt, F., Ghassemi, M., & Chang, W. (2018). A repository of corpora for summarisation. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 55–62.
Google Scholar
Dutta, S., Chandra, V., Mehra, K., Ghatak, S., Das, A. K., & Ghosh, S. (2018). Summarizing Microblogs during emergency events: A comparison of extractive summarisation algorithms. Advances in Intelligent Systems and Computing, 859–872. https://doi.org/10.1007/978-981-13-1498-8_76
Google Scholar
Elbarougy, R., Behery, G., & El Khatib, A. (2020). Extractive Arabic text summarisation using modified PageRank algorithm. Egyptian Informatics Journal, 21(2), 73–81. https://doi.org/10.1016/j.eij.2019.11.001
Web of Science ®Google Scholar
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679
Web of Science ®Google Scholar
Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan, 363–370.
Google Scholar
Gambhir, M., & Gupta, V. (2016). Recent automatic text summarisation techniques: A survey. Artificial Intelligence Review, 47(1), 1–66. https://doi.org/10.1007/s10462-016-9475-9
Web of Science ®Google Scholar
Goldstein, J., Mittal, V., & Carbonell, J. (2011). Sentence-based multi-document summarisation. Journal of American Society for Information Sciences, 21, 187–194.
Google Scholar
Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000). Multi-document summarisation by sentence extraction. NAACL-ANLP 2000 Workshop on Automatic Summarisation. https://doi.org/10.3115/1567564.1567569
Google Scholar
Hahn, U., & Mani, I. (2000). The challenges of automatic summarisation. Computer, 33(11), 29–36. https://doi.org/10.1109/2.881692
Web of Science ®Google Scholar
Hovy, E., & Lin, C. (1996). Automated text summarisation and the summarist system. Proceedings of a Workshop on Held at Baltimore, Maryland October, 13–15. https://doi.org/10.3115/1119089.1119121
Google Scholar
Jezek, K., & Steinberger, J. (2008). Automatic summarizing: (The state of the art 2007 and new challenges). Znalosti, (1–12).
Google Scholar
John, A., Premjith, P., & Wilscy, M. (2017). Extractive multi-document summarisation using population-based multicriteria optimization. Expert Systems with Applications, 86, 385–397. https://doi.org/10.1016/j.eswa.2017.05.075
Web of Science ®Google Scholar
Josef, S., & Ježek, K. (2009). Evaluation measures for text summarisation. Computing and Informatics, 28, 251–275.
Web of Science ®Google Scholar
Karmakar, S., Lad, T., & Chothani, H. (2015). A review paper on extractive techniques of text summarisation. International Research Journal of Computer Science (IRJCS), 1, 2.
Google Scholar
Khan, A., Salim, N., & Farman, H. (2016). Clustered genetic semantic graph approach for multi-document abstractive summarisation. 2016 International Conference on Intelligent Systems Engineering (ICISE). https://doi.org/10.1109/intelse.2016.7475163
Google Scholar
Lagrini, S., Redjimi, M., & Azizi, N. (2018). A survey of extractive Arabic text summarisation approaches. Communications in Computer and Information Science, 159–171. https://doi.org/10.1007/978-3-319-73500-9_12
Google Scholar
Lin, H., & Ng, V. (2019). Abstractive summarisation: A survey of the state of the art. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 9815–9822. https://doi.org/10.1609/aaai.v33i01.33019815
Google Scholar
Lin, L., Lin, C., & Lai, Y. (2018). Realtime event summarisation from tweets with inconsistency detection. Conceptual Modeling, 555–570. https://doi.org/10.1007/978-3-030-00847-5_41
Google Scholar
Martschat, S., & Markert, K. (2018). A temporally sensitive Submodularity framework for timeline summarisation. Proceedings of the 22nd Conference on Computational Natural Language Learning. https://doi.org/10.18653/v1/k18-1023
Google Scholar
Mohan Kalyan, V., Santhaiah, C., Naga Sri Nikhil, M., Jithendra, J., Deepthi, Y., & Krishna Rao, N. V. (2021). Extractive summarisation using frequency driven approach. Machine Learning Technologies and Applications, 183–191. https://doi.org/10.1007/978-981-33-4046-6_18
Google Scholar
Nathani, B., Joshi, N., & Purohit, G. (2020). Design and development of unsupervised stemmer for Sindhi language. Procedia Computer Science, 167, 1920–1927.
Google Scholar
Pitler, E., Louis, A., & Nenkova, A. (2010). Multi-document summarisation automatic linguistic quality evaluation. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 544–554.
Google Scholar
Qaroush, A., Abu Farha, I., Ghanem, W., Washaha, M., & Maali, E. (2021). An efficient single document Arabic text summarisation using a combination of statistical and semantic features. Journal of King Saud University - Computer and Information Sciences, 33(6), 677–692.
Web of Science ®Google Scholar
Ren, Z., Liang, S., Meij, E., & Rijke, M. (2013). Time-aware tweet summary that is personalised. 36th International Conference on Research and Development in Information Retrieval, ser. SIGIR ‘13, New York, ACM, 513–522.
Google Scholar
Rudrapal, D., Das, A., & Bhattacharya, B. (2019). A new approach for Twitter event summarisation based on sentence identification and partial textual entailment. Computación y Sistemas, 23, 3. https://doi.org/10.13053/cys-23-3-3275
Web of Science ®Google Scholar
Saggion, H. (2009). A classification algorithm for predicting the structure of summaries. Proceedings of the 2009 Workshop on Language Generation and Summarisation, Suntec, Singapore, 31–38.
Google Scholar
Sarkar, K. (2010). Syntactic trimming of extracted sentences for improving extractive multi-document summarisation. Journal of Computing, 2, 177–184.
Google Scholar
Shimpikar, S., & Govilkar, S. (2017). A study of Indian regional language text summarising methods. International Journal of Computer Applications, 165(11), 29–33. https://doi.org/10.5120/ijca2017914083
Google Scholar
song, S., Huang, H., & Ruan, T. (2018). Abstractive text summarisation using LSTM-CNN based deep learning. Multimedia Tools and Applications, 78(1), 857–875. https://doi.org/10.1007/s11042-018-5749-3
Web of Science ®Google Scholar
Sun, Y., Yang, F., Wang, X., & Dong, H. (2021). Automatic generation of the draft procuratorial suggestions based on an extractive summarisation method: Bertslca. Mathematical Problems in Engineering, (2021, 1–12. https://doi.org/10.1155/2021/3591894
Web of Science ®Google Scholar
Tas, O., & Kiyani, F. (2017). A survey automatic text summarisation. Press Academia Procedia, 2nd World Conference on Technology, Innovation and Entrepreneurship, Istanbul, Turkey, 5(1), 204–213. https://doi.org/10.17261/Pressacademia.2017.591
Google Scholar
Van Lierde, H., & Chow, T. W. S. (2019). Query-oriented text summarisation based on hypergraph transversals. Information Processing & Management, 56(4), 1317–1338. https://doi.org/10.1016/j.ipm.2019.03.003
Web of Science ®Google Scholar
Verma, P., & Om, H. (2019). MCRMR: Maximum coverage and relevancy with minimal redundancy based multi-document summarisation. Expert Systems with Applications, 120, 43–56. https://doi.org/10.1016/j.eswa.2018.11.022
Web of Science ®Google Scholar
Widjanarko, A., Kusumaningrum, R., & Surarso, B. (2018). Multi document summarisation for the Indonesian language based on latent dirichlet allocation and significance sentence. 2018 International Conference on Information and Communications Technology (ICOIACT). https://doi.org/10.1109/icoiact.2018.8350668
Google Scholar
Yang, S., Yim, J., Kim, J., & Shin, H. V. (2022). CatchLive: Real-time summarisation of live streams with stream content and interaction data. CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3517461
Google Scholar

Exploring the efficacy and reliability of automatic text summarisation systems: Arabic texts in focus

Abstract

1. Introduction

2. Literature review

3. Theoretical underpinnings

4. Methods

Table 1. Data sources used for building the study’s questionnaire on ATSMs

Table 2. Alignment of the questionnaire’s questions about ATSMs with their purposes

5. Analysis and discussion

Table 3. Comparison of different summarisation techniques in Arabic language*

5.1. Features extraction

Table 4. Equation-based assessment of summarisation methods

5.2. Comparison of journals targeting ATSMs

Table 5. Comparison of the precision and recall of summarisation methods across languages including Arabic

Table 6. ATSMs based on standard databases

6. Conclusion and future research

Acknowledgements

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Exploring the efficacy and reliability of automatic text summarisation systems: Arabic texts in focus

Abstract

1. Introduction

2. Literature review

3. Theoretical underpinnings

4. Methods

Table 1. Data sources used for building the study’s questionnaire on ATSMs

Table 2. Alignment of the questionnaire’s questions about ATSMs with their purposes

5. Analysis and discussion

Table 3. Comparison of different summarisation techniques in Arabic language*

5.1. Features extraction

Table 4. Equation-based assessment of summarisation methods

5.2. Comparison of journals targeting ATSMs

Table 5. Comparison of the precision and recall of summarisation methods across languages including Arabic

Table 6. ATSMs based on standard databases

6. Conclusion and future research

Acknowledgements

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date