224
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An Event Log Repair Method Based on Masked Transformer Model

, ORCID Icon, , &
Article: 2346059 | Received 01 Sep 2023, Accepted 15 Apr 2024, Published online: 14 May 2024

ABSTRACT

The effectiveness of business process analysis heavily relies on the quality of event logs. However, the presence of outliers and missing values often compromises the integrity of event logs, consequently exerting adverse effects on process analysis and associated decision-making. Existing log repair research mainly focuses on the reconstruction of missing activity, whereas few efforts are carried out from the perspective of predicting missing activity. This paper introduces a log repair approach based on a masked Transformer, which innovatively combines the self-attention mechanism of Transformers with the task of event log repair. Firstly, by employing various masking strategies, we simulate diverse low-quality event log scenarios that may occur in practical situations. Subsequently, a Masked Language Model is trained on preprocessed datasets to predict masked activities by leveraging contextual information within traces, thereby capturing behavioral information of activities in the event log. Upon completion of model training, we apply it to real event log data for repair tasks. The proposed approach, originating from the perspective of event logs, does not rely on any a priori knowledge related to business process models for generating event logs. Experimental results demonstrate that the masked Transformer-based approach outperforms baseline methods in most event log repair tasks.

Introduction

Process mining emerges as a research discipline situated at the intersection of machine learning, data mining, process modeling, and analysis. The objective is to extract valuable insights from event logs to facilitate the discovery, monitoring, and improvement of real-world processes, particularly those that are not presupposed or predefined. In recent times, process mining has garnered significant attention in both research and practical applications as a methodology encompassing techniques for scrutinizing business processes (Aalst Citation2016). In modern information systems, event logs are readily accessible and applicable across three distinct process mining scenarios: discovery, conformance, and enhancement (Van Der Aalst Citation2011). The reliability of process mining outcomes hinges on high-quality event logs as the foundational data. Nevertheless, in practice, specific errors within event logs often prove unavoidable, especially during the construction of event logs through the integration of multiple heterogeneous data sources or manual log recordings. The investigation of methods to enhance event log quality thus becomes imperative, as it directly enhances the analytical precision of business processes. The adage “Garbage-in, Garbage-out” holds as much significance for process mining as for other forms of computerized data analysis. In process mining, the ‘input’ consists of event logs (Suriadi, Andrews, and Hofstede et al. Citation2017). Therefore, without ensuring the adequate quality of event logs, they should not be directly utilized for process mining (Fischer, Goel, and Andrews et al. Citation2020). Furthermore, the process models extracted from low-quality event logs must faithfully represent the genuine nature of the business processes. Thus, the repair of event logs of poor quality forms an essential research field within the realm of process mining.

The self-attention mechanism has made remarkable advancements in natural language processing in recent years. The self-attention mechanism is a sequence modeling approach based on the attention mechanism, which effectively captures important information within input sequences. It has found a wide range of applications within natural language processing, image processing, audio processing, and other domains, becoming one of the current research hotspots (Vaswani, Shazeer, and Parmar et al. Citation2017). BERT (Devlin, Chang, and Lee et al. Citation2018) (Bidirectional Encoder Representations from Transformers) is a Masked Language Model (MLM) that leverages both forward and backward contextual information during the pre-training process to predict masked activities within a candidate set. Through this process, BERT can acquire semantic text representations and demonstrate a specific level of error correction capability, can be used to capture normal patterns in the event log (Guo, Yuan, and Wu Citation2021). Considering that repairing low-quality event logs necessitates the correction of behavioral constraints learned by the model, BERT’s intrinsic error correction capability is inherently applicable to log repair tasks. Additionally, the attention mechanism can enhance the interpretability of the repair results (Hao, Dong, and Wei et al. Citation2021). Inspired by the Masked Language Modeling (MLM) concept in BERT, we propose a Masked Transformer-based method for event log repair. Our method focuses on repairing event logs through Predictive Process Monitoring. The proposed method is based on the Masked Language Model (MLM) in Natural Language Processing (NLP), which is extended to the domain of event log repair. Specifically, since our method does not rely on clean datasets for training and is similar to previous research (Nguyen et al. Citation2019), It is assumed that the utilized event logs are comprehensive, implying the absence of missing or outlier values. To start with, we utilize a masking mechanism to simulate the acquisition of low-quality event logs in real-world operations. Each masking strategy corresponds to scenarios involving missing values and outliers in low-quality event logs. In contrast, the masking ratio represents the proportion of noise present in the low-quality event logs. Subsequently, we employ the self-attention mechanism from the Transformer to process the input traces of low-quality event logs, capturing long-range dependencies between activities. This process allows for parallel computation, thereby enhancing training efficiency. Furthermore, the MLM task effectively utilizes contextual information from the missing portions of event log traces to predict missing values or anomalies in the event logs, ultimately achieving a high-performance event log repair model. The entire model captures the unsupervised behavioral information among event log activities without requiring prior knowledge about the domain of generating log processes. The application of our method holds relevance across various practical scenarios, including: (1) Business process optimization. Within business processes, the model proves beneficial in predicting and filling in missing activities or unrecorded stages, enhancing the analysis of business process integrity; (2) Information system and record restoration. In information systems, the model serves to repair log data losses resulting from errors or system failures, thereby maintaining system integrity and reliability; (3) Industrial production and manufacturing. In industrial production, the recording of production processes and activities is crucial. The model assists in repairing missing records caused by equipment malfunctions or other issues, ensuring the integrity and traceability of the production process; (4) Health and medical applications. In the medical field, maintaining the integrity and accuracy of patient diagnosis and treatment records is paramount for medical decision-making and analysis. Leveraging the model to repair potential diagnostic and treatment activities that may be missing due to omissions or errors contributes to upholding the completeness and accuracy of medical records.

The main contributions of this paper are as follows:

(1) We enhance the masking strategy employed in conventional MLM tasks to more accurately replicate the characteristics of low-quality event logs encountered in various real-world scenarios.

(2) We introduce an innovative approach that integrates the self-attention mechanism from Transformers into the event log repair task, presenting a method based on the masked Transformer for log repair. Notably, this marks the inaugural utilization of the self-attention mechanism in Transformers for event log repair.

The remaining sections of this paper are structured as follows. Related Work provides an overview of the research status in the field of event log repair. Preliminaries presents the necessary background for this study. The proposed method is detailed in Proposed method. Experimental evaluation describes the experimental setup and shows the results. Finally, Conclusion summarizes the contributions of this paper and discusses future directions for further research.

Related Work

The process mining manifesto (Van Der Aalst et al. Citation2012) highlighted the need for high-quality event logs for process mining. The manifesto describes five maturity levels ranging from one star to five stars. The concept of maturity refers to the level of readiness for conducting process mining analysis. At the lowest level of maturity (1-star), which involves manual event logging, errors in input events (such as incorrect timestamps or activity labels) or missing events may be encountered. At the highest level of maturity (5-star), event logs are considered complete and accurate, as events are automatically recorded by the system (e.g., process-aware information systems). The authors state that event logs rated as 3, 4, or 5-star are suitable for process mining analysis, while one and 2-star rated logs are probably not suited for use in a process mining analysis (Wynn and Sadiq Citation2019).

Traditional event log repair methods often start from a model-based perspective, where process discovery algorithms are initially employed to obtain process models from event logs. These models are then utilized for consistency checking to detect abnormal behaviors (Bezerra, Wainer, and van der Aalst Citation2009). In (Van der Aalst and de Medeiros Citation2005), the workflow net is mined using the α-algorithm, and specific pattern recognition methods are employed to detect potential anomalous processes, enabling fraud detection. In (Rogge-Solti, Mans, and van der Aalst et al. Citation2013), the approach uses a technique based on Stochastic Petri Nets (SPN) to handle missing noise. It utilizes alignment algorithms and Bayesian networks to repair disappeared activities and timestamps. Probabilistic computations are performed for activity and timestamp attributes in the repair process. However, due to their interdependencies, it results in increased computational complexity. In (Wang, Song, and Lin et al. Citation2015), the adopted approach utilizes a Petri Net-based graph repair method to rectify inconsistent event names and detect potential unsound structures. However, this method needs the capability of automatic repair operations. In (Wang et al. Citation2013), a method for recommending missing activity repair, which combines causal network and branching framework, is proposed. As presented in the literature, this method offers missing activity repair recommendations with a top-k recall rate. Additionally, indexing and pruning techniques are incorporated to enhance the time efficiency of the repair process. In (c, Mans, and van der Aalst Citation2013), an analysis is conducted on 27 types of event log quality issues related to four categories of process features that may occur in logs. Inaccuracies (recorded data that does not comply with event log behavioral constraints) and omissions (relevant data not recorded) are identified as the primary causes for the deterioration of the event log. In (Ghionna, Greco, and Guzzo et al. Citation2008), the integration of frequent execution pattern discovery with a clustering-based anomaly detection process, taking into account the statistical characteristics of the log and the constraints associated with the process model, was proposed in the literature. In (Fani Sani, Zelst, and van der Aalst Citation2018), utilizing the frequency of activity occurrences in specific contexts, the method detects abnormal behavior in the given event data while also allowing for the removal of infrequent behaviors to enhance the model discovery results.

Leveraging process models for event log repair allows for utilizing behavioral information within the event logs while providing strong interpretability during the log repair process. However, such methods rely on event logs that could be more complex. In the real world, actual event logs tend to be large and complex, resulting in inefficient or unusable process models being discovered (such as “spaghetti-like” process models) (Chinces and Salomie Citation2015). As a result, several event log repair methods have emerged that do not rely on process models. In (Liu et al. Citation2021), a missing activity repair method based on the activity inheritance relationship in event logs is proposed. It utilizes an activity relation matrix to represent the event log and clustering and assigns each incomplete trace to the most similar clustering result based on distance measures and trace counts. The missing activities in incomplete traces are then repaired based on the activity relationships observed in complete traces within the clustering result. In (Nolle et al. Citation2018), a method based on denoising autoencoders is proposed for detecting anomalies in business process data. This method does not rely on prior knowledge about the process and does not require training on a clean dataset. In (Nguyen and Comuzzi Citation2018), a method is proposed to clean and reconstruct event logs at the attribute level, using autoencoders to improve the quality of process event logs. These methods are fully automated and do not rely on prior knowledge about the process, resulting in accurate predictive outcomes. In (Guo, Guo, and Yang et al. Citation2023), a new weakly-supervised log anomaly detection framework, called LogLG, is proposed for exploring semantic links between keywords in sequences.

Many researchers have attempted to apply various deep-learning methods to event log repair tasks. However, due to the complexity and diversity of process instances in real event logs, traditional deep learning methods often struggle to effectively capture long-term dependencies between events and face issues such as vanishing gradients (LeCun, Bengio, and Hinton Citation2015). Hence, there is a demand for a novel research methodology that can automatically and swiftly rectify missing and abnormal values in event logs without relying on any a priori knowledge about the underlying processes. During the repair process, it should effectively capture the long-term dependencies between activities in the event logs while utilizing the contextual information of missing and anomalous values to capture the behavioral information between activities, thereby achieving superior event log repair. In 2017, Google introduced the Transformer architecture (Vaswani, Shazeer, and Parmar et al. Citation2017), which utilizes self-attention and multi-head attention mechanisms to handle sequential data. It efficiently captures global dependencies while avoiding the issue of gradient vanishing commonly found in traditional recurrent structures. With the growing popularity of Transformer, numerous deep learning methods based on the Transformer model have emerged, such as BERT (Devlin, Chang, and Lee et al. Citation2018), ELMo (Peters et al. Citation2018), and GPT (Radford, Narasimhan, and Salimans et al. Citation2018). These methods have achieved notable achievements in natural language processing (NLP), primarily due to applying a self-attention mechanism and using pretraining tasks. These approaches typically employ unsupervised learning to acquire language representations from large-scale text corpora and have demonstrated remarkable performance in downstream NLP tasks. These techniques have been applied in the field of business process management. In (Bukhsh, Saeed, and Dijkman Citation2021), a business process behavior prediction model based on the Transformer model is proposed, which leverages an attention-based network to learn high-level representations from event logs. In (Moon, Park, and Jeong Citation2021), GPT-2 is used to predict the next activity in the business process and achieve performance above the benchmark. In (Chen, Fang, and Fang Citation2022), a multi-task prediction method based on BERT and transfer learning is proposed. It utilizes the attention mechanism in the Transformer to capture long-term dependencies among activities, resulting in a general representation model for traces. Subsequently, two separate models are defined for the prediction tasks of the next activity and case outcome, respectively. The pre-trained model is fine-tuned using a transfer learning strategy to train the two prediction models. This approach demonstrates excellent performance across different tasks and enables faster application to various prediction tasks, saving significant time and resources. In (van der Aa, Rebmann, and Leopold Citation2021), the detection capability of abnormal behavior is enhanced by leveraging BERT to process textual data in event logs. In (Le and Zhang Citation2021), a Transformer-based classification model is proposed to detect anomalies, which uses a BERT encoder to capture the semantic meaning of the raw log messages and a Transformer to capture contextual information from the log sequences to obtain accurate anomaly detection results. In (Almodovar, Sabrina, and Karimi et al. Citation2024), a model for constructing log anomaly detection (LogFiT) using BERT is proposed to be trained using masked sentence prediction only on normal log data. When confronted with new log data, the model uses top-k token prediction accuracy as a threshold to determine if the log data deviates from normal. In (Guo et al. Citation2024), propose a unified Transformer-based framework for Log anomaly detection (LogFormer), It includes a pre-training stage and the adapter-based tuning stage to improve the generalization of the method to multi-domain logs. In (Zang, Guo, and Yang et al. Citation2024), a unified anomaly detection model, MLAD, for multi-system logs is proposed.Based on this challenging task, MLAD consists of two main components: a transformer and a GMM. A self-concerned mechanism with vector space expansion is designed to learn the internal semantic relationships of multi-system log data, and a GMM is introduced to model the complex distribution of multi-system data. The uncertainty of rare keywords is attended to through the covariance of the GMM, which helps the model avoid the problem of identical shortcuts. In (Guo, Yang, and Liu et al. Citation2023), a large language model, Owl, is proposed that allows anomaly detection on real-time systems and achieves better performance than existing methods.

An evaluation is conducted based on real-world and synthetic event logs, demonstrating the complementary nature of semantic-based anomaly detection to existing frequency-based techniques. In this study, we apply the self-attention mechanism of the Transformer to the task of event log repair and propose a masked Transformer-based approach for log repair. By leveraging the Transformer model’s self-attention mechanism, we can model the interactions between events and learn the behavioral information among them, thereby facilitating the repair of missing values and anomalies in event logs. The entire model operates unsupervised, capturing the behavioral information among event log activities without requiring prior knowledge, such as the composition of the process model that generates the event log.

Preliminaries

Basic Concepts

Process mining techniques are capable of extracting valuable information from event logs. An event log provides detailed information about the activities executed within a single process (Van Der Aalst Citation2011). Within this sub-section, we present the formal definitions of event, trace and event log. First, an event is a record of the execution of an activity. It contains information such as activity, resource, timestamp (Park and Song Citation2020).

Definition 1 (Event (Park and Song Citation2020)). Let ε be the event universe. Events are characterized by various attributes (e.g., activity, originator, timestamp). Let AN be a set of attribute names. For any event eε and any attribute name anAN, πan(e) is the value of an for event e, If e does not contain the attribute an, πan(e)=.

Each event is associated with the process instance (e.g., customer, patient, or student) which has its trace. Trace is a sequence of events. An event log is a collection of process instances. The formal definition is as follows:

Definition 2 (Trace, Event log (Park and Song Citation2020)). Let ε be the set of all finite sequences over ε. A trace, σε, is a finite sequence of events. Each event in a trace appears only once and time is non-decreasing. For σ=<e1,e2,,en>, hdk(σ) consists of first k elements, i.e., hdk(σ)=<e1,e2,,ek>. Let C be the set of traces. An event log L is a collection of traces, i.e., L={σc|cC}.

Transformer

The Transformer model is comprised of two essential components: an encoder and a decoder, both employing the multi-head attention mechanism grounded in self-attention. This research specifically directs its attention to the encoder module within the Transformer model, as illustrated in . The encoder of the Transformer model consists of multiple identical layers, each composed of two sub-layers: a multi-head self-attention layer and a fully connected feed-forward neural network layer. In the multi-head self-attention layer, the input is a sequence (e.g., an activity) of embedded vectors, and the self-attention mechanism is applied to compute context vectors for each position in the sequence. These context vectors can effectively represent the entire sequence, enabling efficient processing of long sequential data. The fully connected feed-forward neural network layer operates on the context vectors obtained from the multi-head self-attention layer, producing output vectors for each position. This layer employs the ReLU activation function, enhancing the model’s capture of non-linear relationships. The output vectors of each layer serve as the input vectors for the next layer, and the final output of the encoder is a sequence of context vectors representing the input sequence. The self-attention mechanism is a computational approach that captures dependencies between different positions within a sequence. In this mechanism, each position can leverage information from all other positions in the sequence to compute its representation. The essence of self-attention is an addressing process where each position has its query vector, key vector, and value vector, obtained from vector representations derived from the input sequence. These query, key, and value vectors are then dot-product with the corresponding query, key, and value vectors of other positions, followed by normalization using the softmax function to obtain attention weights. Finally, the weighted value vectors are summed to generate the output for that position.

Figure 1. The model architecture of Transformer Encoder (based on(Vaswani, Shazeer, and Parmar et al. Citation2017)).

Figure 1. The model architecture of Transformer Encoder (based on(Vaswani, Shazeer, and Parmar et al. Citation2017)).
 Attention(Q,K,V)=softmax(QKTdk)V

The multi-head attention mechanism allows the model to attend to different subspaces of features simultaneously. In this mechanism, the model first maps the input sequence to multiple subspaces of the same dimension. Within each subspace, the model computes a distinct attention output. Finally, the outputs from different heads are concatenated and transformed through a linear layer to obtain the final output. By incorporating multiple heads, the model can effectively handle diverse types of semantic information and learn feature representations from different perspectives, thereby improving the model’s overall performance.

MultiHead(Q,K,V)=Concat(head1,,headh)WO
where headi=Attention(QWiQ,KWiK,VWiV)

In this way, the model can effectively consider the relationships among all positions in the input activity sequence, enabling it to capture the semantics and syntax of the sequence more effectively. The self-attention mechanism allows for parallel computation of attention weights between all positions, significantly reducing computation time. By introducing multiple heads to process the input data, the multi-head attention mechanism enhances the expressive power of the model, enabling it to better adapt to complex tasks. Moreover, the attention weights in the self-attention mechanism reflect the similarity or relevance between each position and other positions, which can be utilized to visualize the important information attended by the model during input processing and enhance model interpretability (Hao, Dong, and Wei et al. Citation2021).

Masked Language Model

In recent years, significant advancements have been made in natural language processing, with one notable achievement being pre-trained language models. Pre-trained language models typically learn language representations through unsupervised learning from large-scale text corpora and have demonstrated outstanding performance in downstream natural language processing tasks. Since the emergence of a series of pre-trained language representation models such as ELMo (Peters et al. Citation2018), GPT (Radford, Narasimhan, and Salimans et al. Citation2018), and BERT (Devlin, Chang, and Lee et al. Citation2018), pre-trained models have shown significantly superior performance over traditional models in the majority of natural language processing tasks. This advancement has garnered increasing attention and is considered one of the most significant breakthroughs in the field of NLP in recent years, representing a crucial milestone in the progress of natural language processing. The Masked Language Model (MLM) is pre-trained. In MLM, the model randomly masks certain words in the input text and predicts the masked words. The masking can be complete (where the masked words are replaced with a unique token), or partial (where a random word replaces only a part of the original word). By predicting the masked words, the MLM enables the model to learn more general and comprehensive language representations, improving its performance on various natural language processing tasks. MLM is used in BERT and other pre-trained language models such as RoBERTa (Liu, Ott, and Goyal et al. Citation2019) and ALBERT (Lan, Chen, and Goodman et al. Citation2019). In this paper, we modify the masking rules of MLM to correspond to different missing values and anomalies in event logs.

Predictive Process Monitoring

Our method focuses on repairing event logs through Predictive Process Monitoring. Predictive Process Monitoring (Di Francescomarino and Ghidini Citation2022) is a branch of process mining that aims at predicting the future of an ongoing (incomplete) process execution. Typical examples of predictions of the future of an execution trace relate to the outcome of a process execution, to its completion time, or to the sequence of its future activities. In recent years, deep learning methods have been increasingly employed in various predictive tasks. Evermann et al. (Evermann, Rehse, and Fettke Citation2016) have made preliminary attempts to predict the next activity of a running trace using Long Short-Term Memory (LSTM) networks. Tax et al. (Tax, Verenich, and La Rosa et al. Citation2017) expanded upon this investigation by employing one-hot encoding for events and LSTM to address diverse prediction tasks in multiple business processes. These tasks including the next activity, the timestamp of the next event, and the remaining time of the case. Camargo et al. (Camargo, Dumas, and González-Rojas Citation2019) improved both of these efforts by encoding the events with an embedding technique that includes the activity attribute and other numerical and categorical attributes, enabling the LSTM to predict the next activity, the timestamp of the next event, and their associated resource. Chen et al. (Chen, Fang, and Fang Citation2022) applied the attention mechanism in Transformers to capture long-term dependencies between activities, resulting in the development of a generalized representation model for traces. Compared with previous LSTM, Bi-LSTM and Word2Vec methods, Self-attention based methods can better capture semantic information from raw logs. It has been widely used in event log anomaly detection studies(Chen and Liao Citation2022; Guo, Lin, and Yang et al. Citation2021). Inspired by this, our approach utilizes self-attention mechanisms to effectively leverage contextual information from missing segments in event log traces. This facilitates the prediction of missing values or anomalies within the event log, ultimately establishing a high-performance event log repair model.

Proposed Method

This section will describe the proposed method for repairing event logs. We will begin by providing an overview of the entire method framework and then focus on the critical components of the method.

Overview

Our method consists of three steps: 1) constructing a dataset of low-quality event logs, 2) training the Masked Transformer-Based Event Log Repair model, and 3) repairing the low-quality event log. illustrates the framework of the proposed method. Our approach does not rely on a clean dataset for training. In order to generate a simulated dataset of low-quality event logs that mirrors real-life scenarios by incorporating anomalies and missing values, we assume the existence of an original high-quality event log devoid of any missing values or anomalies, is denoted as L1. We first introduce various missing values and anomalies into the event log L1 using different masking strategies. The masking ratio is varied to control the proportion of noise present in the event log. Suppose the low-quality event log, obtained after introducing noise, is denoted as L2. From L2, we extract multiple traces denoted as σ, and perform feature encoding on each trace to preprocess the data into a format suitable for deep learning models.

Figure 2. The architecture of the event log repair method.

Figure 2. The architecture of the event log repair method.

In the second step, the preprocessed training data derived from the initial step is inputted into the Masked Transformer-Based Event Log Repair method, facilitating the acquisition of profound representations at the trace level. This stage encompasses the identification and prediction of missing values and anomalies within the low-quality event log.

Finally, the forecasted outcomes of missing values and anomalies, derived from the second step, are stored in the low-quality event log. Subsequently, the corresponding missing values and anomalies in the log are substituted with the predicted values, culminating in the completion of the event log repair process. Let the repaired event log be denoted as L3. We aim to make the repaired event log L3 as close as possible to the high-quality event log L1 that does not contain missing values and anomalies. This study delves into the performance assessment of our proposed method across a multitude of noise injection strategies and varying noise injection ratios. The self-attention mechanism has been innovatively applied to the restoration of low-quality event logs in this manner. The model utilizes a masking mechanism to simulate the potential disruptions that event logs may encounter in the real world. During the event log restoration process, the model can simultaneously consider information at all positions in the traces, facilitating the learning of behavioral relationships between the masked activities and all unmasked activities within the traces. This approach enables the model to predict the masked activities, thereby enhancing the quality of the event logs.

Compilation of a Dataset Comprising low-quality Event Logs

Similar to previous research, we assume that our event logs do not contain any missing values or anomalies. We intentionally inject noise into the logs to create a dataset of low-quality event logs, thereby introducing artificial missing values and anomalies. Bose et al.(c, Mans, and van der Aalst Citation2013) identified four major categories of issues that impact the quality of event logs: missing data (data items not recorded in the event log), incorrect data (data items not accurately recorded in the event log), imprecise data (values recorded are considered too rough or irrelevant), and irrelevant data (data items containing unrelated information). Both missing and imprecise data can be regarded as missing values in the event log, indicating that the corresponding values do not exist. Incorrect and irrelevant data can be seen as anomalies in the event log, indicating that the corresponding values exist but are inaccurate.

During the pretraining phase, the BERT model incorporates two self-supervised tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). Within BERT, MLM involves predicting masked words in a self-supervised manner. NSP, another self-supervised task in BERT, aims to assess the coherence between two sentences, enhancing BERT’s capability in natural language understanding tasks. In this study, as activities represent the fundamental attributes in each event log, our emphasis lies in rectifying missing values and anomalies within the activity attributes of event logs. Consequently, we adopt a methodology akin to that employed by Chen et al. (Chen, Fang, and Fang Citation2022), exclusively utilizing the MLM pretraining task of the BERT model to address the repair of missing and anomalous activities in event logs.

Consider a trace σ=<e1,e2,,el> of length l, and eE. The activity sequence in trace is <a1,a2,,al>, and aA. In the traditional MLM task, an 80%-10%-10% masking strategy is employed, where 80% of the masked portions are directly replaced with “[MASK],” 10% are replaced with other activities, and another 10% are replaced with random activities. The original activities are retained for the remaining 80%. Inspired by the masking strategy in MLM, we adopt a similar approach to inject noise into event logs that do not contain missing or abnormal activities. This allows us to create low-quality event logs that simulate real-life scenarios affected by missing and abnormal values. In the context of missing values in event logs, our approach involves replacing the original values with “[MASK]” to simulate missing values encountered in real-world operations. For instance, when adding a missing value to trace σ, the activity sequence is transformed into <a1,[MASK],,al>. In the case of abnormal values in event logs, our approach involves replacing the original activity names in the event log with any activity name from the activity set to simulate abnormal values encountered in real-world operations. For instance, when adding an abnormal value to trace σ, the activity sequence is modified to <a1,ak,,al>, where the original activity ‘a2’ is replaced with any activity ‘ak’ from the activity set A.

As depicted in , this study expands upon the conventional 80%-10%-10% masking strategy by incorporating two supplementary masking strategies. The first strategy uniformly substitutes values with the “[MASK]” token, while the second strategy uniformly replaces values with random tokens. These additional strategies are designed to emulate instances of low-quality event logs in real-world operational settings, particularly attributed to the presence of missing values or abnormal values.

Figure 3. Masking strategies for different noise additions.

Figure 3. Masking strategies for different noise additions.

To investigate the performance of our method under different noise injection strategies and noise injection ratios, we extended the conventional BERT-based approach that utilizes a 15% masking ratio. We introduced additional masking ratios of 30%, 35%, 40%, and 50%. Furthermore, we compared the performance of our method against baseline methods in terms of their ability to repair these low-quality event logs. For more detailed information, please refer to Experimental evaluation.

The event log L represents a set of events, where each event is linked to form a trace. We perform trace-level feature engineering to preprocess the noise-added event log L2 into a data format suitable for deep learning models. Feature engineering (Guyon and Elisseeff Citation2003) typically involves preprocessing and transforming the raw event log data to extract informative representations that capture the underlying features of the data, thereby facilitating subsequent tasks such as training and prediction using deep learning models. Firstly, we obtain a dataset grouped by traces from the event log L2, and extract activity sequences from each trace while recording the length of the most extended activity sequence within these sequences. As traces represent the execution of business process flows, the obtained activity sequences are arranged in chronological order based on the occurrence time of activities. In traditional BERT training, specifying a maximum length for the input sentences is common. In the proposed method of this paper, the length of the most extended extracted activity sequence is used as the specified maximum length for model input. If the length of an activity sequence in a trace is smaller than the specified model input length, it is padded with a special token (“[PAD]”). Finally, the vocabulary required for model training is generated based on the frequency of activity occurrences, and additional special tokens such as ‘[PAD]’ are added.

In contrast to BERT’s pretraining task that involves three embedding features, our proposed method deviates from including the Next Sentence Prediction (NSP) pretraining task. Inspired by Chen et al. (Chen, Fang, and Fang Citation2022), our method employs a unit sum of two embedding features as input, namely the activity embedding vector and the positional embedding vector. We encode the activity names and corresponding positions within each trace’s activity sequence. In this study, activities are considered the smallest unit of data. Firstly, the length of the activity sequence <a1,a2,,al> in trace σ, denoted as l, is compared with the specified model input length m. If the length of the activity sequence l is greater than the model input length m, it is necessary to truncate the activity sequence. For example, only a subsequence of activities with a length of is retained. However, in this study, the model input length m is set as the maximum length among all activity sequences, so this situation does not occur. If the activity sequence l length is smaller than the model input length, the insufficient part is padded with ‘[PAD].’ Subsequently, the different activities in the activity sequence <a1,a2,,al,[PAD],,[PAD]> are mapped to distinct activity embeddings, denoted as <Ea1,Ea2,,EPAD>. Due to the lack of explicit temporal ordering in Transformers, unlike recurrent neural networks, they cannot directly capture the sequential relationships in the input sequence. Therefore, positional embedding vectors are introduced to assist the Transformer model in understanding and modeling the order information in the activity sequence. We adopt a learnable encoding scheme that is similar to the activity attributes, where each position in the activity sequence <a1,a2,,al,[PAD],,[PAD]> corresponds to an embedding vector <E1,E2,,Em>. Finally, by element-wise addition of the activity embedding vectors with their respective positional embedding vectors, we obtain the final embedding vector X=<x1,x2,,xm> that serves as the input to the model, where xi=Eai+Ei,i[1,m].

For example, as shown in , let’s assume the extracted activity sequence in σ1 is <A,B,D,E>, and the maximum input length for the model is 8. In this case, the activity sequence will be padded as <A,B,D,E,[PAD],[PAD],[PAD],[PAD]>. Subsequently, positional embeddings are applied to each activity and its corresponding position in the sequence to obtain the final embedding vector, which is provided as input to the Masked Transformer-Based Event Log Repair method for preparing the repair of low-quality event logs. The Masked Transformer-Based Event Log Repair method is constructed in Training the Masked Transformer-Based Event Log Repair method, while the repair of low-quality event logs is discussed in Repairing the low-quality event log.

Figure 4. An example of an input feature.

Figure 4. An example of an input feature.

Training the Masked Transformer-Based Event Log Repair Method

The Masked Transformer-Based Event Log Repair method leverages two embedding features from Masked Language Modeling (MLM) to jointly model the activities and positions within traces. Each activity representation in the current layer is based on all the activities in the previous layer and utilizes a self-attention mechanism to obtain deep trace-level representations. This approach intuitively captures bidirectional semantic information and long-term dependencies among activities in the input traces, thereby improving the performance of event log repair tasks depicted in . The Masked Transformer-Based Event Log Repair method is built upon the MLM pretraining task within the BERT model. It consists of the following hierarchical structures: 1) Token Embedding layer, 2) multiple layers of Transformer Encoder, and 3) Output layer. The Output layer is detailed in Repairing the low-quality event log. As described in Compilation of a dataset comprising low-quality event logs, two embedding vectors are utilized: activity embedding vectors and positional embedding vectors. The Token embedding matrix maps each activity in the activity sequence to a corresponding vector. The position embedding matrix encodes the positional information of activities within the activity sequence into vector representations. Both activity embedding vectors and positional embedding vectors together form the input of the model. The multiple layers of the Transformer Encoder are composed of multiple Transformer Blocks, as illustrated in The Transformer model consists of two components: an encoder and a decoder, which utilize the multi-head attention mechanism based on self-attention. In this paper, we focus solely on the encoder module of the Transformer model, as depicted in . The encoder of the Transformer model consists of multiple identical layers, each composed of two sub-layers: a multi-head self-attention layer and a fully connected feed-forward neural network layer. In the multi-head self-attention layer, the input is a sequence (e.g., an activity) of embedded vectors, and the self-attention mechanism is applied to compute context vectors for each position in the sequence. These context vectors can effectively represent the entire sequence, enabling efficient processing of long sequential data. The fully connected feed-forward neural network layer operates on the context vectors obtained from the multi-head self-attention layer, producing output vectors for each position. This layer employs the ReLU activation function, enhancing the model’s capture of non-linear relationships. The output vectors of each layer serve as the input vectors for the next layer, and the final output of the encoder is a sequence of context vectors representing the input sequence. A self-attention mechanism is a computational approach that captures dependencies between different positions within a sequence. In this mechanism, each position can leverage information from all other positions in the sequence to compute its representation. The essence of self-attention is an addressing process where each position has its query vector, key vector, and value vector, obtained from vector representations derived from the input sequence. These query, key, and value vectors are then dot-product with the corresponding query, key, and value vectors of other positions, followed by normalization using the softmax function to obtain attention weights. Finally, the weighted value vectors are summed to generate the output for that position.

Each Transformer Block (Vaswani, Shazeer, and Parmar et al. Citation2017) contains both a multi-head attention mechanism and a self-attention mechanism. The self-attention mechanism, also known as self-attention, aids the model in better understanding the contextual relationships within the input sequence. Specifically, the model utilizes the input X and employs the attention mechanism to learn three distinct vectors: activity query vector “Q“, activity key vector “K“, and activity value vector “V“.

Qi=XiWQ+bQ
Ki=XiWK+bK
Vi=XiWV+bV

The activity query vector “Q” is utilized to identify relevant vectors and determine the current position’s importance in the input sequence. The activity key vector “K” represents each position’s features in the input sequence. The activity value vector “V” is a vector representation associated with each position’s word vector. By calculating the similarity between the activity query vector “Q” and the activity key vector “K” and using the similarity scores as weights to perform a weighted average of the activity value vector “V“, the self-attention mechanism captures the correlations between different positions in the input sequence. This mechanism enables our model to leverage self-attention to learn the relationships among different activities in the input activity sequence, thus enhancing the model’s understanding of the activity behavioral information in event logs.

Repairing the low-quality Event Log

In the Output layer, predictions are made for the injected noise introduced in Compilation of a dataset comprising low-quality event logs. In this paper, the model processes multiple anomalies batched and returns a matrix where each row represents the probability distribution of the most likely activities for a given anomaly position. This process is typically handled using Softmax. Assuming a sequence containing n anomalous values, where the probability distribution for each anomaly position is a k-dimensional vector, we first concatenate these probability distributions, denoted as n,k, into a matrix P. Let W represent the probability matrix obtained by applying the softmax function to each row of P, where W[i,j] represents the probability value corresponding to P[i,j]. For each anomaly position i, we need to find the j that maximizes P[i,j], indicating the most likely original vocabulary term. We can determine the most probable original vocabulary term by identifying the column index j associated with the highest P[i,j].

j=argmax(W[i,:])

Please note that due to the properties of the softmax function, the elements in W[i,:] sum up to 1. Therefore, j is an integer ranging from 1 to k. Finally, we can replace the anomaly position i in the activity sequence with the predicted j-th vocabulary term from the trace, resulting in the repaired activity sequence. We repair each activity sequence in the event log L2. In this paper, the repaired activity sequences are combined to form a new event log called the repaired event log L3. The objective of the proposed method is to make the repaired event log L3 as close as possible to the noise-free event log L1.

Experimental Evaluation

This section presents two experiments to evaluate our proposed event log repair method. The first experiment’s primary objective is to investigate our method’s effectiveness in repairing event logs containing different types of noise. We simulate different types of noise injected into the event log and evaluate the performance of our repair method. The second experiment aims to study the effectiveness of our method in repairing event logs with varying degrees of noise and compare it against state-of-the-art log repair methods.

Event Logs

The datasets used for the experiments in this paper are all from the publicly available event log dataset from the 4TU Center for Research Data.Footnote1 provides an overview of these datasets.

Table 1. Information of event log.

The event log HelpdeskFootnote2 concerning the ticketing management process of the Helpdesk of an Italian software company. The event log the Business Process Intelligence Challenge(BPIC) 2012Footnote3 contains a personal loan or overdraft application process in a Dutch financial institution. The application process consists of three sub-flows, namely the state of the trace application (A), the state of the work item associated with the application in the trace (W), and the state of the trace offer (O). In addition, the W process contains three different lifecycle transitions, i.e., start, scheduling, and completion, while the A and O processes contain only the events of lifecycle transition completion. The Business Process Intelligence (BPI) 2017 challengeFootnote4 also provided the event log from a German financial institution. The event log provided pertains to a loan application process of a Dutch financial institute. A significant difference is that the company switched systems and now supports multiple offers for a single application (in contrast to 2012, where a workaround was visible in the log). The ‘O’ indicates the state of the offer changes for the application. The raw event log BPIC17_O comes from the Business Process Intelligence Challenge 2017. The Business Process Intelligence 2013 challengeFootnote5 is Volvo’s event log for IT incident and problem management. The Business Process Intelligence 2015 challengeFootnote6 is provided by five Dutch municipalities. The data contains all building permit applications over a period of approximately four years.

Experimental Setup

We implemented the methods in this paper on Python 3.8 using the Tensorflow2 and Keras library. The experimental environment comprises a Linux-based system with an Intel i9 10980XE processor, an NVIDIA RTX 3090 GPU, and 256GB of memory.

Firstly, we sort the event log cases in chronological order to simulate the prediction model using historical cases to predict the future behavior of the current case. Then, as described in Overview., we require a low-quality event log dataset with injected noise for model training. Therefore, we inject noise into the event log according to different noise injection rules and divide it into a training set (80%) and a test set (20%) in chronological order. Within the training set, we use 10% of the data as a validation set for validation and fine-tuning during the training phase. Through this partitioning and injection process, we can construct an event log dataset with various types of noise and anomalies to support the training and evaluation of our method.

presents the hyperparameters used in our proposed method along with their corresponding values. We employ the Adam algorithm for model optimization and train the model using backpropagation. To prevent overfitting, we set the early stopping criterion during the training phase as 10 consecutive epochs with a loss value below 0.001 on the validation set. Additionally, the input sequence length of the model is determined based on the maximum case length in the event log to ensure the model can handle event sequences of different lengths. The selection and configuration of these hyperparameters play a crucial role in the model’s training process, ensuring its performance and effectiveness.

Table 2. Details of masked transformer-based event log repair method.

Evaluation Metrics

The performance evaluation metrics for the model include Accuracy, Precision, Recall, and F1-score. To objectively assess the model’s performance, we utilize a confusion matrix (refer to ) classify the predicted results and calculate the model performance metrics based on the actual data categories (see details in ). These metrics comprehensively evaluate the model’s performance in the classification task, offering a more comprehensive understanding of its accuracy and predictive capability.

Table 3. Confusion matrix(Sokolova and Lapalme Citation2009).

Table 4. Evaluation metrics(Sokolova and Lapalme Citation2009).

Results and Discussion

This section provides a comprehensive validation of the feasibility of the Masked Transformer-Based Event Log Repair method, divided into two parts. Firstly, we investigate the impact of different masking schemes on the repair effectiveness of the proposed method in handling various types of anomalies and missing values in event logs. In the second part, we evaluate the performance of our method in predicting anomalies in event logs using accuracy metrics and compare it with the state-of-the-art event log repair methods. It is important to note that the tables presented in this section report the average performance across 10 replications of each experiment.

The Impact of Injecting Different Types of Noise

As described in Overview, there needs to be more publicly available event log datasets designed explicitly for log repair tasks. Similar to previous studies, we assume that the actual log datasets used in our experiments are anomaly-free and simulate low-quality event logs by injecting different types of anomalies and missing values. However, our injection process differs from previous research, where random injection of anomalies and missing values is commonly used to generate low-quality event logs. In this study, we transform the masking operation used in the BERT model for the masked language modeling pre-training task into the operation of injecting noise into the original event logs. The traditional masking strategy in the BERT model follows an 80%-10%-10% masking scheme, where 80% of the masked portion is directly replaced with “[MASK],” another 10% is replaced with any arbitrary activity, and the remaining 10% retains the original activities. In this paper, we consider replacing activities with “[MASK]” as a representation of missing values and randomly replacing activities with other activities to represent anomalies. Based on this foundation, we propose two additional masking strategies: 1) replacing all activities with “[MASK]” to simulate a low-quality event log where all activities are missing values, and 2) randomly replacing all activities with any other activities to simulate a low-quality event log where all activities are anomalies. By adopting these masking strategies, we aim to create event logs with different noise levels, including missing values and anomalies, to evaluate our method’s effectiveness in handling different types of noise. This approach ensures a more realistic and controllable noise injection process, aligning with log repair’s rigorous experimental design and evaluation principles.

In this section, we employed a masking ratio of 15%, consistent with the masking ratio used in the BERT model, to represent the noise level in the event logs. illustrates the performance of our method when dealing with event logs containing different types of noise, evaluated using precision, recall, F1-score, and accuracy metrics. We conducted experiments on four real-world event log datasets (BPIC2012, BPIC2013, BPIC2017, and Helpdesk) and one artificial event log dataset (Small). To simulate different types of noise, we applied three noise injection strategies: “8_1_1” represents a noise strategy with 80% missing values, 10% anomalies, and the remaining 10% unchanged activities, which introduces both missing values and anomalies into the event log; “Mask_All” indicates that all noise in the event log consists of missing values; “Random_All” indicates that all noise in the event log consists of anomalies. To provide a comprehensive view of the experimental results, we also recorded the execution time of each training epoch for event log repair and presented it in . Our study demonstrates the effectiveness and robustness of the proposed method in the presence of different types of noise. Our method exhibits excellent repair performance on artificial and real-world event log datasets. Our method does not show significant preferences when facing event logs with various types of noise, indicating its robustness and effective feature learning capability. Our method maintains good performance even in the presence of complex and diverse noise. Our method still performs well for event logs with a high number of self-loop activities, such as the BPIC2013 dataset, which poses a challenge due to the strong randomness of self-loop activities. This indicates that our method can learn key features and patterns from the event log and extract useful information. During the repair process, our method performs accurate inference and imputation operations, relying not only on contextual information but also on the ability to fill in anomalies and missing values.These execution time data can be references for further performance optimization and efficiency improvement.

Figure 5. Performance of the model with different noise addition schemes (fixed 15% noise inclusion rate).

Figure 5. Performance of the model with different noise addition schemes (fixed 15% noise inclusion rate).

Table 5. Execution time reported in milliseconds(Fixed 15% noise content).

As shown in , our study demonstrates the effectiveness and robustness of the proposed method in the presence of different types of noise. Our method exhibits excellent repair performance on artificial and real-world event log datasets. Our method does not show significant preferences when facing event logs with various types of noise, indicating its robustness and effective feature learning capability. Our method maintains good performance even in the presence of complex and diverse noise. Our method still performs well for event logs with a high number of self-loop activities, such as the BPIC2013 dataset, which poses a challenge due to the high degree of randomness of self-loop activities. This indicates that our method exhibit excellent performance, showcasing its capability to handle complex and diverse noise effectively. During the repair process, our method performs accurate inference and imputation operations, relying not only on contextual information but also on the ability to fill in anomalies and missing values.

Event Log Repair

To evaluate the feasibility of the Masked Transformer-Based Event Log Repair method in event log repair tasks, we selected three state-of-the-art autoencoder-based methods in the field of event log repair (AE (Nguyen et al. Citation2019), VAE (Nguyen et al. Citation2019), and LAE (Nguyen et al. Citation2019)) as baseline comparisons to provide a reference standard for evaluating our proposed method. hyperparameters and their respective values for the baseline models. Although hyperparameter optimization algorithms are used in autoencoder-based methods, we did not employ such methods in this paper. For further experimental details regarding the baseline models, please refer to the relevant content in reference (Nguyen et al. Citation2019).

Table 6. Details of hyperparameters for the Baseline Methods(Nguyen et al. Citation2019).

The results in demonstrate the accuracy of the proposed method and baseline models in repairing missing activities in event logs with varying noise ratios (i.e., noise rates). The proposed method performs relatively well in handling the task of missing activity repair. It achieves high accuracy across different noise rates, indicating its robustness and adaptability to varying degrees of noise and missing data. As the noise ratio increases, both the proposed method and the baseline models show a slight decrease in repair performance, suggesting that the lower the event log quality, the more challenging the repair task becomes. The Masked Transformer-Based Event Log Repair method performs better than VAE and LAE in most scenarios. Although the proposed method slightly lags behind AE on the BPIC2013 dataset, the difference between them are not significant, indicating comparable performance between the two methods on that specific dataset. Nevertheless, in the majority of cases, our method exhibits superior performance compared to Autoencoder (AE). Although our approach demonstrates exceptional efficacy across various datasets, it is noteworthy that its performance tends to be comparatively lower when confronted with the BPIC2013 dataset. This observation could be attributed to the restricted size of the dataset, thereby constraining the model’s capacity to comprehensively capture the behavioral patterns inherent within it. Drawing upon the information delineated in regarding the number of events and cases within each event log dataset, we are able to conclude that the dimensions of the BPIC2013 dataset imposed limitations on the efficacy of our method. Through a comparative analysis with baseline models, it becomes evident that our proposed method attains superior accuracy in addressing the repair of missing activities, underscoring its rectifying advantage in ameliorating issues related to missing activity repair.

Table 7. Accuracy of missing noise repair.

The execution of processes exhibits substantial variability among each other (Fang et al. Citation2023), through further experimentation, it was observed that our proposed method exhibits poor generalization performance when confronted with event logs characterized by diverse variant distributions. illustrates the performance of the Masked Transformer-Based Event Log Repair method when applied to the BPIC2015 dataset. Our proposed approach achieved notably high Precision and Recall values on this dataset, indicating the model’s capability to accurately predict positive instances and comprehend activity relationships within the event logs. Moreover, the model demonstrated elevated accuracy (Val_Accuracy) on the validation set. However, a substantial decline in accuracy was observed on the test set. This discrepancy may be attributed to the unique characteristics of the BPIC2015 dataset, where the proximity in variant count and trace count, coupled with the limited volume of event log data, leads to the imperfect applicability of the behavioral information learned during model training to the test set. Furthermore, the model’s high F1_score on the test set underscores its proficiency in learning from recurring events, whilst indicating challenges in effectively predicting new events.

Table 8. Performance of the model on the BPIC2015 dataset.

In process mining tasks, improving the quality of event logs is crucial for process discovery. To qualitatively evaluate the impact of log quality on process discovery and demonstrate the effectiveness of the proposed event log repair method in this study, we selected the artificial event log “SMALL” as the subject of analysis. We compared the Petri net models mined from the original event log without anomalies, the Petri net models mined from the low-quality event log with 10% injected noise, and the Petri net models mined from the event log repaired using the Masked Transformer-Based Event Log Repair method. We employed the Heuristics Miner algorithm (Weijters, van Der Aalst, and De Medeiros Citation2006) as the process discovery algorithm for process discovery. The Heuristics Miner algorithm is able to achieve a relatively high degree of fitting and precision in the presence of noise. The process discovery algorithm was implemented by invoking relevant methods from the PM4PYFootnote7 library (Berti, van Zelst, and Schuster Citation2023), in the Heuristics Miner algorithm, the “dependency threshold” and “loop two threshold” parameters are set to default values of 0.5, while the “and threshold” parameter is set to the default value of 0.65. This is designed so that the algorithm performs well in various application scenarios. The experimental results are presented in .

Figure 6. Process models mined using the small artificial event log.

Figure 6. Process models mined using the small artificial event log.

Conclusion

Event log repair is essential for the vast majority of real-world process mining applications. In this paper, we introduce a novel machine learning approach for event log repair, featuring a masked Transformer-based method. Inspired by the masked language model in BERT, our method utilizes a masking mechanism to simulate low-quality event logs with missing and anomalous values. We then employ the self-attention mechanism in Transformer to model the interactions among events and repair the missing and anomalous values in the low-quality event log. Unlike traditional event log repair methods, our proposed approach does not require pre-defined thresholds and does not rely on any prior knowledge.

In this study, we innovatively employed a masked language model to simulate low-quality event logs that may exist in real-world scenarios. Leveraging the self-attention mechanism, we systematically learned activity patterns within the event logs. Through the application of diverse masking strategies, we acquired low-quality event logs characterized by varying degrees of missing and anomalous values. Notably, our proposed method exhibited impartiality in repairing low-quality event logs across different scenarios, demonstrating consistent and reliable performance. Furthermore, the approach maintained its efficacy in the face of complex and diverse noise. The experimental findings underscore the robustness and effective feature learning capabilities of the Transformer-based event log repair method utilizing masking techniques. Compared to the baseline methods, we simulated event logs with different noise ratios (i.e., noise levels) by adjusting the masking ratios. The experimental results show that the masked Transformer-based event log repair method outperforms the baseline methods in most event log repair tasks. Specifically, the proposed method achieves repair accuracies greater than 0.936 for the BPIC2012 event log dataset under different noise ratios, whilst the best performance of the baseline methods is 0.792.

Although we have demonstrated the effectiveness and superiority of the masked Transformer-based event log repair method, there are still certain limitations. This paper focused only on repairing the activity attribute while overlooking other essential attributes in the event log. This results in a diminished performance of our proposed approach when confronted with more intricate event logs, such as the BPIC 2015 event log. To address this challenge in future works, we intend to explore additional attributes within the event logs (Aversano et al. Citation2023), such as the timestamp of event occurrences and the resources required for event execution (Fang, Fang, and Lu Citation2022). Simultaneously, we plan to incorporate novel techniques to address the issue of uneven distribution within the event log dataset. For instance, we aim to implement methods like using a generator from Generative Adversarial Networks (GAN) (van Dun et al. Citation2023) to create new traces, thereby balancing the data distribution within the event log. A second limitation pertains to our proposed method being predicated on the assumption that the training dataset is devoid of anomalies and missing values. However, if the event log training dataset contains noise, omissions, or errors, the restorative model may be adversely impacted, leading to potential inaccuracies in predicting missing activities. Consequently, in practical applications, the incorporation of domain-specific expertise may facilitate the design of more effective models and the implementation of more suitable restoration strategies. A third limitation arises from the diverse nature of anomalies and errors that may occur in practical operations. The masking strategies employed in this study can only simulate the types of anomalies and missing values that are prevalent in most scenarios. However, there are instances where random masking may not be the optimal choice. In future research, it would be beneficial to explore more intelligent masking methods to enhance the adaptability of the model to a broader range of anomalies and errors encountered in real-world operational contexts.

Data Availability

The data that support the findings of this study are available from the corresponding author, Xianwen Fang, upon reasonable request.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Supported by the National Natural Science Foundation,China No. 61902002 No. 61572035, 61402011[No.61572035, 61402011, 61902002], Key Research and Development Program of Anhui Province [2022a05020005],the Leading Backbone Talent Project in Anhui Province, China[2020-1-12], and Anhui Province Academic and Technical Leader Foundation [No. 2022D327].

Notes

References

  • Aalst, W. V. D. 2016. Data science in action. Berlin Heidelberg: Springer.
  • Almodovar, C., F. Sabrina, S. Karimi, and S. Azad. 2024. LogFiT: Log anomaly detection using fine-tuned language models. IEEE Transactions on Network and Service Management 21 (2): 1715–32. doi:10.1109/TNSM.2024.3358730.
  • Aversano, L., M. L. Bernardi, M. Cimitile, M. Iammarino, and C. Verdone. 2023. A data-aware explainable deep learning approach for next activity prediction[J]. Engineering Applications of Artificial Intelligence 126:106758. doi:10.1016/j.engappai.2023.106758.
  • Berti, A., S. van Zelst, and D. Schuster. 2023. PM4Py: A process mining library for Python. Software Impacts 17:100556. doi:10.1016/j.simpa.2023.100556.
  • Bezerra, F., J. Wainer, and W. M. P. van der Aalst. 2009. Anomaly detection using process mining[C]//Enterprise. Business-Process and Information Systems Modeling: 10th International Workshop, BPMDS 2009, and 14th International Conference, EMMSAD 2009, held at CAiSE 2009, Amsterdam, The Netherlands, June 8-9, 2009. Proceedings. Springer Berlin Heidelberg. 149–61.
  • Bukhsh, Z. A., A. Saeed, and R. M. Dijkman. 2021. Processtransformer: Predictive business process monitoring with transformer network[J]. arXiv preprint arXiv:2104.00721.
  • Camargo, M., M. Dumas, and O. González-Rojas. 2019. Learning accurate LSTM models of business processes[C]//Business Process Management. 17th International Conference, BPM 2019, Vienna, Austria, September 1–6, 2019, Proceedings 17. Springer International Publishing. 286–302.
  • Chen, H., X. Fang, and H. Fang. 2022. Multi-task prediction method of business process based on BERT and transfer Learning. Knowledge-Based Systems 254:109603. doi:10.1016/j.knosys.2022.109603.
  • Chen, S., and H. Liao. 2022. Bert-log: Anomaly detection for system logs based on pre-trained language model. Applied Artificial Intelligence 36 (1):2145642. doi:10.1080/08839514.2022.2145642.
  • Chinces, D., and I. Salomie. 2015. Optimizing spaghetti process models. 2015 20th International Conference on Control Systems and Computer Science, Bucharest, Romania, 506–11. doi:10.1109/CSCS.2015.15.
  • c, R. P. J. C., R. S. Mans, and W. M. P. van der Aalst. 2013. Wanna improve process mining results?[C]//2013 IEEE symposium on computational intelligence and data mining (CIDM), 127–34. Piscataway, NJ: IEEE.
  • Devlin, J., M. W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805.
  • Di Francescomarino, C., and C. Ghidini. 2022. Predictive process monitoring[J]. Process Mining Handbook. 448:320–46. LNBIP
  • Evermann, J., J. R. Rehse, and P. Fettke. 2016. A Deep Learning Approach for Predicting Process Behaviour at Runtime. In Business Process Management Workshops. BPM 2016. Lecture Notes in Business Information Processing, ed. M. Dumas and M.Fantinato, 327–38. Springer, Cham.
  • Fang, N., X. Fang, and K. Lu. 2022. Online incremental updating for model enhancement based on multi-perspective trusted intervals. Connection Science 34 (1):1956–80. doi:10.1080/09540091.2022.2088696.
  • Fang, H., W. Liu, W. Wang, S. Zhang. 2023. Discovery of process variants based on trace context tree[J]. Connection Science. 35 (1):2190499. doi:10.1080/09540091.2023.2194578.
  • Fani Sani, M., S. J. Zelst, and W. M. P. van der Aalst. 2018. Repairing outlier behaviour in event logs[C]//International Conference on Business Information Systems, 115–31. Cham: Springer.
  • Fischer, D. A., K. Goel, R. Andrews, C.G.J. van Dun, M.T. Wynn, and M. Röglinger. 2020. Enhancing Event Log Quality: Detecting and Quantifying Timestamp Imperfections. In Business Process Management. BPM 2020, ed. D.Fahland, C.Ghidini, J.Becker, and M.Dumas. Lecture Notes in Computer Science, Vol. 12168.
  • Ghionna, L., G. Greco, A. Guzzo, and Pontieri, L. 2008. Outlier Detection Techniques for Process Mining Applications. In Foundations of Intelligent Systems, ed. A. An, S. Matwin, Z.W.Raś, and D.Ślęzak, ISMIS 2008. Lecture Notes in Computer Science(), Vol. 4994. Heidelberg, Berlin: Springer.
  • Guo, H., Y. Guo, J. Yang, J. Liu, Z Li, T. Zheng, L. Zheng, W. Hou, and B. Zhang. 2023. Loglg: Weakly supervised log anomaly detection via log-event graph construction[C]//International Conference on Database Systems for Advanced Applications. Cham: Springer Nature Switzerland. 490–501.
  • Guo, H., X. Lin, J. Yang, J. Bai, T. Zheng, B. Zhang, and Z. Li. 2021. Translog: A unified transformer-based framework for log anomaly detection[J]. arXiv preprint arXiv:2201.00016.
  • Guo, H., J. Yang, J. Liu, et al. 2023. Owl: A large language model for it operations[J]. arXiv preprint arXiv:2309.09298.
  • Guo, H., J. Yang, J. Liu, J. Bai, B. Wang, Z. Li, T. Zheng, B. Zhang, J. Peng, Q. Tian, et al. 2024. LogFormer: A pre-train and tuning pipeline for log anomaly detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence 38 (1):135–43. doi:10.1609/aaai.v38i1.27764. arXiv preprint arXiv:2401.04749
  • Guo, H., S. Yuan, and X. Wu. 2021. Logbert: Log anomaly detection via BERT. 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 1–8. doi:10.1109/IJCNN52387.2021.9534113.
  • Guyon, I., and A. Elisseeff. 2003. An introduction to variable and feature selection[J]. Journal of Machine Learning Research 3 (Mar):1157–82.
  • Hao, Y., L. Dong, F. Wei, and Xu, K. 2021. Self-attention attribution: Interpreting information interactions inside transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence 35 (14): 12963–12971.
  • Lan, Z., M. Chen, S. Goodman, K. Gimpel, P.Sharma, and R. Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.11942.
  • LeCun, Y., Y. Bengio, and G. Hinton. 2015. Deep learning[J]. Nature 521 (7553):436–44. doi:10.1038/nature14539.
  • Le, V. H., and H. Zhang. 2021. Log-based anomaly detection without log parsing. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, 492–504. doi:10.1109/ASE51524.2021.9678773.
  • Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, et al. 2019. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692.
  • Liu, J., J. Xu, R. Zhang, S. Reiff-Marganiec. 2021. A repairing missing activities approach with succession relation for event logs[J]. Knowledge and Information Systems. 63 (2):477–95. doi:10.1007/s10115-020-01524-6.
  • Moon, J., G. Park, and J. P.-O. Jeong. 2021. Prediction of process using one-way language model based on nlp approach[J]. Applied Sciences 11 (2):864. doi:10.3390/app11020864.
  • Nguyen, H. T. C., and M. Comuzzi. 2018. Event log reconstruction using autoencoders[C]//international conference on service-oriented computing. Heidelberg, Berlin, Cham:Springer, 335–50.
  • Nguyen, H. T. C., S. Lee, J. Kim, J. Ko, and M. Comuzzi. 2019. Autoencoders for improving quality of process event logs[J]. Expert Systems with Applications 131:132–47. doi:10.1016/j.eswa.2019.04.052.
  • Nolle, T., S. Luettgen, A. Seeliger, and M. Mühlhäuser. 2018. Analyzing business process anomalies using autoencoders[J]. Machine Learning 107 (11):1875–93. doi:10.1007/s10994-018-5702-8.
  • Park, G., and M. Song. 2020. Predicting performances in business processes using deep neural networks. Decision Support Systems 129:113191. doi:10.1016/j.dss.2019.113191.
  • Peters, M. E., M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie. New Orleans, Louisiana: Association for Computational Linguistics, 2227–2237.
  • Radford, A., K. Narasimhan, T. Salimans, I. and Sutskever. 2018. Improving language understanding by generative pre-training[J].
  • Rogge-Solti, A., R. S. Mans, W. M. P. van der Aalst, and M. Weske. 2013. Improving Documentation by Repairing Event Logs. In The Practice of Enterprise Modeling, ed. J. Grabis, M. Kirikova, J. Zdravkovic, and J. Stirna, Vol. 165. PoEM 2013. Lecture Notes in Business Information Processing. Heidelberg, Berlin: Springer. doi:10.1007/978-3-642-41641-5_10.
  • Sokolova, M., and G. Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information Processing and Management 45 (4):427–37. doi:10.1016/j.ipm.2009.03.002.
  • Suriadi, S., R. Andrews, AHM ter. Hofstede, and M.T. Wynn. 2017. Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs[J]. Information Systems 64: 132–150.
  • Tax, N., I. Verenich, M. La Rosa, and Dumas, M. 2017. Predictive business process monitoring with LSTM neural networks[C]//Advanced Information Systems Engineering: 29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings 29. Springer International Publishing. 477–92.
  • Van Der Aalst, W. 2011. Process mining: Discovery, conformance and enhancement of business processes[M]. Heidelberg: Springer.
  • Van Der Aalst, W., A. Adriansyah, A. K. A. De Medeiros, F. Arcieri, T. Baier, T. Blickle, and M. Wynn. 2012. Process mining manifesto. In Business Process Management Workshops: BPM 2011 International Workshops, Clermont-Ferrand, France, August 29, 2011. 169–94. Springer: Clermont-Ferrand, France.
  • Van der Aalst, W. M. P., and A. K. A. de Medeiros. 2005. Process mining and security: Detecting anomalous process executions and checking process conformance. Electronic Notes in Theoretical Computer Science 121:3–21. doi:10.1016/j.entcs.2004.10.013.
  • van der Aa, H., A. Rebmann, and H. Leopold. 2021. Natural language-based detection of semantic execution anomalies in event logs. Information Systems 102:101824. doi:10.1016/j.is.2021.101824.
  • van Dun, C., L. Moder, W. Kratsch, and M. Röglinger. 2023. ProcessGAN: Supporting the creation of business process improvement ideas through generative machine learning[J]. Decision Support Systems 165:113880. doi:10.1016/j.dss.2022.113880.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and L. Polosukhin. 2017. Attention is all you need[J]. Advances in Neural Information Processing Systems:30.
  • Wang, J., S. Song, X. Lin, X. Zhu, and J. Pei. 2015. Cleaning structured event logs: A graph repair approach[C]//2015 IEEE 31st International Conference on Data Engineering, 30–41. IEEE.
  • Wang, J., S. Song, X. Zhu, and X. Lin. 2013. Efficient recovery of missing events[J]. Proceedings of the VLDB Endowment. 6 (10):841–852. doi:10.14778/2536206.2536212.
  • Weijters, A. J. M. M., W. M. P. van Der Aalst, and A. K. A. De Medeiros. 2006. Process mining with the HeuristicsMiner algorithm[J].
  • Wynn, M. T., and S. Sadiq. 2019. Responsible process mining-a data quality perspective[C]//business process management: 17th International Conference, BPM 2019, Vienna, Austria, September 1–6, 2019, Proceedings 17. Springer International Publishing. 10–15.
  • Zang, R., H. Guo, J. Yang, J. Liu, Z. Li, T. Zheng, X. Shi, L. Zheng, and B.Zhang. 2024. MLAD: A unified model for multi-system log anomaly detection[J]. arXiv preprint arXiv:2401.07655.