Full article: Privacy-Preserving Process Mining: A Blockchain-Based Privacy-Aware Reversible Shared Image Approach

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Deeper integration of cross-organizational business process sharing and process mining has advanced the Industrial Internet. Privacy breaches and data security risks limit its use. Scrambling or anonymizing event data frequently preserves privacy in established studies. The scrambling mechanism or random noise injection corrupts event log process information and lowers process mining outcomes. This research presents a blockchain-based privacy-aware reversible shared image approach using chaotic image and privacy-aware theory for privacy-preserving process mining. Avoiding data loss, disclosure concerns, correlation attacks, and encrypted sharing is possible with the method. First, process data is turned into color images with chaotic image encryption to safeguard privacy and allow reversible reproduction. Second, the on-chain-off-chain paradigm helps handle information lightly; finally, attribute encryption of multi-view event data for correlation resistance and on-demand data encryption sharing. Simulations on common datasets reveal that: 1. The system performance of the proposed method outperforms the baseline method by 57%. 2. The strategy greatly enhances categorical and numerical data privacy. 3. It performs better in event data privacy protection and process mining fitness and precision. The proposed method ensures the secure flow of cross-organizational information in the Industrial Internet and provides a novel privacy-secure computational approach for the growing Artificial Intelligence.

Introduction

With the rapid development of the big data-driven industrial internet, the need for large-scale data integration and sharing has forced the issue of privacy disclosure in business process management to become more acute (Yin Citation2023). Business processes are widely used in management to describe a series of activities implemented to achieve a method of achieving a process goal (Batista, Martínez-Ballesté, and Solanas Citation2022), recorded as events in an organization’s information system, and represented as well-defined process execution steps. All events generated by a process are stored chronologically in an event log (Shraga et al. Citation2022). Event data contains information that is beneficial to the organization (Kan, Fang, and Gong Citation2023), and its effective use can help to detect deviations, identify bottlenecks suggest corrective countermeasures, optimize resources, minimize costs or reduce time, etc., thereby increasing efficiency and competitiveness.

To extract structured, interpretable processes from event logs, research in the field of Process Mining (PM) (van der Aalst Citation2012) aims to mine knowledge from existing event logs to discover, monitor, and improve the processes of real processes, and consists of three main types of PM techniques. 1) Process discovery (Shraga et al. Citation2022), which uses event logs to information to represent and visualize process models from different perspectives (e.g., control flow, organization, time, and data…). 2) Consistency checking (Nagy and Werner-Stark Citation2022): assessing deviations between existing (and ideal) process models and those reflected in event logs. 3) Process enhancement (Marin-Castro and Tello-Leal Citation2021), using existing process executions to extend or improve the information recorded in the event logs of existing process models to better reflect the real situation.

Typically, event logs contain properties related to the activity performed, the resource responsible for that activity (Usually individuals), and the timestamp at which it was completed. Nevertheless, in some application areas, event logs may contain not only personally identifiable information (e.g., full name, national identification number, passport number) or confidential information (e.g., health status, ethnicity, sexual orientation) (Yogarajan, Pfahringer, and Mayo Citation2020) but also confidential information about the organization’s operations (e.g., economic transactions, critical technology) (Zhang Citation2023). If properly managed, the Publication of such sensitive event logs can significantly compromise people’s privacy (Majeed and Lee Citation2021). To protect user privacy, reduce the occurrence of such incidents, and promote the common development of centralized systems in various industries, on the one hand, relevant decrees have been issued, such as the adoption of the General Data Protection Regulation (GDPR) in the European Union (Torre et al. Citation2021), and the promulgation of the Data Security Law of the People’s Republic of China by the Standing Committee of the 13th National People’s Congress. On the other hand, in the context of the industrial Internet, it is urgent to vigorously develop privacy protection technology in the direction of data privacy and usability (Usha Lawrance and Nayahi Jesudhasan Citation2021). Despite this, there is little consideration of privacy issues in the PM field, and countermeasures are mainly based on pseudonymization or encryption when dealing with it.

To this end, Privacy-preserving Process Mining (PPPM) aims to protect personal or confidential information during the PM process from disclosure to unauthorized parties. Typical privacy protection technologies include differential privacy technology (Elkoumy et al. Citation2022), blockchain technology (Bhattacharya et al. Citation2021), anonymization technology (Zhang et al. Citation2022), etc. These techniques imply distortion (i.e., conversion) of event data, which often reduces the quality of the process model and thus affects the utility of PM results. Therefore, the main challenge of PPPM is to distort event data to defend against specific attacks to minimize the quality impact on PM results while reducing or avoiding disclosure risks (identity disclosure, attribute disclosure). Among them, blockchain technology can solve the problem of trust and privacy protection well due to its good privacy protection strategy and data encryption protection technology. In recent years, privacy protection methods based on blockchain technology have risen rapidly (Lv et al. Citation2021). They are widely used in finance (Raddatz et al. Citation2021), education, transportation, industry, medicine, and other fields. At the same time, to solve a large number of information interaction problems faced by enterprises under the background of the collaborative development of industrial Internet 4.0, strengthen the supervision and traceability of circulation information (Fu et al. Citation2022). A supply chain management system (SCM) connects companies with external suppliers and manufacturers, helping companies fully automate their overall process operations. However, since the behavior and status of each entity in the supply chain are different, to improve the production and circulation of enterprises and the utilization of resources, the blockchain-based supply chain management system (BC-SCM) (Al-Farsi, Rathore, and Bakiras Citation2021; Fu et al. Citation2022) came into being. It combines the basic characteristics of open sharing, decentralization, tamper-proof, and traceability of blockchain and supply chain systems, prompting enterprises to quickly adapt to the dynamic needs of the market (Hussain et al. Citation2021). However, the large volume of data and the original enterprise-centered management style led to different data integrity, formats, contents, and standards. The direct use of BC-SCM as a management tool for cross-organizational process data may cause efficiency problems.

Moreover, the current review of BC-SCM focuses on data-driven (Kshetri Citation2018), protection of stakeholders, and interoperability across chains, and still faces challenges such as data vulnerability to tampering, difficulty in sharing data across organizations, and privacy information leakage (Yan et al. Citation2023). Mannhardt et al. outlined two general processes mining-related privacy challenges: technical privacy challenges and organizational privacy challenges (Mannhardt, Petersen, and Fradinho Duarte de Oliveira Citation2019). Majid Rafiei discusses the challenges of applying group-based privacy protection techniques directly to event data but their inability to adapt to cross-organizational process data-sharing scenarios in complex industrial environments (Rafiei and van der Aalst Citation2021). P. Liu et al. proposed a trace privacy data distribution scheme (LORDP) based on local optimization and R-trees, aiming to handle sensitive data while improving trace validity (P. Liu et al. Citation2023). Xu et al. proposed C-fDRL, which provides a framework for context-aware joint deep reinforcement learning (fDRL) to maintain context-aware privacy for task offloading (Xu et al. Citation2023). Therefore, with the development of modern process management technologies, it is necessary to study blockchain-based privacy protection methods for business processes in supply chain systems to enhance the quality and efficiency of enterprise collaboration while reducing the risk of privacy leakage in the context of Industrial Internet 4.0.

Research on cross-organizational privacy-preserving approaches for process mining faces two main issues:

The circulation and release of large amounts of end-to-end process data in cross-organizational environments (Liu et al. Citation2019).
Efficient business process management while ensuring the accuracy of process mining results (Pika et al. Citation2020).

To solve the cross-organizational supply chain data sharing problem, Quansi Wen et al. proposed a supply chain-oriented data sharing scheme (Wen et al. Citation2019), which provides a feasible and reliable direction for decentralized intelligent supply chains. Yogarajan et al. provide a comprehensive review of papers on end-to-end automated de-identification of electronic health records to demonstrate the advances that have been made in improving the overall accuracy of the system as well as the identification of an individual’s protected health information, as well as the challenges that still remain in terms of the risks posed to patient protection and privacy (Yogarajan, Pfahringer, and Mayo Citation2020). However, there are still challenges in modern business process management. Chaotic encryption (Fang et al. Citation2022), as the main technique for image encryption, has the advantages of randomness, ergodicity, determinism, and sensitivity to initial conditions; therefore, chaotic image encryption (Cai et al. Citation2022) is a suitable tool to solve the problem of privacy-aware data sharing for business processes in the supply chain. On the one hand, business process management aims to bridge the gap between traditional data mining and process modeling (van der Aalst Citation2012), which can improve efficiency and reduce errors, guaranteeing end-to-end information security and trusted communication among stakeholders in the supply chain process. On the other hand, the advantages of pseudo-randomness and sequence unpredictability of chaotic image coding further enhance the security and anti-tampering of privacy information management.

Based on the above problems, a privacy-aware reversible information sharing image method for process data based on blockchain and chaotic image cryptography is proposed in conjunction with business process management to improve the privacy reliability of stakeholders’ process information and the security of distributed data sharing between BC-SCM systems:

Setting up three system environment permissions to achieve the goal of secure storage, circulation, and distribution of process data cross-organizational.
Building a privacy-aware information sharing framework for lightweight information management in on-chain-off-chain mode.
Fully reversible privacy protection of process data based on the principles of chaotic system encryption and attribute encryption.

Motivation

With the rapid development of modern information systems, the scope of business process information flows has become broader and crosses multiple organizational boundaries (Kerzel Citation2021). The deep integration of cross-organizational information sharing and AI has enabled the further development of services such as smart healthcare, but privacy breaches and data security issues have hindered their widespread use (W. Liu et al. Citation2023). In cross-organizational collaboration environments (Bu et al. Citation2021), each participating organization must disclose some of its process information to support collaboration, where the interactive sharing of process information involves privacy compliance issues for various attribute data (e.g., GPS location, work hours, salary, and illness). However, sharing information can go wrong and sensitive information can be easily exposed due to a lack of trust (Lu et al. Citation2020). Therefore, to achieve trusted process data sharing, the core issue that must be addressed by big data integration upgrades is the privacy protection of information flows between cross-organizational systems. Moreover, the event logs required for process mining are scattered in different information systems in cross-organizational scenarios with insufficiently uniform formats and attributes, which increases the difficulty of data integration for unified management and collaborative development. According to the Process Mining Manifesto (van der Aalst et al. Citation2012) and a recent survey on cross-organizational process mining (Liu et al. Citation2020), there are two main settings for cross-organizational mining. The first one emphasizes collaborative environments, where multiple organizations with different responsibilities typically execute a single business process. In the second scenario, different organizations adopt and execute a basic process while sharing information and infrastructure. As a result, privacy protection and process data sharing face significant challenges in fulfilling cross-organizational collaboration requirements. However, there is no privacy-secure process data interaction method that is fully applicable to cross-organizational sharing scenarios while satisfying the query requirements of different users.

An event log ( $L$ ) that records process activities performed by public space organizations (e.g. hospitals, businesses). The event log consists of a set of traces ( $T =< t_{1}, \dots, t_{m} >$ ), where $m (m > 0)$ is the number of traces in $L$ . Each trace $t_{j} \in T (1 \leq j \leq m)$ contains a set of events $< e_{1}, \dots, e_{n} >\in E$ , where $n (n > 0)$ is the number of events in $t_{j}$ . The events in a trace are ordered chronologically to represent instances (i.e., cases) of the same process execution. Each event $e_{i} \in E$ describes a step of the process execution, which has multiple attributes such as event ID, case ID, activity executed, timestamp, etc. For example, shows an example of a hospital drug supply event log containing 5 traces, 10 events, and 8 attributes. Such event logs contain a variety of personally identifiable information (attributes such as doctor and patient directly expose individual information, and “TDF” is a specialized medication for treating Hepatitis B), as well as private data associated with an individual under certain association conditions (attribute association analyses reveal an individual’s private information, “Peter” is a specialist in HIV disease treatment, and patient “Alice” was tested for “HIVtest” to analyze the relationship between the patient and related diseases). It would be highly irresponsible for an organization to release event logs without modification, as confidential data is associated with personally identifiable information that may compromise privacy.

Table 1. Example of a hospital drug supply event log.

Download CSV Display Table

Such activity contains a variety of personally identifiable information, as well as private data associated with the person under certain association conditions. It would be highly irresponsible for an institution to Publish $L$ without modification, as confidential data is associated with personally identifiable information that could jeopardize privacy. For example, shows an example of a hospital drug supply event log, which contains 5 traces, 10 events, and 8 attributes. On the one hand, sensitive attributes such as drug, doctor, and patient directly expose individual information (e.g., “TDF” is a special drug for hepatitis B). On the other hand, the analysis of associations between attributes can also reveal private information about individuals (e.g., “Peter” is a specialist in HIV disease treatment and patient “Alice” was tested for “HIVtest” in the analysis of associations between doctors and related testing activities).

In conclusion, achieving trusted connection and sharing of cross-organizational business process information is not only a traditional information security issue, but also the key to building a trusted data ecology, respecting the privacy claims and business interests of each participant, and facilitating the orderly flow of real data across subjects. Therefore, constructing a new framework for privacy-aware information sharing will face two main issues (1) On the one hand, with numerous subsystems and huge amounts of information (Essa et al. Citation2019), how to share process information cross-organizationally, fairly and efficiently to facilitate timely adjustment of production capacity and maximization of benefits by all parties. (2) On the other hand, in the absence of a centralized information management system in traditional management organizations, how to ensure the accuracy of process mining results while safeguarding stakeholder privacy.

Next, around these two issues, section “A blockchain-based framework for process data privacy protection” constructs the blockchain-based process data privacy protection framework and outlines its functions, section “Blockchain-based reversible privacy-preserving image approach for process data” develops the blockchain-based process data privacy protection image approach in detail, section “Evaluation” evaluates the blockchain-based process data privacy-aware image approach through three public event log sets and a synthetic log, and section “Summary” concludes.

A Blockchain-Based Framework for Process Data Privacy Protection

In this section, a privacy-enhanced image framework for process data is constructed by combining blockchain, business process management, and chaotic image encryption methods based on cross-organizational process data security sharing requirements and the logic of the BC-SCM system, as shown in . It will be the basis for improving the degree of privacy in the Publication of event log data while ensuring the accuracy of process mining results. First, the framework separates the forbidden environment of the process generation system, the internal environment of the process owner (the main purpose of the process owner to collect and store event data is to support the execution of the process), and the forbidden environment of the process analyst (the process analyst uses the event data for analytical processing for purposes such as process monitoring, analysis, and improvement). Second, to avoid compromising personal or organizational privacy, access to event data from the forbidden environment to the internal environment and from the forbidden environment is protected by a query interface. Importantly, the interface implements a data distribution mechanism that encrypts and transforms event data, thus ensuring privacy.

Figure 1. Image method for process data privacy protection system framework.

The process of data flow between the three system environments is shown in . First, the original process data are collected and stored in the forbidden environment, combined with behavioral coding and chaotic image encryption technology to transform 2-dimensional event log data into 3-dimensional image data and further merge and confusion them, and store a large amount of deconstructed and reorganized event data encrypted in the internal environment with a new process data image management method to achieve the unified management of cross-organizational event log data. Then, based on the blockchain distributed system technology in the internal environment, the merged and reconstructed color images with unique index $ID$ are chained, while the encrypted image data are stored under the chain, and the on-chain off-chain mode realizes lightweight blockchain information management to build a privacy-aware information sharing framework. Finally, depending on different users or needs, personnel from the external environment can decrypt the relevant desensitized data in the internal environment with the authorization of both parties with data sharing interests in the internal environment. In addition, the government and other regulatory authorities hold the unique index code $ID$ and synthetic image authorized by the internal environment, and have access to all the original process data in the forbidden environment based on the public key, private key and the initial value of the chaotic system.

Next, section “Overview” outlines the three main implementation steps of the framework, and section “System Function Analysis” briefly describes the main features that the framework can achieve.

Overview

The specific method of information sharing for blockchain-based privacy-aware image method for supply chain process data is shown in , and the algorithm in the figure will be described in detail in section “Blockchain-based reversible privacy-preserving image approach for process data”.

Figure 2. Image methods for privacy-aware sharing of process data.

As shown in , the workflow for building a supply chain process data privacy-aware information sharing system based on chaotic image encryption technology has the following three main steps:

Step 1: Building a blockchain-based supply chain process system. The business process event log of each node enterprise at each level is the data source subject, where the first level enterprise is the raw material supplier, the second level is the manufacturer, the third level is the distributor, and the fourth level is the agent or institution.

Step 2: Data conversion and storage. ① The temporal structure of the event log is categorized by the importance of the attributes (given different permissions) and then transformed into a spatial structure and encrypted. Each company collects its own process data concerning product costs, prices, users, product managers, quantities, suppliers, etc. The event log information with different attribute categories is transformed and encrypted into color image data using Algorithm 1: Log-image Data Conversion (L-IDC) and the system public key (i.e., the initial parameters $x_{0}$ , $μ$ of the chaotic system) in the internal environment. ② Sub-image merge and generates image unique index $ID$ . The sub-images are encrypted and merged into a part of the image group and stored in each enterprise information system. Further, after updating the initial value of the chaotic system using the image matrix obtained by Algorithm L-IDC to generate a new chaotic sequence, the sub-images are merged and confused using Algorithm 2: Image Merge Diffusion Encryption (IMDE) while generating the unique image index string $ID$ . ③ $ID$ Data Up-link. All supply chain network primary nodes load the unique index string ID of the privacy-aware event log images onto the blockchain and actually store the confused encrypted multiple-color images in the internal environment of each primary node under the chain. On-chain and off-chain storage for lightweight blockchain management.

Step 3: Publication of sensitive data. The $ID$ uploaded from each major node on the blockchain, combined with the images stored under the chain encrypted using the chaotic image encryption method, authorize the user to obtain the desired part of the data (Algorithm 4: DPDAP, Decryption of Process Data with Attribute Permission) or all of the real data (Algorithm 3: DAPD, Decrypt All Process Data) using the chaotic system public key and different attributes to decrypt the private key.

System Function Analysis

The blockchain-based process data privacy-aware supply chain system achieves the five functions of data integration, conversion, encryption, storage, and query, as well as the goals of secure storage of process data, secure flow of information across organizations, and on-demand data release. The systematic approach to achieving the five functions is as follows:

Integration. Process data for each enterprise, obtained by enterprise data managers through IoT technology scanning, from anti-counterfeiting, storage, transportation to sales, information collection automation, reducing information errors caused by human operations, traceable to improve drug safety supervision.
Conversion. Using chaotic systems and knowledge of image coding to convert 2D temporal information of different process attributes into 3D color images increases the difficulty of data leakage.
Encryption. Enterprises use a unified image encryption (Singh and Singh Citation2022) method to convert each type of supply chain data into a unique data format, store it in a database where only data can be added, and generate a unique non-repeating eight-bit string password from the image information for each sub-image generated.
Storage. Using blockchain technology, multiple string groups generated by process data of each node enterprise in the supply chain are uploaded to the blockchain network by enterprise data managers through the upload data interface of smart contracts and stored in blockchain blocks, and 2D information of event logs are transformed into 3D forms of diagrams and stored in the internal environment of each supply chain node.
Query. Regulators, supply chain subjects and users obtain permissions and keys according to attribute requirements to query desensitized process data (e.g., drug information).

This section shows three specific implementation steps of the privacy-aware information-sharing framework for blockchain business processes. Next, the method for data managers to transform, encrypt, and privacy-aware publish the event data stored in the database by data type is detailed in section “Blockchain-Based Reversible Privacy-Preserving Image Approach for Process Data”.

Blockchain-Based Reversible Privacy-Preserving Image Approach for Process Data

According to section “Overview” there are three steps to construct a reversible privacy-aware information sharing system for supply chain process data based on chaotic image encryption: 1) building a blockchain-based supply chain business process system; 2) data conversion and storage; and 3) reversible privacy-aware data release. This chapter will detail the specific implementation methods of the three steps.

Image Conversion of Process Data

To apply for process data privacy protection in complex data-sharing scenarios, 2D data of different types of process information are transformed into 3D information of image data and encrypted based on behavioral coding and chaotic image processing. The data conversion process is shown in .

Figure 3. Process data conversion methods.

Anonymization of Activities

To prevent revealing personal privacy in the case due to the activity name, we use chaotic sequences computed by public and private keys separately for anonymizing ordinary and privacy-related activities, respectively. Firstly, the activities in the event log are classified into ordinary and private activities by expert domain knowledge. Next, the activity is converted into a numerical sequence using Ascii encoding. Finally, the chaotic sequence computed using the public key confusion is the sequence of values transformed by ordinary activities. The chaotic sequence computed using the private key confusion is the sequence of values transformed by privacy activities. Eventually, the activities’ names and order of activities were completely anonymized.

Anonymization of Resources

To prevent disclosure of personal privacy in cases through event resources (e.g., in , Peter is a specialist in HIV disease treatment and prescribes medication for the patient, Alice). First, Anonymize resources and their temporal relationships by using chaotic sequences derived from private key calculations. Then use Ascii encoding to transform the resource attributes in the event log into a sequence of numerical values. Finally, the chaotic sequence derived from the private key calculation confuses the transformed sequence of values from the resource.

Anonymization of Timestamps

To prevent the active cycle time from linking cases to their individuals, we injected chaotic sequence noise derived from public key calculations using a random timestamp offset method. Anonymization focuses on two parts of the timestamp: 1) Case start time, i.e. the timestamp of the first event of the case (As11 in , the initial offset timestamp is calculated by introducing $λ_{shift}$ into the original timestamp). 2) Execution timestamp of every other event after the case start time (After the initial offset of the timestamp, first, chaotic sequence noise is introduced into the time interval between events, as shown as $Δ_{1}$ , $Δ_{2}$ , and $Δ_{3}$ in . Subsequently, random noise was added to the length of each time interval, denoted by $c_{1}$ , $c_{2}$ and $c_{3}$ . To maintain the order of events, we bind the random offset of the timestamp to the size of the interval between two events, as show in , $Δ_{1} + λ_{shift}$ , etc). Finally, since the event order has been anonymized in the activity anonymization phase, the attack of associating an activity with a timestamp is invalid.

Anonymization of Attribute Values

To prevent the correlation between attributes in the event log from linking cases to their individuals; chaotic sequences computed using different private keys are used to anonymize all classification types and numeric types data and their temporal relationships. First, the attributes in the event log are classified into sensitive and nonsensitive attributes by expert domain knowledge. Then, the chaotic sequence computed with the private key confuses and diffuses the numerical attribute values of the sensitive attributes and the numerical sequence obtained by transforming the non-numerical attribute data. Finally, the chaotic sequence derived from the public key calculation is used to confuse and diffuse the sequence of values after transforming nonsensitive attributes.

Table

Download CSV Display Table

The process of log-image conversion implementation is shown in Algorithm 1: Log-image Data Conversion (L-IDC). The privacy level of this method is directly related to the attribute group size (k); the larger k is, the higher the privacy level achieved. As shown in , Ascii encoding is combined with the chaotic image encryption technique to convert event data encoding into computer language to facilitate encryption and spatial transformation. Using the trichromatic principle of color images, the encoded matrix of multiple event data is confused and encrypted and stored in a visual and uniformly formatted RGB image. In addition, during the conversion of activities, the entropy value is calculated by combining the activities with the initial model, and the degree of event data confusion, i.e., some behavioral information, is retained.

For example, the 512 event log data in Hospital Drug Supply Event Log, transformed by Algorithm L-IDC into privacy-aware color images such as the two color images of size $64 \times 72 \times 3$ in . It brings together complex data from different sources, formats, attributes, and natural characteristics in a physically organic way and unifies them into image types to provide support for comprehensive data sharing.

Figure 4. Event log to color image example.

Image Obfuscation Merging and Diffusion Encryption

To provide comprehensive data sharing, Algorithm L-IDC unifies multiple complex attribute data into image types. However, therefore the number of images is huge, which could be more conducive to query and management. Therefore, the number of compressed images is merged using Algorithm 2: Image Merge Diffusion Encryption (IMDE), and the initial value of the chaotic system (private key) is updated using the image information to diffuse the encryption further.

Table

Display Table

For example, the image matrix generated by transforming 512 event log data encrypted by Algorithm L-IDC in section “Image Conversion of Process Data” is used to update the initial value of the chaotic system (public key $x {^{'}}_{0}$ , $μ^{'}$ ), and then the two images are diffusively transformed into a color image of size $96 \times 96 \times 3$ using Algorithm IMDE, and the transformation process is shown in . The unique index code $ID$ generated at the same time is “b[mFhLq=”. Further, chaotic encryption is used to confusion and diffuse the images to generate a higher level of image encryption while halving the number of images and index encoding, thus reducing the amount of data stored on and off the chain exponentially.

Figure 5. Example of a confusion and diffusion event log diagram.

Storage

The number of event log images is reduced exponentially by the calculation of Algorithm IMDE, and the final output is the merged image and the unique index code of the image. Based on blockchain technology, the merged and fully desensitized event log graph containing real privacy perceptions is stored in a forbidden environment under the chain. The corresponding 8-bit unique index IDs of multiple images calculated by Algorithm IMDE are stored in the blockchain network through the smart contract interface on the chain. When required for process mining, each of the remaining node enterprises can be authorized by both parties with unique $ID$ numbers to query the required portion of privacy-aware event logs. Address the information transparency requirements faced by the supply chain while storing real privacy-aware data off-chain.

For example, the stored procedure for the updated image and $ID$ index encoding after obfuscation, merging and encryption by Algorithm IMDE in section“Image Obfuscation Merging and Diffusion Encryption” is shown in .

Figure 6. Stored procedures for event log images and index IDs.

The use of asymmetric encryption to enable the protection of private data improves the two main problems that the process of private Publication of sensitive data of data will face: 1) Data tampering or loss due to human factors. 2) The publishing process of privacy-aware event log image data cannot be well secured. First of all, the anti-change and anti-deletion permissions in the prohibited environment guarantee the integrity of image data. Second, the secure publishing model combining the key stored on the blockchain with the image data stored in the forbidden environment secures access to the privacy-aware event logs. In addition, the lightweight information security management mode of on-chain and off-chain interaction can effectively avoid the security problems caused by direct contact with data through $ID$ indexing, and greatly reduce the data bloat problem of the public ledger.

Privacy-Aware Data Publishing

Regulators, supply chain subjects, etc., query the desensitized Process data, drug information or all Process data by attribute demand. As a result, data release is divided into two access methods: full decryption and decryption by attribute authority.

Full Decryption

The government and other regulatory authorities, through the index $ID$ stored on the blockchain and the corresponding image data under the chain, can use the public key to read all the transaction data to see whether any enterprise in the supply chain complies with laws and regulations. The full decryption process is shown in , and the algorithm is described as Algorithm 3: Decrypt All Process Data (DAPD).

Figure 7. Full decryption process of supply chain event log.

Algorithm DAPD can recover all event logs of all nodes in the supply chain one by one, and the amount of data obtained is extremely large. The process is time-consuming and takes up much space, making it difficult for ordinary users to obtain useful information. However, the government and other regulatory authorities inquiring whether the business is in line with the norms is a unit of business for personal inspection is not a unified inspection. As a result, government and other regulatory agencies typically use a small amount of input data to execute the Algorithm DAPD, and the decryption process is secure and efficient.

Table

Display Table

Decryption with Attribute Permissions

All stakeholders can read only the part of the information that does not contain sensitive information about the transactions in the supply chain by combining the unique index ID stored in the blockchain and the corresponding off-chain image data. Users with higher information needs can use the private key to access the remaining transaction data that hides some sensitive information. Here, for example, a drug distributor needs to access some of the process data of Hospital A under the authorization of hospital management to provide a more reasonable drug supply to Hospital A. Among them, the drug’s name, quantity and pricing are the process data that need to be obtained with permission, and further data statistics can be made. Moreover, sensitive data containing patients and hospitals, such as patients who use the drug and the prices at which drugs are bought, cannot be accessed. The decryption process with attribute permissions is shown in , and the algorithm is described as Algorithm 4: Decryption of Process Data with Attribute Permission (DPDAP).

Figure 8. Process data decryption process with attribute permissions.

Table

Display Table

For example, the 512 event logs shown in are encrypted by Algorithm L-IDC and Algorithm IMDE to permit access to all the remaining attribute data except for the “drug” attributes. Some of the event logs after the data was decrypted and published are shown in . The attribute name of the sixth column in is “drug,” and its data is published with pseudo-random values, losing the meaning of data mining but retaining the sensitive attribute value in an encrypted manner. This method does the same with the remaining single or multiple property values.

Table 2. “Drug” attribute without decryption permission data distribution.

Download CSV Display Table

Evaluation

This section evaluates a blockchain-based privacy-aware image approach for processing data using three public event log sets and a synthetic log. It explores the impact of applying the approach on privacy and utility. On the one hand, the privacy analysis compares the privacy-aware event logs with the original event logs. The analysis assesses the degree of privacy impact on classification and numerical type attributes. On the other hand, utility analysis compares the similarity of specific results obtained from privacy-aware event logs with the same type of results obtained from the original event logs. We evaluate five main attributes: control flow, resources, timestamps, unique identifiers, and sensitive attributes. In particular, for utility analysis, the control flow perspective focuses on process discovery, and the resource perspective performs social network discovery. It is worth noting that utility analysis is highly dependent on the underlying algorithm that generates a particular result, and privacy analysis provides a more general assessment.

The impact of the proposed image privacy-aware information encryption framework on the process mining results is evaluated by the published information of privacy-aware data on the results of process mining methods frequently used in the healthcare field. First, describe the characteristics of the event logs used in the evaluation. Next, the divisional encryption methods applicable to logs are discussed. Then, the system throughput (TPS) is calculated to analyze the system performance. Moreover, to measure the degree of privacy of the event logs after decrypting different requirements, the Wasserstein Distance (EMD) is used to calculate whether the distribution SE of sensitive values is close to their distribution S in the whole table. Information entropy is used to measure the strength of the randomness of the information distribution in an image. Finally, the impact of this privacy-aware information-sharing framework on multiple process mining results when publishing data with different attributes is summarized, including process discovery, consistency analysis, and organizational mining analysis.

Datasets

In the experiments, three real event logs and one synthetic event log, which are publicly available in the 4TU Data Research Center, were used, i.e., the event log descriptions in . The first real event log is a sepsis case (Sepsis), and the log contains events from hospital sepsis cases. The second real event log is the hospital log (Hospital), and the cases in the log represent the paths of gynecological patients. The third real log is BPIC 2017, a log about the loan application process for Dutch financial institutions. The three real event logs have multiple data attributes, but most have many values that need to be added. The fourth synthetic log is the hospital drug supply and use log (Drugs). presents information on the characteristics of the four logs, with the number of cases ranging from 1050 to 31,509 and the number of events ranging from 5214 to 1,202,267. In Sepsis and Hospitals, a unique process path is followed in most cases.

Table 3. Descriptive statistics of event logs.

Download CSV Display Table

Experiment Setup

Experiments are conducted to evaluate the privacy and utility of the proposed blockchain-based image method system framework for process data privacy protection (referred to as IM-BDPA). We ran the experiment on a single machine with an Intel(R) Core(TM) i7 processor and 16 GB of RAM. First, suspend any experiments within 24 hours. For the healthcare domain, encryption of certain attributes in event logs may affect the process mining results for event logs with certain characteristics. For example, if encryption scrambles resource information, it can no longer be analyzed from an organizational perspective. For the decryption of attributes with different requirements, i.e., event logs with different characteristics, the open-source process mining framework ProM plug-in is used to achieve process mining performance analysis. Through experiments, the privacy-aware event logs of attribute privacy encryption and scrambling are decrypted using the corresponding attribute permissions, the decryption logs are obtained, and the results are evaluated and generalized. The experiments are conducted in three main aspects:

System performance.
Privacy of event logs after encryption with different attributes.
Process mining results, consistency analysis, performance analysis and organizational mining analysis of event logs after image decryption, i.e., data utility.

Create multiple privacy event logs by privacy encrypting and scrambling one attribute value at a time, with each privacy encryption scrambling targeting only one specific privacy threat and counting the percentage of cases and events affected by these privacy protections. In the experiments, attribute value encryption and scrambling are applied to both classification type attributes and numerical attributes, respectively, and different encryption methods are used for different attribute features. First, for classification type attributes: 1) Activity. Encrypt and scramble the events associated with the activity if the activity is not performed in at least k cases. 2) Resource. Encrypt and scramble the resource attributes if the resource does not involve at least k cases. 3) The remaining attributes, such as medical examinations, diagnoses, drugs, etc. Randomly select k different records in the value of this attribute for encryption and scrambling; Second, for numerical attributes: 1) Timestamp. Generalize timestamps to two granularity levels, “year,” “year & month.” For larger granularity, “year” only the event year is kept, and the rest of the time is encrypted and scrambled. Keep the year and month of the event for “Year & Month,” and encrypt and scramble the rest of the time. Here $k = \{year, year \amp month\}$ . Note that the timestamps order the events in the case in these logs and that the generalization of all timestamps does not change the order of events. 2) Unique identifier (ID). Cryptographically scrambling the last k bits of the value of the attribute. 3) The rest of the data attributes. Sepsis contains many data attributes with information related to diagnosis and treatment, three of which were selected as being closely related to sepsis (e.g., CRP, Leucocytes, Lactic-Acid). Privacy encrypts the data attribute value associated with fewer than k cases. Where k is the privacy intensity and the higher the value, the higher the degree of privacy, $k = \{2, 10, 100, years, years \amp months\}$ in the experiment.

Results and Analysis

This section evaluates and summarizes the experimental results. Next, section“System Performance” analyzes the system performance, section“Privacy Metrics” analyzes the privacy of the event log after attribute encryption, and Section“Utility Metrics” analyzes the data utility of the event log after image decryption.

System Performance

Throughput is selected as the performance index of the system to compare the efficiency and performance of this paper’s algorithm IM-BDPA with the traditional PBFT consensus algorithm in the same experimental environment. TPS refers to the number of requests the system model can handle per unit of time, and the system TPS formula is shown in (1) (Lin et al. Citation2021). $Trad e_{Δ t}$ is the number of requests processed by the system model per unit time and $Δt$ is the unit duration.

(1)

Throughput = Trad e_{Δ t} / Δt

(1)

To evaluate the system performance, the number of nodes participating in the consensus process in the algorithm is used as the independent variable. The nodes gradually increase from 4 to 30, and the system throughput is calculated. The results are shown in .

Figure 9. System TPS.

Due to the low complexity of system communication, it can be seen from that in the process of changing the number of nodes, the IM-BDPA algorithm used in this article has an average throughput improvement of 57% compared to the PBFT algorithm.

Privacy Metrics

Earth Mover’s Distance

To prevent an attacker with a global distribution of sensitive attributes from compromising privacy, t-closeness modifies k-anonymity to restrict the distribution of sensitive values. The distribution $SE$ of sensitive values in any equivalence class E must be close to their distribution $S$ in the whole table. In particular, the Wasserstein Distance metric is used to measure whether the distance $d (S . SE)$ between distributions is less than a threshold value $t$ . The Wasserstein Distance, also known as Earth Mover’s Distance (EMD) (Rösel et al. Citation2021), evaluates the minimum cost (the minimum value of the average distance moved) required to convert from the S distribution to the SE distribution.

EMD of numerical attributes. Calculate Equation $r_{i} = S_{i} - S E_{i}, \break (i = 1, 2, \dots, m)$ , the distance between S and SE can be calculated as (2) (Rösel et al. Citation2021).
(2) $D (S, SE) = \frac{1}{m - 1} (|r_{1}| + |r_{1} + r_{2}| + \dots + |r_{1} + \dots + r_{m - 1}| = \frac{1}{m - 1} \sum_{i = 1}^{m} |\sum_{j = 1}^{i} r_{j}|$ (2)
EMD of classification attributes. The distance between any two values of the category attribute is defined as 1, and it is easy to prove that this is a metric. Since the distance between any two values is 1, for each point where $S_{i} - S E_{i} > 0$ , we only need to move the extra points to other points. Thus, we have the following equation (3) (Rösel et al. Citation2021).
(3) $D (S, SE) = - \sum_{S_{i} < S E_{i}} (S_{i} - S E_{i})$ (3)

Assuming $EL$ is the original event log, $EL^{'}$ is the privacy-aware event log, and $ps \in PS$ is the analysis perspective, the degree of privacy is calculated as in equation (4) (Rösel et al. Citation2021). where $ul (r, {\overline{EL}}_{ps}, {\overline{EL^{'}}}_{ps})$ is the EMD distance between the two event logs projected on the given view $ps$ and $r \in RA$ is the redistribution function. The value of $du (EL, EL^{'})$ is between 0 and 1, and the higher the value, the higher the degree of privacy.

(4)

du (EL, EL^{'}) = 1 - min_{r \in RA} ul (r, {\overline{EL}}_{ps}, {\overline{EL^{'}}}_{ps})

(4)

The IM-BDPA algorithm is used to process data from different data, resource, and activity perspectives for each of the four event logs and calculate the degree of privacy impact of IM-BDPA for each perspective. For the three categorical attributes and the five data attributes contained in the event logs, after granting decryption permissions to only a single of these attribute perspectives at a time, new privacy-aware event logs are obtained and the EMD distance between them and the original event log is computed, and shows the comparative results of the privacy degree analysis.

Figure 10. Graph of the degree of privacy impact of IM-BDPA on different perspectives.

As shown in , the privacy-aware event logs $E^{'}$ all have privacy degree values greater than .7 compared to the original event logs E. The average privacy degree for classification attributes is .8728, and the average privacy degree for numerical attributes is .9108, both of which meet the privacy protection requirements. In addition, the shaded area shows that for the eight selected attributes, the IM-BDPA algorithm has different degrees of privacy impact on the four event logs as a whole, with the largest impact on the event log Sepsis and the smallest impact on BPIC2017.

Information Entropy

The information entropy reflects the strength of the randomness of the information distribution in the image (Ahuja et al. Citation2023). The more uniform the distribution of grayscale values, the closer the image is to the random information source image and the closer the value of information entropy is to 8. The formula for calculating information entropy is shown in Equation (5) (Ahuja et al. Citation2023). Where $N$ denotes that there are $N$ different values of information in the image, and the set of values is $(s_{0}, s_{1}, \dots, s_{N - 1})$ , and $P (S_{i})$ denotes the probability that $S_{i}$ appears in the image $S$ .

(5)

H (S) = - \sum_{i = 0}^{N - 1} P (S_{i}) log [P (S_{i})]

(5)

Therefore, the randomness of the encrypted event log images is measured using the local Shannon entropy. Firstly, 25 non-overlapping image blocks are extracted from the ciphertext image and their information entropy values are calculated using EquationEquation (5)(5) $H (S) = - \sum_{i = 0}^{N - 1} P (S_{i}) log [P (S_{i})]$ (5) . Then their local Shannon entropy averages are calculated. After calculation, the information entropy of the ciphertext images obtained using the encryption algorithm is shown in . That is, the information entropy of each event log image is close to the information entropy value of a random image (8). Therefore, after transforming the event log into an image, all interrelationships between the original temporal logic and attributes are destroyed. The randomness increases to near random images, improving security and privacy.

Table 4. Information entropy of the event log images.

Download CSV Display Table

Utility Metrics

By the experimental method described in Section 5.2, the IM-BDPA algorithm is used to do privacy calculations for different perspectives in the event logs of Drugs, Sepsis, and Hospital, respectively, and then the experimental results are analyzed.

The proportion of privacy impact on cases and events:

In order to count the proportion of privacy impact on cases and events in different event logs, multiple views of Drugs, Sepsis, and Hospital event logs were encrypted using the image method privacy-aware information and numerical statistical calculations were done. shows the percentage of cases and events affected by privacy encryption and scrambling in the three event logs, with larger percentages indicating higher cases and events affected by privacy awareness. As can be seen from , there are few cases affected by activity privacy encryption in Sepsis and more cases affected by resource privacy encryption in Drugs. As shown in , there are few events affected by activity privacy encryption in Sepsis, while there are more events affected by activity privacy encryption in Hospital. Privacy encryption of timestamps affects all events in the three logs. Moreover, the number of cases or events affected by activity, resource, and data privacy encryption increases exponentially as the k value increases.

Figure 11. Proportion of cases and events affected in the event log.

Fitness and precision:

The fitness indicates the portion of the event log that can be replayed in the model, and the precision indicates the proportion of the model’s behavior in the event log. Use the inductive mining plug-in in ProM to mine process models from raw and privacy-aware event logs for Drugs, Sepsis, and Hospital event logs, respectively. For each log, two process models are found: ① A model that captures the mainstream process paths, where the default setting of the plug-in is .8, and ② A model that captures all process paths. Then, the fitness and precision values are calculated to assess the quality of the discovered process models. The fitness and precision results of different process paths are shown in .

Figure 12. Impact of privacy-aware data on process discovery results.

The variation of process model fitness and precision values found from the three original event logs as well as the privacy-aware event logs that vary with the value of $k$ can be seen in . where $k = 0$ for the original event log. First, the precision of the mainstream process model is much higher than that of all path process models, while the fitness is just the opposite. Second, privacy-aware data with timestamp attributes have no impact on the quality of the discovery model, encrypting only the temporal attributes and leaving the event order unchanged. Finally, the fitness and precision values of the privacy-aware event logs are close to a straight line with the original event logs, i.e., the privacy encryption framework has a small impact on the process model quality.

Average trace fitness value:

Using an alignment-based process consistency analysis method, a canonical process model and an event log are used as inputs, the log and model are aligned, and the average trace fitness value is calculated. Using the event logs Drugs, Sepsis, and Hospital, for each original event log, a model of the mainstream process paths discovered using inductive mining in the process discovery described above is used as the model input for the consistency analysis method. Meanwhile, privacy-aware event logs that are encrypted using the privacy-preserving framework and then decrypted according to different demand attributes are used as the event log input for the consistency analysis method. The results of the average trace adaptation value calculation are shown in .

Figure 13. Average trace fitness values for the event logs.

As shown in , the privacy-aware data for the activity attributes have a negligible effect on the average trace fitness values of the model and logs. Because the proportion of events in the event log affected by privacy encryption and scrambling is small, and the timestamp does not affect the average trace fitness, only the order of encrypted events remains unchanged. Although the average trace fitness is not affected, the fitness value of the trace in the privacy-aware log may not be equal to the fitness value of the corresponding trace in the original log.

Pearson correlation coefficient:

Use the Mine for a Handover of Work Social Network plugin in ProM to discover social networks from raw event logs and privacy-aware event logs, respectively. Where nodes represent resources, the weight of the connection arc between nodes is determined by the frequency of activity exchange between two related resources. Finally, the discovered social networks were analyzed comparatively, and their Pearson correlation coefficients were calculated. The resource privacy-aware event log includes encrypted resource data, but the frequency of switching between resources remains the same. Therefore, the impact of resource privacy-aware data on social network discovery results is shown in .

Figure 14. Impact of resource privacy-aware data on social network discovery result.

As can be seen from , the Pearson correlation coefficients are all above .994 as the value of $k$ varies, which means that even if $k = 100$ , most of the resource attributes are still encrypted. The effect of resource privacy-aware data on the weights of the remaining inter-resource arcs is also negligible. Because the social networks found from resource privacy-aware logs include suppressed resources that are encrypted only, but the frequency of switching between resources is almost constant.

Summary

The article discusses the practicality and privacy leakage of end-to-end process data interaction in a cross-organizational environment in the context of the Industrial Internet. To achieve the purpose of privacy protection by balancing utility and privacy according to demand. Based on the BC-SCM system, we propose a blockchain-based reversible privacy-aware image method for processing data by combining business process management and chaotic image encryption technology. Enhance the reliability, privacy and security of supply chain business process information shared among distributed data. Three different system environment permissions to meet different privacy needs, greatly reducing the risk of data leakage while achieving the decryption level control as required. Second, behavioral coding is combined with chaotic image encryption techniques to transform 2D event log data with different attribute formats into consistent 3D image data and merge, confuse and diffuse them. Encrypted storage of large amounts of event data in a new process data management approach to achieve unified management of privacy-aware cross-organizational event log data. Then, based on the blockchain distributed system technology, the unique index code of the merged and reconstructed color image is stored on the chain, the encrypted image data is stored under the chain, and the on-chain and off-chain mode realizes lightweight blockchain information management. Finally, based on the initial value of the chaotic system and different attribute decryption requirements, the practical needs of reversible privacy protection of on-demand decryption or decryption of all process data are met. The article constructs a privacy-conscious information sharing framework that achieves the goals of fully reversible privacy protection as well as secure storage, circulation, and distribution of cross-organizational process data, providing comprehensive privacy guarantees for the secure circulation of cross-organizational information in the Industrial Internet, as well as new privacy-secure underlying computational approaches for the booming artificial intelligence.

However, to solve the practicality and privacy leakage problems of end-to-end process data interaction in cross-organizational environments, firstly, different environment access permissions set according to different needs provide more refined access control, which also directly leads to an increase in the complexity of the system, Managing and maintaining different permissions may require more time and resources. Secondly, directly converting event log data into images ignores the behavioral relationships between activities, causing the converted event log data to be no longer directly suitable for other process mining methods. Therefore, in future work, consider integrating behavioral relationships and the importance of event log attributes into the privacy protection process, which will enable more secure business process management to provide more accurate and effective standardized basic calculations and interpretability for artificial intelligence applications, ensuring its consistency and effectiveness with business objectives.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, Xianwen Fang, upon reasonable request.

Additional information

Funding

Supported by the Natural Science Research Project of Universities in Anhui Province [YJS20210370], National Natural Science Foundation, China [No. 61572035, 61402011], Key Research and Development Program of Anhui Province [2022a05020005], the Leading Backbone Talent Project in Anhui Province, China (2020-1-12), and the Open Project Program of the Key Laboratory of Embedded System and Service Computing of Ministry of Education [No.ESSCKF2021-05].

References

Ahuja, B., R. Doriya, S. Salunke, M. F. Hashmi, A. Gupta, and N. D. Bokde. 2023. HDIEA: High dimensional color image encryption architecture using five-dimensional Gauss-logistic and Lorenz system. Connection Science 35 (1):2175792. doi:10.1080/09540091.2023.2175792.
Web of Science ®Google Scholar
Al-Farsi, S., M. M. Rathore, and S. Bakiras. 2021. Security of blockchain-based supply chain management systems: Challenges and opportunities. Applied Sciences 11 (12):5585. doi:10.3390/app11125585.
Google Scholar
Batista, E., A. Martínez-Ballesté, and A. Solanas. 2022. Privacy-preserving process mining: A microaggregation-Based approach. Journal of Information Security and Applications 68:103235. doi:10.1016/j.jisa.2022.103235.
Web of Science ®Google Scholar
Bhattacharya, P., S. Tanwar, U. Bodkhe, S. Tyagi, and N. Kumar. 2021. BinDaaS: Blockchain-Based deep-learning as-a-service in healthcare 4.0 applications. IEEE Transactions on Net work Science and Engineering 8 (2):1242–32. doi:10.1109/TNSE.2019.2961932.
Web of Science ®Google Scholar
Bu, F., C. Hu, Q. Zhang, C. Bai, L. T. Yang, and T. Baker. 2021. A cloud-edge-aided In-cremental high-order possibilistic c-means algorithm for medical data clustering. IEEE Tra-Nsactions on Fuzzy Systems 29 (1):148–55. doi:10.1109/TFUZZ.2020.3022080.
Web of Science ®Google Scholar
Cai, H., J. Sun, Z. Gao, and H. Zhang. 2022. A novel multi-wing chaotic system with FPGA implementation and application in image encryption. Journal of Real-Time Image Processing 19 (4):775–90. doi:10.1007/s11554-022-01220-4.
Web of Science ®Google Scholar
Elkoumy, G., A. Pankova, and M. Dumas. 2022. Differentially private release of event logs for process mining. arXiv:2201.03010. http://arxiv.org/abs/2201.03010.
Google Scholar
Essa, Y. M., A. El-Mahalawy, G. Attiya, and A. El-Sayed. 2019. Parallel and distributed powerset generation using big data processing. Applied Artificial Intelligence 33 (13):1133–56. doi:10.1080/08839514.2019.1665262.
Web of Science ®Google Scholar
Fang, P., H. Liu, C. Wu, and M. Liu. 2022. A survey of image encryption algorithms based on chaotic system. The Visual Computer 39 (5):1975–2003. doi:10.1007/s00371-022-02459-5.
Web of Science ®Google Scholar
Fu, J., B. Cao, X. Wang, P. Zeng, W. Liang, and Y. Liu. 2022. BFS: A blockchain-based financing scheme for logistics company in supply chain finance. Connection Science 34 (1):1929–55. doi:10.1080/09540091.2022.2088698.
Web of Science ®Google Scholar
Hussain, M., W. Javed, O. Hakeem, A. Yousafzai, A. Younas, M. J. Awan, H. Nobanee, and A. M. Zain. 2021. Blockchain-based IoT devices in supply chain management: A systematic literature review. Sustainability 13 (24):13646. doi:10.3390/su132413646.
Web of Science ®Google Scholar
Kan, D., X. Fang, and Z. Gong. 2023. Event log privacy based on differential petri nets. Applied Artificial Intelligence 37 (1):2175109. doi:10.1080/08839514.2023.2175109.
Web of Science ®Google Scholar
Kerzel, U. 2021. Enterprise AI canvas integrating artificial intelligence into business. Applied Artificial Intelligence 35 (1):1–12. doi:10.1080/08839514.2020.1826146.
Web of Science ®Google Scholar
Kshetri, N. 2018. 1 Blockchain’s roles in meeting key supply chain management objectives. International Journal of Information Management 39:80–89. doi:10.1016/j.ijinfomgt.2017.12.005.
Web of Science ®Google Scholar
Lin, H., J. Hu, X. Wang, M. F. Alhamid, and M. J. Piran. 2021. Toward secure data fusion in industrial IoT using transfer learning. IEEE Transactions on Industrial Informatics 17 (10):7114–22. doi:10.1109/TII.2020.3038780.
Web of Science ®Google Scholar
Liu, C., H. Duan, Q. Zeng, M. Zhou, F. Lu, and J. Cheng. 2019. Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Transactions on Services Computing 12 (4):639–53. doi:10.1109/TSC.2016.2617331.
Web of Science ®Google Scholar
Liu, W., Y. He, X. Wang, Z. Duan, W. Liang, and Y. Liu. 2023. BFG: Privacy protection framework for internet of medical things based on blockchain and federated learning. Connection Science 35 (1):2199951. doi:10.1080/09540091.2023.2199951.
Web of Science ®Google Scholar
Liu, C., H. Li, Q. Zeng, T. Lu, C. Li, and C. Huang. 2020. Cross-organization emergency response process mining: An approach based on petri nets. Mathematical Problems in Engineering 2020:1–12. doi:10.1155/2020/8836007.
Web of Science ®Google Scholar
Liu, P., D. Wu, Z. Shen, and H. Wang. 2023. Trajectory privacy data publishing scheme based on local optimisation and R-tree. Connection Science 35 (1):2203880. doi:10.1080/09540091.2023.2203880.
Web of Science ®Google Scholar
Lu, Y., X. Huang, Y. Dai, S. Maharjan, and Y. Zhang. 2020. Blockchain and federated learning for privacy-preserved data sharing in industrial IoT. IEEE Transactions on Industrial Informatics 16 (6):4177–86. doi:10.1109/TII.2019.2942190.
Web of Science ®Google Scholar
Lv, Z., L. Qiao, M. S. Hossain, and B. J. Choi. 2021. Analysis of using blockchain to protect the privacy of drone big data. IEEE Network 35 (1):44–49. doi:10.1109/MNET.011.2000154.
Web of Science ®Google Scholar
Majeed, A., and S. Lee. 2021. Anonymization techniques for privacy preserving data publishing: A comprehensive survey. IEEE Access 9:8512–45. doi:10.1109/ACCESS.2020.3045700.
Web of Science ®Google Scholar
Mannhardt, F., S. Petersen, and M. Fradinho Duarte de Oliveira. 2019. A trust and privacy framework for smart manufacturing environments. Journal of Ambient Intelligence and Smart Environments 11 (3):201–19. doi:10.3233/AIS-190521.
Web of Science ®Google Scholar
Marin-Castro, H. M., and E. Tello-Leal. 2021. Event log preprocessing for process mining: A review. Applied Sciences 11 (22):10556. doi:10.3390/app112210556.
Google Scholar
Nagy, Z., and A. Werner-Stark. 2022. An alignment-based multi-perspective online conformance checking technique. Acta Polytechnica Hungarica 19 (4):105–27. doi:10.12700/APH.19.4.2022.4.6.
Web of Science ®Google Scholar
Pika, A., M. T. Wynn, S. Budiono, A. H. M. Ter Hofstede, W. M. P. van der Aalst, and H. A. Reijers. 2020. Privacy-preserving process mining in healthcare. International Journal of Environmental Research and Public Health 17 (5):1612. doi:10.3390/ijerph17051612.
PubMed Web of Science ®Google Scholar
Raddatz, N., J. Coyne, P. Menard, and R. E. Crossler. 2021. Becoming a blockchain user: Understanding consumers’ benefits realisation to use blockchain-based applications. European Journal of Information Systems 32 (2):287–314. doi:10.1080/0960085X.2021.1944823.
Web of Science ®Google Scholar
Rafiei, M., and W. M. P. van der Aalst. 2021. Group-based privacy preservation techniques for process mining. Data & Knowledge Engineering 134:101908. doi:10.1016/j.datak.2021.101908.
Web of Science ®Google Scholar
Rösel, F., S. A. Fahrenkrog-Petersen, H. van der Aa, and M. Weidlich. 2021. A distance measure for privacy-preserving process mining based on feature learning. arXiv:2107.06578. http://arxiv.org/abs/2107.06578.
Google Scholar
Shraga, R., A. Gal, D. Schumacher, A. Senderovich, and M. Weidlich. 2022. Process discovery with context-aware process trees. Information Systems 106:101533. doi:10.1016/j.is.2020.101533.
Web of Science ®Google Scholar
Singh, K. N., and A. K. Singh. 2022. Towards integrating image encryption with compression: A survey. ACM Transactions on Multimedia Computing, Communications and Applications 18 (3):1–21. doi:10.1145/3498342.
Web of Science ®Google Scholar
Torre, D., M. Alferez, G. Soltana, M. Sabetzadeh, and L. Briand. 2021. Modeling data protection and privacy: Application and experience with GDPR. Software and Systems Modeling 20 (6):2071–87. doi:10.1007/s10270-021-00935-5.
Web of Science ®Google Scholar
Usha Lawrance, J., and J. V. Nayahi Jesudhasan. 2021. Privacy preserving parallel clustering based anonymization for big data using MapReduce framework. Applied Artificial Intelligence 35 (15):1587–620. doi:10.1080/08839514.2021.1987709.
Web of Science ®Google Scholar
van der Aalst, W. 2012. Process mining: Overview and opportunities. ACM Transactions on Management Information Systems 3 (2):1–17. doi:10.1145/2229156.2229157.
Google Scholar
van der Aalst, W., A. Adriansyah, A. K. A. de Medeiros, F. Arcieri, T. Baier, T. Blickle, J. C. Bose, P. van den Brand, R. Brandtjen, J. Buijs, et al. 2012. Process mining manifesto. In Business process management workshops, ed. F. Daniel, K. Barkaoui, and S. Dustdar, vol. 99, 169–94. Berlin Heidelberg: Springer. doi: 10.1007/978-3-642-28108-2_19.
Google Scholar
Wen, Q., Y. Gao, Z. Chen, and D. Wu. 2019. A blockchain-based data sharing scheme in the supply chain by IIoT. 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), 695–700. doi: 10.1109/ICPHYS.2019.8780161.
Google Scholar
Xu, Y., M. Z. A. Bhuiyan, T. Wang, X. Zhou, and A. K. Singh. 2023. C-FDRL: Context-aware privacy-preserving offloading through federated deep reinforcement learning in cloud-enabled IoT. IEEE Transactions on Industrial Informatics 19 (2):1155–64. doi:10.1109/TII.2022.3149335.
Web of Science ®Google Scholar
Yan, X., P. Yin, Y. Tang, and S. Feng. 2023. A remote sensing encrypted data search method based on a novel double-chain. Connection Science 35 (1):2165638. doi:10.1080/09540091.2023.2165638.
Web of Science ®Google Scholar
Yin, W. 2023. Zero-knowledge proof intelligent recommendation system to protect students’ data privacy in the digital age. Applied Artificial Intelligence 37 (1):2222495. doi:10.1080/08839514.2023.2222495.
Web of Science ®Google Scholar
Yogarajan, V., B. Pfahringer, and M. Mayo. 2020. A review of automatic end-to-end de-identification: Is high accuracy the only metric? Applied Artificial Intelligence 34 (3):251–69. doi:10.1080/08839514.2020.1718343.
Web of Science ®Google Scholar
Zhang, Y. 2023. Privacy-preserving with zero trust computational intelligent hybrid technique to English education model. Applied Artificial Intelligence 37 (1):2219560. doi:10.1080/08839514.2023.2219560.
Web of Science ®Google Scholar
Zhang, X., L. Qi, W. Dou, Q. He, C. Leckie, R. Kotagiri, and Z. Salcic. 2022. MRMondri-an: Scalable multidimensional anonymisation for big data privacy preservation. IEEE Transactions on Big Data 8 (1):125–39. doi:10.1109/TBDATA.2017.2787661.
Web of Science ®Google Scholar

Privacy-Preserving Process Mining: A Blockchain-Based Privacy-Aware Reversible Shared Image Approach

ABSTRACT

Introduction

Motivation

Table 1. Example of a hospital drug supply event log.