809
Views
1
CrossRef citations to date
0
Altmetric
Research Article

PrivBCS: a privacy-preserving and efficient crowdsourcing system with fine-grained worker selection based on blockchain

, ORCID Icon, , , , & ORCID Icon show all
Article: 2202837 | Received 02 Mar 2023, Accepted 10 Apr 2023, Published online: 16 Jun 2023

Abstract

Crowdsourcing takes advantage of human intelligence to solve complex problems that computers cannot handle. People actively participate in computational tasks for rewards, especially those that are relatively simple for humans but challenging for computers. However, most traditional crowdsourcing systems must rely on a central organisation to handle the work associated with crowdsourcing. This may lead to privacy and security issues such as information leakage, single point of attack and unfair judgement. Meanwhile, the high service fees of existing central crowdsourcing platforms will reduce people's motivation to participate in crowdsourcing tasks. Consequently, PrivBCS is proposed in this paper, a blockchain-based decentralised crowdsourcing system that does not rely on third-party agencies and requires only low transaction fees. In this paper, PrivBCS protects the confidentiality of crowdsourcing data and the identity privacy of participants by cryptographic means, such as Paillier, El-Gamal and CPABE. It selects the most suitable worker to perform tasks by setting access control policies on task data and matching workers' attributes while ensuring the anonymity of workers' identity. Considering that the data volume of crowdsourcing tasks may be large and there is a certain upper limit to the block capacity of the blockchain, an on-chain and off-chain collaborative storage scheme is proposed, which can increase the throughput of the blockchain system and improve its scalability to a certain extent. In PrivBCS, we implement the transaction process of crowdsourcing tasks through smart contracts. The experimental results and security analysis in this paper demonstrate the effectiveness, efficiency, and feasibility of PrivBCS.

1. Introduction

Nowadays, almost all citizens own smartphones and tablet PCs, etc. These devices have several built-in sensors, such as microphones, cameras, pedometers, which allow people to shift from the traditional data collection mode to the crowdsensing mode. Crowdsensing refers to a new model of data acquisition that combines the idea of crowdsourcing and the sensing capability of mobile devices, which is a typical service model of crowdsourcing technology for sensing tasks. In this way, service providers can get the sensory data collected by mobile devices uploaded by users across the country and process it to provide services to users. Crowdsensing focuses on using sensing devices to obtain information from the physical world and can be seen as a data collection channel. But crowdsourcing focuses on using human intelligence to get online solutions and can be seen as a form of organisation for data acquisition. We can simply understand that crowdsourcing is a group of people working together to accomplish the same thing. Crowdsourcing divides a complex task into several simple subtasks, then assigns these subtasks to workers on the crowdsourcing platform to complete, and finally aggregates the results submitted by the workers to get the final distributed solution. The concept of crowdsourcing first appears in an article written by Jeff Howe (Jeff, Citation2006) in 2006. Over time, the Internet and the sharing economy have flourished, and crowdsourcing has been widely used as a new business model in a variety of industries. Unlike traditional outsourcing, crowdsourcing provides a very flexible idea of distributed problem solving, especially for solving certain problems that are complex for computers but simple for human communities, which brings great economic benefits and research value to society. In crowdsourcing platforms, a requester can openly collect solutions by posting crowdsourcing tasks, and a worker can apply for suitable tasks for rewards based on his or her area of expertise and interests, which can be non-spatial tasks (e.g. image annotation, writing translation) or spatio-temporal tasks (e.g. environmental monitoring, delivery services). Some of the more popular crowdsourcing platforms are Amazon Mechanical Turk, CrowdFlower (Citation2023), TopCoder (Citation2023), TaskRabbit (Citation2023), Upwork (Citation2023), etc. These crowdsourcing platforms are usually centralised, i.e. crowdsourcing tasks are required to be posted/assigned/submitted through a third-party central organisation between requesters and workers. This may lead to a series of problems, such as privacy disclosure (sensitive data is stored centrally), denial of service attacks (the central system is vulnerable to attacks by malicious people), single point of attack (central system becomes suddenly inaccessible due to facilities), unfair evaluation (due to opaque black box operations), free riding/false reporting (in the same stream as the system administrator). Therefore, it is necessary to introduce the idea of decentralisation into the crowdsourcing platform.

Blockchain, as a decentralised technology, is participated and maintained by a huge number of nodes or servers. Data is stored decentralised on multiple nodes, which makes it difficult for an attacker to find most of the nodes and initiate an attack. At the same time, the consensus mechanism of blockchain network will ensure that the majority of nodes are in agreement, so even if there are a few partial malicious nodes, they cannot affect the normal operation of the whole network. In recent years, blockchain, as an emerging integration technology, has become an effective and reasonable alternative to third-party central trusted institutions because of its openness, transparency, decentralisation, traceability and tamper resistance, and it has introduced its technical advantages into crowdsourcing systems. Blockchain-based crowdsourcing solutions have been proposed by researchers, and a typical blockchain crowdsourcing system consists mainly of a task requester, a worker, and a blockchain, where the posting (requester-triggered) and assignment of tasks, submission (worker-triggered) of solutions, and rewards are distributed by interacting with blockchain to trigger the smart contracts deployed on it. For example, Li et al. (Citation2019) proposed CrowdBC, a blockchain-based decentralised crowdsourcing framework that enables fair transactions between requesters and workers. Simultaneously, it avoids the single point of attack that traditional crowdsourcing is prone to (the possibility of attackers destroying an Ethernet-wide node through a network attack is minimal). CrowdBC can achieve fair trading and evaluation in a decentralised environment, but there are still some flaws and practical limitations in the design and implementation (e.g. data leakage, transaction size limitation). On the one hand, owing to the nature of the blockchain (openness and transparency), the files of transaction records on it are visible to all participants. In this open environment, the identity information, expertise and other sensitive data of the participants of crowdsourcing tasks on the chain are under the risk of exposure, which will cause irreparable damage if they are improperly exploited by malicious users. On the other hand, the underlying technology of blockchain itself has certain limitations, the size of transaction data blocks and the volume of transactions that can be processed per second are limited, which will reduce the efficiency of crowdsourcing transactions. Blockchain systems require nodes throughout the network to maintain complete backups of data. As more blocks are added, the cost of storing data will also increase, and nodes face challenges in storage and data transmission in the blockchain network.

This paper proposed a novel privacy-preserving crowdsourcing system based on blockchain with fine-grained worker selection, called PrivBCS, where participants, i.e. task requesters and workers, are able to conduct crowdsourcing transactions in a secure and private manner, while considering that the datasets and solutions of crowdsourcing tasks may be too large to be efficiently stored in smart contracts. Therefore, we propose an on-chain and off-chain collaborative storage scheme, which can largely improve the scalability of the blockchain crowdsourcing system. To be specific, the details of crowdsourcing tasks and solutions submitted by workers are displayed in cipher text on the blockchain, while the specific datasets are stored in a third party off-chain, such as IPFS or public cloud, after encryption. Our proposed scheme contributions are as follows:

  1. Providing the PrivBCS, a crowdsourcing trading platform based on blockchain privacy protection, for task requesters and workers. This platform does not need to rely on any centralised platform to execute crowdsourcing trading processes. It uses smart contracts to achieve automatic execution of transactions between requesters and workers, ensuring fairness and decentralisation of the crowdsourcing platform. Moreover, users of the platform only need to pay a small transaction fee to participate in crowdsourcing activities, instead of paying expensive service fees to traditional crowdsourcing platforms, which greatly increases workers' motivation and participation.

  2. Designing a fine-grained worker access control scheme that integrates the spatial location and reputation values of task performers to select appropriate workers to perform crowdsourcing tasks.

  3. During the process of crowdsourcing task execution, adopting an on-chain and off-chain collaborative storage mechanism to improve the scalability of the blockchain crowdsourcing platform and reduce the consumption of smart contracts. The on-chain storage stores the crowdsourcing task requirements and the encrypted download address of the data, and the specific datasets of the task files are stored in the off-chain IPFS repository.

  4. Using cryptographic schemes to encrypt sensitive user information (e.g. user location information) and crowdsourcing datasets to ensure user privacy and task confidentiality.

  5. Deploying the smart contract locally and tested its deployment cost and running cost. And experiments have tested the time overhead of the encryption and decryption operations involved in each phase. The experimental results show the scalability and feasibility of PrivBCS.

The rest of the paper is organised as follows. Section 2 briefly presents the current work of crowdsourcing and blockchain crowdsourcing. Section 3 will introduce the blockchain and smart contract, the problems faced by the blockchain crowdsourcing system, the cryptographic primitives associated with the scheme, the system models and problem statements, the security model and system notation. Section 4 will describe the overall framework proposed in this paper and design a specific blockchain crowdsourcing system scheme based on the framework. Section 5 conducts a formal security analysis, and Section 6 will analyse the performance of the scheme through specific experiments. Finally, Section 7 concludes the whole paper and discusses the future work of this paper.

2. Related works

This section will review the crowdsourcing effort and briefly discuss where improvements can still be made.

Centralised crowdsourcing platforms: Since Jeff Howe introduced the concept of crowdsourcing in 2006, it has received a good deal of attention from the community as a distributed solution. There are existing centralised crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), Upwork, and CrowdFlower, which are relatively successful and popular. However, they are all vulnerable to false reporting attacks and single point of attack so that users' interests are compromised. A service recommendation algorithm is proposed in Chen et al. (Citation2022). AMT is collecting explicit data from solutions, which can lead to data leakage, and in addition the pseudo-ID in AMT can be linked by malicious users, resulting in the exposure of user identity privacy. To address the privacy protection problem in AMTs, Salehi et al. (Citation2015) designed a privacy-preserving framework: Dynamo, whose pseudo-IDs can only be linked by publishers thus providing anonymity, but it still suffers from other flaws in AMTs (e.g. data privacy, Dynamo's legitimacy relies on all participants in the platform being honest and trustworthy). Xu et al. (Citation2019) proposed an EPTD approach that guarantees the reliability of group wisdom-aware data for confidentiality by means of truth discovery algorithms and cryptographic tools, but ignores the confidentiality of user identities. To et al. (Citation2017) and Gisdakis et al. (Citation2016) focused on the privacy of users during crowdsourcing tasks, and Gisdakis et al. (Citation2016) also set different permissions to view and handle transactional transactions. Although the above schemes address the privacy of crowdsourcing tasks to some extent, they still ignore the fairness and security of crowdsourcing transactions and are vulnerable to single point of attack problems.

Blockchain-based crowdsourcing platform: There are many studies on privacy and security of blockchain platforms (Fan et al., Citation2022; Liang, Fan et al., Citation2020). A dynamic secret-sharing mechanism is used in Liang et al. (Citation2019) to protect the security of data transmission in blockchain IoT. Liang et al. (Citation2021) uses homomorphic encryption to secure private data in copyright transactions. Liang, Yang et al. (Citation2022) proposed a federated blockchain-based privacy protection scheme for personal data. BFC (Zhang et al., Citation2019) is a crowdsourcing scheme based on blockchain, which uses commitment, security hash and homomorphic encryption technologies to protect data confidentiality and transaction fairness. SenseChain (Kadadha et al., Citation2020) uses blockchain technology and smart contract modules to achieve the goals of reliable selection of workers, fair distribution of rewards, and reduction of high transaction costs, playing the role of a decentralised multi-requester/multi-worker crowdsourcing framework. zkCrowd (Zhu et al., Citation2020) is a crowdsourcing platform based on a hybrid blockchain architecture, integrating a public chain and a sub chain. Their work focuses on communication and consensus protocols to guard the privacy of communication and transactions. MCS-Chain (Feng & Yan, Citation2019) is a completely decentralised mobile crowdsourcing system. It redesigns the consensus mechanism, incentive method and node architecture at the bottom of the blockchain system, but does not consider users' privacy and transactional data confidentiality. TSWCrowd (Gao et al., Citation2020) uses blockchain technology to build a distributed crowdsourcing framework. It can ensure the reliable distribution of crowdsourcing tasks and maintain the legitimate interests of both sides of crowdsourcing. However, it ignores the confidentiality of task data and the privacy of users during the transaction. CrowdBC (Li et al., Citation2019) is a general crowdsourcing platform based on the blockchain. Although it meets the normal transaction function, it does not provide a task allocation mechanism or a privacy protection mechanism for solutions. SecBCS (Lin, He, Zeadally et al., Citation2020) uses group signature technology to provide anonymity for users, and uses AES to protect the confidentiality of crowdsourcing results, but it relies on a trusted execution environment. Zebralancer (Lu et al., Citation2018) is an anonymous and private decentralised crowdsourcing system. It has designed a universal prefix linkable anonymous authentication scheme. This scheme provides anonymity and accountability mechanisms, which can solve the identity and data disclosure problems of the centralised crowdsourcing system. However, due to excessive computing operations and high costs on smart contracts, the platform may be overloaded, thus reducing the efficiency of the system. Ghaffaripour and Miri (Citation2020) uses the promise of zero knowledge proof (zk-SNARK) to protect participants' private information, but its computational overhead is too large for practical applicability. In conclusion, although the above-mentioned related solutions are able to address the shortcomings of centralised crowdsourcing to some extent, they also bring new challenges that need to be addressed urgently. Most existing blockchain crowdsourcing systems do not take into account the fact that crowdsourcing task data files and solution data files can often be too large and hence cannot be efficiently stored in smart contracts. There is no explicit access right for workers to view and execute crowdsourcing tasks, which leads to many unauthorised behaviours.

To effectively solve the above problems, our research aims to ensure the data privacy of both sides of the transaction and the confidentiality of the crowdsourcing task based on the blockchain-based fair crowdsourcing mechanism by using cryptographic tools; design a fine-grained worker selection scheme to enhance user's motivation to participate in crowdsourcing; design an on-chain and off-chain collaborative storage model, where each node only needs to store part of the block data, which greatly reduces the storage space of the nodes and improves the efficiency of the crowdsourcing task. Details will be presented in subsequent sections.

3. Preliminaries

This section will introduce the relevant techniques used to build PrivBCS in this paper and explain some of the more important symbolic notations.

3.1. Blockchain and smart contract

Blockchain is essentially an append-only distributed public ledger formed by cryptographic blocks connected in chronological order and with certain rules, and using a consensus algorithm. A typical blockchain structure is depicted in Figure .

Figure 1. A typical block structure.

Figure 1. A typical block structure.

As a chained ledger system, blockchain is decentralised, open, transparent, and traceable. It can effectively solve the trust problems of the centralised system and guarantee the validity of data. Each block structure contains a number of chronologically linked transaction records, forming a block-chain-like structure, making the transaction records traceable but unchangeable. Miners compete with each other for the right to add new blocks to the blockchain network by providing arithmetic power to earns rewards. A smart contract is essentially a section of executable program code that performs a specific task when pre-defined conditions are met. Smart contracts deployed on the blockchain will be executed automatically in a distributed environment based on the program code, without the need to establish mutual trust between participating nodes.

3.2. Privacy and storage challenges in BCS

This paper highlights the privacy, efficiency and performance issues that need to be considered in building a secure blockchain crowdsourcing platform, PrivBCS (based on existing research, e.g. Feng et al., Citation2019; Liang, Li et al., Citation2020; Liang, Xiao et al., Citation2022; Liang, Yang et al., Citation2022; Lin, He, Huang et al., Citation2020). As a new economic model that provides people with a platform for information exchange and problem solving, crowdsourcing systems should meet the basic security, privacy and functional requirements, namely availability, integrity, fairness, security, confidentiality, and efficiency.

3.2.1. Privacy and data leakage issues during crowdsourcing task

It is well known that all nodes in a blockchain network need to keep a network-wide and verifiable public ledger with data being publicly accessible. While this open and transparent feature is extremely attractive for some data copyright protection software, there are data privacy issues for crowdsourcing applications, especially when the crowdsourcing dataset is sensitive, which will lead to the leakage of sensitive information and harm the users' interests. For example, in spatio-temporal crowdsourcing transactions, the user's information can be acquired by analysing the transaction flow in the blockchain, thus causing the leakage of user privacy. A dynamic group key negotiation scheme (Xu et al., Citation2022) is proposed to address the communication efficiency and privacy issues.

Moreover, in a blockchain crowdsourcing environment, since the confirmation of transactions is not real-time, there is a delay (e.g. it usually takes about 12 seconds in ethereum), which will lead to malicious workers being able to steal the submitted data on the chain and use it as their own solution to get paid. This “free-riding” phenomenon reduces the incentive for other members of the system to participate in task resolution, and harms the interests of others. Therefore, in this decentralised environment, the privacy of data should be considered.

3.2.2. Storage limitations

Data of blockchain is redundantly replicated to each distributed node. The block size of transactions on the blockchain is limited to 1 MB, and if large files are sent and stored using smart contracts, each participating node needs to perform validation operations, and the cost of such operations is expensive (e.g. gas cost), so the performance and efficiency of dealing with large files on the blockchain is relatively low. In addition, blockchain provides storage that only allows adding transactions but not modifying or deleting them. As more and more data spreads within the blockchain network over time, the cost of nodes to process and store data will also increase, and miner nodes will need more storage space and higher bandwidth to mine, leading to an increase in cost. Therefore, off-chain technology can be used to store the less in-demand data off-chain. Therefore, we adopt an on-chain and off-chain collaborative storage scheme in this paper. The whole architecture is classified into three parts: blockchain layer, user layer and storage layer, as shown in Figure .

Figure 2. System architecture of PrivBCS.

Figure 2. System architecture of PrivBCS.

The storage layer we use in this paper is IPFS, but of course other cloud storage can also be used to store data. The user layer mainly consists of requesters and executers of tasks. The blockchain layer is mainly responsible for the logical implementation of crowdsourcing transactions and the initialisation and distribution of user keys.

3.3. Cryptographic primitives

This section will study the crucial cryptographic primitives used in this paper, namely, CP-ABE (Bethencourt et al., Citation2007), Paillier (Catalano et al., Citation2001) and El-Gamal (Elgamal, Citation1985).

3.3.1. Ciphertext-policy attribute-based encryption

The focus of the attribute-based encryption scheme is the access control structure, and there are three main access structures commonly used in attribute encryption schemes: access tree structure, LSSS, and access structure with gate. The first structure is used in this paper. The access tree structure can be regarded as the access structure obtained by extending the (t,n) threshold on the access tree, and this data structure describes the matching strategy during the computation of the algorithm.

In our design, we use CP-ABE to protect the confidentiality of crowdsourcing tasks and participant identities. The data owner sets the access policy based on the user's attributes and thus decides that only users whose attributes satisfy the access policy can successfully perform decryption and access the ciphertext. It consists of the following components:

  1. ABE_Setup(λ). This algorithm inputs privacy security parameters, and then outputs the relevant public parameters of pairing. The trusted Key Management Center (KMC) is responsible for executing this initialisation algorithm. The algorithm first initialises a multiplicative cyclic group G1 of prime order r (the length of r is λ.), and makes e:G1×G1GT a bilinear map. Assuming that g is the generating element of the group G1, the random number αZr is selected, and Y=e(g,g)α is computed, the random number βZr is selected, and gβ is computed, the system master key MK=gα is returned, and the system public key PK=(Y,gβ) is returned. Each attribute corresponds to a group element of G1, which can be pre-selected or hashed when using a uniform hash algorithm, we chose the latter approach.

  2. ABE_KeyGen(MK,S,K). This is a PPT algorithm that generates the private key SK based on the set of attributes S uploaded by the worker and the pseudo-private key factor K. Detailed steps are as follows.

    1. The worker selects the random number xZr, calculates K=gx, and sends {S,K} to the KMC.

    2. KMC picks a random number tZr and computes D0=gt and D=Kα+βt=gαxgxβt. For each attribute attS in the list of user attributes S, compute Datt=H(att)t, and return the worker's pseudo-private key SK=(D,D0,{Datt}attS).

    3. The worker receives SK, then calculates D=(D)1/x=gαgβt, and gets the private key: SK=(D,D0,{Datt}attS)=(D=gαgβt,D0=gt,attS,Datt=H(att)t).

    In this way, a random number x generated by the worker himself is added to the key generation process, and the Key Management Center (KMC) returns a pseudo-private key to workers. Finally, only the worker who knows the random number x is able to recover the final real private key SK.

  3. ABE_Enc(PK,M,Γ). This is a PPT algorithm that inputs the public key PK, the message M and the access tree structure Γ. The algorithm picks a random number sZr, and for the plaintext message MGt, computes C=Me(g,g)αs, C0=gs. Taking s as the secret, the access tree is split along from the root node in a top-to-bottom fashion such that for the leaf node attribute i (corresponding to the node subscript is index(i)), the corresponding secret slice is λi. Attributes which are contained in the access control policy tree Γ are represented in binary form. This part maps the binary string to a group element using the hash function H:{0,1}G1, let N be the set of leaf nodes of the access control structure tree. For each leaf node attribute i, select riZr at random and compute Ci1=gβλiH(i)ri, Ci2=gri. Finally, the ciphertext CT=(C,C0,{Ci1,Ci2}iN) is returned.

  4. ABE_Dec(PK,CT,SK). This algorithm inputs the public parameter PK, the ciphertext CT containing the access structure Γ, and the private key SK. The decryption is successful only if the set of attributes S of the private key SK satisfy the ciphertext access structure tree Γ. This paper defines IN and I={i(SN)}, for the attributes i that overlap in the set of key attributes S and the set of leaf node attributes of the ciphertext access tree Γ, computes Pi=e(Ci1,D0)e(Ci2,Di)=e(g,g)βtλi, where iI. The recursive operation is done from the root node, and the secret is embedded in the exponential in the process, so the Pi is treated as the secret slice in the implementation process. Therefore, the secret slice of the intermediate nodes in the computation process is calculated in the form of the slice of each child node based on the Lagrangian interpolation factor after doing the exponential operation, and then doing the concatenation operation, and finally the secret value of the root node is recovered in the form of e(g,g)βts. Next, e(C0,D)=e(g,g)αse(g,g)βts is calculated to further find e(g,g)αs and M=C/e(g,g)αs, where e(C0,D)iIe(Ci1,D0)e(Ci2,Di)=e(g,g)αse(g,g)βtsiIe(gβλiH(i)ri,gt)e(gri,H(i)t)=e(g,g)αs

3.3.2. Paillier cryptosystem

Paillier, based on the difficulty problem of composite residue classes, is a homomorphic encryption algorithm that supports additive homomorphism and number product homomorphism. It consists of three algorithms (Key generation; Encryption; Decryption) as follows:

  1. P_KGen(). This algorithm will independently select two random prime numbers p and q of equal length and let them satisfy gcd(pq,(p1)(q1))=1 (this property ensures that the two chosen prime numbers are of equal length). We can calculate N = pq, λ=lcm(p1,q1), and then randomly select another integer gZN2 (let it satisfy the order of n dividing g) and g satisfies: gcd(L(gλmodN2),N)=1. Where L(u)=(u1)/u, gcd(a,b) is used to calculate the maximum common divisor, ZN2 is the set of integers less than N2, ZN2 is the set of integers mutually prime with N2. We set the public key PK=(N,g), the private key SK=λ.

  2. P_Enc(). This algorithm will choose a random number rZN, for any plaintext message MZN, use the public key PK=(N,g) to perform encryption, and calculate it as follows: C=Enc(M,r,PK)=gMrNmodN2. The ciphertext CZN2, since the selection of r is random, Paillier is a probabilistic encryption scheme. Therefore, the same message M can be encrypted with the same public key PK to get different ciphertexts, but it is still the same message after decryption, thus ensuring the semantic security of the ciphertext, which means that a malicious person cannot obtain any information about the plaintext from the ciphertext. That is, a malicious person cannot get any information of the message M from the ciphertext.

  3. P_Dec(). This algorithm will decrypt C with SK=λ, and get plaintext MD(C,SK). Specifically, M=Dec(C,SK)=L(cλmodN2)L(gλmodN2)modN.

Paillier has two important features, additive homomorphism and number multiplication homomorphism. To be specific, homomorphic addition means that the ciphertexts corresponding to two messages M1 and M2 are multiplied by the ciphertext equivalent to the sum of messages M1 and M2. For number multiplication homomorphism, the Paillier algorithm currently supports only the computation rule of multiplying plaintexts and ciphertexts, i.e. E(M1)E(M2)=E(M1+M2), E(km)=E(m)k.

3.3.3. El-Gamal cryptosystem

El-Gamal is a cryptosystem based on the discrete logarithm puzzle with security which is based on the discrete function problem and Diffie-Hellman's difficulty, and it consists of three parts:

  1. EG_KGen(). This algorithm will randomly choose a large prime number p satisfying the security requirement and a prime number q less than p to generate the finite field Zp and the generator gZp, and then randomly select x(1<x<p1) and calculate ygxmodp. Finally, the public key PK is (y,g,p) and the private key SK is x.

  2. EG_Enc(). The plaintext message M is encrypted. This algorithm will select a number k that is mutually prime to (p1) and compute C1=gkmodp and C2=ykMmodp. The ciphertext message is C=E(M)=(C1,C2).

  3. EG_Dec(): The plaintext M can be calculated from ciphertext C, M=D(C)=C2C1xmodp.

Since each message is encrypted by a different random number k, El-Gamal is a probabilistic encryption scheme. El-Gamal has an important property, namely the homomorphic multiplication property, specifically the multiplication of the ciphertext of m1 and the ciphertext of m2 is equivalent to multiplying m1 and m2 before encrypting, i.e. Epk(m1)Epk(m2)=Epk(m1m2).

4. The proposed scheme

This section presents the proposed blockchain crowdsourcing scheme with fine-grained worker selection and data privacy protection in this paper. The goal is to select the most suitable worker Wi to perform crowdsourcing tasks in a fair and just manner without disclosing sensitive information about the requesters R and the worker Wi. To protect the sensitive information of users, all private data of workers and requesters (e.g. location information, reputation value) will be encrypted locally by the person himself before sending to the blockchain crowdsourcing system. Then a series of calculations will be done to choose the optimal worker to perform the task, assuming that the worker's reputation value vi represents his personal expertise. The higher the reputation value, the better the worker's expertise and ability to complete the task. The reputation value of Wi is updated after each task is completed. Meanwhile, in the process of spatio-temporal crowdsourcing task assignment, the closer the spatial distance between R and Wi, the better the execution of the task. For spatio-temporal crowdsourcing tasks, execution time and spatial distance are two key factors for selecting suitable workers. The main difference between traditional crowdsourcing and spatial crowdsourcing is that the latter requires the performer to constantly move around in the real world to access information, so factors such as time, location and environment come into play, such as real-time taxi hailing services (Didi Taxi) and take-out ordering services (Meituan delivery).

In this paper, we combine the worker's reputation value and the spatial distance between the worker and the crowdsourcing task to evaluate which of the two workers, Wi and Wj, is more suitable for the crowdsourcing task according to the following equation. (1) d(R,Wi)/vi<d(R,Wj)/vj(1) d(R,W) denotes the distance between two entities. It is worth noting that all operations should be performed by ciphertext. The computational cost of full homomorphic encryption is extremely high, resulting in limited application scenarios in practice. Therefore, this paper will choose to use partial homomorphic encryption schemes (Paillier and El-Gamal), which are much more computationally efficient than FHE, but also have certain problems. Specifically, each scheme only supports partial homomorphism and cannot perform the above equation computation individually and completely. The solution to this problem will be shown in the next section.

4.1. System model and problem statement

Our scheme combines blockchain, smart contracts, and cryptographic algorithms (e.g. CPABE, Paillier, El-Gamal) to achieve privacy protection of participating users as well as task data in distributed crowdsourcing systems and select optimal workers at fine granularity in the presence of identity anonymity. The system model diagram of PrivBCS is depicted in Figure .

Figure 3. The structure of PrivBCS.

Figure 3. The structure of PrivBCS.

The system model involves the following entities and the functions they are responsible for respectively are as follows:

  1. Task publisher: As the initiator of a crowdsourcing transaction, the requester packages his needs into a task, where the content of the task broadly includes the following parts: task description, execution conditions, task budget, data download address and expiration date of task submission. Among them, the task description mainly briefly describes problems that the task requester needs to solve, i.e. the specific task or solution that the worker needs to complete; the execution conditions list, e.g. what do the workers need in order to be eligible to apply for this published crowdsourcing task (e.g. skills, reputation, value, distance); the task budget explains the total amount of rewards that can be given to the worker for this crowdsourcing task; after the crowdsourcing task is published, the workers apply for the tasks that they are interested in and are given the specific task data which is stored under the chain through the download address provided, they then submit the task solution and get the reward within the validity period. Late submission is considered ineligible.

  2. Workers: In a blockchain crowdsourcing system, workers are responsible for giving a solution and submitting it to the blockchain within a given validity period, solving the crowdsourcing tasks posted by the requester with the expectation of receiving a reward.

  3. Blockchain and smart contracts: Blockchain is a decentralised platform on which crowdsourcing tasks are published, and can act as a broker in a decentralised crowdsourcing platform by encoding the execution conditions and access control policies into smart contracts and deploying them on the blockchain, which are automatically executed by smart contracts when the conditions are triggered to realise an automated transaction process. Task requesters and workers interact through smart contracts, and, in addition, user information and transaction records are stored forever in the smart contracts. In this paper, the proposed scheme uses a permissioned blockchain (e.g. Hyperledger).

  4. Off-chain storage system, known as the Interplanetary File System (IPFS): The content stored on the IPFS is addressable and it is a decentralised distributed storage network transfer protocol. When a user uploads a file to the IPFS network, the file is split into several “data blocks” that are stored on different server nodes around the world and stored using hash de-duplication. This makes the data more secure, efficient and cost effective. When a user uploads data to IPFS, it generates a hash for each copy of the data, which can be queried and downloaded at any time. This means that the user can look up and verify the integrity and authenticity of the data file by using the data hash and digital signature on the chain.

  5. Key Management Center (KMC): The KMC provides users with identity authorisation and the initialisation function of encryption scheme, generates keys for users, participates in the access control process of workers, and acts as the computing center.

Assuming that: (1) the KMC in our scheme is a trusted third party (a node in the permission chain is responsible for serving this function) and that the communication channel between users and KMC is secure. (2) Requesters and workers in the crowdsourcing process are considered honest but curious, which means they are curious about the user's information but do not reveal important information. Some symbols used in this scheme are briefly described in Table .

Table 1. Symbols used in our scheme.

4.2. Overview

This section provides a rough overview of the generic processes involved in a blockchain crowdsourcing system.

4.2.1. Initialisation

The trusted KMC is responsible for executing the initialisation algorithm by selecting security parameters λ as input and initialising the system. That is, for CPABE, BC invokes the algorithm ABE_Setup() to generate MK and PK, and any user who obtains PK can pre-set the access policy and encrypt the message by invoking the algorithm ABE_Enc(). The recipient of the ciphertext requests the decryption key from the BC (which has the MK that can trigger the algorithm ABE_KeyGen()).

Before using the blockchain system for crowdsourcing transactions, each R and Wi(1in) needs to register with the assistance of the Key Management Center (KMC) to generate unique identifiers IDR and IDWi, and authenticated key pairs (PKR,SKR) and (PKWi,SKWi). In addition, at this stage, Wi is also given an initial value of reputation vi, which indicates the reputation and expertise of each user (assuming the initial value is 1 and a higher value indicates a higher expertise of that user). The identity identifier is used to indicate a user who has been registered in the crowdsourcing system, and the public-private key pair is used to authorise transactions posted by users.

R and Wi(1in) apply to the Key Management Center (KMC) to generate data encryption keys respectively, and invoke the homomorphic encryption key generation algorithms EG_KGen() and P_KGen() to generate homomorphic encryption public-private key pairs (P_pkR,P_skR), (EG_pkR,EG_skR), (P_pkWi,P_skWi) and (EG_pkWi,EG_skWi), which are used to process crowdsourcing transactions.

4.2.2. Task publishing

In the task publishing phase, the requester R locally invokes a homomorphic encryption algorithm to encrypt the crowdsourcing task dataset before storing it in a third-party repository (e.g. IPFS). The storage address is recorded and the crowdsourcing task information (task description, execution reward, task expiration date, data download address, etc.) is embedded in a new transaction, which is then broadcasted, and the miner verifies the legitimacy of the transaction and publishes the legitimate transaction to the blockchain. When workers see a crowdsourcing task that has been posted but not yet solved, they can choose to apply to perform the crowdsourcing task, and then the blockchain crowdsourcing system will filter the received worker requests and select the most suitable workers for specific tasks according to the preset rules.

4.2.3. Solution submission

In this phase, the selected worker gets access to the crowdsourcing dataset as well as the requirements, assuming that the task is completed within a valid time frame. The worker encrypts the solution to the crowdsourcing task and stores it in IPFS, then uploads the storage address to the blockchain and waits for the requester to check in. If the check passes, the worker is paid what he or she deserves, and the reputation value is updated.

4.3. Detailed construction

This subsection describes the information content of our proposed scheme in more detail, and the data flow of PrivBCS is depicted in Figure . Noting that the CPABE algorithm used in this paper needs to be performed by the Key Management Center (KMC), and this responsibility can be taken up by the license node in the blockchain.

Figure 4. The data flow of PrivBCS.

Figure 4. The data flow of PrivBCS.

4.3.1. Stage1–System initialisation

During the system initialisation phase, the Key Management Center (KMC) generates MK and calculates common parameters by following the steps below.

First, inputing a security parameter λ, and then choosing a cyclic group G1, its generator g, and the large prime order p. Let e:G1×G1GT be a cryptographic bilinear mapping. By calling ABE_Setup(1λ)(PK,MK) to generate MK=gα, PK=(Y,gβ), and the system public parameters PParam={e,g,G1,GT,Zr}, KMC stores PK on the chain by calling a smart contract, at which point both R and Wi can access it from the chain. Then KMC calls the P_KGen(PKp,SKp) algorithm in Paillier system and EG_KGen(PKEG,SKEG) algorithm in El-Gamal cryptosystem to generate the public-private key pairs, respectively.

4.3.2. Stage2–User registration

Users (requester R and workers W) in a blockchain crowdsourcing system are required to register on the platform and obtain a unique identity (IDR and IDWi). The requester R then uses his/her unique identifier IDR to generate his/her public-private key pair (PKR,SKR) by the algorithm KeyGen(MK,PK,IDR)(PKR,SKR), and Wi uses his/her unique identifier IDWi to generate his/her public-private key pair (PKWi,SKWi) by the algorithm KeyGen(MK,PK,IDWi)(PKWi,SKWi). The BC system will also run key generation algorithms of homomorphic encryption and generate public-private key pairs of homomorphic encryption algorithm (P_pkR,P_skR), (EG_pkR,EG_skR), (P_pkWi,P_skWi) and (EG_pkWi,EG_skWi) for R and Wi. In addition, a uniform initialisation operation is performed for the initial value of the user's reputation value vi. At the end of this phase, R and Wi will get their personal information respectively, R={IDR,PKR,SKR} and Wi={IDWi,PKWi,SKWi,valuei} where WiW,i=1,2,,n.

4.3.3. Stage3–Publish task

The requester R divides the crowdsourcing task Task(metaData/rawData) into two parts, the original data rawData of the crowdsourcing task is stored using off-chain storage (e.g. in IPFS). The AES encryption algorithm generates a symmetric key AK and uses it to encrypt the original data rawData, which is stored as a ciphertext in IPFS, i.e. AES.Enc(rawData)CT, and returns a unique hash address, so that the confidentiality of the task data is guaranteed. What is saved on the blockchain is crowdsourcing task metadata metaData, which contains the storage address ctAddr of the original task data in the distributed storage platform (such as IPFS) and the digital signature of the requester. At the same time, R captures its current location information PR=(xR,yR), runs the encryption algorithm locally and uploads E(xR2+yR2), E(xR) and E(yR) to the blockchain. R embeds the details of the above crowdsourcing task taskInfo into a blockchain transaction, generates a signature on the transaction with his SKR, and then broadcasts the transaction txR={IDtxR,taskInfo,ctVerify,signtx}. The blockchain miner verifies the validity and legitimacy of the transaction, and if it passes the verification, it is added to the new block and uploaded to the chain. The task information taskInfo={IDR,ctAddr,tReq,condition,tExpire,tReward,E(xR2+yR2),E(xR),E(yR)}, where IDtxR is the unique identifier of the transaction, IDR is the identifier of R, ctAddr is the address where the original data rawData of the crowdsourcing task is stored, tReq is the issue that needs attention during the execution of the task, condition is the selection condition that the worker must have, tExpire is the valid time period [t1,t2] for the submission of the crowdsourcing task solution, tReward is the reward used to reward the worker, ctV erify is the check code of the task ciphertext, and signtx is the digital signature generated for the transaction using the private key SKR of R in the blockchain. Algorithm 1 and Algorithm 2 describe the crowdsourcing task publishing as well as the validation.

At the end of this phase, the crowdsourcing task is successfully published to the blockchain and workers can access, request and download the task. It is worth noting that in this paper, we assume that R needs to ensure that he has a certain balance (higher than the reward given for the task execution) in his account as a deposit for the crowdsourcing task before posting it, in order to prevent the requester from launching a false report attack. The deposit cannot be redeemed before the specified time, and the bonus payment awarded to the worker will be automatically deducted and the remaining deposit will be returned to the account after the task transaction is completed.

4.3.4. Stage4–Request task

Supposing the location of the task T posted by R is Pt=(xt,yt) and the location of Wi is Pi=(xi,yi), the Euclidean distance between them can be calculated as follows: (2) E(d2(T,Wi))=E[(xtxi)2+(ytyi)2]=E(xt2+yt2)E(xi2+yi2)E(2xtxi)E(2ytyi)=E(xt2+yt2)E(xi2+yi2)E(xt)2xiE(yt)2yi(2)

Lemma 4.1

Let W={W1,W2,,Wn} be the set of n workers, where Wi denotes one of the n workers who applies for a crowdsourcing task, Wx denotes the worker who is eventually able to successfully apply and perform the task, and V is the product of the reputation values vi(1in) of all workers, that is, V=i=1nvi. Let vi=V/vi, 1in. For any two workers WiW and WjW, d(R,Wi)/vi<d(R,Wj)/vj holds if and only if vid(T,Wi)<vjd(T,Wj) holds. The proof is as follows: (3) d(T,Wi)/vi<d(T,Wj)/vjd(T,Wi)viV<d(T,Wj)vjVd(T,Wi)vi<d(T,Wj)vj(3)

Based on the above lemma, for a worker Wi who applies for the task T, we will transform the calculation of cmpi=d(T,Wi)vi into the calculation of cmpi=vid(T,Wi), where vi=v1v2vi1vi+1vn=V/vi. The workers apply for the BC wanting to perform crowdsourcing tasks, encrypt their reputation value vi through the El-Gamal cryptosystem and their location information Pi=(xi,yi) through the Paillier cryptosystem, encrypt them separately and then upload them to the blockchain BC, i.e. E(vi), E(xi2+yi2), E(xi) and E(yi). The worker's identifier IDWi should also be encrypted (or signed) and uploaded to the blockchain to be used later in the CPABE encryption and decryption phase. At this point, BC can obtain E(vi) for all workers. After receiving all the requests, BC multiplies the obtained E(vi) to obtain E(V)=i=1nE(vi), and then sends E(V) to KMC for decryption to obtain V, followed by V and E(xt2+yt2),E(xt), E(yt) are broadcasted to all workers, at this time each worker Wi(1in) is able to obtain the encrypted task location, so that the distance between itself and the crowdsourcing task can be calculated. At the same time, E(cmpi2)=E(vi2d2(T,Wi))=E(d2(T,Wi))vi2 can be calculated from vi and E(d2(T,Wi)), and then the result of this calculation is sent to the BC system. Finally, BC selects the optimal worker from the received E(cmpi2) according to the following conditions. (4) Wx=min{d(T,Wi)vi,0<in}min{vid(T,Wi),0<in}min{cmpi2,0<in}(4) Assuming that Wx is the final winner, Wx will then perform the crowdsourcing task. At this point BC already has information about Wx, i.e. IDWx, vx, and Px=(xx,yx). BC constructs the access control structure Γ based on the already owned attributes {IDWx,(xx,yx),vx}, and then obtains PK on the blockchain, the source address ctAddr of the crowdsourcing task outside the blockchain, and the symmetric key AK of the advanced encryption standard AES. Finally, the proposed scheme will make the plaintext message M={ctAddr,AK} and call the ABEEnc(PK,M,Γ)CT algorithm for encryption.

4.3.5. Stage5–View request

After receiving the broadcast from BC, Wi checks whether he meets the execution conditions of the crowdsourcing task, i.e. the worker's private attribute set S meets the access structure predefined by the publisher. If it meets the predefined access structure, Wi can successfully access datasets of crowdsourcing task outside blockchain, and then execute the crowdsourcing task according to the requirements. The detailed steps are described below.

  1. Wi verifies the digital signature of the transaction.

  2. Wi selects a random number xZr, the set of attributes S, calculates gx, and sends J=(S,gx) to the KMC.

  3. Then the KMC calls ABE_KeyGen(MK,S,J)SK to generate the pseudo-private key SK=(D,D0,{Dj}jS) and sends it to Wi.

  4. After Wi receives SK, they calculate the D=(D)1/x=gαgβt and receive the final key SK=(D,D0,{Dj}jS).

  5. Wi uses the private key SK to decrypt the ciphertext message CT, obtains the off-chain storage address of the crowdsourcing task data file and the symmetric key AK, downloads the crowdsourcing task data file according to the address and decrypts it.

In this stage, the proposed scheme adds a random number x chosen by Wi to generate the private key. As the value of x is known only to the worker himself, neither BC nor KMC can recover SK, this ensures the confidentiality of the data and the anonymity of the user.

4.3.6. Stage6–Perform task

If the worker meets the access structure of the preset conditions, after obtaining the task, he proposes a solution Sol_rawData for the crowdsourcing task according to the task information and requirements, encrypts the Sol_rawData using PKR of the requester R, stores it in an IPFS (Interplanetary File System), and returns the storage address Addrsol. Simultaneously, the solution metadata Sol_metaData={IDWi,IDtxR,Addrsol} is uploaded to BC, and this crowdsourcing task can be regarded as complete. Noting that the metadata of the crowdsourcing task solution is stored on-chain, including: the identifier IDWi of the worker performing the task, the identifier IDtxR of the crowdsourcing transaction, and a fixed hash value Addrsol that points to the address where the solution is stored off-chain; the details of the solution and the data are encrypted and stored in the off-chain database. Specifically:

  1. The worker Wi encrypts the original data Sol_rawData of the crowdsourcing task solution via PKR of the requester R, generates the solution ciphertext cipherSolWiEnc(Sol_rawData,PKR), uploads it to the third-party storage center and returns the solution storage address Addrsol;

  2. Wi integrates the above crowdsourcing information as the answer Sol_rawData to the crowdsourcing task request T, Sol_metaData={IDWi,IDtxR,Addrsol}, in which the IDtxR is used to identify the crowdsourcing task performed by the worker Wi.

  3. Wi calls library functions to generate IDtxWi as a unique identifier for the transaction txWi, and then uses SKWi to generate a digital signature signWi for the solution ciphertext cipherSolWi, where txWi={IDtxWi,PKWi,Sol_metaData,signWi}.

  4. Within the task validity [t1,t2] given by the requester R, Wi broadcasts the transaction txWi in the whole network of the blockchain. Finally, the miner checks whether the transaction is legitimate or not, and if the transaction is legitimate, the miners add this transaction to a new block.

4.3.7. Stage7–Check and receive task solutions

R receives the transaction txWi and determines whether it is within the valid time slot [t1,t2] of the task. If it is valid, the requester R goes on to verify the signature using PKWi of Wi, finds the original data Sol_rawData of the crowdsourcing task solution stored in the IPFS outside the blockchain by using the provided pointer index, verifies the ciphertext data cipherSolWi using the public key PKWi of Wi. The data is encrypted with PKR to ensure its privacy and confidentiality. By decrypting the ciphertext data with SKR, R can obtain the final solution plaintext data.

4.3.8. Stage8–Update information

After R receives the solution from Wi, BC will trigger the information update mechanism, which includes the account balance of R and Wi, and the worker's vi. That is, R will receive a valid solution and redeem the deposit, and Wi will be paid accordingly.

5. Security analysis

This section will discuss the privacy and security of PrivBCS from the aspect of on-chain and off-chain security.

5.1. On-chain security analysis

This work will then analyse the trustworthiness, privacy, cipher integrity and traceability of crowdsourcing tasks on the chain to ensure the reliability, confidentiality and traceability of task data in transactions to the best extent.

  1. Reliability: In order to ensure the security of the system master key, most of the existing CPABE schemes require the key center to generate private keys using the user's attribute set, which is too costly to build trust in the centralised platform and also prone to single point of attack, as well as users' attribute leakage and key security problems. In this paper, blockchain is used to replace the traditional trusted center, so that transactions can be made between parties without building trust and ensuring fair execution of transactions.

  2. Privacy: To achieve fine-grained worker access control, most current CPABE solutions expose access policies or attribute sets, which poses a great threat to privacy protection. To safeguard the privacy of users' attributes, it is necessary to hide the access policies or attributes. In terms of attribute privacy, users' sensitive information and related attributes are encrypted locally before uploading to the blockchain. The location information of each user and the unique identifier used for access control are also handled in this way. At the same time, the data source files used for transactions will also be encrypted and uploaded. Therefore, privacy protection (identity privacy, location privacy, data privacy) of requesters and workers can be achieved even in a transparent environment of blockchain.

  3. Ciphertext integrity: To improve the scalability of the crowdsourcing blockchain system, we encrypt the specific data of the crowdsourcing task and store the encrypted data in the IPFS under the chain, and then add the ciphertext address to the blockchain transaction. In addition, this paper adds the ciphertext digital digest to the transaction, so it can verify the integrity of the ciphertext by the hash value.

  4. Traceability: The nature of blockchain ensures the traceability of crowdsourcing transactions.

5.2. Off-chain security analysis

To improve the scalability of the blockchain crowdsourcing system and the efficiency of crowdsourcing transactions, PrivBCS will choose to encrypt and save the crowdsourcing task data files to IPFS, which relies mainly on Distributed Hash Tables (DHTs) to retrieve the file locations. It does not matter whether IPFS is a trusted storage entity or not, because we use cryptographic storage in the article, i.e. task data and solutions are encrypted before uploading. At the same time, the proposed scheme has adopted an off-chain database to store the source files of crowdsourcing tasks and solutions, which will greatly cut the overhead of the blockchain network and enhance the performance of the system.

5.3. Free-riding attack

On a public blockchain network platform, due to the underlying technical characteristics, there is a delay in the confirmation time of transactions in the network. Malicious workers can take advantage of this inherent flaw to obtain solutions from other honest workers and use them for themselves so that they can get paid without working, which is called “free-riding”. This phenomenon is known as “free-riding”, which can greatly harm the self-interest of task publishers and honest workers, and reduce user motivation and participation. To overcome it, this paper encrypts the task data and solutions and store them off-chain, and then upload the off-chain access address to the blockchain, and the source address is access-controlled so that only workers whose relevant attribute sets meet the preset conditions can successfully access the task download address and obtain the encrypted source file. At the same time, the workers' proposed task solutions are submitted to the blockchain in a similar manner, so that malicious workers can be somewhat inhibited from “free-riding” behaviour.

6. Performance evaluation

In this section, the experimental environment will be introduced, and then the performance of PrivBCS and the experimental results will be discussed.

6.1. Experimental setting

This section will present the experimental environment of PrivBCS and provide the cost of the scheme at each stage of operation (e.g. the time to encrypt and decrypt messages and the overhead of access control). For the purpose of proving the feasibility and effectiveness of PrivBCS, it is necessary to deploy smart contracts locally to test our proposed scheme in solidity, java, python and javascript, and also use relevant cryptographic libraries (e.g. jpbc, pycrypto). Considering that the solidity language may be limited in some encryption and decryption operations and to reduce the overhead on smart contracts, the cryptographic primitives (e.g. Paillier, El-Gamal, CPABE) used in our proposal will be executed off-chain. The on-chain is more focused on the implementation of the transaction logic and the design of the smart contracts. The experimental environment for this paper is configured with Windows (64-bit) operating system, Intel Core i5-10400 CPU with 16 G memory.

6.2. Performance comparison

Designing a decentralised, privacy-secure and high-performance crowdsourcing system is the main goal of PrivBCS. This paper will focus more on the privacy protection of users during crowdsourcing transactions, and how to complete the transactions as efficiently and inexpensively as possible. Therefore, the solution of this paper proposes to store the original data of the crowdsourcing task under the blockchain for encryption, and then upload the off-chain address of the task (the same way the solution is submitted). Setting access conditions for part of the task, so that only workers who meet the conditions are qualified to access the data content of the part, thus achieving the effect of fine-grained worker access control. Through the mechanism of on-chain and off-chain collaborative storage, it is possible to radically reduce the execution overhead of smart contracts, while solving the drawback that smart contracts cannot effectively store large-capacity files. This improves the operational efficiency of the system and offering a new trend for the evolution of blockchain crowdsourcing systems.

In order to highlight the advantages of our proposed scheme, as depicted in Table , this part has compared PrivBCS with the existing crowdsourcing system, the comparison is made in terms of functionality. Mainly from the following six aspects: whether decentralisation is guaranteed, whether there is optimisation of task assignment, whether the confidentiality of task data and the privacy of users are protected, whether there is fine-grained worker access control and whether the data is stored collaboratively on-chain and off-chain. “✓” means that the scheme supports this function, and “✗” means that the scheme does not provide this function. Among several crowdsourcing platforms compared with PrivBCS, except for CrowdBC, all of them do not have on-chain and off-chain collaborative storage mechanism. Although CrowdBC provides off-chain storage, it cannot protect the data confidentiality during transaction execution and does not support fine-grained access control of data. In contrast, the PrivBCS scheme proposed in this paper can not only guarantee the confidentiality of data in the transaction process, but also provide fine-grained access control to the data. By associating the encrypted data with the access structure, the decryption key is associated with the user's attributes. Only users whose attributes satisfy the access structure can decrypt the ciphertext and get access to the original data of the task under the chain. In general, PrivBCS can improve the security and scalability of blockchain crowdsourcing systems and reduce the execution overhead of smart contracts.

Table 2. Comparison of existing crowdsourcing systems.

6.3. Experimental results

In the task request phase, workers need to encrypt their identifier, reputation value and location respectively and upload them to BC, then BC will concatenate the received reputation value vi(1in) to get E(V), decrypt it to get V, and broadcast it and the encrypted location value of the task to the worker who requested the task. Wi(1in) will compute cmpi(1in) from the obtained data and send it to BC, which will select the most suitable worker to perform the task. At this time, BC will set the access control structure to limit only the selected workers to be eligible to access the task source file. Workers check whether they are eligible to view the task source file and satisfy the execution conditions by providing their own private attribute sets.

In this paper, the original data of the task are stored under the chain, while the blockchain only holds the fixed hash value returned by IPFS, pointing to the original data of the task stored in IPFS. Therefore, the performance of PrivBCS is mainly affected by the time overhead of message encryption and decryption and data access control compared to those systems that store all data on the blockchain.

For the time overhead of encryption and decryption, experiments will test the performance of PrivBCS with different number of workers, setting the number of workers to {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}. As depicted in Figure , the experiment counted the time overhead of encrypting the worker information and selecting the most suitable worker for different number of workers, and constructed the access control structure to fine-grained control of the workers' permissions, and the time cost was also counted together. Experimental results show that in the case of data encryption, the access control operation takes only about 0.18 s per worker, while the average time to select the most suitable worker for this operation is 8.21*10 −3  s. As shown in Figure , the experiment calculated the time cost of computing V and cmpi(1in) when data is encrypted. The average calculation time of V is 7.67103 ms, and the average calculation time of cmpi is 1.07 ms. For the time cost of file upload to IPFS, the experiments have been conducted with file sizes set to {10, 50, 100, 200, 300, 400, 500, 600, 700, 800}(in MB), respectively. As shown in Figure , the time of file uploaded to IPFS increases with the file size while the network remains stable. It only takes about 0.064 s to complete the upload operation for a file size of 10 MB, and about 5.5 s to complete the upload for a file size of 800MB. The experimental results show that PrivBCS is feasible and the time cost is acceptable. For off-chain data computation and cryptographic operations, the proposed scheme distributed them to each participant, instead of BC being responsible for processing all the data of the transaction, which can better reduce network bandwidth and traffic overhead. The on-chain logic layer is responsible for the normal operation of crowdsourcing transactions, while the off-chain extension layer is responsible for storing the raw data of tasks and solutions. This “hybrid architecture” model of on-chain and off-chain data collaboration improves the scalability of the blockchain and enhances the performance of the system.

Figure 5. Time overhead for encryption and decryption.

Figure 5. Time overhead for encryption and decryption.

Figure 6. Computational time cost of encrypted data.

Figure 6. Computational time cost of encrypted data.

Figure 7. Time cost of file upload.

Figure 7. Time cost of file upload.

7. Conclusion and future work

In this paper, a decentralised blockchain crowdsourcing system is proposed, PrivBCS, which can solve many defects in traditional crowdsourcing mechanisms, such as single point of attack, high transaction costs and opaque transactions. Through blockchain technology and smart contracts, this paper transfers crowdsourcing transactions to the blockchain for processing, realising fairness and transparency of transactions. Thus it can ensure fair transactions among crowdsourcing participants without the participation of third-party organisations. At the same time, the proposed scheme can guarantee the privacy of users' sensitive information and the confidentiality of transaction data during crowdsourcing transactions by using cryptographic means such as Paillier, El-Gamal, and CPABE. For a worker who applies for a task, his identifier will be encrypted and be used to the key generation in the access control phase. In addition, a random number that only the workers themselves know is added during the key generation process to better protect the anonymity of the worker's identity. For task files, it is necessary to consider that the data volume can be very large, while there is an upper limit to the block size in the blockchain and the data that can be processed by smart contracts, resulting in inefficient transactions. Therefore, PrivBCS adopts the approach of storing the source files of crowdsourcing tasks and solutions in an off-chain extension layer and storing their corresponding storage addresses on-chain, so as to achieve effective storage and efficient access of large-capacity files in smart contracts. This on-chain and off-chain collaborative storage approach has been tested to reduce the overhead of network and storage on-chain, and therefore the performance of the system has been improved.

In future work, we will concentrate on a more equitable and effective task allocation scheme design. For the selection of fine-grained workers, it is necessary to pay attention to the weight of workers' attributes and consider hierarchical access control of data. At the same time, threshold values are set according to task requirements to specify the maximum number of crowdsourcing task executors, and searchable encryption schemes between workers and tasks should be focused on.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was partially supported by the National Key Research and Development Program of China under Grant 2021YFA1000600, the National Natural Science Foundation of China under Grant 62072170, the Science and Technology Project of Department of Communications of Hunan Provincial under Grant 202101, the Key Research and Development Program of Hunan Province under Grant 2022GK2015, and the Hunan Provincial Natural Science Foundation of China under Grant 2021JJ30141.

References

  • Bethencourt, J., Sahai, A., & Waters, B. (2007). Ciphertext-policy attribute-based encryption. In IEEE Symposium on Security and Privacy (sp '07) (pp. 321–334). doi:10.1109/SP.2007.11
  • Catalano, D., Gennaro, R., Howgrave-Graham, N., & Nguyen, P. Q. (2001). Paillier's cryptosystem revisited. In Proceedings of the 8th ACM Conference on Computer and Communications Security (pp. 206–214). New York, USA: Association for Computing Machinery. doi:10.1145/501983.502012
  • Chen, X., Liang, W., Xu, J., Wang, C., Li, K. C., & Qiu, M. (2022). An efficient service recommendation algorithm for cyber-physical-social systems. IEEE Transactions on Network Science and Engineering, 9(6), 3847–3859. doi:10.1109/TNSE.2021.3092204
  • Crowdflower. (2023). https://appen.com/join-our-crowd/.
  • Elgamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory, 31(4), 469–472. doi:10.1109/TIT.1985.1057074
  • Fan, Y., Lin, X., Liang, W., Wang, J., Tan, G., Lei, X., & Jing, L. (2022). TraceChain: A blockchain-based scheme to protect data confidentiality and traceability. Software: Practice and Experience, 52(1), 115–129. doi:10.1002/spe.2753
  • Feng, Q., He, D., Zeadally, S., Khan, M. K., & Kumar, N. (2019). A survey on privacy protection in blockchain system. Journal of Network and Computer Applications, 126, 45–58. doi:10.1016/j.jnca.2018.10.020
  • Feng, W., & Yan, Z. (2019). MCS-Chain: Decentralized and trustworthy mobile crowdsourcing based on blockchain. Future Generation Computer Systems, 95, 649–666. doi:10.1016/j.future.2019.01.036
  • Gao, L., Cheng, T., & Gao, L. (2020). TSWCrowd: A decentralized task-select-worker framework on blockchain for spatial crowdsourcing. IEEE Access, 8, 220682–220691. doi:10.1109/ACCESS.2020.3043040
  • Ghaffaripour, S., & Miri, A. (2020). A decentralized, privacy-preserving and crowdsourcing-based approach to medical research. In IEEE International Conference on Systems, Man, and Cybernetics (smc) (pp. 4510–4515). doi:10.1109/SMC42975.2020.9283027
  • Gisdakis, S., Giannetsos, T., & Papadimitratos, P. (2016). Security, privacy, and incentive provision for mobile crowd sensing systems. IEEE Internet of Things Journal, 3(5), 839–853. doi:10.1109/JIOT.2016.2560768
  • Jeff, H. (2006). The rise of crowdsourcing. Wired Magazine, 14(6), 1–4.
  • Kadadha, M., Otrok, H., Mizouni, R., Singh, S., & Ouali, A. (2020). SenseChain: A blockchain-based crowdsensing framework for multiple requesters and multiple workers. Future Generation Computer Systems, 105, 650–664. doi:10.1016/j.future.2019.12.007
  • Li, M., Weng, J., Yang, A., Lu, W., Zhang, Y., Hou, L., & Deng, R. H. (2019). CrowdBC: A blockchain-based decentralized framework for crowdsourcing. IEEE Transactions on Parallel and Distributed Systems, 30(6), 1251–1266. doi:10.1109/TPDS.2018.2881735
  • Liang, W., Fan, Y., Li, K. C., Zhang, D., & Gaudiot, J. L. (2020). Secure data storage and recovery in industrial blockchain network environments. IEEE Transactions on Industrial Informatics, 16(10), 6543–6552. doi:10.1109/TII.2020.2966069
  • Liang, W., Li, K. C., Long, J., Kui, X., & Zomaya, A. Y. (2020). An Industrial network intrusion detection algorithm based on multifeature data clustering optimization model. IEEE Transactions on Industrial Informatics, 16(3), 2063–2071. doi:10.1109/TII.2019.2946791
  • Liang, W., Tang, M., Long, J., Peng, X., Xu, J., & Li, K. C. (2019). A secure fabric blockchain-based data transmission technique for industrial internet-of-things. IEEE Transactions on Industrial Informatics, 15(6), 3582–3592. doi:10.1109/TII.2019.2907092
  • Liang, W., Xiao, L., Zhang, K., Tang, M., He, D., & Li, K. C. (2022). Data fusion approach for collaborative anomaly intrusion detection in blockchain-Based systems. IEEE Internet of Things Journal, 9(16), 14741–14751. doi:10.1109/JIOT.2021.3053842
  • Liang, W., Yang, Y., Yang, C., Hu, Y., Xie, S., Li, K. C., & Cao, J. (2022). PDPChain: A consortium blockchain-based privacy protection scheme for personal data. IEEE Transactions on Reliability, 72(2), 586–598. doi:10.1109/TR.2022.3190932
  • Liang, W., Zhang, D., Lei, X., Tang, M., Li, K. C., & Zomaya, A. Y. (2021). Circuit copyright blockchain: blockchain-based homomorphic encryption for IP circuit protection. IEEE Transactions on Emerging Topics in Computing, 9(3), 1410–1420. doi:10.1109/TETC.2020.2993032
  • Lin, C., He, D., Huang, X., Xie, X., & Choo, K. K. R. (2020). Blockchain-based system for secure outsourcing of bilinear pairings. Information Sciences, 527, 590–601. doi:10.1016/j.ins.2018.12.043
  • Lin, C., He, D., Zeadally, S., Kumar, N., & Choo, K. K. R. (2020). SecBCS: a secure and privacy-preserving blockchain-based crowdsourcing system. Science China Information Sciences, 63(3), 1–14. https://doi.org/10.1007/s11432-019-9893-2
  • Lu, Y., Tang, Q., & Wang, G. (2018). ZebraLancer: Private and anonymous crowdsourcing system atop open blockchain. In IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (pp. 853–865).doi:10.1109/ICDCS.2018.00087
  • Salehi, N., Irani, L. C., Bernstein, M. S., Alkhatib, A., Ogbe, E., & Milland, K. (2015). We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 1621–1630). New York, USA: Association for Computing Machinery.
  • To, H., Ghinita, G., Fan, L., & Shahabi, C. (2017). Differentially private location protection for worker datasets in spatial crowdsourcing. IEEE Transactions on Mobile Computing, 16(4), 934–949. doi:10.1109/TMC.2016.2586058
  • Taskrabbit. (2023). https://www.taskrabbit.com/.
  • Topcoder. (2023). https://www.topcoder.com/.
  • Upwork. (2023). https://www.upwork.com/.
  • Xu, G., Li, H., Liu, S., Wen, M., & Lu, R. (2019). Efficient and privacy-preserving truth discovery in mobile crowd sensing systems. IEEE Transactions on Vehicular Technology, 68(4), 3854–3865. doi:10.1109/TVT.2019.2895834
  • Xu, Z., Liang, W., Li, K. C., Xu, J., Zomaya, A. Y., & Zhang, J. (2022). A time-sensitive token-based anonymous authentication and dynamic group key agreement scheme for Industry 5.0. IEEE Transactions on Industrial Informatics, 18(10), 7118–7127. doi:10.1109/TII.2021.3129631
  • Zhang, J., Cui, W., Ma, J., & Yang, C. (2019). Blockchain-based secure and fair crowdsourcing scheme. International Journal of Distributed Sensor Networks, 15(7), 15501477198. doi:10.1177/1550147719864890.
  • Zhu, S., Cai, Z., Hu, H., Li, Y., & Li, W. (2020). zkCrowd: A hybrid blockchain-based crowdsourcing platform. IEEE Transactions on Industrial Informatics, 16(6), 4196–4205. doi:10.1109/TII.2019.2941735