Full article: SQL queries over encrypted databases: a survey

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Limited by the local storage resource, data users have to encrypt their data and outsource the encrypted databases to cloud servers to enjoy low-cost, professional data management services, which promotes the rapid development of outsourcing database technology. Despite this, the complex underlying setting and loosely coupled database architecture lead to various security risks and performance bottlenecks, while there is currently no work to achieve a comprehensive evaluation of existing encrypted database solutions from the aspects of underlying settings, security levels, functions, etc. In this work, we first propose an evaluation model to assess SQL functionalities and security from multiple dimensions. Secondly, we categorise the existing SQL query schemes into three categories: software-based construction, hardware-based construction, and hybrid-based construction, that is, a combination of software and hardware components. On this basis, we analyse the framework, advantages, and limitations of classic and state-of-the-art schemes. Finally, we summarise the software-based and hardware-based approaches from dimensions of SQL functionality, security, and efficiency, thus clarifying their ideal application scenarios. Notably, SQL query schemes that exhibit minimal equality of pair leakage and support strong obliviousness can achieve higher levels of security. In addition, hardware-based solutions can achieve more complex SQL queries and superior performance without designing complex and functionally-limited cryptographic tools.

Keywords:

1. Introduction

The rise of cloud computing has revolutionised the way organisations store and manage data. The scalability and cost efficiency offered by cloud services have led to the widespread adoption of outsourced databases. Despite the compelling of cloud computing (Armbrust et al., Citation2010), unauthorised access to outsourced databases launched by malicious database administrators or external attackers also poses security problems (Chu et al., Citation2013; Latif et al., Citation2014; Mell & Grance, Citation2009).

In response to these problems, encrypted databases have emerged as a promising solution, which ensures data confidentiality while retaining the convenience and efficiency of cloud services. Once the databases are encrypted and outsourced to the cloud servers, it is becoming difficult to perform meaningful operations over encrypted data. Therefore, efficient SQL queries (Chamberlin & Boyce, Citation1974) over encrypted databases has been studied by plenty of researchers (Carbunar & Sion, Citation2011; L. Chen et al., Citation2023; Kerschbaum et al., Citation2013; J. Li, Liu et al., Citation2015; Mironov et al., Citation2017; Pang & Ding, Citation2014; Zhu et al., Citation2021). CryptDB (Popa et al., Citation2011) utilised Property-Preserving Encryption (PPE) to perform various SQL query operations. EnclaveDB (Priebe et al., Citation2018) hosted sensitive data along with natively compiled queries in an enclave, enabling efficient and secure SQL queries over encrypted databases. Huawei's GaussDB (Zhu et al., Citation2021) integrates software-mode encryption algorithms and hardware-mode confidentiality calculations to realise rich SQL query functions. Since then, plenty of researches have emerged (Eskandarian & Zaharia, Citation2017; Hahn et al., Citation2019; Jutla & Patranabis, Citation2022; Kamara & Moataz, Citation2018; M. Li et al., Citation2023; Lv et al., Citation2023; Shafieinejad et al., Citation2022; Sun et al., Citation2021). These researches may rely on different techniques, such as cryptographic primitives and trusted hardware.

Despite these advances, current encrypted database solutions still face significant challenges, especially the architecture and security. Loosely coupled database architecture and complex underlying settings lead to various security risks and performance bottlenecks. In addition, existing work lacks a comprehensive evaluation of SQL query schemes from the dimensions of architecture, SQL functionality, and security level. In this case, there is no clear understanding of the trade-offs between functionality and security, and the feasibility of deploying such a solution in the real world, which hinders further development of encrypted databases.

To address the current gaps in encrypted database research, we need a clear classification of existing SQL query schemes. Additionally, we need to establish a comprehensive evaluation model to fully assess the SQL query schemes from the aspects of SQL functionality, performance, and security level. It is also important to develop a security level evaluation system that can provide guidance and benchmarks for comparing and selecting SQL query schemes.

1.1. Our contributions

In this paper, we summarise the state of art SQL query solutions on encrypted databases and evaluate them from multiple aspects. Our contributions are listed as follows.

We propose a novel evaluation model that comprehensively analyses the functionalities supported by different SQL query schemes and their security regarding confidentiality, privacy, and obliviousness. We also carefully evaluate the security of different schemes by integrating factors such as leakage profile and obliviousness into a unified security assessment scale, ranging from “low” to “higher”.
We comparatively analyse software and hardware-based SQL query schemes from dimensions of functionality, security, and efficiency. We also delve into software algorithm-based schemes. We summarise the granularity of protection and application scenarios of schemes using different encryption algorithms depending on the data sensitivity and the requirements of the SQL query function.

1.2. Organisation

This paper is organised as follows. The system and threat models of a generic secure SQL query scheme are presented in Section 2. We also proposed an evaluation model capable of analysing the SQL functionalities of different schemes and the security level they provide. In Section 3, we discuss the secure query schemes based on software algorithms. In Section 4, we discuss the classic trusted hardware-based SQL query schemes. In Section 5, we discuss the hybrid schemes combining software and hardware. In Section 6, we first compare some recent and classic secure SQL query schemes and then comparatively analyse the software and hardware-based approaches from dimensions of functionalities, security, and efficiency. We also compare the functionality and security of software-based schemes which use different cryptographic algorithms. Finally, the conclusion and prospects are presented in Section 7.

2. Models and definitions

In this section, we summarise the system and threat models for secure SQL query schemes. These models provide formal definitions for generic secure SQL queries and describe the capabilities of potential adversaries. Following this, we introduce a novel evaluation model capable of analysing both the functionality and security of various SQL query schemes.

2.1. System model

Similar to the model in Poddar et al. (Citation2019), our model is illustrated in Figure . Specifically, such a model involves two entities: the cloud server and the data owner.

Data Owner: The data owner, an entity with limited resources, opts to outsource their databases to the cloud server. To address security concerns, the database must be locally encrypted before outsourcing. Subsequently. when conducting a SQL query, the parameters within the query are encrypted and sent to the server. Finally, the data owner employs their private key to decrypt the returned query results from the server, ultimately obtaining the final results.
Cloud Server: The cloud server is an entity with seemingly inexhaustible resources, responsible for performing data retrieval within the encrypted database and ultimately returning the satisfied records to the data owner.

Figure 1. System model.

Definition 2.1

Generic Secure SQL Query Scheme

Let DB represent a relational database, and EDB represent its encrypted counterpart. The formal definition for generic secure SQL query schemes is given as follows:

$K \leftarrow KeyGen (1^{k})$ . The key generation algorithm inputs a security parameter $1^{k}$ and generates a random key K which is managed by the user.
$EDB \leftarrow EDBSetup (K, DB)$ . The data owner uses key K to encrypt $DB$ and generates an encrypted database $EDB$ which is then outsourced to the cloud server.
$ER \leftarrow QueryExec (EDB, q)$ . The cloud server performs SQL query q issued by the user and returns the encrypted results ER to the user.
$R \leftarrow Dec (K, ER)$ . The user performs the necessary decryption to obtain the plaintext results R.

2.2. Threat model

The threat model for SQL queries over encrypted databases can generally be categorised into three types based on the server's behaviour: honest, semi-honest (Goldreich et al., Citation2019), and malicious models, each reflecting varying degrees of malicious intent.

Honest Model: In the honest model, the server not only faithfully executes the protocol, but also does not pry into the data owner's privacy.
Semi-honest Model: In the semi-honest model, the server follows the protocol honestly but remains curious about the owner's data.
Malicious Model: In the malicious model, the server not only fails to run the protocol honestly but may also deliberately corrupt the data, engaging in activities such as malicious listening and tampering.

This paper primarily focuses on the semi-honest and malicious models, where the security risks are mainly from malicious listening and tampering. To prevent the cloud server from listening or tampering with sensitive data, encryption and verification techniques have been extensively researched (Benabbas et al., Citation2011; S. Chen et al., Citation2023; X. Chen et al., Citation2021; X. Chen, Li, Weng et al., Citation2015; Eltayesh, Citation2017; Jutla & Patranabis, Citation2022; J. Li et al., Citation2022; Lv et al., Citation2023; Reddy & Kumari, Citation2022; Z. Zhang et al., Citation2019).

Encryption Techniques: Encryption provides a fundamental layer of confidentiality, rendering data opaque to unauthorised parties (Jutla & Patranabis, Citation2022; J. Li, Liu et al., Citation2015; M. Li et al., Citation2023; Lv et al., Citation2023; Popa et al., Citation2011). Whether a server is semi-honest or malicious, without the correct encryption keys, sensitive data remains secure.
Verification: Verification extends the cryptographic defense by enabling data owners to validate the integrity and authenticity of returned query results (Benabbas et al., Citation2011; S. Chen et al., Citation2023; X. Chen et al., Citation2021; Eltayesh, Citation2017). This is particularly crucial in the malicious model where a server could tamper with results.

Although encryption and verification ensure that the service provider does not access or tamper with the data, the access control mechanisms are essential to prevent external attackers from accessing the encrypted database (S. Chen et al., Citation2023; X. Chen, Li, Huang et al., Citation2015; Hur & Noh, Citation2010; J. Li, Chen et al., Citation2015; Wei et al., Citation2021). They serve as gatekeepers that govern data accessibility within an encrypted database. These mechanisms, which encompass user authentication, identity-based access regulation, and additional advanced control strategies, ensure that only verified users can access and execute SQL queries.

2.3. Evaluation model

We assume that $T_{A} = (A_{1}, A_{2}, \dots, A_{m})$ and $T_{B} = (B_{1}, B_{2}, \dots, B_{n})$ are two different tables, where $A_{1}, A_{2}, \dots, A_{m}$ and $B_{1}, B_{2}, \dots, B_{n}$ are different attributes within the tables. Specifically, $T_{A}$ is a set of m-tuples (or rows) $(a_{1}, a_{2}, \dots, a_{m})$ , where $a_{i}$ is an element in the domain of attribute $A_{i}$ . Similarly, $T_{B}$ is a set of n-tuples $(b_{1}, b_{2}, \dots, b_{n})$ , where $b_{i}$ is an element in the domain of attribute $B_{i}$ . In this paper, we propose an evaluation model to analyse different schemes from two distinct aspects: the type of SQL functionality and the level of security. However, it's important to note that our model does not explicitly take into account aspects of data integrity and freshness.

2.3.1. SQL functionality

Definition 2.2

Equality Query

For a table $T_{A}$ , an equality query on the $T_{A}$ is defined as: $σ_{A_{i} = v} (T_{A}) = {t | t \in T_{A} and t (A_{i}) = v}$ where $t (A_{i})$ is the value of tuple t on attribute $A_{i}$ , and v is a specific value used for filtering the rows.

Definition 2.3

Join

For table $T_{A}$ and table $T_{B}$ , where $A_{i}$ and $B_{j}$ are join attributes, the join operation is defined as: $T_{A} ⋈ T_{B} = {\hat{t_{a} t_{b}} | t_{a} \in T_{A} and t_{b} \in T_{B} and t_{a} (A_{i}) = t_{b} (B_{j})}$ where $\hat{t_{a} t_{b}}$ is a combined tuple $(a_{1}, a_{2}, \dots, a_{m}, b_{1}, b_{2}, \dots, b_{n})$ , and $t_{a} (A_{i})$ is the value of tuple $t_{a}$ on attribute $A_{i}$ . Similarly, $t_{b} (B_{j})$ is the value of tuple $t_{b}$ on attribute $B_{j}$ .

Definition 2.4

Group By

For a table $T_{A}$ , the group by operation on attribute $A_{i}$ can be defined as: $γ_{A_{i}} (T_{A}) = {(A_{i}, G) | ∀t \in T_{A}, t \in G \Leftrightarrow t (A_{i}) = v}$ where G is a subset of tuples in $T_{A}$ that share the same value v of attribute $A_{i}$ , and $γ_{A_{i}}$ represents a set of such groups for the entire table $T_{A}$ .

Definition 2.5

Order By

For a table $T_{A}$ , with an attribute $A_{i}$ , the order by operation can be defined as: $τ_{A_{i}} (T_{A}) = {(t_{1}, t_{2}, \dots, t_{m}) | ∀k \in [1, m - 1], t_{k} (A_{i}) \leq t_{k + 1} (A_{i})}$ where $t_{k} (A_{i})$ denotes the value of the kth tuple in the ordered list on attribute $A_{i}$ , and m is the total number of tuples in $T_{A}$ . Generally, the tuples are arranged in ascending or descending order as specified in the query.

Definition 2.6

Range Query

For a table $T_{A}$ and an interval $(a, b)$ , the range query on $T_{A}$ can be defined as: $σ_{a < A_{i} < b} (T_{A}) = {t | t \in T_{A} and a < t (A_{i}) < b}$ where $t (A_{i})$ is the value of tuple t on attribute $A_{i}$ and $(a, b)$ is an interval of values taken on attribute $A_{i}$ . Range query expands the querying capabilities by enabling the selection of data that falls within a specific numerical or chronological range.

Definition 2.7

Aggregation

For a table $T_{A}$ , an aggregation operation with respect to an aggregate function f over attribute $A_{i}$ can be defined as: $α_{f (A_{i})} (T_{A}) = f (t (A_{i}) | t \in T_{A})$ where f could be SUM, COUNT, AVG, MIN, or MAX, and it computes a single value from the set of $A_{i}$ values across all tuples t in $T_{A}$ .

Definition 2.8

String Pattern Match

For a table $T_{A}$ and a pattern string P, the string pattern match query can be defined as: $σ_{A_{i} LIKE P} (T_{A}) = {t | t \in T_{A} and P matches t (A_{i})}$ where $t (A_{i})$ is the string value of tuple t on attribute $A_{i}$ , and the pattern match is based on SQL's LIKE operator, which may include wildcard characters.

2.3.2. Security

We first define three aspects of security: data confidentiality, query privacy, data, and program obliviousness. Additionally, we demonstrate the corresponding risks under security definitions, including data leakages and attacks, and outline the security goals to address these issues.

(1) Confidentiality.

Data confidentiality protects data from unauthorised access and disclosure, including means for protecting personal privacy and proprietary information. We refer to the adaptive security definition given by Curtmola et al. (Citation2006) and use the leakage profile to define the security.

Definition 2.9

Confidentiality

Let $Π$ = (KeyGen,EDBSetup,QueryExec,Dec) be a secure SQL query scheme and $L$ be a leakage function. For an efficient adversary $A$ and a simulator $S$ , we define the experiments $Rea l_{A}^{Π}$ and $Idea l_{A, S}^{Π}$ as follows.

$Rea l_{A}^{Π}$ : $A$ chooses where λ is a security parameter. The game runs $EDBSetup$ and gives $EDB$ to $A$ . Then $A$ chooses a series of query $q_{1}, \dots, q_{m}$ . For all $i \in [m]$ , the game runs $QueryExec (EDB, q_{i})$ and gives output to $A$ . Eventually, $A$ outputs a bit b which is also the output of the $Rea l_{A}^{Π}$ .
$Idea l_{A, S}^{Π}$ : $A$ chooses $DB$ . The simulator $S$ initialises a query list q. The game runs $EDB \leftarrow S (L (DB))$ and gives $EDB$ to $A$ . Then $A$ chooses a series of query $q_{1}, \dots, q_{m}$ . For all $i \in [m]$ , the game runs $S (L (DB, q_{i}))$ and gives output to $A$ . Eventually, $A$ outputs a bit b.

We say that $Π$ is $L$ -semantically secure if for all polynomial probabilistic time adversaries $A$ there exists an efficient simulator $S$ and a negligible function $neg$ such that $| Pr [Rea l_{A}^{Π} (λ) = 1] - Pr [Idea l_{A, S}^{Π} (λ) = 1] | \leq neg (λ)$ Under this security model, an adversary cannot obtain any information about user data except through the leakage profiles generated by the leakage function $L$ .

(2) Privacy.

Privacy is the most critical concern when performing SQL queries over encrypted databases. To ensure the protection of sensitive information, it is crucial to prevent unauthorised parties from gaining in-depth insights into various aspects of the data. Therefore, we define size pattern, equality pattern, and order pattern to quantify privacy protection.

Size Pattern (SP). The size pattern typically pertains to the size of the tables retrieved by the query. In most cases, result pattern leakage is not considered a significant risk, since it is challenging for an attacker to extract sensitive information based solely on the size of the table.
Equality Pattern (EP). The equality pattern pertains to the usage of the equality operator $(=)$ in selection queries, particularly in the WHERE clause, to filter out rows that meet certain conditions. This pattern is often used to match a specific attribute or column value.
Join Pattern (JP). The join pattern refers to the equality condition used for tuple matching when performing primary, foreign key join operations on two tables. Specifically, during a join operation, the Database Management System (DBMS) needs to select a subset of the Cartesian product using the equality condition. The join pattern reveals whether a pair of tuples have equal values on the join attribute. Although join patterns do not reveal the underlying plaintext data, the frequency information it reveals can be exploited by attackers to extract valuable data.
Order Pattern (OP). The order pattern refers to the relative order of the encrypted data. Many schemes commonly employ order-preserving encryption (Agrawal et al., Citation2004; Boldyreva et al., Citation2011) (OPE) to enable SQL functions that involve comparison operations, such as range queries, group by and aggregation. OPE ensures that $E (m_{i}) < E (m_{j})$ for $m_{i} < m_{j}$ where E is the encryption algorithm. While the order facilitates a wide range of SQL functions, it is also valid information that an attacker can exploit.

Definition 2.10

Size Pattern Leakage

For a SQL query on table $T_{A} = (A_{1}, A_{2}, \dots, A_{m})$ , the size pattern leakage (SPL) is defined as: $SPL = m \cdot | t | or m \cdot | t_{σ} |$ where m is the number of attributes in the $T_{A}$ , $| t |$ and $| t_{σ} |$ denote the number of all tuples in the table and the number of tuples satisfying the query.

It is difficult for attackers to launch an effective attack using only SPL. In cases where the database size and query result size are sensitive data, padding is utilised to hide the size pattern.

Definition 2.11

Equality Pattern Leakage

For a table $T_{A}$ and an attribute $A_{i}$ , the equality pattern leakage (EPL) can be defined as: $EPL = {t_{1} (A_{i}) = t_{2} (A_{i})} if ϵ (t_{1} (A_{i})) = ϵ (t_{2} (A_{i}))$ where $t_{1} (A_{i})$ and $t_{2} (A_{i})$ represent the plaintext values of attribute $A_{i}$ for records $t_{1}$ and $t_{2}$ . $ϵ (t_{1} (A_{i}))$ and $ϵ (t_{2} (A_{i}))$ represent the encrypted values of attribute $A_{i}$ for records $t_{1}$ and $t_{2}$ .

Definition 2.12

Join Pattern Leakage

For a primary, foreign key join operation $T_{A} ⋈ T_{B}$ on table $T_{A}$ and $T_{B}$ , the join pattern leakage (JPL) can be simply defined as: $\begin{aligned} JPL [t_{a}, t_{b}] & = 1 if t_{a} (A_{i}) = t_{b} (B_{j}) \\ JPL [t_{a}, t_{b}] & = 0 if t_{a} (A_{i}) \neq t_{b} (B_{j}) \end{aligned}$ where $t_{a} (A_{i})$ is the value of tuple $t_{a}$ on attribute $A_{i}$ and $t_{b} (B_{j})$ is the value of tuple $t_{b}$ on attribute $B_{j}$ . $A_{i}$ and $B_{j}$ are join attributes with the same domain. JPL = 1 indicates that the two tuples have equal values on the join attribute, whereas JPL = 0 indicates that the two tuples have different values on the join attributes.

While attackers cannot directly obtain sensitive data through JPL, it is important to note that the frequency of items can still be deduced using JPL, providing a powerful weapon for launching inference attacks. For example, consider two tables, Customers and Orders, that share a common field CustomerID. The Customers table contains unique CustomerIDs, while the Orders table may contain multiple entries of the same CustomerID, representing multiple orders placed by the same customer. Performing the join operation on these two tables using the equality condition (CustomerID in Customers = CustomerID in Orders) reveals the equality of pairs that can be utilised by attackers to infer the frequency of items. If a specific encrypted CustomerID appears multiple times in the result of the join query, it indicates that the corresponding customer has placed multiple orders. This leakage enables the attacker to deduce the frequency of orders placed by each customer, despite the inability to discern the exact customer identities or order details.

We present two common types of inference attacks (Blackstone et al., Citation2019; Naveed et al., Citation2015) that utilise the frequency information revealed by JPL. In these attacks, attackers can effectively deduce private data by combining publicly available auxiliary datasets with frequency information. This approach enables them to launch sophisticated and effective inference attacks as follows.

Frequency Analysis Attack (Lacharité & Paterson, Citation2015). In frequency analysis attacks, attackers exploit pattern information leaked during database queries to obtain frequency information which is then used to infer or recover sensitive data within the database. Specifically, attackers can utilise frequency information along with publicly available auxiliary information to correlate the frequencies of the two datasets. Through this correlation, they can deduce private data by performing one-to-one sorting.
$l_{p}$ -Optimisation Attack (Lacharité & Paterson, Citation2015). In $l_{p}$ -Optimisation attacks, attackers similarly correlate the frequency information obtained from the pattern information with the frequencies of the public auxiliary dataset to deduce private data. Specifically, attackers minimise a designated cost function, typically $l_{p}$ distance, between ciphertext and plaintext histograms.

Definition 2.13

Order Pattern Leakage

For a table $T_{A}$ and an attribute $A_{i}$ , the order pattern leakage (OPL) can be defined as: $OPL = {t_{1} (A_{i}) > t_{2} (A_{i})} if ϵ (t_{1} (A_{i})) > ϵ (t_{2} (A_{i}))$ where $t_{1} (A_{i})$ and $t_{2} (A_{i})$ represent the plaintext values of attribute $A_{i}$ for records $t_{1}$ and $t_{2}$ . $ϵ (t_{1} (A_{i}))$ and $ϵ (t_{2} (A_{i}))$ represent the encrypted values of attribute $A_{i}$ for records $t_{1}$ and $t_{2}$ .

When using OPL, another attack surface arises. We show two common attacks that utilise the order information.

Sorting Attack (Naveed et al., Citation2015). In sorting attacks, attackers use the relative order of the encrypted data to deduce sensitive data. Specifically, with enough different attribute values in the encrypted attribute columns, attackers can sort the encrypted data to correspond one-to-one with the message space to recover the private data.
Cumulative Attack (Naveed et al., Citation2015). In cumulative attacks, attackers leverage the auxiliary datasets and the relative order of encrypted data to recover the sensitive data. Specifically, if a ciphertext in the encrypted column exceeds $90 %$ of the other ciphertexts, attackers can recover the plaintext by mapping this ciphertext to the data in the auxiliary dataset that exceeds $90 %$ of the remaining data.

If the scheme reveals equality, join, or order patterns when executing SQL queries over encrypted databases, then even if it supports rich SQL functionalities, this advantage comes at the expense of leaking a substantial amount of valid information. Such schemes are proven to be insecure.

Overall, the primary objective of designing a secure SQL query scheme is to prevent the leakage of equality, join, and order patterns. For example, when executing SQL queries involving equality such as selection, join, and group by, it is crucial to restrict the amount of information an attacker can learn about equality of pairs. This can be achieved by limiting the leakage of equality conditions and frequency information to only those tuples that match the SQL query, rather than exposing all tuples in the table. Similarly, for the order pattern, the schemes should aim to prevent attackers from deducing any information about the plaintext data, except for the order of the ciphertext, particularly when supporting SQL functions that involve comparison operations like range queries. The m-ORE proposed by Lv et al. (Citation2023) enables to achieve comparison over ciphertext while hiding the most significant differing bits. Regarding the size pattern, most schemes do not prioritise hiding it, as it is challenging for attackers to effectively infer sensitive information solely based on metadata such as table sizes in the database. However, if the metadata, such as the database size, is deemed sensitive and needs to be concealed, the size pattern can be hidden using padding techniques. This involves adding extra characters to a field or column, ensuring that all encrypted values have a uniform length, regardless of the original plaintext length.

(3) Obliviousness.

Obliviousness is primarily aimed at enhancing the security and privacy of access patterns. Without such measures, malicious adversaries could exploit these patterns to launch leakage abuse attacks (Blackstone et al., Citation2019; Grubbs et al., Citation2017; Islam et al., Citation2012) on query schemes.

Access Pattern. The access pattern refers to the traces left by users during the process of querying records in a table. This includes the locations of accessed data blocks, the order of data block requests, the frequency of access to the same data block, and specific methods of reading or writing data.

The leakage of the access patterns leads to the precise location of the underlying database elements being revealed when a user issues a query on the data block. This allows attackers to infer the importance of the stored data from the user's access pattern, e.g. by counting the frequency of accesses to each data block. At the same time, attackers can also match two consecutive access patterns to infer the correlation between data queries and even the content of the encrypted data. Therefore, the access pattern is the powerful information for an attacker to use to launch effective attacks.

Access Pattern Attack (Islam et al., Citation2012). In access pattern attacks, attackers can gain insights into encrypted data by observing the patterns of access to that data. For example, if a particular piece of data is accessed frequently, attackers might infer that it's highly relevant or sensitive. They could potentially guess the contents of encrypted records or infer which records contain certain keywords based on the frequency and access pattern.
File Injection Attack (Y. Zhang et al., Citation2016). In file injection attacks, attackers trick a user into encrypting and uploading specially crafted files to a searchable encrypted database or cloud storage. These files are designed by attackers and contain specific data patterns. Once these files are part of the user's encrypted database, attackers monitor the access patterns or query results when the user performs queries. Based on which injected files are accessed during these searches, attackers can infer the nature of the user's queries and potentially gain insights into the user's private data.
Side-channel Attack (Van Schaik et al., Citation2021). In side-channel attacks, attackers obtain information from the physical implementation of the cryptosystem, diverging from traditional attacks that rely on brute force or theoretical weaknesses in the algorithm. Side-channel attacks are essentially indirect attacks, exploiting information that is unintentionally exposed rather than directly attacking the system. Access pattern leakage is one such unintentional exposure. For example, attackers observing repeated access to a particular data location could infer that this location contains critical data, thus making it a target for future attacks. The granularity of access patterns can significantly impact the potential for side-channel attacks. For instance, if the access patterns reveal detailed information about individual data items, they could provide a rich source of information for attackers. On the other hand, if the access patterns are coarse-grained, revealing only high-level information, the potential for side-channel attacks may be reduced. The relationship between side-channel attacks and access pattern leakage also has a temporal aspect. Attackers may need to observe the system over a period of time to discern meaningful patterns. This makes systems with frequent data access potentially more vulnerable to these types of attacks.

Definition 2.14

Obliviousness

Let $l = ((o p_{1}, add r_{1}, dat a_{1}), (o p_{2}, add r_{2}, dat a_{2}), \dots, (o p_{n}, add r_{n}, dat a_{n}))$ be a sequence of operations of length n that accesses the data in the table. Each operation $o p_{i}$ represents a read operation $read (add r_{i})$ or a write operation $write (add r_{i}, data)$ . For two sequences of operations $l_{1}$ and $l_{2}$ of the same length, we say that the access pattern was successfully hidden if, for all polynomial probabilistic time algorithms, there exists a negligible function neg such that $| Pr [A (l_{1}) = 1] - Pr [A (l_{2}) = 1] | \leq neg (λ)$ where $A (l_{i})$ denotes the access sequence generated for a given sequence of operations $l_{i}$ .

The secure SQL schemes should be designed to provide obliviousness while preserving privacy, ensuring that an attacker cannot learn sensitive information from access pattern differences, thus achieving a higher level of security.

3. Software-based secure SQL query schemes

In this section, we discuss the secure SQL query schemes based on software algorithms that utilise cryptographic primitives to provide data security and privacy. According to the utilisation of different cryptographic primitives, we further categorise the secure SQL schemes into three types: those based on property-preserving encryption (PPE), searchable symmetric encryption (SSE), and structured encryption (STE). We discuss the representative schemes in Sections 3.1, 3.2, and 3.3, respectively.

3.1. Schemes based on property-preserving encryption

Property-preserving encryption (Pandey & Rouselakis, Citation2012) is a category of encryption schemes that permits public computation on encrypted data based on a pre-specified property, such as OPE (Boldyreva et al., Citation2011) and deterministic encryption (Bellare et al., Citation2007) (DET). Specifically, DET ensures that $ϵ (m_{i}) = ϵ (m_{j})$ for $m_{i} = m_{j}$ and OPE ensures that $ϵ (m_{i}) > ϵ (m_{j})$ for $m_{i} > m_{j}$ where ε is the encryption algorithm. Through the properties it preserves, PPE-based schemes (Alsirhani et al., Citation2017; Fuller et al., Citation2017; Papadimitriou et al., Citation2016; Popa et al., Citation2011, Citation2014; Tu et al., Citation2013) allow performing operations such as range queries and equality comparisons on encrypted data without decryption. To better understand the SQL functionality supported by such schemes along with the leakage, we introduce CryptDB (Popa et al., Citation2011), a well-established PPE-based scheme as follows.

CryptDB is devoted to establishing an intermediate proxy server situated between the DBMS and the user client. This intermediary proxy server assumes a pivotal role in facilitating various essential operations, including but not limited to SQL query rewriting, key delivering, and result decryption. Notably, CryptDB introduces a multi-layered encryption scheme “onion encryption”, as shown in Figure . Specifically, each onion layer represents a specific encryption type that supports particular query functions. The outer layer provides stronger security than the inner layers. When an application initiates a predicate computation on a column, the CryptDB proxy determines the appropriate layer of the onion to perform the computation. If the column's encryption is not in the correct layer, the proxy sends the necessary keys to the server to decrypt the ciphertext layer by layer until the correct encryption layer is reached, enabling the desired computation. The encryption methods used in various onions are shown in Table . Specifically, The choice of encryption method within the “onion” is determined by the SQL query issued by the application. For example, DET supports equality-based queries, and OPE supports range queries. CryptDB also introduces other encryption schemes, such as Join, OPE-Join, and Word Search, to facilitate join and search operations.

Figure 2. Onion encryption.

Table 1. Encryption schemes used in onion.

Download CSV Display Table

While CryptDB provides rich SQL functionality, it also has concerns regarding efficiency and security.

Inefficiency. One notable disadvantage of CryptDB is its inherent inefficiency. The outermost layer in an onion is usually non-deterministic, and when performing almost any SQL query, the server needs to decrypt the entire column of data to strip off the outermost layer of the onion.
Huge Leakage (Grubbs et al., Citation2017; Naveed et al., Citation2015). In terms of security, CryptDB reveals equality and order patterns when using DET and OPE, respectively. Specifically, because the onion wraps the deterministic ciphertext in a probabilistic ciphertext, CryptDB is semantically secure and does not reveal any equality pattern when no join operation is performed. However, once the user initiates a join query, the onion is stripped away and the data is again in an insecure deterministic layer, which reveals the frequency information of all the records in the columns. The severe leakage makes CryptDB vulnerable to the frequency analysis, $ℓ_{p}$ -optimisation, sorting attacks, and cumulative attacks mentioned in Section 2.6.
Uncertainty of the Proxy Side. CryptDB assumes that cloud servers are semi-honest, but rely on honest proxies. The attacked proxy can result in the leakage of plaintext queries, data, and even secret keys.

In Arasu et al. (Citation2013) first proposed the Cipherbase scheme to support full SQL functionality. The scheme extends industrial-grade database engine (Microsoft, Citation2022) (Microsoft SQL Server) to achieve orthogonal security. To further improve the efficiency of SQL queries, Tu et al. (Citation2013) proposed the Monomi scheme, splitting the query execution into two parts, one for the encrypted data stored in the cloud and the other for the plaintext data located at the client. However, Monomi's reliance on costly encryption makes it unsuitable for extensive big-data scenarios. To analyse large datasets efficiently, in Papadimitriou et al. (Citation2016) proposed the Seabed using additively symmetric homomorphic encryption. In Alsirhani et al. (Citation2017) further improved security by splitting the encrypted columns among various cloud providers through a distribution technique.

3.2. Schemes based on searchable symmetric encryption

Searchable symmetric encryption allows efficient searching over encrypted documents or records without decryption (Amorim & Costa, Citation2023; Curtmola et al., Citation2006; Song et al., Citation2000). SSE hinges on the utilisation of index mechanisms coupled with trapdoor information to support access to confidential data. In such a scheme, the primary objective for a user is to construct an index that not only supports the efficient retrieval of data but also upholds the principles of privacy preservation. Concurrently, trapdoor information, derived mainly from user queries, plays a critical role in this framework. These trapdoors are typically formulated using key terms or phrases extracted from the query. The formal definition of SSE is as follows and its construction is shown in Figure .

Figure 3. Searchable symmetric encryption.

Definition 3.1

Searchable Symmetric Encryption

Let $Δ = {W_{1}, W_{2}, \dots, W_{p}}$ be a keyword dictionary, $R = (R_{1}, R_{2}, \dots, R_{n})$ be the records in the tables. An SSE scheme is a tuple of five polynomial-time algorithms $(KeyGen, Enc, Trapdoor, Search, Dec)$ such that:

$K \leftarrow KeyGen (λ)$ : is a probabilistic algorithm that inputs a security parameter λ, and outputs a random key K.
$(I, C) \leftarrow Enc (K, R)$ : is a probabilistic algorithm that inputs a secret key K and the plaintext records R, and outputs an index I and ciphertext $C = (C_{1}, C_{2}, \dots, C_{n})$ .
$T_{W} \leftarrow Trapdoor (K, W)$ : is a deterministic algorithm that inputs a key K and the search keywords W in SQL queries, and outputs the trapdoor information $T_{W}$ .
$D_{W} \leftarrow Search (I, T_{W})$ : is a deterministic algorithm to find records that contain keywords W. It inputs an index I and trapdoor information $T_{W}$ , and outputs a collection of identifiers $D_{W}$ .
$D_{i} \leftarrow Dec (K, C_{i})$ : is a deterministic algorithm running on the client side. It inputs a secret key K and the ciphertext file $C_{i}$ found by the identifier, and outputs the search results $D_{i}$ .

In this section, we discuss two SSE-based schemes, Secure-Join (Section 3.2.1) and JXT (Section 3.2.2), specifically tailored for the effective handling of a critical category of join queries within the SQL.

3.2.1. Secure-join: leakage-optimised join scheme

The Secure-Join scheme (Hahn et al., Citation2019) combines SSE and Key-Policy Attribute-Based Encryption (Goyal et al., Citation2006) (KP-ABE) to provide fine-grained security. The scheme utilises both SSE and ABE encryption to minimise information leakage, and ABE also reduces storage overheads.

The process of performing a join using Secure-Join is illustrated in Figure . Firstly, the join attribute is encrypted using SSE. Suppose the join attribute is unique, such as a primary key, the deterministic encryption algorithm SSE.Token is applied. On the other hand, if the join attribute is not unique, the probabilistic encryption algorithm SSE.Enc is applied. The resulting token and ciphertext, collectively known as SSE-value, are then further blinded using ABE to minimise potential leakage. The blinded ciphertext is then outsourced to the cloud. When a user initiates a join query, the client generates a query token by transforming the filter predicates in the query into attribute policies using ABE.KeyGen. Upon receiving the query, the server performs the decryption algorithm, invoking ABE.Dec to filter. Finally, the join matching is executed by SSE based on the filtered data.

Figure 4. Secure-join.

In terms of security, Secure-Join outperforms PPE-based schemes like CryptDB. It only reveals the equality patterns of records that meet the join criteria. However, inherent drawbacks exist within the Secure-Join scheme:

Limited Join Support and High Overhead (Shafieinejad et al., Citation2022). Secure-Join exclusively supports primary, foreign key joins, incurring a significant overhead with a time complexity of $O (n^{2})$ .
Absence of Obliviousness. The scheme lacks an additional entity to prevent access pattern leakage, rendering it susceptible to potential access pattern attacks (Islam et al., Citation2012).
Super-Additive Leakage (Shafieinejad et al., Citation2022). It is the most serious point. Specifically, the leakage caused by a collection of queries may be larger than the sum of the leakage caused by each query.

Therefore, although it provides a higher level of security than CryptDB, Secure-Join still falls short in terms of security protection as it does not provide obliviousness and can lead to Super-Additive Leakage. To address these limitations, Shafieinejad et al. (Citation2022) introduced Function-Hiding Inner Product Encryption, enhancing the Secure-Join scheme. This extension not only broadens join support to include hash joins but also achieves a reduced time complexity of $O (n)$ . Critically, it confines the leakage caused by a sequence of queries to the passing closure of the sum of individual query leakages, effectively mitigating super-additive leakage risks.

Similar to the Secure-Join scheme, many schemes use attribute-based encryption to implement access control mechanisms (N. Chen et al., Citation2022; S. Chen et al., Citation2023; J. Li et al., Citation2022; Premkamal et al., Citation2019). The ciphertext-policy attribute-based encryption (CP-ABE) scheme proposed by J. Li et al. (Citation2022) realises fine-grained access control while solving the problem of user key abuse. The revocable attribute-based encryption scheme that protects the data integrity (RABE-DI) proposed by S. Chen et al. (Citation2023) can dynamically set user access privileges while verifying data integrity.

3.2.2. JXT: queries over join without pre-computation

Join Cross-Tags (Jutla & Patranabis, Citation2022) (JXT) scheme emerges as a notable enhancement of the Oblivious Cross-Tags (Cash et al., Citation2013) (OXT) scheme pioneered by Cash et al. This symmetric-key solution adeptly facilitates join operations on encrypted databases without necessitating pre-computation, thereby representing a significant step in SSE. While OXT is based on key-value pairs, JXT uses a similar structure to implement SQL queries on structured data. Therefore we briefly introduce the OXT scheme.

The OXT scheme is the first sublinear index-based framework capable of handling conjunctive keyword queries. Its design incorporates two core index structures, namely TSet and XSet, both of which are essential for retrieval tasks. Specifically, the TSet, serving as an inverted index, is built to expedite the retrieval of individual keywords. Contrarily, the XSet serves as a repository for the hash values of keyword-document pairs. The server utilises s series of intersecting trapdoors which are aligned correspondingly to user queries, enabling to return those documents that contain the complete set of queried keywords.

JXT preserves full compatibility with OXT, ingeniously extending its capabilities to encompass complex queries. In the context of a join query denoted as $w_{1} \land w_{2}$ , where $w_{1}$ and $w_{2}$ represent attribute-value pairs originating from two distinct tables, the server orchestrates a meticulous process. Initially, the server synthesises the elements within TSet, encapsulating records that align with the individual keywords $w_{1}$ and $w_{2}$ . Subsequently, leverage XSet for member detection, enabling the server to ascertain whether records within the respective TSets match the criteria outlined by the join query. Upon the successful completion of member detection, the server promptly transmits the corresponding TSet entry to the client. In the conclusive phase, the client undertakes the decryption process.

In terms of leakage, JXT reveals result pattern, query equality, size pattern, conditional intersection pattern, and join-attribute distribution pattern. These are considered benign, especially if join attributes are primary keys with high entropy (Jutla & Patranabis, Citation2022). A possible advancement includes future research directed at supporting joins over low-entropy attributes and extending the scheme to support multi-table joins without pre-computation.

We also investigated other SSE-based approaches (Cash et al., Citation2014; L. Chen et al., Citation2023; Goh, Citation2003; M. Li et al., Citation2021). Many SSE-based schemes tend to focus on queries with limited functionality. In contrast, Pappas et al. (Citation2014) propose a scalable and secure database management system that supports arbitrarily efficient Boolean queries. Additionally, we note that most SSE schemes are designed for single-user, whereas M. Li et al. (Citation2021) support multi-user shared use of SQL queries over encrypted databases by applying the Diffie-Hellman protocol to the Trapdoor algorithm of standard SSE. The DSSE-DC scheme proposed by L. Chen et al. (Citation2023) avoids redundant storage in cloud environments while improving query efficiency by using a deduplication mechanism and inner product matching.

3.3. Schemes based on structured encryption

The structured encryption scheme (Chase & Kamara, Citation2010) is a symmetric key encryption scheme used to encrypt data structures. Drawing parallels with SSE (Song et al., Citation2000), STE shares the foundational concept of leveraging indexes for efficient query operations. However, what sets STE apart is its extension of index structures to encompass a broader array of data structures, including multi-maps, dictionaries, and graphs.

Definition 3.2

Structured Encryption

Let T be an abstract data type and δ be a data structure of type T. A structured encryption scheme is a tuple of five polynomial-time algorithms $(KeyGen, Enc, Token, Query, Dec)$ such that:

$K \leftarrow KeyGen (λ)$ : is a probabilistic algorithm that inputs a security parameter λ, and outputs a secret key K.
$(γ, c) \leftarrow Enc (K, δ, M)$ : is a probabilistic algorithm that inputs a secret key K, a data structure δ, and a sequence of data M which is another form of plaintext space corresponding to the data structure. It outputs a sequence of ciphertext c and an encrypted data structure γ.
$τ \leftarrow Token (K, q)$ : is a probabilistic algorithm that inputs a key K and a query q. It outputs a token τ.
$(J, v) := Query (γ, τ)$ : is a deterministic algorithm that inputs a token τ and an encrypted data structure γ. It outputs a set of pointers J to the ciphertext that matches the query q, and a sequence of semi-private data v.
$m_{j} := Dec (K, c_{j})$ : is a deterministic algorithm that inputs a secret key K and a ciphertext $c_{j}$ . It outputs a message $m_{j}$ .

We then discuss two STE-based approaches in Section 3.3.1 and 3.3.2.

3.3.1. SPX: SQL on structurally-encrypted databases

SPX, a relational database encryption framework proposed by Kamara and Moataz (Citation2018), employs structured encryption for secure SQL query execution. SPX creates encrypted databases in both row-wise and column-wise formats, supporting queries in Heuristic Normal Form (HNF). Specifically, the encrypted database construct index structure by using multi-maps and dictionaries, including $M M_{R}$ for row-wise mapping, $M M_{C}$ for column-wise mapping, $M M_{V}$ for handling equality relations among values and a dictionary $DX$ encapsulating the $M M_{C}$ of each column. These structures are encrypted using a dictionary and multi-mapping encryption before being outsourced to the server. The combination of selection, projection, and cross-product operations in relational algebra, known as SPC algebra (Chandra & Merlin, Citation1977; Codd, Citation1970), forms the basis for SPX's SQL query. To mitigate the exponential time overhead associated with normal form queries of SPC, Kamara and Moataz introduced HNF which also addresses the challenges posed by the complexity of SQL.

Compared to JXT (Jutla & Patranabis, Citation2022), SPX has lower computational and communication complexity for certain extreme cases where the result of join queries is empty (Jutla & Patranabis, Citation2022). However, SPX has pre-computation of joins at setup, which might lead to storage blowup for large databases.

In terms of security, SPX reveals the size pattern, equality pattern, access pattern, and search pattern. There is no doubt that SPX exhibits a lower level of leakage compared to PPE-based (Alsirhani et al., Citation2017; Fuller et al., Citation2017; Papadimitriou et al., Citation2016; Popa et al., Citation2011, Citation2014; Tu et al., Citation2013) because it confines equality pattern leakage to records meeting the specified query criteria.

3.3.2. STI: partially precomputed joins

When the server conducts full precomputation of equi-joins, significant drawbacks emerge. Not only does the server learn the equality patterns within rows, but for the client, the burden of downloading and decrypting a massive number of rows becomes overwhelming. To address this challenge, numerous approaches delegate a portion of the computation to the client (Ciriani et al., Citation2009; Damiani et al., Citation2003; Tu et al., Citation2013), aiming to minimise information leakage. However, none of these schemes effectively support join operations.

In response to this limitation, Cash et al. proposed partially precomputed joins (Cash et al., Citation2021) based on a structured index (STI) which is an extension of STE. Specifically, the server does not preemptively perform joins on all rows in the table and then performs predicate filtering. Instead, the server stores only the rows from the input table that appear in the join result, and the client downloads only those rows to compute the join. This framework not only mitigates the risk of curious servers learning extensive join patterns but also significantly reduces the computational overhead on the client's end.

While both STI and SPX process analogous queries, STI demonstrates less leakage by leveraging the HashSet technique derived from OXT (Cash et al., Citation2013). However, compared to JXT, which does not need partial pre-computation of joins during setup, STI exhibits higher query complexity.

Another notable STE-based scheme is the ARQ framework (Espiritu et al., Citation2022), which does not rely on trusted hardware or cryptographic primitives. Instead, it integrates a management structure for plaintext data with STE, supporting efficient aggregate range queries over encrypted databases.

4. Hardware-based secure SQL query schemes

Software algorithms-based solutions have limitations in supporting full SQL capabilities. A range of trusted hardware-based solutions (Antonopoulos et al., Citation2020; Bajaj & Sion, Citation2011; Eskandarian & Zaharia, Citation2017; M. Li et al., Citation2023; Priebe et al., Citation2018; Zheng et al., Citation2017) effectively addresses this challenge. In Section 4.1, we explore a database engine that uses a secure coprocessor. In Section 4.2, we describe an enclave-based database. In Section 4.3, we discuss the first database system that provides data security at an industrial-strength level. In Section 4.4, we delve into the oblivious query processing, which helps to protect access patterns.

The secure SQL query schemes based on trusted hardware utilise the Trusted Execution Environments (Sabt et al., Citation2015) (TEE) that protect private data at the hardware level. Intel SGX is the most popular Trusted Execution Environment that provides secure data processing inside the cloud. Enclaves in SGX are protected regions in CPUs, where a remotely attested piece of code can run without interference from a potentially adversarial hypervisor and OS.

4.1. TrustedDB

TrustedDB (Bajaj & Sion, Citation2011) is the first trusted hardware-based approach that performs SQL queries securely by leveraging a server-hosted trusted hardware Secure Coprocessor (SCPU) during the query processing phase. The SQL query process is shown in Figure . It begins with the client discerning between private and public attributes in the database. In this case, private attributes are encrypted, with decryption restricted to either the client or the SCPU. Subsequently, the client encrypts the SQL query using the public key of the SCPU and transmits it to the server agent. Upon receiving the encrypted query, the server forwards it to the request handler within the SCPU. The request handler then decrypts the query and passes it to the query parser. Then, the query parser undertakes query optimisation and separates the components involving private attributes. These particular queries are directed to the database engine inside the SCPU for execution. Simultaneously, the remainder of the query is forwarded to untrustworthy servers to enhance overall performance.

Figure 5. TrustedDB.

TrustedDB is not only more cost-effective than software algorithm-based solutions for linear processing queries but also does not impose limitations on SQL functionality. In terms of security, TrustedDB is more secure than software-based schemes due to it only reveals the size pattern and access pattern. However, since TrustedDB places only a small portion of the query process engine in trusted hardware, it does not guarantee the SQL query's privacy, data integrity, and freshness (Priebe et al., Citation2018).

4.2. EnclaveDB

EnclaveDB (Priebe et al., Citation2018) addresses the shortcomings of the TrustedDB by building a secure DBMS. It not only safeguards the SQL query's privacy but also protects the integrity of metadata and intermediate data. The architecture is shown in Figure .

Figure 6. EnclaveDB.

EnclaveDB is similar to the TrustedDB in its query process for sensitive and public data. Initially, queries undergo pre-compilation on the client side. The resulting compiled queries are loaded into the enclave during the initialisation phase of the database. When a user's query traverses the proxy module, all query parameters are encrypted and transmitted to the SQL server. If the query pertains to sensitive data, it is executed by invoking the compiled queries stored inside the enclave. In contrast, if the query involves non-sensitive data, the SQL server outside the enclave manages it like a conventional database.

Due to the involvement of a trusted enclave, EnclaveDB supports rich SQL functionality and provides higher security protection than software algorithm-based schemes by hiding the equality and order pattern. However, EnclaveDB cannot protect access patterns due to the lack of obliviousness. In terms of performance, EnclaveDB shows high efficiency under the assumption of having a sufficiently large enclave memory. However, aligning with this requirement is currently challenging due to existing hardware limitations, which may affect practical applications.

4.3. Always encrypted DB

Microsoft's Always Encrypted (Antonopoulos et al., Citation2020) (AE) database, available in SQL Server, supports rich SQL functionality by leveraging the secure enclave and it supports two enclave types: Windows Virtualisation-based Security (Microsoft, Citation2023a) and Intel SGX (Costan & Devadas, Citation2016). It is the first industrial-strength database system that provides data confidentiality.

AE's SQL query process is shown in Figure . It begins with a client submitting a plaintext query to the driver, which encrypts query parameters using corresponding keys and sends them to the database engine. Then the system decrypts the data in the enclave and completes the SQL query. AE employs a two-level key system: the column encryption key (CEK) and the column master key (CMK). CEK encrypts the data in the column, and a CMK encrypts one or more CEKs. CMKs are securely stored in a trusted key store outside the database engine, such as the Windows certificate store or Azure Key Vault (Microsoft, Citation2023b). To execute a secure SQL query, the process involves accessing the key store, encrypting parameters, determining the need for an enclave, performing enclave computation, and decrypting query results using the CEK.

Figure 7. Always encrypted DB.

AE supports rich SQL functionality including equality queries, joins, grouping, aggregation, and string pattern matching, and the involvement of a trusted enclave helps improve the security level. However, as with TrustedDB and EnclaveDB, security leaves much to be desired as obliviousness is not provided. Additionally, it is worth noting that AE does not provide data integrity and protection against denial-of-service attacks (Peng & Liu, Citation2007).

4.4. ObliDB

The software algorithms-based approaches exhibit inherent vulnerabilities that can lead to the leakage of access patterns. While TEE offers data isolation in trusted hardware-based schemes, it remains compromised as adversaries can trace accessed memory, thereby exposing the access pattern within the enclave. Consequently, TrustedDB, EnclaveDB, and AE prove inadequate in safeguarding access patterns from potential side-channel attacks (Van Schaik et al., Citation2021). Currently, there are already emerged schemes (Arasu & Kaushik, Citation2013; Eskandarian & Zaharia, Citation2017; Krastnikov et al., Citation2020; Zheng et al., Citation2017) that start to focus on providing obliviousness.

The oblivious RAM (ORAM) proposed by Goldreich and Ostrovsky (Goldreich & Ostrovsky, Citation1996) offers the capability to hide access patterns. The ORAM utilises “oblivious operations” which intentionally obfuscate the underlying memory access patterns. However, the impracticality of deploying ORAM in real cloud models arises due to its polylogarithmic computation complexity and size overhead (Bindschaedler et al., Citation2015). Oblivious query processing (Arasu & Kaushik, Citation2013) offers a series of oblivious query processing algorithms to protect access patterns but with poor performance.

ObliDB, introduced by Eskandarian and Zaharia (Citation2017), not only furnishes obliviousness across multiple database access methods but also exhibits superior speed and diminished overhead when compared to ORAM. The ObliDB's architecture is illustrated in Figure . Unlike EnclaveDB which stores tables in the enclave, ObliDB encrypts the tables and stores them in untrusted memory because the protected memory capacity in the enclave is not large enough. The core of its safeguarded access pattern is the deployment of oblivious query processing algorithms within secure enclaves. This ensures that memory blocks that contain sensitive data are uniformly accessed, preventing any unauthorised access or data breaches.

Figure 8. ObliDB.

ObliDB provides obliviousness for two primary data storage methods: flat linear tables and structured indexes. The former organises records in contiguous memory blocks, mandating access to all blocks for both read and write operations to obfuscate access patterns. Even in scenarios where a record fails to satisfy the predicate conditions in a query, a dummy write to the corresponding memory block becomes imperative to maintain uniformity in access patterns across the dataset. Contrarily, the structured index storage approach employs a combination of ORAM and the B+ tree structure to ensure obliviousness. All data resides within the leaf nodes of the B+ tree, ensuring each lookup accesses an equivalent number of nodes. Moreover, when a query entails a fixed number of memory accesses, ORAM steps in to mask the access patterns effectively.

In terms of SQL query functionality, ObliDB supports a spectrum of operations, including selection, join, aggregation, group by, and range queries, etc. Regarding security, ObliDB leaks the least compared to the other schemes discussed before and since it hides the access pattern, it only reveals the size of the input and output tables, providing the highest security level. However, since ObliDB also uses enclaves, its performance is also limited by the small amount of protected memory in the enclave.

Opaque (Zheng et al., Citation2017) also provides obliviousness by introducing distributed oblivious relational operators while supporting rich SQL functionality. However, ObliDB and Opaque can only support primary-foreign key joins in join operations. Krastnikov et al. (Citation2020) proposed an efficient oblivious join algorithm based on sorting networks that does not use expensive ORAM and specifically protects access patterns in equi-joins.

5. Hybrid approach: combine software and hardware

With the development of secure hardware technologies, there arises a current need to balance the functionality and security of encrypted database queries. Therefore, hybrid schemes have recently emerged (Cao et al., Citation2020; Zhu et al., Citation2021). They combine the advantages of both software and hardware approaches. This strategic convergence facilitates comprehensive application scenarios, accommodating diverse business contexts, including public and hybrid cloud environments, through seamless transitions between hardware and software modes. In this section, we present two cloud database products: Huawei's GaussDB (Section 5.1) and Alibaba's PolarDB (Section 5.2), which employ integrated hardware and software solutions to address the evolving landscape of secure data management.

5.1. GaussDB

Huawei's GaussDB (Zhu et al., Citation2021) achieves rich SQL functionality through the integration of cryptography algorithms in the software mode and confidential computation in the hardware mode. The software mode, operating in the insecure Rich Execution Environment (REE), includes data encryption and indexing, supporting common SQL queries like equality and range queries. On the other hand, the hardware mode, running in TEE, caters to complex queries such as aggregation and string pattern match. To facilitate cross-platform access to TEE resources, GaussDB utilises the SecGear (CitationChenmaodong & Huawei Co. Ltd, Citationn.d.) (a confidential computing security application development kit) as a virtual TEE, with current support for Intel SGX and ARM TrustZone (Pinto & Santos, Citation2019).

GaussDB's SQL query process is similar to AE which we discussed in Section 4.3. The driver parses the query, determines the encryption type, rewrites the query, and transmits it to the database engine. The engine then selects the execution mode based on query type and computational cost. When the software mode is selected, the cryptographic algorithm is directly invoked for query execution. While in the hardware mode, a trusted channel between the driver and trusted hardware is established for key exchange.

Although GaussDB ensures data confidentiality, it does not provide data integrity and freshness, and it does not prevent malicious adversaries from tampering with query results. Since GaussDB does not provide obliviousness, it's less secure than ObliDB, and side-channel attacks and access pattern attacks enable adversaries to learn much useful information.

5.2. PolarDB

Alibaba's encrypted database PolarDB (Cao et al., Citation2020) can integrate hardware and software, and supports both pure cryptography and trusted hardware technical solutions, which makes it easy for users to make their own choices based on business needs. Users are free to specify the encryption columns for each scheme, and non-encrypted data is not affected, which provides a fine-grained security and performance trade-off. At the hardware level, PolarDB supports Intel SGX and FPGA shield cards and provides rich SQL functions including numerical computation, string manipulation, and range queries. At the software level, pure cryptographic algorithms support few SQL functions such as equality and join query in a TEE-free environment.

PolarDB offers a flexible, high-performance, and secure database service with vast storage capacity by combining hardware and software. However, like other pure hardware solutions, PolarDB is not resistant to side-channel attacks and access pattern attacks.

6. Comparison

Currently, we have discussed software algorithm-based, trusted hardware-based, and hybrid schemes. To satisfy diverse user requirements regarding functionality and security in executing SQL queries over encrypted databases, we first utilise our evaluation model to compare some recent and classic schemes in Section 6.1. Then, we conduct a comprehensive analysis and comparison of software algorithm-based and trusted hardware-based schemes in Section 6.2. Eventually, we compare and assess the functionality and security aspects of schemes in the software mode which use different cryptographic algorithms in Section 6.3.

Table 2. Comparison of existing encrypted databases

6.1. Comparison of existing secure SQL query schemes

We utilise our proposed evaluation model to compare some recent and classic secure SQL query schemes from dimensions of SQL functionalities, technologies or primitives, potential leakages, and attacks they may face. The detailed evaluation results are shown in Table 2.

To comprehensively assess the security levels of different SQL query schemes, it is essential to evaluate two crucial metrics: the number of revealed equality of pairs and the availability of obliviousness. The number of revealed equality of pairs serves as a fundamental criterion for measuring the security strength of a scheme. A scheme with minimal or no equality of pair leakage protects the access pattern and ensures that the equality relation of some attribute between tuples is not accessible to attackers, avoiding numerous effective inference attacks. In contrast, a scheme that exhibits significant equality of pair leakage poses a higher risk of exposing relationships and patterns within the encrypted data, leading to privacy breaches. In addition, the availability of obliviousness in a SQL query scheme is another critical aspect to consider. A scheme providing obliviousness has the same access patterns for every data block, making it impossible for an attacker to obtain any useful information from the access pattern. In contrast, schemes without obliviousness are vulnerable to access pattern attacks. Therefore, SQL query schemes that exhibit minimal or no equality of pair leakage and offer strong obliviousness can achieve higher levels of security. We harmonise these factors into a scalar security assessment ranging from “low” to “higher”. These four security levels of SQL query schemes are as follows:

Low Security Level. The SQL query schemes classified under this level offer minimal protections and lack any form of obliviousness. These schemes reveal all equality of pairs within the tables, which means that the cloud server is privy to all access and equality patterns, whether they satisfy the SQL query or not. This level is susceptible to the most basic forms of inference attacks such as frequency analysis and $l_{p}$ -Optimisation due to its transparency in operation patterns and lack of significant protective layers.
Moderate Security Level. The SQL query schemes at this level do not provide obliviousness and only reveal partial equality of pairs, specifically the records that satisfy the SQL query. Common attacks at this level could include more sophisticated access pattern attacks that narrow down the focus to subset data revealed by queries.
High Security Level. High-security level schemes significantly reduce the leakage by not revealing any equality of pairs. SQL query schemes that are based on trusted hardware typically reside at this security level, offering enhanced privacy protection and demonstrating resilience against inference attacks. However, their lack of obliviousness leaves them susceptible to more complex side-channel attacks.
Higher Security Level. The highest level in our model, provides not only protection for all equality of pairs but also incorporates obliviousness. SQL query schemes under this category are the most secure, designed to withstand sophisticated attacks, including those leveraging indirect information such as frequency and access pattern. They obfuscate access patterns by using ORAM or differential privacy.

6.2. Comparative analysis of software and hardware-based schemes

We analyse and summarise the advantages and limitations of the software-based and hardware-based approaches from three dimensions of SQL functionality, security and performance, elucidating their application scenarios.

SQL Functionality. Secure SQL query schemes based solely on cryptographic tools have limitations in handling complex SQL queries. For example, OPE encryption schemes restrict functionality to range queries, lacking the complete set of operators available in a standard database query. In contrast, the data inside TEE is naturally secure, and hardware-based databases avoid building complex cryptographic schemes by executing SQL queries directly on plaintext within the trusted hardware. Therefore, they can support more complex SQL functionalities and computations such as joins, aggregations, order by, group by, etc.
Security. Security threats to software-based schemes primarily stem from leakage caused by cryptographic algorithms. For example, DET and OPE algorithms reveal equality and order patterns, respectively, which can be learned by attackers to launch effective inference attacks. Hardware-based schemes, while offering a higher security level, still present concerns which arise from inherent vulnerabilities in the hardware itself and the fact that trusted hardware often overlooks preventing side-channel attacks. Additionally, if vulnerabilities exist in the code running within the TEE, they could be exploited by an attacker to compromise the database's security. Despite these concerns, trusted hardware-based schemes generally maintain a higher security level compared to software algorithm-based schemes, for they reveal less exploitable information.
Efficiency. Hardware-based solutions demonstrate superior efficiency compared to software-based solutions for several reasons. In hardware-based approaches, data can be decrypted in an isolated and secure environment, such as an Enclave, and operations can be executed efficiently. In contrast, cryptographic primitive-based methods, like fully homomorphic encryption (Gentry, Citation2009), usually involve higher computational overhead (Cooney, Citation2009). These methods require complex mathematical operations to maintain the encrypted state of data during processing, which can lead to reduced performance. Therefore, in scenarios that require efficient computation and querying, hardware-based query solutions should be given priority.

The choice between hardware and software solutions should hinge on a judicious evaluation of the context. For environments where trusted hardware is available and requires complex SQL functionality, trusted hardware-based solutions shine. Software solutions are preferable in cases where only common SQL queries are required or where the deployment environment cannot be tightly controlled.

Moreover, an emerging trend is the symbiotic integration of software and hardware solutions, aiming to combine the flexibility of software algorithms with the robust security benefits of hardware. This combination can be particularly effective when dealing with complex query types or meeting stringent security requirements.

6.3. Comparative analysis of PPE, SSE and STE-based schemes

When delving into the domain of software algorithm-based schemes, one uncovers several cryptographic methods, each exhibiting unique characteristics in terms of the SQL functionality they support and the level of security they offer. In our comparative analysis, we dissect and elucidate the features of PPE, SSE, and STE, offering insight into the ideal application scenarios and the granularity of protection for each.

PPE-based schemes. The granularity of protection in PPE-based schemes is the lowest among these three types of schemes. This is because PPE schemes maintain some properties of the plaintext to allow specific types of queries. They are specialised for efficiency in specific types of queries, such as equality-match or range queries. It offers a compromised security level due to its property-preserving nature, which could lead to a variety of leakage-abuse attacks (Blackstone et al., Citation2019; Grubbs et al., Citation2017; Naveed et al., Citation2015). They are suitable for scenarios that require high query efficiency but do not pay special attention to data sensitivity.
SSE-based schemes. SSE-based schemes offer a medium level of granularity in protection. They facilitate a more extensive range of search-related operations without preserving the plaintext properties, which improves security over PPE. However, the search pattern and access pattern can still present a potential vector for exploitation (Islam et al., Citation2012; Y. Zhang et al., Citation2016). SSE-based schemes strike a balance between functionalities and security. They are well-suited to scenarios that require a broader search capability with moderate security requirements.
STE-based schemes. STE schemes provide the finest granularity of protection. They encrypt both the data and their structures, which prevents attackers from making inferences based on access or search patterns. This level of protection is the most comprehensive, but it also requires more computational resources, which may impact the efficiency of query processing. They are specifically tailored to handle structured data, enabling rich queries over encrypted databases that maintain the nuanced relationships inherent in complex data structures. They are indispensable for scenarios demanding high-security assurances paired with the need to perform queries on encrypted structured data, such as hierarchical data or graph-based datasets.

When choosing an appropriate cryptographic scheme, we need to have a deep understanding of the data's structure and sensitivity, the specific requirements of the SQL queries anticipated, and the security model necessary. For example, while PPE might suffice for non-sensitive and high-speed querying, STE is the first choice for databases that hold sensitive structured data and require comprehensive security.

7. Conclusion

In this work, we first propose an evaluation model to assess SQL functionality and security. Then, we classify secure SQL query schemes into three categories and analyse their architectures, advantages, and limitations. Finally, we compare hardware-based and software-based schemes from multiple aspects. Our comprehensive evaluation helps guide stakeholders in making informed decisions.

Based on the results analysed in this paper, there may be the following two future research directions:

Enhancements to TEE-Based Schemes. Future research should focus on optimising the enclave page cache (EPC) memory management. This is critical to enhance the performance of TEE-based schemes.
Efficient Access Pattern Hiding Structures. There is also a need to develop lightweight ORAM or new obfuscation methods. Such advancements are essential for improving efficiency in big data contexts.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Agrawal, R., Kiernan, J., Srikant, R., & Xu, Y. (2004). Order preserving encryption for numeric data. In Proceedings of the 2004 ACM SIGMOD international conference on management of data (pp. 563–574).
Google Scholar
Alsirhani, A., Bodorik, P., & Sampalli, S. (2017). Improving database security in cloud computing by fragmentation of data. In 2017 International conference on computer and applications (ICCA) (pp. 43–49).
Google Scholar
Amorim, I., & Costa, I. (2023). Leveraging searchable encryption through homomorphic encryption: A comprehensive analysis. Mathematics, 11(13), 2948. https://doi.org/10.3390/math11132948
Web of Science ®Google Scholar
Antonopoulos, P., Arasu, A., Singh, K. D., Eguro, K., Gupta, N., Jain, R., Kaushik, R., Kodavalla, H., Kossmann, D., Ogg, N., & Ramamurthy, R. (2020). Azure SQL database always encrypted. In Proceedings of the 2020 ACM SIGMOD international conference on management of data (pp. 1511–1525).
Google Scholar
Arasu, A., Blanas, S., Eguro, K., Kaushik, R., Kossmann, D., Ramamurthy, R., & Venkatesan, R. (2013). Orthogonal security with cipherbase. In Conference on innovative data systems research.
Google Scholar
Arasu, A., & Kaushik, R. (2013). Oblivious query processing. arXiv preprint arXiv:1312.4012.
Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., & Zaharia, M. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58. https://doi.org/10.1145/1721654.1721672
Web of Science ®Google Scholar
Bajaj, S., & Sion, R. (2011). TrustedDB: A trusted hardware based database with privacy and data confidentiality. In Proceedings of the 2011 ACM SIGMOD international conference on management of data (pp. 205–216).
Google Scholar
Barber, R., Lohman, G., Pandis, I., Raman, V., Sidle, R., Attaluri, G., Chainani, N., Lightstone, S., & Sharpe, D. (2014). Memory-efficient hash joins. Proceedings of the VLDB Endowment, 8(4), 353-364. https://doi.org/10.14778/2735496.2735499
Google Scholar
Bellare, M., Boldyreva, A., & O'Neill, A. (2007). Deterministic and efficiently searchable encryption. In Advances in cryptology-CRYPTO 2007: 27th annual international cryptology conference (pp. 535–552).
Google Scholar
Benabbas, S., Gennaro, R., & Vahlis, Y. (2011). Verifiable delegation of computation over large datasets. In Annual cryptology conference (pp. 111–131).
Google Scholar
Bindschaedler, V., Naveed, M., Pan, X., Wang, X. F., & Huang, Y. (2015). Practicing oblivious access on cloud storage: The gap, the fallacy, and the new way forward. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (pp. 837–849).
Google Scholar
Blackstone, L., Kamara, S., & Moataz, T. (2019). Revisiting leakage abuse attacks. IACR Cryptology ePrint Archive, 2019, 1175. https://doi.org/10.14722/ndss.2020.23103
Google Scholar
Boldyreva, A., Chenette, N., Lee, Y., & O'neill, A. (2009). Order-preserving symmetric encryption. In Advances in cryptology-EUROCRYPT 2009: 28th annual international conference on the theory and applications of cryptographic techniques (pp. 224–241).
Google Scholar
Boldyreva, A., Chenette, N., & O'Neill, A. (2011). Order-preserving encryption revisited: Improved security analysis and alternative solutions. In Advances in cryptology–CRYPTO 2011: 31st annual cryptology conference (pp. 578–595).
Google Scholar
Cao, W., Liu, Y., Cheng, Z., Zheng, N., Li, W., Wu, W., Ouyang, L., Wang, P., Wang, Y., Kuan, R., & Liu, Z. (2020). {POLARDB} meets computational storage: Efficiently support analytical workloads in {Cloud−Native} relational database. In 18th USENIX conference on file and storage technologies (FAST 20) (pp. 29–41).
Google Scholar
Carbunar, B., & Sion, R. (2011). Toward private joins on outsourced data. IEEE Transactions on Knowledge and Data Engineering, 24(9), 1699–1710. https://doi.org/10.1109/TKDE.2011.142
Web of Science ®Google Scholar
Cash, D., Jaeger, J., Jarecki, S., Jutla, C., Krawczyk, H., Roşu, M.-C., & Steiner, M. (2014). Dynamic searchable encryption in very-large databases: Data structures and implementation. IACR Cryptology ePrint Archive, 2014, 853. https://doi.org/10.14722/NDSS.2014.23264
Google Scholar
Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Roşu, M.-C., & Steiner, M. (2013). Highly-scalable searchable symmetric encryption with support for boolean queries. In Advances in cryptology–CRYPTO 2013: 33rd annual cryptology conference (pp. 353–373).
Google Scholar
Cash, D., Ng, R., & Rivkin, A. (2021). Improved structured encryption for SQL databases via hybrid indexing. In International conference on applied cryptography and network security (pp. 480–510).
Google Scholar
Chamberlin, D. D., & Boyce, R. F. (1974). SEQUEL: A structured English query language. In Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on data description, access and control (pp. 249–264).
Google Scholar
Chandra, A. K., & Merlin, P. M. (1977). Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the ninth annual ACM symposium on theory of computing (pp. 77–90).
Google Scholar
Chase, M., & Kamara, S. (2010). Structured encryption and controlled disclosure. In Advances in cryptology-ASIACRYPT 2010: 16th international conference on the theory and application of cryptology and information security (pp. 577–594).
Google Scholar
Chen, L., & Li, J., & Li, J. (2023). Toward forward and backward private dynamic searchable symmetric encryption supporting data deduplication and conjunctive queries. IEEE Internet of Things Journal, 10(19), 17408–17423. https://doi.org/10.1109/JIOT.2023.3274390
Web of Science ®Google Scholar
Chen, N., Li, J., Zhang, Y., & Guo, Y. (2022). Efficient CP-ABE scheme with shared decryption in cloud storage. IEEE Transactions on Computers, 71(1), 175–184. https://doi.org/10.1109/TC.2020.3043950
Web of Science ®Google Scholar
Chen, S., Li, J., Zhang, Y., & Han, J. (2023). Efficient revocable attribute-based encryption with verifiable data integrity. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2023.3325996
Google Scholar
Chen, X., Li, H., Li, J., Wang, Q., Huang, X., Susilo, W., & Xiang, Y. (2021). Publicly verifiable databases with all efficient updating operations. IEEE Transactions on Knowledge and Data Engineering, 33(12), 3729–3740. https://doi.org/10.1109/TKDE.2020.2975777
Web of Science ®Google Scholar
Chen, X., Li, J., Huang, X., Ma, J., & Lou, W. (2015). New publicly verifiable databases with efficient updates. IEEE Transactions on Dependable and Secure Computing12(5), 546–556. https://doi.org/10.1109/TDSC.2014.2366471
Web of Science ®Google Scholar
Chen, X., Li, J., Weng, J., Ma, J., & Lou, W. (2015). Verifiable computation over large database with incremental updates. IEEE Transactions on Computers, 65(10), 3184–3195. https://doi.org/10.1109/TC.2015.2512870
Web of Science ®Google Scholar
Chenmaodong and Huawei Co. Ltd. (n.d.). secGear. Retrieved February 22, 2021, from https://gitee.com/src-openeuler/secGear.
Google Scholar
Chu, C.-K., Zhu, W.-T., Han, J., Liu, J. K., Xu, J., & Zhou, J. (2013). Security concerns in popular cloud storage services. IEEE Pervasive Computing, 12(4), 50–57. https://doi.org/10.1109/MPRV.2013.72
Web of Science ®Google Scholar
Ciriani, V., di Vimercati, S. D. C., Foresti, S., Jajodia, S., Paraboschi, S., & Samarati, P. (2009). Keep a few: Outsourcing data while maintaining confidentiality. In Computer Security–ESORICS 2009: 14th European symposium on research in computer security (pp. 440–455).
Google Scholar
Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387. https://doi.org/10.1145/362384.362685
Web of Science ®Google Scholar
Cooney, M. (2009). IBM touts encryption innovation; new technology performs calculations on encrypted data without decrypting it. Computer World. https://www.computerworld.com/article/2526031/ibm-toutsencryption-innovation.html
Google Scholar
Costan, V., & Devadas, S. (2016). Intel SGX explained. IACR Cryptology ePrint Archive, 2016, 86.
Google Scholar
Curtmola, R., Garay, J., Kamara, S., & Ostrovsky, R. (2006). Searchable symmetric encryption: Improved definitions and efficient constructions. In Proceedings of the 13th ACM conference on computer and communications security (pp. 79–88).
Google Scholar
Damiani, E., Vimercati, S. D. C., Jajodia, S., Paraboschi, S., & Samarati, P. (2003). Balancing confidentiality and efficiency in untrusted relational DBMSs. In Proceedings of the 10th ACM conference on computer and communications security (pp. 93–102).
Google Scholar
Eltayesh, F. (2017). Verifiable outsourced database model: A game-theoretic approach [Ph.D. Dissertation]. Concordia University.
Google Scholar
Eskandarian, S., & Zaharia, M. (2017). Oblidb: Oblivious query processing using hardware enclaves. arXiv preprint arXiv:1710.00458.
Google Scholar
Espiritu, Z., Markatou, E. A., & Tamassia, R. (2022). Time-and space-efficient aggregate range queries over encrypted databases. Proceedings on Privacy Enhancing Technologies, 2022(4), 684–704. https://doi.org/10.56553/popets-2022-0128
Google Scholar
Fuller, B., Varia, M., Yerukhimovich, A., Shen, E., Hamlin, A., Gadepally, V., Shay, R., Mitchell, J. D., & Cunningham, R. K. (2017). Sok: Cryptographically protected database search. In 2017 IEEE symposium on security and privacy (SP) (pp. 172–191).
Google Scholar
Gentry, C. (2009). Fully homomorphic encryption using ideal lattices. In Proceedings of the forty-first annual ACM symposium on Theory of computing (pp. 169–178).
Google Scholar
Goh, E.-J. (2003). Secure indexes. IACR Cryptology ePrint Archive, 2003, 216.
Google Scholar
Goldreich, O., Micali, S., & Wigderson, A. (2019). How to play any mental game, or a completeness theorem for protocols with honest majority. In Providing sound foundations for cryptography: On the work of Shafi Goldwasser and Silvio Micali (pp. 307–328).
Google Scholar
Goldreich, O., & Ostrovsky, R. (1996). Software protection and simulation on oblivious RAMs. Journal of the ACM (JACM), 43(3), 431–473. https://doi.org/10.1145/233551.233553
Web of Science ®Google Scholar
Goyal, V., Pandey, O., Sahai, A., & Waters, B. (2006). Attribute-based encryption for fine-grained access control of encrypted data. In Proceedings of the 13th ACM conference on computer and communications security (pp. 89–98).
Google Scholar
Grubbs, P., Sekniqi, K., Bindschaedler, V., Naveed, M., & Ristenpart, T. (2017). Leakage-abuse attacks against order-revealing encryption. In 2017 IEEE symposium on security and privacy (SP) (pp. 655–672).
Google Scholar
Gu, Q., & Liu, P. (2007). Denial of service attacks. Handbook of Computer Networks: Distributed Networks, Network Planning, Control, Management, and New Trends and Applications, 3, 454–468. https://doi.org/10.1002/9781118256107
Google Scholar
Hahn, F., Loza, N., & Kerschbaum, F. (2019). Joins over encrypted data with fine granular security. In 2019 IEEE 35th international conference on data engineering (ICDE) (pp. 674–685).
Google Scholar
Hur, J., & Noh, D. K. (2010). Attribute-based access control with efficient revocation in data outsourcing systems. IEEE Transactions on Parallel and Distributed Systems, 22(7), 1214–1221. https://doi.org/10.1109/TPDS.2010.203
Web of Science ®Google Scholar
Islam, M. S., Kuzu, M., & Kantarcioglu, M. (2012). Access pattern disclosure on searchable encryption: Ramification, attack and mitigation. In Ndss (Vol. 20, p. 12).
Google Scholar
Jutla, C., & Patranabis, S. (2022). Efficient searchable symmetric encryption for join queries. In International conference on the theory and application of cryptology and information security (pp. 304–333).
Google Scholar
Kamara, S., & Moataz, T. (2018). SQL on structurally-encrypted databases. In Advances in cryptology–ASIACRYPT 2018: 24th international conference on the theory and application of cryptology and information security (pp. 149–180).
Google Scholar
Kerschbaum, F., Härterich, M., Grofig, P., Kohler, M., Schaad, A., Schröpfer, A., & Tighzert, W. (2013). Optimal re-encryption strategy for joins in encrypted databases. In Data and applications security and privacy XXVII: 27th annual IFIP WG 11.3 conference (pp. 195–210).
Google Scholar
Krastnikov, S., Kerschbaum, F., & Stebila, D. (2020). Efficient oblivious database joins. arXiv preprint arXiv:2003.09481.
Google Scholar
Lacharité, M.-S., & Paterson, K. G. (2015). A note on the optimality of frequency analysis vs. ℓp-optimization. IACR Cryptology ePrint Archive, 2015, 1158.
Google Scholar
Latif, R., Abbas, H., Assar, S., & Ali, Q. (2014). Cloud computing risk assessment: A systematic literature review. Future Information Technology: FutureTech 2013, 285–295. https://doi.org/10.1007/978-3-642-40861-8
Google Scholar
Li, J., & Chen, X., & Huang, X. (2015). New attribute–based authentication and its application in anonymous cloud access service. International Journal of Web and Grid Services, 11(1), 125–141. https://doi.org/10.1504/IJWGS.2015.067161
Web of Science ®Google Scholar
Li, J., Liu, Z., Chen, X., Xhafa, F., Tan, X., & Wong, D. S. (2015). L-EncDB: A lightweight framework for privacy-preserving data queries in cloud computing. Knowledge-Based Systems, 79, 18–26. https://doi.org/10.1016/j.knosys.2014.04.010
Web of Science ®Google Scholar
Li, J., Zhang, Y., Ning, J., Huang, X., Poh, G. S., & Wang, D. (2022). Attribute based encryption with privacy protection and accountability for CloudIoT. IEEE Transactions on Cloud Computing, 10(2), 762–773. https://doi.org/10.1109/TCC.2020.2975184
Web of Science ®Google Scholar
Li, M., Du, R., & Jia, C. (2021). A multi-user shared searchable encryption scheme supporting SQL query. In Security and privacy in new computing environments: Third EAI international conference (pp. 406–422).
Google Scholar
Li, M., Zhao, X., Chen, L., Tan, C., Li, H., Wang, S., Mi, Z., Xia, Y., Li, F., & Chen, H. (2023). Encrypted databases made secure yet maintainable. In 17th USENIX symposium on operating systems design and implementation (OSDI 23) (pp. 117–133).
Google Scholar
Lv, C., Wang, J., Sun, S.-F., Wang, Y., & Qi, S., & Chen, X. (2023). Towards practical multi-client order-revealing encryption: Improvement and application. IEEE Transactions on Dependable and Secure Computing, 1–16. https://doi.org/10.1109/tdsc.2023.3268652
Google Scholar
Mell, P., & Grance, T. (2009). Effectively and securely using the cloud computing paradigm. NIST, Information Technology Laboratory, 2(8), 304–311.
Google Scholar
Microsoft (2022). https://azure.microsoft.com/en-us/.
Google Scholar
Microsoft (2023a). Virtualization-based security (VBS). https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-vbs.
Google Scholar
Microsoft (2023b). What's new for azure key vault. https://learn.microsoft.com/en-us/azure/key-vault/general/whats-new.
Google Scholar
Mironov, I., Segev, G., & Shahaf, I. (2017). Strengthening the security of encrypted databases: Non-transitive joins. In Theory of cryptography: 15th international conference (pp. 631–661).
Google Scholar
Mishra, P., Poddar, R., Chen, J., Chiesa, A., & Popa, R. A. (2018). Oblix: An efficient oblivious search index. In 2018 IEEE symposium on security and privacy (SP) (pp. 279–296).
Google Scholar
Naveed, M., Kamara, S., & Wright, C. V. (2015). Inference attacks on property-preserving encrypted databases. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (pp. 644–655).
Google Scholar
Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. In International conference on the theory and applications of cryptographic techniques (pp. 223–238).
Google Scholar
Pandey, O., & Rouselakis, Y. (2012). Property preserving symmetric encryption. In Advances in cryptology–EUROCRYPT 2012: 31st annual international conference on the theory and applications of cryptographic techniques (pp. 375–391).
Google Scholar
Pang, H., & Ding, X. (2014). Privacy-preserving ad-hoc equi-join on outsourced data. ACM Transactions on Database Systems (TODS), 39(3), 1–40. https://doi.org/10.1145/2629501
Web of Science ®Google Scholar
Papadimitriou, A., Bhagwan, R., Chandran, N., Ramjee, R., Haeberlen, A., Singh, H., Modi, A., & Badrinarayanan, S. (2016). Big data analytics over encrypted datasets with seabed. In 12th USENIX symposium on operating systems design and implementation (OSDI 16) (pp. 587–602).
Google Scholar
Pappas, V., Krell, F., Vo, B., Kolesnikov, V., Malkin, T., Geol Choi, S., George, W., Keromytis, A., & Bellovin, S. (2014). Blind seer: A scalable private DBMS. In 2014 IEEE symposium on security and privacy (pp. 359–374).
Google Scholar
Pinto, S., & Santos, N. (2019). Demystifying arm trustzone: A comprehensive survey. ACM Computing Surveys (CSUR), 51(6), 1–36. https://doi.org/10.1145/3291047
Web of Science ®Google Scholar
Poddar, R., Boelter, T., & Popa, R. A. (2019). Arx: An encrypted database using semantically secure encryption. Proceedings of the VLDB Endowment, 12(11), 1664–1678. https://doi.org/10.14778/3342263.3342641
Web of Science ®Google Scholar
Popa, R. A., Redfield, C. M. S., Zeldovich, N., & Balakrishnan, H. (2011). CryptDB: Protecting confidentiality with encrypted query processing. In Proceedings of the twenty-third ACM symposium on operating systems principles (pp. 85–100).
Google Scholar
Popa, R. A., Stark, E., Valdez, S., Helfer, J., Zeldovich, N., & Balakrishnan, H. (2014). Building web applications on top of encrypted data using Mylar. In 11th USENIX symposium on networked systems design and implementation (NSDI 14) (pp. 157–172).
Google Scholar
Premkamal, P. K., Pasupuleti, S. K., & Alphonse, P. J. A (2019). A new verifiable outsourced ciphertext-policy attribute based encryption for big data privacy and access control in cloud. Journal of Ambient Intelligence and Humanized Computing, 10(7), 2693–2707. https://doi.org/10.1007/s12652-018-0967-0
Web of Science ®Google Scholar
Priebe, C., Vaswani, K., & Costa, M. (2018). EnclaveDB: A secure database using SGX. In 2018 IEEE symposium on security and privacy (SP) (pp. 264–278).
Google Scholar
Reddy, K. S., & Kumari, K. P. (2022). A scheme for verifying integrity of SQL query processing on encrypted databases. CVR Journal of Science and Technology, 22(1), 1–6.
Google Scholar
Sabt, M., Achemlal, M., & Bouabdallah, A. (2015). Trusted execution environment: What it is, and what it is not. In 2015 IEEE trustcom/BigDataSE/ISPA (Vol. 1, pp. 57–64).
Google Scholar
Shafieinejad, M., Gupta, S., Liu, J. Y., Karabina, K., & Kerschbaum, F. (2022). Equi-joins over encrypted data for series of queries. In 2022 IEEE 38th international conference on data engineering (ICDE) (pp. 1635–1648).
Google Scholar
Song, D. X., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceeding 2000 IEEE symposium on security and privacy (pp. 44–55).
Google Scholar
Sun, Y., Wang, S., & Li, H., & Li, F. (2021). Building enclave-native storage engines for practical encrypted databases. Proceedings of the VLDB Endowment, 14(6), 1019–1032. https://doi.org/10.14778/3447689.3447705
Web of Science ®Google Scholar
Tu, S., Kaashoek, M. F., Madden, S., & Zeldovich, N. (2013). Processing analytical queries over encrypted data. Proceedings of the VLDB Endowment, 6(5), 289–300. https://doi.org/10.14778/2535573.2488336
Google Scholar
Van Schaik, S., Minkin, M., Kwong, A., Genkin, D., & Yarom, Y. (2021). CacheOut: Leaking data on Intel CPUs via cache evictions. In 2021 IEEE symposium on security and privacy (SP) (pp. 339–354).
Google Scholar
Vinayagamurthy, D., Gribov, A., & Gorbunov, S. (2019). StealthDB: A scalable encrypted database with full SQL query support. Proceedings on Privacy Enhancing Technologies, 2019(3), 370–388. https://doi.org/10.2478/popets-2019-0052
Google Scholar
Wei, J., Chen, X., Huang, X., Hu, X., & Susilo, W. (2021). RS-HABE: Revocable-storage and hierarchical attribute-based access scheme for secure sharing of e-health records in public cloud. IEEE Transactions on Dependable and Secure Computing, 18(5), 2301–2315.
Web of Science ®Google Scholar
Zhang, Y., Katz, J., & Papamanthou, C. (2016). All your queries are belong to us: The power of {File−Injection} attacks on searchable encryption. In 25th USENIX security symposium (USENIX Security 16) (pp. 707–720).
Google Scholar
Zhang, Z., Chen, X., Li, J., Tao, X., & Ma, J. (2019). HVDB: A hierarchical verifiable database scheme with scalable updates. Journal of Ambient Intelligence and Humanized Computing, 10(8), 3045–3057. https://doi.org/10.1007/s12652-018-0757-8
Web of Science ®Google Scholar
Zheng, W., Dave, A., Beekman, J. G., Popa, R. A., Gonzalez, J. E., & Stoica, I. (2017). Opaque: An oblivious and encrypted distributed analytics platform. In 14th USENIX symposium on networked systems design and implementation (NSDI 17) (pp. 283–298).
Google Scholar
Zhu, J., Cheng, K., Liu, J., & Guo, L. (2021). Full encryption: An end to end encryption mechanism in GaussDB. Proceedings of the VLDB Endowment, 14(12), 2811–2814. https://doi.org/10.14778/3476311.3476351
Web of Science ®Google Scholar

SQL queries over encrypted databases: a survey