Privacy Preserving Set Intersection Protocol Secure Against Malicious Behaviors

Similar documents
An Efficient and Secure Protocol for Privacy Preserving Set Intersection

An Efficient and Secure Protocol for Privacy Preserving Set Intersection

Efficient Protocols for Privacy Preserving Matching Against Distributed Datasets

Privacy Preserving Set Intersection Based on Bilinear Groups

ANALYSIS OF PRIVACY-PRESERVING ELEMENT REDUCTION OF A MULTISET

An Unconditionally Secure Protocol for Multi-Party Set Intersection

Privacy Preserving Multiset Union with ElGamal Encryption

A Fair and Efficient Solution to the Socialist Millionaires Problem

Multiparty Computation

Parallel Coin-Tossing and Constant-Round Secure Two-Party Computation

Lecture Notes 20: Zero-Knowledge Proofs

Lecture 14: Secure Multiparty Computation

Lectures 1&2: Introduction to Secure Computation, Yao s and GMW Protocols

Single Database Private Information Retrieval with Logarithmic Communication

Benny Pinkas Bar Ilan University

Lecture 9 and 10: Malicious Security - GMW Compiler and Cut and Choose, OT Extension

Multi-Party Computation with Conversion of Secret Sharing

How many rounds can Random Selection handle?

Sealed-bid Auctions with Efficient Bids

Thesis Proposal: Privacy Preserving Distributed Information Sharing

Efficient Secure Auction Protocols Based on the Boneh-Goh-Nissim Encryption

Cut-and-Choose Yao-Based Secure Computation in the Online/Offline and Batch Settings

Extracting Witnesses from Proofs of Knowledge in the Random Oracle Model

Covert Multi-party Computation

Honest-Verifier Private Disjointness Testing without Random Oracles

Oblivious Evaluation of Multivariate Polynomials. and Applications

4-3 A Survey on Oblivious Transfer Protocols

ON DEFINING PROOFS OF KNOWLEDGE IN THE BARE PUBLIC-KEY MODEL

A Round and Communication Efficient Secure Ranking Protocol

Lecture 9 Julie Staub Avi Dalal Abheek Anand Gelareh Taban. 1 Introduction. 2 Background. CMSC 858K Advanced Topics in Cryptography February 24, 2004

One-Round Secure Computation and Secure Autonomous Mobile Agents

Impossibility and Feasibility Results for Zero Knowledge with Public Keys

Notes on Zero Knowledge

SELECTED APPLICATION OF THE CHINESE REMAINDER THEOREM IN MULTIPARTY COMPUTATION

Abstract. Often the core diculty in designing zero-knowledge protocols arises from having to

1/p-Secure Multiparty Computation without an Honest Majority and the Best of Both Worlds

Efficient Cryptographic Protocol Design Based on Distributed El Gamal Encryption

Introduction to Cryptography Lecture 13

1 Number Theory Basics

Generation of Shared RSA Keys by Two Parties

Lecture 6. 2 Adaptively-Secure Non-Interactive Zero-Knowledge

Efficient Private Bidding and Auctions with an Oblivious Third Party

From Secure MPC to Efficient Zero-Knowledge

Founding Cryptography on Smooth Projective Hashing

Short Exponent Diffie-Hellman Problems

Secret sharing schemes

On Two Round Rerunnable MPC Protocols

1 Secure two-party computation

1 What are Physical Attacks. 2 Physical Attacks on RSA. Today:

Complete Fairness in Secure Two-Party Computation

Covert Multi-party Computation

Private Intersection of Certified Sets

Homework 3 Solutions

Universally Composable Multi-Party Computation with an Unreliable Common Reference String

Privacy-preserving cooperative statistical analysis

CMSC 858K Introduction to Secure Computation October 18, Lecture 19

Privacy-Preserving Ridge Regression Without Garbled Circuits

One Round Threshold Discrete-Log Key Generation without Private Channels

On Achieving the Best of Both Worlds in Secure Multiparty Computation

A Generalization of Paillier s Public-Key System with Applications to Electronic Voting

Practical Verifiable Encryption and Decryption of Discrete Logarithms

Fang Song. Joint work with Sean Hallgren and Adam Smith. Computer Science and Engineering Penn State University

Secure Modulo Zero-Sum Randomness as Cryptographic Resource

Generalized Oblivious Transfer by Secret Sharing

1 Basic Number Theory

Cryptographic Asynchronous Multi-Party Computation with Optimal Resilience

Oblivious Transfer and Secure Multi-Party Computation With Malicious Parties

An Overview of Homomorphic Encryption

Lecture 19: Verifiable Mix-Net Voting. The Challenges of Verifiable Mix-Net Voting

Cryptography CS 555. Topic 23: Zero-Knowledge Proof and Cryptographic Commitment. CS555 Topic 23 1

An Efficient Transform from Sigma Protocols to NIZK with a CRS and Non-Programmable Random Oracle

SHADE: Secure HAmming DistancE computation from oblivious transfer

Threshold Undeniable RSA Signature Scheme

Type-based Proxy Re-encryption and its Construction

Secure Multiparty Computation from Graph Colouring

An Efficient Protocol for Fair Secure Two-Party Computation

Efficient Conversion of Secret-shared Values Between Different Fields

Lecture 11: Non-Interactive Zero-Knowledge II. 1 Non-Interactive Zero-Knowledge in the Hidden-Bits Model for the Graph Hamiltonian problem

Lecture 15 - Zero Knowledge Proofs

Error-Tolerant Combiners for Oblivious Primitives

Modulo Reduction for Paillier Encryptions and Application to Secure Statistical Analysis. Financial Cryptography '10, Tenerife, Spain

Complete Fairness in Multi-Party Computation Without an Honest Majority

A Framework for Non-Interactive Instance-Dependent Commitment Schemes (NIC)

6.897: Advanced Topics in Cryptography. Lecturer: Ran Canetti

Highly-Efficient Universally-Composable Commitments based on the DDH Assumption

The Theory and Applications of Homomorphic Cryptography

Eliminating Quadratic Slowdown in Two-Prime RSA Function Sharing

How To Shuffle in Public

Fast Three-Party Shared Generation of RSA Keys Without Distributed Primality Tests

Interactive Zero-Knowledge with Restricted Random Oracles

Efficient Secure Auction Protocols Based on the Boneh-Goh-Nissim Encryption

Multiparty Computation from Threshold Homomorphic Encryption

Does Parallel Repetition Lower the Error in Computationally Sound Protocols?

Benny Pinkas. Winter School on Secure Computation and Efficiency Bar-Ilan University, Israel 30/1/2011-1/2/2011

Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs

Secure Computation. Unconditionally Secure Multi- Party Computation

Secure Vickrey Auctions without Threshold Trust

MTAT Cryptology II. Zero-knowledge Proofs. Sven Laur University of Tartu

Unconditional Characterizations of Non-Interactive Zero-Knowledge

Transcription:

Privacy Preserving Set Intersection Protocol Secure Against Malicious Behaviors Yingpeng Sang, Hong Shen School of Computer Science The University of Adelaide Adelaide, South Australia, 5005, Australia {yingpeng.sang, hong.shen}@adelaide.edu.au Abstract When datasets are distributed on different sources, finding out their intersection while preserving the privacy of the datasets is a widely required task. In this paper, we address the Privacy Preserving Set Intersection (PPSI) problem, in which each of the parties learns no elements other than the intersection of their private datasets. We propose an efficient protocol in the malicious model, where the adversary may control arbitrary number of parties and execute the protocol for its own benefit. A related work in [12] has a correctness probability of ( 1 ) ( is the size of the encryption scheme s plaintext space), a computation complexity of O( 2 S 2 lg ) (S is the size of each party s data set). Our PPSI protocol in the malicious model has a correctness probability of ( 1 ) 1, and achieves a computation cost of O(c 2 S 2 lg ) (c is the number of malicious parties and c 1). Keywords : cryptographic protocol, privacy preservation, distributed datasets, set intersection, zero-knowledge proof. 1 Introduction For datasets distributed on different sources, intersection among these sets is always required to gain useful information. In this paper, we address the problem of Privacy Preserving Set Intersection (PPSI), in which there are ( 2) parties, each party P i (i = 1,..., ) has a set (or multiset) T i and T i = S, all parties want to learn the intersection T I = T 1... T, without gleaning any information other than those computed from a coalition of parties inputs and outputs. Our construction in this paper is based on the malicious model (([8])), where there is a probabilistic polynomialtime (PPT) bounded adversary maliciously controlling arbitrary number of parties. A malicious party can arbitrarily deviate from the protocol. Specifically, they may have the following malicious behaviors: refusing to participate in the protocol when the protocol is first invoked, arbitrarily substituting its original local input or intermediate computations, and aborting the protocol whenever obtaining the desired result. In this paper, we take measures such as zeroknowledge proofs to prevent the malicious behaviors. In [18] we have proposed an efficient PPSI protocol for the semi-honest model, and solved PPSI by efficiently constructing and evaluating polynomials whose roots are elements of the set intersection. This paper is an extended version of [18] to the malicious model. In comparison with a previous PPSI protocol in the malicious model from [12], our protocol has higher correctness and lower complexity. Specifically: 1) Our protocol is correct with probability ( 1 ) 1, where is the size of the input space of the Paillier s cryptosystem. The PPSI protocol in [12] is correct with probability ( 1 ). 2) Our PPSI protocol has a computation cost of O(c 2 S 2 lg ) (c is the number of malicious parties and c 1), while the protocol in [12] has a computation cost of O( 2 S 2 lg ). The remainder of the paper is organized as following: Section 2 discusses some related work. The problem of PPSI is formally defined in Section 3. Section 4 lists the basic tools for our protocol. Section 5 proposes the PPSI protocol for the malicious model. In Section 6 we analyze the security of the protocol. In Section 7 we compare our protocol with the related work considering the computation and communication costs. Section 8 concludes the whole paper. 2 Related Work General solutions have been provided for the SMC problem ([9], [19]). In general SMC, the function to be com-

puted is represented by a circuit, and every gate of the circuit is privately evaluated. However, when this general solution is used for a specific problem, the large size of the circuit and high cost of evaluating all gates will result in a much less efficient protocol than the non-private protocol for this problem. Therefore, many efficient private protocols for the specific problems have been proposed based on the specific properties of these problems. PPSI can be traced back to the specific problem of private equality test (PET) in two-party case, where each party has a single element and wants to test whether they are equal without publishing the elements. The problem of PET was considered in [1], [4], [14] and [15]. PET solutions can not be simply used for the multi-party cases of PPSI, otherwise too much sensitive information will be leaked, e.g. any two parties will know the intersection of their private sets. A solution for the multi-party case of PPSI was firstly proposed in [7]. The solution is based on evaluating polynomials representing elements in the sets. In [11], another solution for PPSI was proposed, in which each polynomial representing each set is multiplied by a random polynomial which has the same degree with the former polynomial. Both of their protocols are based on the semi-honest model. The PPSI protocol in [11] was fixed to be secure in the malicious model in [12]. In [18], we have got a PPSI protocol in the semi-honest model, with lower cost than [7] and [11]. We multiplied each polynomial representing each set by a random polynomial which has a low enough degree without compromising the security of the solution. We also multiplied the randomized polynomials by a nonsingular matrix to improve the correctness of our solution. In this paper, we extend our PPSI protocol to the malicious model. The PPSI protocol in [18] is only secure when the adversary controls one party, in this paper we strengthen the security of the protocol against the adversary who maliciously controls arbitrary number of parties, using zero-knowledge proofs and optimized degree in the random polynomials. 3 Problem Definition 3.1 Computational indistinguishability In SMC, the security in both types (semi-honest and malicious) of adversaries is argued by the computational indistinguishability of the views in the ideal model and real model ([8, 13]). Suppose an ensemble X = {X n } n be a sequence of random variables X n for n, which are ranging over strings of length poly(n). Two ensembles X = {X n } n and Y = {Y n } n are computationally indistinguishable, denoted by X c Y, if for every PPT algorithm A, and every c > 0, there exists an integer such that for all n, P r[a(x n ) = 1] P r[a(y n ) = 1] < 1 n. c P r[a(x) = 1] is the probability that A outputs 1 on input x. 3.2 Privacy Preserving Set Intersection Suppose all sets held by the parties are subsets of a common set T, firstly we should prevent the dictionary attack, in which an adversary may defraud the honest party of inputs using T. Therefore, we assume that each party holds a set (or multiset) of the same size S and S T, such that given two arbitrarily selected subsets T i and T i, the probability that an input a T i equals any input a T i is negligible (i.e., S T 0). We use T (i, j) to denote the j-th element on P i, T to denote (T 1,..., T ). c (1 c 1) is total number of colluded parties. I is the index set of c colluded parties, I = {i 1,..., i c }. I is the index set of honest parties, I = {1,..., } \ I. Z is the plaintext space of Paillier s cryptosystem. K is the input length of T (i, j), i.e. K = lg. In Definition 1 we define the intersection function f. Definition 1 (Intersection Function f) The intersection function f is an -ary function: ({0, 1} K S ) ({0, 1}) S, i.e., f(t ) = {f ij (T ) i = 1,...,, j = 1,..., S}, where f ij (T ) = 1 if T (i, j) T I, and f ij (T ) = 0 if T (i, j) / T I. Below we define the problem of privacy preserving set intersection in the malicious model. Definition 2 (PPSI in the malicious model) Let Π be an -party protocol for computing f. Let a pair (I, A), where A is a PPT algorithm, represent an adversary in the real model. The joint execution of Π under (I, A) in the real model, denoted REAL Π,I,A (T ), is defined as the output sequence resulting from the interaction among the parties in the execution of Π. Let a pair (I, B), where B is a PPT algorithm, represent an adversary in the ideal model, where there is an available trusted third party. The joint execution of f under (I, B) in the ideal model, denoted IDEAL f,i,b (T ), is defined as the output pair of B and the honest parties in the ideal execution. Π is said to securely solve the problem of privacy preserving set intersection in the malicious model, if for every PPT algorithm A (representing a real-model adversary strategy), there exists a PPT algorithm B (representing an ideal-model adversary strategy), such that {IDEAL f,i,b (T )} c {REAL Π,I,A (T )}. (1)

4 Basic Tools 4.1 Paillier s Cryptosystem We use Paillier s cryptosystem ([16]) for its following properties: 1) it is an additive homomorphic encryption scheme. Given two encryptions E(m 1 ) and E(m 2 ), E(m 1 + m 2 ) = E(m 1 ) E(m 2 ); 2) given an encryption E(m) and a scalar a, E(a m) = E(m) a ; 3) (, )-threshold decryption can be supported (by [5],[6]). The corresponding secret key is shared by a group of parties, and the decryption can t be performed by any single party, unless all parties act together. 4.2 Calculations on encrypted polynomials In our protocol, we need do some calculations on encrypted polynomials. For a polynomial f(x) = m i=0 a ix i, we use E(f(x)) to denote the sequence of encrypted coefficients {E(a i ) i = 0,..., m}. Given E(f(x)), where E( ) is an additive HE scheme (e.g. Paillier s), some computations can be made as following (which have also been used in [7] and [11]): 1) At a value v, we can evaluate E(f(x)): E(f(v)) = E(a m v m + a m 1 v m 1 +... + a 0 ) = E(a m ) vm E(a m 1 ) vm 1 E(a 0 ). 2) Given E(f(x)), we can compute E(c f(x)) = {E(a m ) c,..., E(a 0 ) c }. 3) Given E(f(x)) and E(g(x)), g(x) = m j=0 b jx j, we can compute E(f(x) + g(x)) = {E(a m )E(b m ),..., E(a 0 )E(b 0 )}. 4) Given f(x) and E(g(x)), we can compute E(f(x) g(x)). Suppose that g(x) = n j=0 b jx j, f(x) g(x) = m+n k=0 c kx k, then E(c k ) = E(a 0 b k + a 1 b k 1 +... + a k b 0 ) = E(b k ) a0 E(b 0 ) a k. a i or b j are treated as zero if i > m or j > n. 4.3 Zero-Knowledge Proofs 1) Proof of knowing the plaintext of an encryption E(a), P K{a : E(a)}. This is from [2]. 2) Proof of the nonsingularity of an encrypted matrix, P K{R : D = E(0) D = E(det(R)) R = E(R)}. R is an matrix, E(R) are the encrypted entries of R, det(r) is the determinant of R. This is from the proof of correct multiplications in [2] and private equality test in [10]. 3) Proof of correct matrix multiplication, P K{R : G = E(F R) F = E(F ) R = E(R)}. F = (f 1,..., f ) is a vector of polynomials. R is an matrix, and E(R) are the encrypted entries of R. This is also from the proof of correct multiplications in [2]. 4) Proof of correct polynomial evaluation, P K{v : E(f(v))}, given a common encrypted polynomial E(f). Suppose f(x) = m 0 a ix i, the proof can be constructed by parallel executions of m rounds of knowing plaintext v i and m rounds of proving correct multiplication E(a i v i ). 5 Protocol for Privacy Preserving Set Intersection in the malicious model 5.1 Main Idea Our protocol for PPSI is based on evaluating randomized polynomials representing the intersection. Basically it includes the following steps: 1) Each P i computes a polynomial f i to represent its set T i : f i = (x T (i, 1)) (x T (i, S)). Then it randomizes f i to be f i j=1 r i,j by the help of other parties, in which r i,j is a random polynomial generated by P j. The parties get a polynomial vector F = (f 1 j=1 r 1,j,..., f j=1 r,j). 2) Each P i randomly generates a nonsingular matrix R i, then F R 1, F R 1 R 2,..., F R 1 R are computed respectively on P 1, P 2,..., P. Finally, the parties get G = F R, where R = i=1 R i and R is nonsingular. Let R uv (1 u, v ) be the (u, v)-th entry of R. The resulting G is another polynomial vector (g 1,..., g ) as following: Efficiently we can construct the following proofs based on proofs of knowledge on statements about discrete logarithms ([2, 10]), the completeness and soundness of which have been argued respectively in the related work. Our constructions compose the basic proofs using AD ( ) operations, the closure of which is also argued in [3]. These proofs are used in our protocol for the malicious model. X X g 1 = f 1 r 1,j R 11 +... + f r,j R 1 j=1 j=1... X X g = f 1 r 1,j R 1 +... + f r,j R j=1 j=1 (2)

3) Each P i evaluates (g 1,..., g ) at the element T (i, j). If for k = 1,..., g k (T (i, j)) = 0, then P i determines T (i, j) T I, otherwise, P i determines T (i, j) / T I. 5.2 Malicious Behaviors A semi-honest party can not know another party s private inputs, under the condition that f i, F and G are all encrypted and only the final evaluations g k (T (i, j)) are decrypted. However, a malicious party may compromise the correctness of the protocol, by substituting his inputs with another party s, or replacing the intermediate computations. We prevent these malicious behaviors using zeroknowledge proofs in Section 4.3, under which each party should either behave in a semi-honest manner or being detected as cheating. We will not consider preventing those malicious behaviors such as independently and arbitrarily selecting inputs from the input space, and quitting the protocol at any step. Malicious behaviors considered in our protocol include the following: 1) In step 1), a malicious P i may just send E(f i ) received from P i to other parties. Then P i can easily substitute his inputs with P i s inputs, and intentionally influence the correctness of the protocol. Therefore, each party P i should prove that he knows at least one plaintext of E(f i ), with proof 1) of Section 4.3. Suppose f i = S l=0 t i,lx l. Because t i,s is always 1, we make each party to prove that he knows the plaintext of E(t i,s 1 ). If the proof is accepted, the verifiers also accept that the prover independently generates the inputs (i.e. the roots of f i ). A malicious P i may generate some coefficients by himself, and substitute the other coefficients with those received from another party. However, it s easy to verify that such kind of behaviors can not succeed in substituting any root of f i with another party s inputs. Thus, it is unnecessary for the verifiers to check coefficients other than the second one. 2) In step 1), a malicious party may encrypt a polynomial whose coefficients are all zeros. By equation 2) he will know the intersection of all parties, but the other parties will only know the intersection of all parties except the malicious one, and the correctness of the protocol will be compromised. To prevent this, the honest parties should reset the leading coefficient of polynomials received from others to be E(1). They can not prevent the other coefficients from being intentionally set as zeros, but this kind of behaviors can be treated as the malicious party has independently selected his own inputs by himself. 3) In step 2), a malicious party may generate a singular matrix R i, then if G(T (i, j)) = (0,..., 0), it s unnecessary that all f l (T (i, j)) = 0 for l = 1,...,, and the correctness of the protocol will be compromised. Therefore, each party P i should prove that R i it generates is nonsingular with proof 2) of Section 4.3. 4) In step 2), a malicious party may do multiplication with a matrix R i other than the committed matrix R i. Each party should prove that he does correct matrix multiplication with the matrix R i it has committed, with proof 3) of Section 4.3. 5) In step 3), a malicious party may evaluate some inputs in any f i from an honest party, other than in any g k, then he may guess some roots of f i. Each party should prove his evaluation is correct by proof 4) of Section 4.3. 5.3 The Protocol The protocol is given in Fig. 1. Below we illustrate some details of the steps. 5.3.1 The Degree of r i,j In Step 1), each P j (j i) computes E(f i r i,j ) and sends it back to P i, then no one knows the coefficients of j=1 r i,j without decryptions. In Step 2), E(G) = E(F R), and no one knows the entries of R. Let degree(r i,j ) and degree( j=1 r i,j) be d. d should be large enough to prevent the polynomial interpolation attacks of an adversary who controls c parties, and be small enough to reduce the complexity. The following lemma gives the optimized value of d. Lemma 1 To prevent polynomial interpolation attacks, the degree of each r i,j in G of Equation 2) should at least be (c 1)S when S 2 + 1 c, and at least be (c 1)S (1 + ) + 1+cS when S > 2 + 1 c. Proof If degree(r i,j ) = d, then degree(g k ) = S + d. The polynomial interpolation attacks of an adversary can be the following: 1) Calculating the interpolating polynomial g k given the cs points ( T (i, j), g k (T (i, j)) ). It s easy to verify that if the adversary knows the S + d + 1 coefficients of each g k, he will know also the coefficients of f i of the honest parties by another interpolation. If S + d + 1 cs + 1, i.e. d min = (c 1)S, this attack can be prevented. 2) Calculating the interpolating polynomial group G given the cs points ( T (i, j), g k (T (i, j)) ). If it s successful, the adversary directly gets the coefficients

Inputs: There are ( 2) parties, c (1 c 1) of them may collude in malicious manners. Each party has a private set of S elements, denoted T i. Each party holds the public key and its own share of the secret key for the threshold HE cryptosystem. Output: Each party P i knows TI = T 1... T. Steps: 1) Computing E(F): For i = 1,...,, 1.1) P i computes f i = (x T(i, 1)) (x T(i, S)) = S l=0 t i,lx l, encrypts the coefficients to get E(f i ), and sends E(f i ) to all the other parties with the proof PK{t i,s 1 : E(t i,s 1 )}. 1.2) For j = 1,...,, each P j sets the leading coefficient of E(f i ) to be E(1), generates a random polynomial r i,j as d m=0 α i,j,mx m, in which d is given by Lemma 1, α i,j,m is uniformly selected from Z. P j computes E(f i r i,j ) and sends it to all the other parties. 1.3) All P j for j = 1,..., compute E(f i j=1 r i,j). In the end, all P i for i = 1,..., get E(F) in which F = (f 1 j=1 r 1,j,..., f j=1 r,j). 2) Computing E(G) : For i = 1,...,, 2.1) P i generates a random nonsingular matrix R i, and sends R i = E(R i ) and D i = E(det(R i )) to all the other parties, with the proof of the nonsingularity of the encrypted matrix PK{R i : D i E(0) D i = E(det(R i )) R i = E(R i )}. 2.2) P i computes G i = E(FR 1 R i ), and sends it to all the other parties, with the proof of correct matrix multiplication PK{R i : G i = E(FR 1 R i ) G i 1 = E(FR 1 R i 1 ) R i = E(R i )}. In the end, all P i for i = 1,..., get E(G) = E(F i=1 R i). 3) Decryption and Evaluation : 3.1) Every P i evaluates T(i, j) in E(G), and prove the correct evaluation by PK{T(i, j) : G(T(i, j))} for j = 1,..., S. 3.2) P i decrypts the vector E(G(T(i, j))). If it s a zero vector, T(i, j) T I; otherwise, T(i, j) / TI. Figure 1. Protocol for Privacy Preserving Set Intersection in the malicious model of f i of the honest parties. In each g k, the malicious party has the same c unknown polynomials f i, the same unknown polynomials j=1 r i,j. He also has 2 unknown entries in the matrix R. j=1 r i,j can be treated as a single polynomial with degree d. Then in G, the malicious party has totally ( c)s + (d + 1) + 2 unknowns. Therefore if ( c)s + (d + 1) + 2 cs + 1, i.e. d min = (c 1)S (1 + ) + 1+cS, this attack can be prevented. Therefore, if (c 1)S (c 1)S (1+)+ 1+cS, i.e. S 2 + 1 c, then d min = (c 1)S; otherwise, d min = (c 1)S (1 + ) + 1+cS. 5.3.2 The Correctness of PPSI Protocol P i determines whether T (i, j) T I by decrypting the vector ( g 1 (T (i, j)),..., g (T (i, j)) ) to see whether it s a zero vector. The following lemma gives the correctness probability of the determination. Lemma 2 (The Correctness of PPSI Protocol) - Completeness If T (i, j) T I, ( g 1 (T (i, j)),..., g (T (i, j)) ) is a zero vector. - Soundness If ( g 1 (T (i, j)),..., g (T (i, j)) ) is a zero vector, T (i, j) T I with probability ( 1 ) 1. Proof Completeness: If T (i, j) T I, T (i, j) is a root of all f i for i = 1,...,, then F (T (i, j)) = (f 1 (T (i, j)) j=1 r 1,j,..., f (T (i, j)) j=1 r,j) = (0,..., 0), G(T (i, j)) = F (T (i, j))r = (0,..., 0). Soundness: Because R i for i = 1,..., are generated to be nonsingular, R = i=1 R i is also nonsingular. If G(T (i, j)) = (0,..., 0), a linear system F (T (i, j))r = (0,..., 0) can be made, and it has only one solution: F (T (i, j)) = (0,..., 0), i.e.,f l (T (i, j)) j=1 r l,j(t (i, j)) = 0 for l = 1,...,. Suppose j=1 r l,j = d m=0 α l,j,mx m. The coefficients of each r l,j are uniformly selected from Z, so each α l,j,m is uniformly distributed over Z, and d m=0 α l,j,mt (i, j) m is also uniformly distributed. The probability that T (i, j) is a root of j=1 r l,j is 1/. If T (i, j), l {1,..., }, f l (T (i, j)) j=1 r l,j(t (i, j)) = 0, because f l (T (i, j)) must be 0 when l = i, so the probability that l (l i) f l (T (i, j)) = 0 is p = (1 1/ ) 1. is the number of parties and practically. When is large enough, p 1, i.e. if G(T (i, j)) = (0,..., 0), overwhelmingly T (i, j) T I.

6 Security Analysis The security of our protocol can be proved by the idealvs.-real simulation paradigm as in Theorem 1. Theorem 1 Assuming the threshold Paillier encryption is semantically secure and the zero-knowledge proofs can t be forged, the PPSI Protocol in Fig. 1 is a secure protocol Π in Definition 2, which securely solves the problem of PPSI when the number of malicious parties is arbitrary. Proof Suppose A and B are respectively adversarial strategies in the real and ideal model, and they control the same set of parties P I (1 I 1) during the executions. A actually denotes the adversarial strategy in the PPSI protocol in Fig.1. We need to prove that the views of A and B are computationally indistinguishable, in order to prove the two joint executions {IDEAL f,i,b (T )} c {REAL Π,I,A (T )}. Due to the semantic security of Paillier s cryptosystem, A can not know the plaintexts of encryptions received from the honest parties. By Lemma 1, A can not know more elements than T I on the honest parties. By the security of zero-knowledge proofs, A can not extract information other than the statements in each zero-knowledge proof, and convince the honest parties on any false statement. Then we analyze the view of B. In the ideal model, the honest parties (denoted by P I, I = {1,..., } \ I) and malicious parties controlled by B compute f by the help of the trusted third party (TTP). B can be constructed using A as a subroutine as following: 1) Computing F : 1.1) B invokes A. A intentionally generates f j for each party P j in it, and sends E(f j ) to B. B sets the leading coefficient of E(f j ) to be E(1), and emulates the proof P K{t i,s 1 : E(t i,s 1 )} to check whether a verifier will be convinced that P j knows the plaintext of E(t i,s 1 ). If the verifier would be convinced, B sends f j to the TTP, otherwise he aborts. For the honest parties in P I, they send their polynomials directly to the TTP. 1.2) The TTP encrypts all f i for i = 1,..., and sends them to the honest parties and B. B invokes A again. B forwards all E(f i ) to A. For i = 1,...,, A generates a random polynomial r i,j for each party P j in it, computes E(f i r i,j ) and sends them to B. B sends all E(f i r i,j ) to the TTP. For the honest parties P j in P I, they generate a random polynomial r i,j for i = 1,..., and send all f i r i,j directly to the TTP. 1.3) The TTP decrypts E(f i r i,j ) received from B, sums f i r i,j for j = 1,...,, and gets F = (f 1 j=1 r 1,j,..., f j=1 r,j). 2) Computing G : The TTP sets R 0 to be an identity matrix. For j = 1,...,, the TTP computes G j = F R 0 R j 1, and sends E(G j ) to P j. The following two steps are repeated: 2.1) If P j is in B, B invokes A. A generates a random matrix R j for each P j, and sends R i = E(R j ) and D j = E(det(R j )) to B. B checks whether a verifier will be convinced that R j is nonsingular by emulating the proof of P K{R j : D j E(0) D j = E(det(R j )) R j = E(R j )}. Then A computes G j+1 = E(F R 0 R j ), and sends it to B. B checks whether a verifier will be convinced that the matrix multiplication is correct by emulating the proof P K{R j : G j+1 = E(F R 0 R j ) G j = E(F R 0 R j 1 ) R j = E(R j )}. If the verifier would be convinced by both proofs, B sends R j to the TTP, otherwise he aborts. 2.2) If P j is an honest party, he generates a nonsingular matrix R j and send it directly to the TTP. 3) Evaluation : Every P i sends its inputs to TTP. TTP evaluates each T (i, j) in G, and sends back G(T (i, j)) to each party. If G(T (i, j)) is zero vector, P i determines T (i, j) T I; otherwise, it determines T (i, j) / T I. According to the above procedure, in assumption of the Paillier encryption is semantically secure and the zero knowledge proofs can t be forged, the view of B is computationally indistinguishable from the view of A, so the joint executions {IDEAL f,i,b (T )} c {REAL Π,I,A (T )}. 7 Comparisons with Related Work 7.1 Complexity of PPSI Protocol 7.1.1 Complexity without Zero-knowledge Proofs Each Paillier s encryption and decryption requires a cost of 2lg mod-muls (modular multiplications mod 2 ). Each exponentiation has the same cost with the encryption. The length of each encryption is 2lg bits. We firstly do not consider the complexity of zero-knowledge proofs, and compute the computation cost based on mod-muls, and communication cost based on bits.

1) Computation Cost: In step 1.1) of our protocol, each P i computes S 2lg mod-muls for E(f i ). In step 1.2), each party P j computes (S+1)(d+1) 2lg + Sd mod-muls for E(f i r i,j ) for i = 1,...,. In step 1.3), each P i computes ( 1)(S + d + 1) modmuls to get the sum E(f i j=1 r i,j). In step 2.2), each party computes (S + d + 1) 2 2lg + ( 1)(S + d + 1) mod-muls for G i. In step 3.1) each party computes (S + d + 1)S 2lg + S(S + d) mod-muls for the evaluations. In step 3.2) each party computes S 2lg mod-muls for the decryptions. Thus totally each party needs [S + (S + 1)(d + 1) + (S + d + 1) 2 + (S + d + 2)S] 2lg + Sd + ( 1)(S +d+1)+( 1)(S +d+1)+s(s +d) mod-muls. By Lemma 1 d can be denoted by O(Sc), then the computation complexity of each party is O(cS 2 lg ). Because the computations in each step can be executed parallel on each party, the total computations of one party can denote the computation complexity of the whole protocol. 2) Communication Cost: Then in step 1.1) each party sends S encryptions. In step 1.2) each party sends ( 1)(S +d+1) encryptions. In step 1.3) each party sends (S + d + 1) encryptions to P 1. In step 2.2) each party sends (S + d + 1) encryptions. Thus the total communication cost of all parties is O(c 2 Slg ) bits. 7.1.2 Complexity with Zero-knowledge Proofs The proofs in Section 4.3 are based on the basic blocks, such as proof of knowing the plaintext, proof of correct multiplication, and private equality test, all of which have a computation cost of O(lg ) mod-muls, and a communication cost of O(lg ) bits, according to [2] and [10]. The major cost is on Step 3.1), which need O(cS 2 ) size proof for each party. Thus the protocol have a computation cost of O(cS 2 lg ), and a communication cost of O(c 2 S 2 lg ) bits. 7.1.3 Complexity after Speedup Speeding up techniques can be employed in the PPSI protocol. If all parties ensure that there is a possible coalition of c (1 c 1) malicious parties, in Step 1) of the protocol each E(f i ) can be randomized as E(f i c+1 j=1 r i,j) by sending E(f i ) to any c parties, instead of all the other 1 parties. In Step 2) E(G) can be computed as E(F c+1 i=1 R i). Then the computation cost can be reduced to O(c 2 S 2 lg ). 7.2 Related Work in [12] In Kissner s protocol for PPSI ([12]), a single polynomial F = l=1 f l k=1 r l,k is constructed and evaluated on each T (i, j). f l is a polynomial representing elements on P l, r l,k is a polynomial uniformly selected by P k. If F (T (i, j)) = 0 then P i determines T (i, j) T I, otherwise he determines T (i, j) / T I. The computation and communication costs after speedup are shown in Table 1 for comparisons with our PPSI protocol. Table 1. Comparisons of solutions for the PPSI problem in the malicious model Computation Cost Communication Cost Our Protocol O(c 2 S 2 lg ) O(c 2 S 2 lg ) Protocol in [12] O( 2 S 2 lg ) O( 2 S 2 lg ) In Kissner s protocol, wrong determinations happen in two kinds of instances: 1) If l i, k=1 r l,k(t (i, j)) = 0, then whether f l (T (i, j)) = 0 is uncertain. Each k=1 r l,k(t (i, j)) can be regarded as a random number over Z, so the probability that l i, r l,k (T (i, j)) 0 is (1 1 ) 1, and the probability that l i, k=1 r l,k(t (i, j)) = 0 is 1 (1 1 ) 1. 2) Even if l i, r l,k (T (i, j)) 0, F (T (i, j)) can be regarded as a random number over Z, and it s possible that the linear combination of nonzero f l (T (i, j)) is zero. The probability that l i, f l (T (i, j)) 0 but F (T (i, j)) = 0 is (1 1 ) 1 1. Thus, the probability for wrongly determining T (i, j) T I is 1 (1 1 ) 1 + (1 1 ) 1 1, and the correctness probability is ( 1 ). 8 Concluding Remarks We address the problem of Privacy Preserving Set Intersection (PPSI) among parties. The problem is solved by constructing polynomials representing elements in the set intersection, and evaluating the polynomials to determine whether an element is in the set intersection, without publishing the datasets on each party. Our protocol is proved to be overwhelmingly correct, and secure in the malicious model assuming there is a coalition of arbitrary c (1 c 1) malicious parties. In comparison with related work in [12], our protocol has less computation cost and achieves higher correctness in the determinations. References [1] F. Boudot, B. Schoenmakers and J. Traor e, A Fair and Efficient Solution to the Socialist Millionaires Problem, in Discrete Applied Mathematics, 111(1-2), pp. 23-36, 2001.

[2] R. Cramer, I. Damgard, and J. ielsen, Multiparty Computation from Threshold Homomorphic Encryption, in Advances in Cryptology - EUROCRYPT 2001, LCS, Springer, vol. 2045, pp. 280-300, 2001. [3] A. D. Santis, G. D. Crescenzo, G. Persiano, and M. Yung. On Monotone Formula Closure of SZK. in Proc. of the 35th Annual Symposium on Foundations of Computer Science, pp. 454-465, IEEE Computer Society, 1994. [4] R. Fagin, M. aor, and P. Winkler, Comparing Information without Leaking It, in Communications of the ACM, 39(5): 77-85, 1996. [5] P. Fouque, G. Poupard and J. Stern, Sharing Decryption in the Context of Voting or Lotteries, in Proc. of the 4th International Conference on Financial Cryptography, pp. 90-104, 2000. [6] P. Fouque and D. Pointcheval, Threshold Cryptosystems Secure against Chosen-ciphertext Attacks, in Proc. of Asiacrypt 2001, pp. 351-368, 2001. [15] M. aor and B. Pinkas, Oblivious Transfer and Polynomial Evaluation, in Proc. of the 31st Annual ACM Symposium on Theory of Computing, pp. 245-254, 1999. [16] P. Paillier, Public-key Cryptosystems based on Composite Degree Residuosity Classes, in Proc. of EUROCRYPT 1999, pp. 573-584, 2000. [17] D. Randall, Efficient Generation of Random onsingular Matrices, in Random Structures and Algorithms, vol. 4(1), pp. 111-118, 1993. [18] Y. Sang, H. Shen, Y. Tan and. Xiong, Efficient Protocols for Privacy Preserving Matching Against Distributed Datasets, accepted by the 8th International Conference on Information and Communications Security (ICICS 06), LCS, 2006. [19] A.C. Yao, Protocols for Secure Computations, in Proc. of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pp. 160-164, 1982. [7] M. Freedman, K. issim and B. Pinkas, Efficient Private Matching and Set Intersection, in Proc. of Eurocrypt 04, LCS, Springer, vol. 3027, pp. 1-19, 2004. [8] O. Goldreich, Foundations of Cryptography: Volume 2, Basic Applications, Cambridge University Press, 2004. [9] O. Goldreich, S. Micali, and A. Wigderson, How to Play Any Mental Game, in Proc. of 19th STOC, pp. 218-229, 1987. [10] M. Jakobsson and A. Juels, Mix and Match: Secure Function Evaluation via Ciphertexts, in ASI- ACRYPT 2000, pp 162-177, 2000. [11] L. Kissner and D. Song, Privacy-Preserving Set Operations, in Advances in Cryptology - CRYPTO 2005, LCS, Springer, vol.3621, pp. 241-257, 2005. [12] L. Kissner and D. Song, Privacy-Preserving Set Operations, in Technical Report CMU-CS-05-113, Carnegie Mellon University, June 2005. [13] Y. Lindell. Parallel Coin-Tossing and Constant- Round Secure Two-Party Computation. in Journal of Cryptology, 16(3): pp. 143-184, 2003. [14] H. Lipmaa, Verifiable Homomorphic Oblivious Transfer and Private Equality Test, in Advances in Cryptography ASIACRYPT 2003, pp. 416-433, 2003.