A Privacy Preserving Markov Model for Sequence Classification

Size: px
Start display at page:

Download "A Privacy Preserving Markov Model for Sequence Classification"

Transcription

1 A Privacy Preserving Markov Model for Sequence Classification Suxin Guo Department of Computer Science and Engineering SUNY at Buffalo Buffalo U.S.A. Sheng Zhong State Key Laboratory for Novel Software Technology Nanjing University Nanjing China Aidong Zhang Department of Computer Science and Engineering SUNY at Buffalo Buffalo U.S.A. ABSTRACT Sequence classification has attracted much interest in recent years due to its difference from the traditional classification tasks as well as its wide applications in many fields such as bioinformatics. As it is not easy to define specific features for sequence data as in traditional feature based classifications many methods have been developed to utilize the particular characteristics of sequences. One common way of classifying sequence data is to use probabilistic generative models such as the Markov model to learn the probability distribution of sequences in each class. One thing that should be considered in the research of sequence classification is the privacy issue. In many cases especially in the bioinformatics field the sequence data contains sensitive information which obstructs the mining of data. For example the DNA and protein sequences of individuals are highly sensitive and should not be released without protection. But in the real world data is usually distributed among different parties and for the parties training only with their own data may not give them strong enough models. This raises a problem when some parties each holding a set of sequences want to learn the Markov models on the union of their data but do not want to reveal their data to others due to the privacy concerns. In this paper we address this problem and propose a method to train the Markov models from the ones of the first order to the ones of order k where k > 1 on sequence data distributed among parties without revealing each party s private sequences to others. We apply the homomorphic encryption to protect the sensitive information. Categories and Subject Descriptors E.3 [Data]: Data Encryption public key cryptosystems; I.5.2 [Pattern Recognition]: Design Methodology classifier design and evaluation; J.3 [Computer Applications]: Life and Medical Science biology and genetics Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise to republish to post on servers or to redistribute to lists requires prior specific permission and/or a fee. BCB 13 September Washington DC USA Copyright 2013 ACM /13/09...$ General Terms Algorithms Security Keywords Data Security Markov Model Sequence Classification 1. INTRODUCTION Sequence classification has been a hot topic in many fields including bioinformatics text mining speech recognition and others. In this work we focus on the classification of symbolic sequences where a sequence is an ordered list of symbols drawn from a finite alphabet such as DNA and protein sequences. For example protein sequences are composed of symbols from an alphabet of 20 amino acids. The task of sequence classification is to train a classifier which assigns class labels to sequences. In traditional feature based classification tasks a sample typically has a vector of features that represents it. But for the sequence data the features are not explicit so that the traditional feature based classifiers cannot be directly used in sequence classification. Many methods have been developed to take advantage of the particular characteristics of sequences to improve the classification performance. One common way of doing this is to train probabilistic generative models on sequence data. Markov model is one of the most popular models because it well captures the probability distribution of each class and also because of its cost efficiency and decent accuracy. There is a problem in sequence classification that the privacy issues should be taken into consideration especially in the bioinformatics field. The DNA and protein sequences are usually collected from particular individuals and thus contain sensitive information regarding those people such as genetic markers for diseases [19]. Because of this the sequences are mostly anonymized. However even after the anonymization the sequences still suffer from the threat of re-identification. For example in many cases a sequence can be de-anonymized and linked to its human contributor by the recognition of certain markers [19]. The same type of sequence data is usually generated or collected by not only one organization. It is more likely that the data is distributed among different organizations. If the organizations just use their own data to learn the classifiers there is no privacy violation. But this is not practical because with the limited data the learned model may not be strong enough. A more reasonable way is that the organi-

2 zations collaborate with each other and learn the models on the union of their data. Here comes the privacy problem and no one is willing to reveal his/her data to others. For example since various kinds of cancer are related with the mutation in the human protein sequences medical institutes may collect both normal and mutated protein sequences from their patients and volunteer donators so that they can learn on the data. Then for new coming sequences the institutes can identify whether they have mutation or not. Obviously it is better for the institutes to cooperate with each other and learn on the union of their data because they can get stronger models in this way. But the private information within their sequences may stop them from sharing the data. In this paper we address the problem of learning Markov models on sequence data distributed among different parties with privacy concerns and propose a method to learn the models without revealing each party s sequences. We not only deal with ordinary Markov models of the first order but also extend the method to preserve privacy for the Markov models of order k where k is larger than one. The reason why we extend the method is that [2] has shown that Markov models with higher order can improve the accuracy of sequence classification. The rest of this paper is organized as follows: We present the related work in Section 2 and the technical preliminaries in Section 3 which includes the background knowledge about the Markov model and the cryptographic tools we need. The details of our method is explained in Section 4. In Section 5 we show the experimental results and finally Section 6 concludes the paper. 2. RELATED WORK In recent decades people have been gradually aware of the privacy problems lay in data analyzing methods. A lot of data mining and machine learning algorithms have been extended to be privacy preserving. Most of these approaches fall into two categories. The methods in the first category protect privacy by data perturbation techniques such as randomization [1 17] rotation [4] and resampling [16]. As the original data is perturbed this kind of methods usually suffer from certain accuracy loss. The approaches of the second category apply cryptographic techniques to protect data during the computations [23 10]. As the sensitive information is encrypted rather than changed in these approaches there is typically no accuracy loss. Our work is based on the second way and applies homomorphic encryptions to protect data. In the cryptographic category some secure building blocks are very commonly used such as secure sum [5] secure comparison [6 7] secure division [8] secure scalar product [5 8 12] secure matrix multiplication [ ] etc.. The data mining and machine learning algorithms that have been enhanced with privacy solutions include decision tree classification [23 28] k-means clustering [30 18] gradient descent methods [31] and others. Actually our work is not the first one that considers the privacy problem about the Markov model. [22] has proposed a method to outsource Markov chains securely without revealing sensitive information. But our problem setting is different from theirs. They consider the scenario that the Markov model has already been learned and known to one party. Another party has the test queries which are going to be tested against the model. Both of the two parties encrypt their own information and send them to an untrusted server which performs the testing procedure securely. While in our case the Markov model is not known at the beginning and our goal is to learn the model with training data distributed among different parties. All the computations are done by the data owners not a server. Hence the method from [22] cannot be directly applied to our setting. 3. TECHNICAL PRELIMINARIES 3.1 Markov Model For Sequence Classification We briefly introduce the Markov model and how it is used for sequence classification. We start with the ordinary first order Markov model and then explain the model of order k where k > Markov Model of the First Order We have a set of states of size m which is denoted by Σ and we can consider it as an alphabet. A Markov chain is a sequence of states with the Markov property which means each state is only dependent on its previous state not any others where each state is from the state alphabet. For a sequence S of length n we denote the i-th element in S by S i and the value of the i-th element by s i [2]. So that each s i is from the state alphabet. With the Markov property we have: P (S i+1 = s i+1 S i = s i S i 1 = s i 1... S 1 = s 1) = P (S i+1 = s i+1 S i = s i). That is the probability of state s i+1 given all the previous states is the same as the probability of state s i+1 given only state s i. Thus the probability of sequence S is: P (S) =P (S n = s n S n 1 = s n 1) P (S n 1 = s n 1 S n 2 = s n 2)... P (S 1 = s 1) n =P (S 1 = s 1) P (S i = s i S i 1 = s i 1). i=2 We simplify the notation of the above equation as follows: P (S) = P (s n s n 1)P (s n 1 s n 2)... P (s 1) = P (s n 1) i=2 P (si si 1). (1) To train the Markov models for sequence classification each element in the alphabet of the sequences is considered as a state. For example in the classification of protein sequence data each of the 20 amino acid is treated as a state and Σ is the set of amino acids of size 20. We need to calculate these probabilities: For any state s a in the state alphabet its prior probability is: P (s a) = count(s a) s j Σ count(sj) where count(s a) is the number of times s a appearing in the training set and s j Σ count(sj) is the sum of the number of times that all the states in the alphabet appear in the

3 training set which in this case is the total size of the training set or the sum of lengths of all the sequences in the set. For any two states s a and s b in the state alphabet the transition probability that s b happens given s a is: P (s b s a) = count(sa s b) count(s a) where count(s a s b ) is the number of times s a is followed by s b in the training set. After we calculate the transition probability of every pair of states in the alphabet Σ we can get an m by m transition matrix. Then the training process is completed. To test a sequence against a Markov model and examine how likely that this sequence is generated from this model we just need to follow Equation 1 and calculate the product of all the needed probabilities which can be found in the transition matrix and the priors. For each class such a Markov model is trained on the data of this class. Then to test a sequence and identify its class label we just need to calculate its probability against every class model and assign it to the class with the highest probability Markov Model of Order k The Markov model of order k is an extension of the Markov model of order 1 such that each state is dependent on its previous k states not just 1. With the extension the probability of sequence S is changed to: P (S) = P (s 1 s 2... s k ) n i=k+1 P (s i s i 1 s i 2... s i k ). In this case the priors are not the probabilities of every single state but the probabilities of every k-gram. Here a k- gram means a combination of k symbols from the alphabet. For any k-gram s 1... s k its prior probability is: P (s 1... s k ) = count(s1... s k) count(all k grams) where count(s 1... s k ) is the number of times that k-gram s 1... s k appears in the training set at any position and count(all k grams) is the sum of the number of times that all possible k-grams appear in the training set. For any state s a and any k-gram s 1... s k the transition probability that s a happens given s 1... s k is: P (s a s 1... s k ) = count(s1... s k s a) count(s 1... s k ) where count(s 1... s k s a) is the number of times s 1... s k is followed by s a in the training set. In this case the transition matrix is not of size m by m but of size m k by m because the number of all possible k- grams is m k. The following procedure is the same as the first order Markov model. We also train a model for each class and the test sequences are tested with every class model. 3.2 Privacy Protection of the Markov Model We assume that each party has a set of sequences and they want to learn the Markov models collaboratively on the union of their data. We develop our secure solution under the semi-honest model which is widely used in articles of this area [ ]. In this model the parties are assumed to be honest but curious which means that the parties follow the protocols correctly but they would try to derive the private information of others with the intermediate results they get during the execution of protocols. This is a reasonable assumption in the privacy preserving data mining problems because the goal of all the parties is to get the accurate mining results so they are not willing to corrupt the protocols and get invalid results. 3.3 Cryptographic Tools Homomorphic Cryptographic Scheme In this paper we apply an additive homomorphic asymmetric cryptographic system to perform the encryptions and decryptions of the data. In an asymmetric cryptographic system we have a pair of keys: a public key for encryption and a private key for decryption. We denote the encryption of integer x 1 by E(x 1). A cryptographic scheme is additive homomorphic if there are operators and that for any two integers x 1 x 2 and any constant a we have E(x 1 + x 2) = E(x 1) E(x 2) E(a x 1) = a E(x 1). This means with an additive homomorphic cryptographic system we can compute the encrypted sum of integers directly from the encryptions of these integers ElGamal Cryptographic system There are several additive homomorphic cryptographic schemes [32 26]. In this work we apply a variant of the ElGamal scheme [11] which is semantically secure under the Diffe-Hellman Assumption [3]. ElGamal cryptographic system is a multiplicative homomorphic asymmetric cryptographic system. With this system the encryption of an integer f is such a pair: E(f) = (f y r g r ) where g is a generator x is the private key y is the public key that y = g x and r is a random integer. We call the first part of the pair c 1 and the second part c 2 so that c 1 = f y r and c 2 = g r. To decrypt E(f) we compute s = c x 2 = g rx = g xr = y r. Then do c 1 s 1 = f y r y r and we can get the cleartext f. In the variant of ElGamal scheme we use the integer f is encrypted in such a way: E(f) = (g f y r g r ). The only difference between the original ElGamal scheme and this variant is that f in the first part is changed to g f. With the change this variant is an additive homomorphic cryptosystem such that: E(x 1 + x 2) = E(x 1) E(x 2) E(a x 1) = E(x 1) a. To decrypt E(f) we follow the same procedure as in the original ElGamal algorithm. But because of the change after the above decryption we get g f instead of f. To obtain f from g f we have to perform an exhaustive search which is to try every possible f and look for the one that matches g f.

4 Please note that the time needed for this exhaustive search is reasonable because we only need to search all possible values of the plaintext which is not a big range in our case. We assume that the private key is additively shared by all the parties and no party knows the complete private key. The parties need to coordinate with others to do the decryptions and the ciphertexts can be exposed to every party because no party can decrypt them without the help of others. The private key is shared in this way: Suppose there are two parties parties A and B. A has a part of private key x A and B has the other part x B such that x A + x B = x where x is the complete private key. In the decryption we need to compute s = c x 2 = c x A+x B 2 = c x A 2 c x B 2. Party A calculates s A = c x A 2 and party B calculates s B = c x B 2 so that s = s A s B. We need to do c 1 s 1 = c 1 (s A s B) 1 = c 1 s 1 A s 1 B. Party A computes c1 s 1 A and sends it to party B. Then party B computes c 1 s 1 A s 1 B = c1 s 1 = g f and sends it to A. In this way both parties can get the decrypted result. Here since the party B does its decryption part later it gets the final result earlier. If it does not send the result to A the decrypted result can only be known to party B. The order of the parties in the decryptions can be changed so if we need the result to be known to only one party the party should do its decryption later Secure Scalar Product Computation We apply the secure scalar product computation protocol in [12] to compute the scalar product of two vectors. Given the two d-dimensional vectors x = (x 1 x 2... x d ) from party A and y = (y 1 y 2... y d ) from party B the protocol securely computes the scalar product p A + p B = xy = x 1y 1 + x 2y x d y d that p A is with party A and p B is with party B Secure Logsum Computation In this work we also need the secure logsum computation proposed in [27]. The input are two d-dimensional vectors x = (x 1 x 2... x d ) which is from party A and y = (y 1 y 2... y d ) which is from party B such that x+y = log z = (log z 1 log z 2... log z d ). The output are two additive shares s A held by party A and s B held by party B that s A + s B = log( d i=1 zi) = log( d i=1 10x i+y i ). The basic idea of the secure logsum algorithm is: First party A computes vector 10 x q where q is a random number generated by A and party B computes vector 10 y. Second the two parties apply the secure scalar product protocol to calculate the scalar product of the two vectors 10 x q and 10 y. The result φ = d i=1 10x i+y i q is only known to party B. Finally party B computes s B = log φ = log( d i=1 10x i+y i ) q and party A has s A = q so that s A+s B = log( d i=1 10x i+y i ) = log( d i=1 zi). 4. PRIVACY PRESERVING MARKOV MODEL FOR SEQUENCE CLASSIFICATION In this section we present how to securely learn the Markov models for sequence classification on data distributed between two parties A and B. It can clearly be extended to the case when the number of parties is larger than two. For simplicity we just consider the two-party case here. We start with the first order Markov model and then extend it to the Markov model of order k where k > Markov Model of the First Order As mentioned in Section 3 the training of the Markov model for each class is to count the occurrences of single states and combinations of states in the class and calculate the prior and transition probabilities. Let C be the set of all class labels which is of size l. Then for each class value c j C we compute the prior probabilities of states and the transition probabilities. For any state s a in the state alphabet its prior probability in class c j is: P (s a c j) = count(sa cj) count(c j) where count(s a c j) is the number of times s a appearing in the sequences belonging to class c j and count(c j) is the sum of the number of times that all the states in the alphabet appear in the sequences belonging to class c j which in this case is the sum of lengths of all the sequences belonging to class c j. When the data is distributed between parties A and B we have: count(s a c j) = count A(s a c j) + count B(s a c j) where count A(s a c j) is the number of times s a appearing in the sequences belonging to class c j in the data of party A and count B(s a c j) is the number of times s a appearing in the sequences belonging to class c j in the data of party B. To get the total occurrence times of s a we need to add up the times it appears in both parties. Similarly we have: count(c j) = count A(c j) + count B(c j). So the prior probability of state s a in class c j is: P (s a c j) = count(sa cj) count(c j) = counta(sa cj) + countb(sa cj) count A(c j) + count B(c j) where count A(s a c j) and count A(c j) are held by party A and count B(s a c j) and count B(c j) are held by party B. Although the two parties can encrypt their own values and exchange them it is still hard to calculate P (s a c j) because an additively homomorphic cryptosystem does not support the secure computation of the division operation between two encrypted integers. So we need to calculate log P (s a c j) instead of P (s a c j) which turns the division into a substraction: log P (s a c j) = log(count A(s a c j) + count B(s a c j)) log(count A(c j) + count B(c j)) Then the problem becomes how to calculate log(a + b) where a is with party A and b is with party B securely. Here we need to utilize the secure logsum protocol which takes two d-dimensional vectors x = (x 1 x 2... x d ) from party A and y = (y 1 y 2... y d ) from party B as input where x + y = log z = (log z 1 log z 2... log z d ) and outputs two additive shares s A held by party A and s B held by party B that s A + s B = log( d i=1 zi) = log( d i=1 10x i+y i ). We feed the secure logsum protocol with such two vectors of 2-dimension: x = (log a 0) from party A and y = (0 log b) from party B. In this case x + y = log z = (log z 1 log z 2) =

5 (log a log b). Then the output of the secure logsum protocol should be s A + s B = log(z 1 + z 2) = log(a + b). Following this procedure log(count A(s a c j)+count B(s a c j)) and log(count A(c j) + count B(c j)) are calculated by the two parties with the secure logsum protocol and shared in this way: log(count A(s a c j) + count B(s a c j)) = s A 1 + s B 1 log(count A(c j) + count B(c j)) = s A 2 + s B 2 where s A 1 and s A 2 are held by party A and s B 1 and s B 2 are in party B. Then we have: log P (s a c j) = (s A 1 + s B 1 ) (s A 2 + s B 2 ) = (s A 1 s A 2 ) + (s B 1 s B 2 ). s A 1 s A 2 can be computed by party A and s B 1 s B 2 by party B. The two parties then exchange the two values and both of them can get log P (s a c j) and calculate P (s a c j). For any two states s a and s b in the state alphabet the transition probability that s b happens given s a in class c j is: P (s b s a c j) = count(sa s b c j) count(s a c j) where count(s a s b c j) is the number of times s a is followed by s b in the sequences belong to class c j. Following the same procedure as the prior probabilities both parties can get the transition probabilities securely: P (s b s a c j) = counta(sa s b c j) + count B(s a s b c j). count A(s a c j) + count B(s a c j) With all the prior probabilities and transition probabilities computed both parties can get the Markov models of every class. Since the models are known every party can test its own sequences against the models by itself. The training process of the privacy preserving Markov model of the first order is summarized in Algorithm Markov Model of Order k The training process of the Markov model of order k follows the same pattern as the training process of the Markov model of order 1. For any k-gram s 1... s k its prior probability in class c j is: P (s 1... s k c j) = count(s 1... s k c j) count(all k grams in c j) where count(s 1... s k c j) is the number of times that k- gram s 1... s k appears in the sequences belonging to class c j at any position and count(all k grams in c j) is the sum of the number of times that all possible k-grams appear in the sequences belonging to class c j. The two parties can compute the probability securely from their counts with the same method as in the training of the first order Markov model: P (s 1... s k c j) = count A(s 1... s k c j) + count B(s 1... s k c j) count A(all k grams in c j) + count B(all k grams in c j). Algorithm 1 Privacy Preserving Markov Model of Order 1 Input: Party A has a set of sequences D A and party B has a set of sequences D B; Output: The Markov models of every class where each model contains the prior probabilities of every state and the transition matrix; 1: for each class c j do 2: Party A counts the sum of the number of times that all the states in the alphabet appear in the sequences in D A that belong to class c j which is count A(c j) and party B counts count B(c j) from D B in the same way; 3: for each state s a in the state alphabet do 4: Party A counts the occurrence times of s a in the sequences in D A that belong to class c j count A(s a c j) and party B counts count B(s a c j) from D B in the same way; 5: Parties A and B jointly compute the logarithm of the prior probability of s a in c j log P (s a c j) with the counts they have under the help of the secure logsum protocol and then compute P (s a c j); 6: for each state s b in the state alphabet do 7: Party A counts the number of times s a is followed by s b in the sequences in D A that belong to class c j count A(s a s b c j) and party B counts count B(s a s b c j) from D B in the same way; 8: Parties A and B jointly compute the logarithm of the transition probability that s b happens given s a in class c j log P (s b s a c j) with the counts they have under the help of the secure logsum protocol and then compute P (s b s a c j); 9: end for 10: end for 11: end for For any state s a and any k-gram s 1... s k the transition probability that s a happens given s 1... s k in class c j is: P (s a s 1... s k c j) = count(s1... s k s a c j) count(s 1... s k c j) where count(s 1... s k s a c j) is the number of times s 1... s k is followed by s a in the sequences belonging to class c j. The probability can be computed by: P (s a s 1... s k c j) = count A(s 1... s k s a c j) + count B(s 1... s k s a c j). count A(s 1... s k c j) + count B(s 1... s k c j) The training process of the privacy preserving Markov model of order k is summarized in Algorithm EXPERIMENTS The experimental results are presented in this section. All the algorithms are implemented with the Crypto++ library in the C++ language and the communications between parties are implemented with socket API. The experiments are conducted on a Red Hat server with 16 x 2.27 GHz CPUs and 24G of memory. We use two real-world datasets to test our algorithms. The first dataset which is from [25] is a set of inorganic

6 Algorithm 2 Privacy Preserving Markov Model of Order k Input: Party A has a set of sequences D A and party B has a set of sequences D B; Output: The Markov models of every class where each model contains the prior probabilities of every k-gram and the transition matrix; 1: for each class c j do 2: Party A counts the sum of the number of times that all possible k-grams appear in the sequences in D A that belong to class c j which is count A(all k grams in c j) and party B counts count B(all k grams in c j) from D B in the same way; 3: for each possible k-gram s 1... s k do 4: Party A counts the occurrence times of s 1... s k in the sequences in D A that belong to class c j count A(s 1... s k c j) and party B counts count B(s 1... s k c j) from D B in the same way; 5: Parties A and B jointly compute the logarithm of the prior probability of s 1... s k in c j log P (s 1... s k c j) with the counts they have under the help of the secure logsum protocol and then compute P (s 1... s k c j); 6: for each state s a in the state alphabet do 7: Party A counts the number of times s 1... s k is followed by s a in the sequences in D A that belong to class c j count A(s 1... s k s a c j) and party B counts count B(s 1... s k s a c j) from D B in the same way; 8: Parties A and B jointly compute the logarithm of the transition probability that s a happens given s 1... s k in class c j log P (s a s 1... s k c j) with the counts they have under the help of the secure logsum protocol and then compute P (s a s 1... s k c j); 9: end for 10: end for 11: end for accuracy of the privacy preserving approach. The problem is that the cryptosystem we are using only support operations on non-negative integers. But in the secure logsum protocol we need to perform calculations like 10 x q and this may introduce real numbers which are not integers. When encrypting such real numbers we need to convert them into integers by multiplying them with a magnitude of 10 and then round the products to integers. After the decryption we divide the numbers by the magnitude to recover the original numbers. This is the only step that causes accuracy loss in this work. It is clear that the larger the magnitude we use the smaller the accuracy loss is. In Table 1 we show how the errors which represent the differences between the results of the privacy preserving approach and the results of the ideal centralized approach reduce when the magnitude increases. There are two kinds of errors: the probability errors and the classification errors. We define a probability error e p to be the relative error between a probability calculated with the privacy preserving approach p 1 and a probability calculated with the centralized approach p 2 such that e p = p 2 p 1 /p 2. For each prior probability and transition probability that is computed in the Markov models we can get such an error. The probability errors in Table 1 are the average of the errors calculated for each dataset and for each magnitude. The classification error is defined as e c = n e/n where n e is the number of test sequences classified to different classes by the two approaches and n is the total number of test sequences. Since each test sequence is classified to the class with the highest probability the comparisons among probabilities rather than the values of the probabilities themselves count more in the classification. Thus the classification result is not affected very much by the probability errors. We have the following observation from our experiment results in Table 1: Although there are probability errors the classification results are always correct in these two datasets. materials binding peptide sequences. It contains 25 quartzbiding peptide sequences each of which is either a strong binder or a weak binder. There are 10 strong binder sequences and 15 weak binder sequences. All of the sequences are of the same length which is 12. The second dataset is the SCOP (structural classification of proteins) dataset from [24]. The approach in [21] is applied to preprocess the data and we get proteins from seven families which are A B C D E F and G. Here we pick protein sequences from classes A and G to test our algorithm. Class A contains 23 sequences with different lengths. The length of the shortest one is 160 and the length of the longest one is 177. Class G consists of 20 protein sequences with lengths from 45 to 83. We present the performance of our algorithms in two aspects: the accuracy and the running time. Since our work only focuses on the privacy preservation of the Markov models we evaluate the accuracy of our approach by comparing the result of our privacy preserving approach with the result of the ideal centralized algorithm. Although protecting data with cryptographic approaches should not lose any information and thus the privacy preserving approach should give exactly the same result as the original algorithm there is a practical issue that reduces the Table 1: Errors in the Probabilities and the Classification Results Magnitude Probability Error Classification Error Dataset 1 Dataset 2 Dataset 1 Dataset Table 1 shows that when the magnitude becomes larger the probability errors becomes smaller. For people who want to learn perfect models it seems that a very large magnitude would be a good choice. But there is a problem that large magnitude also causes high computation cost and long training time so we need to find a balance between the accuracy and efficiency. Table 2 presents how the running time increases with the magnitude. We get these time durations by training a first order Markov model for one class on each of the two datasets. The running time is affected not only by the magnitude but also by the order of the Markov models. When the value of k increases the training time of a Markov model of order k also increases. Table 3 shows the training time of a Markov

7 Table 2: Running Time Affected by the Magnitude Magnitude Running Time Dataset 1 Dataset s 70s s 72s s 84s s 208s model of order k for the cases that k = 1 and k = 2. For each k the total training time of a model is t; the time of doing the secure logsum calculations is denoted by t l and the time of other communications is denoted by t c. All the times in Table 3 are obtained when the magnitude is set to Table 3: Running Time Affected by the Order Dataset Order 1 Order 2 t l t c t t l t c t Dataset 1 42s 34s 77s 830s 672s 1503s Dataset 2 50s 34s 84s 841s 673s 1515s We can find from Table 3 that when training a Markov model the time of doing the secure logsum calculations and other communications t l + t c plays a dominant role in the total time t. The time of the local operations such as the parties computing the counts in their own data is trivial. Hence the training time is not very relative to the size of the training data but is more relative to the value of k and the size of the alphabet m. This is because the number of secure logsum calculations and communications which dominates the overall time is determined by the number of probabilities to be computed including the prior probabilities and the transition probabilities. It can be seen in Section 4 that the calculation of each probability requires one secure logsum computation and some communications between parties. The number of probabilities to be computed is determined by k and m. The number of prior probabilities for a Markov model of order k is m k because we need to compute a prior probability for each k-gram and the number of all possible k-grams is m k. The number of transition probabilities is m k m because there is a transition probability for every k-gram and state pair. When the value of k increases the number of probabilities increases exponentially. Denote the number of probabilities when the value of k is i to be np i then np i+1 = np i m. Thus the training time when k = i + 1 should also be m times of the training time when k = i. Table 3 supports this conclusion that all the times t l t c and t in the case of order 2 are around 20 times of their counterparts in the case of order 1 where 20 is the size of the amino acid alphabet shared by the two datasets. On the other hand when the value of k is fixed the difference between the time costed to train a model in the two datasets is not significant though the sizes of the two datasets are very different. This is because that the size of the data does not affect the running time as much as the size of the alphabet does. With the above discussions we can see that it is not affordable for ordinary computers to train a Markov model of order k when k is very large. This problem lies not only in the privacy preserving solution but also in the original Markov model of order k [2]. Fortunately the Markov model of order k can give decent accuracy even when k is very small. 6. CONCLUSIONS AND FUTURE WORK In this paper we proposed a method that enables two parties to securely train Markov models of order k on the union of their sequence data without revealing either party s information to the other. We evaluated the method with two real-world datasets and shown that the information loss in our privacy preserving algorithm is very low. We also analyzed the running time of the algorithm. Although we focus on the sequence classification task here the proposed privacy preserving Markov model method can be extended and used in other fields and this will be our future work. 7. ACKNOWLEDGMENTS The materials published in this paper are partially supported by the National Science Foundation under Grants No No and No REFERENCES [1] R. Agrawal and R. Srikant. Privacy-Preserving Data Mining [2] C. Andorf A. Silvescu D. Dobbs and V. Honavar. Learning classifiers for assigning protein sequences to gene ontology functional families. In Proceedings of the Fifth International Conference On Knowledge Based Computer Systems (KBCS) [3] D. Boneh. The Decision Diffie-Hellman Problem volume 1423 pages Springer-Verlag [4] K. Chen and L. Liu. Privacy preserving data classification with rotation perturbation. In Proceedings of the Fifth IEEE International Conference on Data Mining ICDM 05 pages Washington DC USA IEEE Computer Society. [5] C. Clifton M. Kantarcioglu J. Vaidya X. Lin and M. Y. Zhu. Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newsletter 4(2): [6] I. Damgard M. Fitzi E. Kiltz J. B. Nielsen and T. Toft. Unconditionally Secure Constant-Rounds Multi-party Computation for Equality Comparison Bits and Exponentiation volume 3876 pages Springer [7] I. Damgard M. Geisler and M. Kroigard. Homomorphic encryption and secure comparison. International Journal of Applied Cryptography 1: [8] W. Du and M. Atallah. Privacy-Preserving Cooperative Statistical Analysis page 102. IEEE Computer Society [9] W. Du Y. Y. S. Han and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification volume 233. Lake Buena Vista Florida [10] W. Du and Z. Zhan. Building decision tree classifier on private data. Reproduction pages

8 [11] T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory 31(4): [12] B. Goethals S. Laur H. Lipmaa and T. Mielik?inen. On private scalar product computation for privacy-preserving data mining. Science 3506: [13] O. Goldreich. Foundations of Cryptography volume 1. Cambridge University Press [14] S. Han and W. K. Ng. Privacy-preserving linear fisher discriminant analysis. In Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining PAKDD 08 pages Berlin Heidelberg Springer-Verlag. [15] S. Han W. K. Ng and P. S. Yu. Privacy-preserving singular value decomposition IEEE 25th International Conference on Data Engineering pages [16] G. R. Heer. A bootstrap procedure to preserve statistical confidentiality in contingency tables. In Proceedings of the International Seminar on Statistical ConïňAdentiality pages [17] Z. Huang W. Du and B. Chen. Deriving private information from randomized data. Proceedings of the 2005 ACM SIGMOD international conference on Management of data SIGMOD 05 page [18] G. Jagannathan and R. N. Wright. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data pages ACM [19] S. Jha L. Kruger and V. Shmatikov. Towards practical privacy for genomic computation IEEE Symposium on Security and Privacy sp 2008 pages: [20] M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering 16(9): [21] A. Kumar and L. Cowen. Augmented training of hidden markov models to recognize remote homologs via simulated evolution. Bioinformatics 25(13): [22] P. Lin and K. S. Candan. Access-private outsourcing of markov chain and random walk based data analysis applications. In Proceedings of the 22nd International Conference on Data Engineering Workshops [23] Y. Lindell and B. Pinkas. Privacy preserving data mining. Journal of Cryptology 15(3): [24] A. G. Murzin S. E. Brenner T. Hubbard and C. Chothia. Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology [25] E. E. Oren C. Tamerler D. Sahin M. Hnilova U. O. S. Seker M. Sarikaya and R. Samudrala. A novel knowledge-based approach to design inorganic-binding peptides. Bioinformatics 23(21): [26] P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. Computer 1592: [27] P. Smaragdis and M. Shashanka. A framework for secure speech recognition. IEEE Transactions On Audio Speech And Language Processing 15(4): [28] Z. Teng and W. Du. A hybrid multi-group privacy-preserving approach for building decision trees. In Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining PAKDD 07 pages Berlin Heidelberg Springer-Verlag. [29] J. Vaidya and C. Clifton. Privacy-preserving outlier detection volume 41 pages IEEE [30] J. Vaidya W. Lafayette and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. Security pages [31] L. Wan W. K. Ng S. Han and V. C. S. Lee. Privacy-preservation for gradient descent methods. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining KDD 07 page [32] S. Zhong. Privacy-preserving algorithms for distributed mining of frequent itemsets. Information Sciences 177(2):

Privacy Preserving Calculation of Fisher Criterion Score for Informative Gene Selection

Privacy Preserving Calculation of Fisher Criterion Score for Informative Gene Selection Privacy Preserving Calculation of Fisher Criterion Score for Informative Gene Selection Suxin Guo, Sheng Zhong, and Aidong Zhang Department of Computer Science and Engineering, State University of New

More information

Privacy Preserving Calculation of Fisher Criterion Score for Informative Gene Selection

Privacy Preserving Calculation of Fisher Criterion Score for Informative Gene Selection Privacy Preserving Calculation of Fisher Criterion Score for Informative Gene Selection Suxin Guo 1, Sheng Zhong 2, and Aidong Zhang 1 1 Department of Computer Science and Engineering, SUNY at Buffalo,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Multi-Party Privacy-Preserving Decision Trees for Arbitrarily Partitioned Data

Multi-Party Privacy-Preserving Decision Trees for Arbitrarily Partitioned Data INTERNATIONAL JOURNAL OF INTELLIGENT CONTROL AND SYSTEMS VOL. 12, NO. 4, DECEMBER 2007, 351-358 Multi-Party Privacy-Preserving Decision Trees for Arbitrarily Partitioned Data Shuguo HAN, and Wee Keong

More information

Privacy-Preserving Data Imputation

Privacy-Preserving Data Imputation Privacy-Preserving Data Imputation Geetha Jagannathan Stevens Institute of Technology Hoboken, NJ, 07030, USA gjaganna@cs.stevens.edu Rebecca N. Wright Stevens Institute of Technology Hoboken, NJ, 07030,

More information

An Overview of Homomorphic Encryption

An Overview of Homomorphic Encryption An Overview of Homomorphic Encryption Alexander Lange Department of Computer Science Rochester Institute of Technology Rochester, NY 14623 May 9, 2011 Alexander Lange (RIT) Homomorphic Encryption May 9,

More information

Privacy-preserving Data Mining

Privacy-preserving Data Mining Privacy-preserving Data Mining What is [data] privacy? Privacy and Data Mining Privacy-preserving Data mining: main approaches Anonymization Obfuscation Cryptographic hiding Challenges Definition of privacy

More information

Quantifying Privacy for Privacy Preserving Data Mining

Quantifying Privacy for Privacy Preserving Data Mining Quantifying Privacy for Privacy Preserving Data Mining Justin Zhan Carnegie Mellon University justinzh@rew.cmu.edu Abstract Data privacy is an important issue in data mining. How to protect respondents

More information

An Unconditionally Secure Protocol for Multi-Party Set Intersection

An Unconditionally Secure Protocol for Multi-Party Set Intersection An Unconditionally Secure Protocol for Multi-Party Set Intersection Ronghua Li 1,2 and Chuankun Wu 1 1 State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences,

More information

ANALYSIS OF PRIVACY-PRESERVING ELEMENT REDUCTION OF A MULTISET

ANALYSIS OF PRIVACY-PRESERVING ELEMENT REDUCTION OF A MULTISET J. Korean Math. Soc. 46 (2009), No. 1, pp. 59 69 ANALYSIS OF PRIVACY-PRESERVING ELEMENT REDUCTION OF A MULTISET Jae Hong Seo, HyoJin Yoon, Seongan Lim, Jung Hee Cheon, and Dowon Hong Abstract. The element

More information

Privacy-Preserving Linear Programming

Privacy-Preserving Linear Programming Optimization Letters,, 1 7 (2010) c 2010 Privacy-Preserving Linear Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences Department University of Wisconsin Madison, WI 53706 Department of Mathematics

More information

Notes for Lecture 17

Notes for Lecture 17 U.C. Berkeley CS276: Cryptography Handout N17 Luca Trevisan March 17, 2009 Notes for Lecture 17 Scribed by Matt Finifter, posted April 8, 2009 Summary Today we begin to talk about public-key cryptography,

More information

Privacy-Preserving Ridge Regression Without Garbled Circuits

Privacy-Preserving Ridge Regression Without Garbled Circuits Privacy-Preserving Ridge Regression Without Garbled Circuits Marc Joye NXP Semiconductors, San Jose, USA marc.joye@nxp.com Abstract. Ridge regression is an algorithm that takes as input a large number

More information

Question: Total Points: Score:

Question: Total Points: Score: University of California, Irvine COMPSCI 134: Elements of Cryptography and Computer and Network Security Midterm Exam (Fall 2016) Duration: 90 minutes November 2, 2016, 7pm-8:30pm Name (First, Last): Please

More information

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 33 The Diffie-Hellman Problem

More information

A Knapsack Cryptosystem Based on The Discrete Logarithm Problem

A Knapsack Cryptosystem Based on The Discrete Logarithm Problem A Knapsack Cryptosystem Based on The Discrete Logarithm Problem By K.H. Rahouma Electrical Technology Department Technical College in Riyadh Riyadh, Kingdom of Saudi Arabia E-mail: kamel_rahouma@yahoo.com

More information

Privacy Preserving Multiset Union with ElGamal Encryption

Privacy Preserving Multiset Union with ElGamal Encryption Privacy Preserving Multiset Union with ElGamal Encryption Jeongdae Hong 1, Jung Woo Kim 1, and Jihye Kim 2 and Kunsoo Park 1, and Jung Hee Cheon 3 1 School of Computer Science and Engineering, Seoul National

More information

Privacy-preserving cooperative statistical analysis

Privacy-preserving cooperative statistical analysis Syracuse University SURFACE Electrical Engineering and Computer Science College of Engineering and Computer Science 2001 Privacy-preserving cooperative statistical analysis Wenliang Du Syracuse University,

More information

CRYPTOGRAPHY AND NUMBER THEORY

CRYPTOGRAPHY AND NUMBER THEORY CRYPTOGRAPHY AND NUMBER THEORY XINYU SHI Abstract. In this paper, we will discuss a few examples of cryptographic systems, categorized into two different types: symmetric and asymmetric cryptography. We

More information

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2 Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets 1 Shaanxi Normal University, Xi'an, China E-mail: lizekun@snnu.edu.cn Shuyu Li Shaanxi Normal University, Xi'an,

More information

Gurgen Khachatrian Martun Karapetyan

Gurgen Khachatrian Martun Karapetyan 34 International Journal Information Theories and Applications, Vol. 23, Number 1, (c) 2016 On a public key encryption algorithm based on Permutation Polynomials and performance analyses Gurgen Khachatrian

More information

SELECTED APPLICATION OF THE CHINESE REMAINDER THEOREM IN MULTIPARTY COMPUTATION

SELECTED APPLICATION OF THE CHINESE REMAINDER THEOREM IN MULTIPARTY COMPUTATION Journal of Applied Mathematics and Computational Mechanics 2016, 15(1), 39-47 www.amcm.pcz.pl p-issn 2299-9965 DOI: 10.17512/jamcm.2016.1.04 e-issn 2353-0588 SELECTED APPLICATION OF THE CHINESE REMAINDER

More information

1 Number Theory Basics

1 Number Theory Basics ECS 289M (Franklin), Winter 2010, Crypto Review 1 Number Theory Basics This section has some basic facts about number theory, mostly taken (or adapted) from Dan Boneh s number theory fact sheets for his

More information

An Efficient and Secure Protocol for Privacy Preserving Set Intersection

An Efficient and Secure Protocol for Privacy Preserving Set Intersection An Efficient and Secure Protocol for Privacy Preserving Set Intersection PhD Candidate: Yingpeng Sang Advisor: Associate Professor Yasuo Tan School of Information Science Japan Advanced Institute of Science

More information

A Secure Protocol for Computing String Distance Metrics

A Secure Protocol for Computing String Distance Metrics A Secure Protocol for Computing String Distance Metrics Pradeep Ravikumar CyLab, Center for Automated Learning and Discovery, School of Computer Science, Carnegie Mellon University Pittsburgh PA, 15213,

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 11 February 21, 2013 CPSC 467b, Lecture 11 1/27 Discrete Logarithm Diffie-Hellman Key Exchange ElGamal Key Agreement Primitive Roots

More information

Definition: For a positive integer n, if 0<a<n and gcd(a,n)=1, a is relatively prime to n. Ahmet Burak Can Hacettepe University

Definition: For a positive integer n, if 0<a<n and gcd(a,n)=1, a is relatively prime to n. Ahmet Burak Can Hacettepe University Number Theory, Public Key Cryptography, RSA Ahmet Burak Can Hacettepe University abc@hacettepe.edu.tr The Euler Phi Function For a positive integer n, if 0

More information

Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification

Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification Syracuse University SURFACE Electrical Engineering and Computer Science LC Smith College of Engineering and Computer Science 1-1-2004 Privacy-Preserving Multivariate Statistical Analysis: Linear Regression

More information

4-3 A Survey on Oblivious Transfer Protocols

4-3 A Survey on Oblivious Transfer Protocols 4-3 A Survey on Oblivious Transfer Protocols In this paper, we survey some constructions of oblivious transfer (OT) protocols from public key encryption schemes. We begin with a simple construction of

More information

Lecture Notes, Week 6

Lecture Notes, Week 6 YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467b: Cryptography and Computer Security Week 6 (rev. 3) Professor M. J. Fischer February 15 & 17, 2005 1 RSA Security Lecture Notes, Week 6 Several

More information

Quantum-resistant cryptography

Quantum-resistant cryptography Quantum-resistant cryptography Background: In quantum computers, states are represented as vectors in a Hilbert space. Quantum gates act on the space and allow us to manipulate quantum states with combination

More information

Public Key Cryptography

Public Key Cryptography Public Key Cryptography Introduction Public Key Cryptography Unlike symmetric key, there is no need for Alice and Bob to share a common secret Alice can convey her public key to Bob in a public communication:

More information

Introduction to Cryptography. Lecture 8

Introduction to Cryptography. Lecture 8 Introduction to Cryptography Lecture 8 Benny Pinkas page 1 1 Groups we will use Multiplication modulo a prime number p (G, ) = ({1,2,,p-1}, ) E.g., Z 7* = ( {1,2,3,4,5,6}, ) Z p * Z N * Multiplication

More information

Extracting Witnesses from Proofs of Knowledge in the Random Oracle Model

Extracting Witnesses from Proofs of Knowledge in the Random Oracle Model Extracting Witnesses from Proofs of Knowledge in the Random Oracle Model Jens Groth Cryptomathic and BRICS, Aarhus University Abstract We prove that a 3-move interactive proof system with the special soundness

More information

A Secure Protocol for Computing String Distance Metrics

A Secure Protocol for Computing String Distance Metrics A Secure Protocol for Computing String Distance Metrics Pradeep Ravikumar CyLab, Center for Automated Learning and Discovery, School of Computer Science, Carnegie Mellon University Pittsburgh PA, 15213,

More information

6.080/6.089 GITCS Apr 15, Lecture 17

6.080/6.089 GITCS Apr 15, Lecture 17 6.080/6.089 GITCS pr 15, 2008 Lecturer: Scott aronson Lecture 17 Scribe: dam Rogal 1 Recap 1.1 Pseudorandom Generators We will begin with a recap of pseudorandom generators (PRGs). s we discussed before

More information

On the security of Jhanwar-Barua Identity-Based Encryption Scheme

On the security of Jhanwar-Barua Identity-Based Encryption Scheme On the security of Jhanwar-Barua Identity-Based Encryption Scheme Adrian G. Schipor aschipor@info.uaic.ro 1 Department of Computer Science Al. I. Cuza University of Iași Iași 700506, Romania Abstract In

More information

Benny Pinkas Bar Ilan University

Benny Pinkas Bar Ilan University Winter School on Bar-Ilan University, Israel 30/1/2011-1/2/2011 Bar-Ilan University Benny Pinkas Bar Ilan University 1 Extending OT [IKNP] Is fully simulatable Depends on a non-standard security assumption

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

8.1 Principles of Public-Key Cryptosystems

8.1 Principles of Public-Key Cryptosystems Public-key cryptography is a radical departure from all that has gone before. Right up to modern times all cryptographic systems have been based on the elementary tools of substitution and permutation.

More information

Secure Vickrey Auctions without Threshold Trust

Secure Vickrey Auctions without Threshold Trust Secure Vickrey Auctions without Threshold Trust Helger Lipmaa Helsinki University of Technology, {helger}@tcs.hut.fi N. Asokan, Valtteri Niemi Nokia Research Center, {n.asokan,valtteri.niemi}@nokia.com

More information

Security Protocols and Application Final Exam

Security Protocols and Application Final Exam Security Protocols and Application Final Exam Solution Philippe Oechslin and Serge Vaudenay 25.6.2014 duration: 3h00 no document allowed a pocket calculator is allowed communication devices are not allowed

More information

Random Multiplication based Data Perturbation for Privacy Preserving Distributed Data Mining - 1

Random Multiplication based Data Perturbation for Privacy Preserving Distributed Data Mining - 1 Random Multiplication based Data Perturbation for Privacy Preserving Distributed Data Mining - 1 Prof. Ja-Ling Wu Dept. CSIE & GINM National Taiwan University Data and User privacy calls for well designed

More information

AN INTRODUCTION TO THE UNDERLYING COMPUTATIONAL PROBLEM OF THE ELGAMAL CRYPTOSYSTEM

AN INTRODUCTION TO THE UNDERLYING COMPUTATIONAL PROBLEM OF THE ELGAMAL CRYPTOSYSTEM AN INTRODUCTION TO THE UNDERLYING COMPUTATIONAL PROBLEM OF THE ELGAMAL CRYPTOSYSTEM VORA,VRUSHANK APPRENTICE PROGRAM Abstract. This paper will analyze the strengths and weaknesses of the underlying computational

More information

Polynomial Interpolation in the Elliptic Curve Cryptosystem

Polynomial Interpolation in the Elliptic Curve Cryptosystem Journal of Mathematics and Statistics 7 (4): 326-331, 2011 ISSN 1549-3644 2011 Science Publications Polynomial Interpolation in the Elliptic Curve Cryptosystem Liew Khang Jie and Hailiza Kamarulhaili School

More information

Digital Signatures. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Digital Signatures. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay Digital Signatures Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay July 24, 2018 1 / 29 Group Theory Recap Groups Definition A set

More information

Lecture 4 Chiu Yuen Koo Nikolai Yakovenko. 1 Summary. 2 Hybrid Encryption. CMSC 858K Advanced Topics in Cryptography February 5, 2004

Lecture 4 Chiu Yuen Koo Nikolai Yakovenko. 1 Summary. 2 Hybrid Encryption. CMSC 858K Advanced Topics in Cryptography February 5, 2004 CMSC 858K Advanced Topics in Cryptography February 5, 2004 Lecturer: Jonathan Katz Lecture 4 Scribe(s): Chiu Yuen Koo Nikolai Yakovenko Jeffrey Blank 1 Summary The focus of this lecture is efficient public-key

More information

k-nearest Neighbor Classification over Semantically Secure Encry

k-nearest Neighbor Classification over Semantically Secure Encry k-nearest Neighbor Classification over Semantically Secure Encrypted Relational Data Reporter:Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU May 9, 2014 1 2 3 4 5 Outline 1. Samanthula B K, Elmehdwi

More information

Anonymous Credential Schemes with Encrypted Attributes

Anonymous Credential Schemes with Encrypted Attributes Anonymous Credential Schemes with Encrypted Attributes Bart Mennink (K.U.Leuven) joint work with Jorge Guajardo (Philips Research) Berry Schoenmakers (TU Eindhoven) Conference on Cryptology And Network

More information

Cryptanalysis of Patarin s 2-Round Public Key System with S Boxes (2R)

Cryptanalysis of Patarin s 2-Round Public Key System with S Boxes (2R) Cryptanalysis of Patarin s 2-Round Public Key System with S Boxes (2R) Eli Biham Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel biham@cs.technion.ac.il http://www.cs.technion.ac.il/~biham/

More information

Sharing a Secret in Plain Sight. Gregory Quenell

Sharing a Secret in Plain Sight. Gregory Quenell Sharing a Secret in Plain Sight Gregory Quenell 1 The Setting: Alice and Bob want to have a private conversation using email or texting. Alice Bob 2 The Setting: Alice and Bob want to have a private conversation

More information

Lecture 7: ElGamal and Discrete Logarithms

Lecture 7: ElGamal and Discrete Logarithms Lecture 7: ElGamal and Discrete Logarithms Johan Håstad, transcribed by Johan Linde 2006-02-07 1 The discrete logarithm problem Recall that a generator g of a group G is an element of order n such that

More information

1 What are Physical Attacks. 2 Physical Attacks on RSA. Today:

1 What are Physical Attacks. 2 Physical Attacks on RSA. Today: Today: Introduction to the class. Examples of concrete physical attacks on RSA A computational approach to cryptography Pseudorandomness 1 What are Physical Attacks Tampering/Leakage attacks Issue of how

More information

Hidden Number Problem Given Bound of Secret Jia-ning LIU and Ke-wei LV *

Hidden Number Problem Given Bound of Secret Jia-ning LIU and Ke-wei LV * 2017 2nd International Conference on Artificial Intelligence: Techniques and Applications (AITA 2017) ISBN: 978-1-60595-491-2 Hidden Number Problem Given Bound of Secret Jia-ning LIU and Ke-wei LV * DCS

More information

A Fair and Efficient Solution to the Socialist Millionaires Problem

A Fair and Efficient Solution to the Socialist Millionaires Problem In Discrete Applied Mathematics, 111 (2001) 23 36. (Special issue on coding and cryptology) A Fair and Efficient Solution to the Socialist Millionaires Problem Fabrice Boudot a Berry Schoenmakers b Jacques

More information

Thesis Proposal: Privacy Preserving Distributed Information Sharing

Thesis Proposal: Privacy Preserving Distributed Information Sharing Thesis Proposal: Privacy Preserving Distributed Information Sharing Lea Kissner leak@cs.cmu.edu July 5, 2005 1 1 Introduction In many important applications, a collection of mutually distrustful parties

More information

ENEE 457: Computer Systems Security 10/3/16. Lecture 9 RSA Encryption and Diffie-Helmann Key Exchange

ENEE 457: Computer Systems Security 10/3/16. Lecture 9 RSA Encryption and Diffie-Helmann Key Exchange ENEE 457: Computer Systems Security 10/3/16 Lecture 9 RSA Encryption and Diffie-Helmann Key Exchange Charalampos (Babis) Papamanthou Department of Electrical and Computer Engineering University of Maryland,

More information

Homomorphic Encryption. Liam Morris

Homomorphic Encryption. Liam Morris Homomorphic Encryption Liam Morris Topics What Is Homomorphic Encryption? Partially Homomorphic Cryptosystems Fully Homomorphic Cryptosystems Benefits of Homomorphism Drawbacks of Homomorphism What Is

More information

Quantum Symmetrically-Private Information Retrieval

Quantum Symmetrically-Private Information Retrieval Quantum Symmetrically-Private Information Retrieval Iordanis Kerenidis UC Berkeley jkeren@cs.berkeley.edu Ronald de Wolf CWI Amsterdam rdewolf@cwi.nl arxiv:quant-ph/0307076v 0 Jul 003 Abstract Private

More information

L7. Diffie-Hellman (Key Exchange) Protocol. Rocky K. C. Chang, 5 March 2015

L7. Diffie-Hellman (Key Exchange) Protocol. Rocky K. C. Chang, 5 March 2015 L7. Diffie-Hellman (Key Exchange) Protocol Rocky K. C. Chang, 5 March 2015 1 Outline The basic foundation: multiplicative group modulo prime The basic Diffie-Hellman (DH) protocol The discrete logarithm

More information

Activity Mining in Sensor Networks

Activity Mining in Sensor Networks MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Activity Mining in Sensor Networks Christopher R. Wren, David C. Minnen TR2004-135 December 2004 Abstract We present results from the exploration

More information

Privacy Preserving Techniques for Speech Processing

Privacy Preserving Techniques for Speech Processing Privacy Preserving Techniques for Speech Processing Manas A. Pathak December 1, 2010 Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee

More information

Lecture 1: Introduction to Public key cryptography

Lecture 1: Introduction to Public key cryptography Lecture 1: Introduction to Public key cryptography Thomas Johansson T. Johansson (Lund University) 1 / 44 Key distribution Symmetric key cryptography: Alice and Bob share a common secret key. Some means

More information

A new conic curve digital signature scheme with message recovery and without one-way hash functions

A new conic curve digital signature scheme with message recovery and without one-way hash functions Annals of the University of Craiova, Mathematics and Computer Science Series Volume 40(2), 2013, Pages 148 153 ISSN: 1223-6934 A new conic curve digital signature scheme with message recovery and without

More information

Elsevier Editorial System(tm) for Data & Knowledge Engineering Manuscript Draft

Elsevier Editorial System(tm) for Data & Knowledge Engineering Manuscript Draft Elsevier Editorial System(tm) for Data & Knowledge Engineering Manuscript Draft Manuscript Number: DATAK-D-08-00204 Title: Impossibility of Unconditionally Secure Scalar Products Article Type: Full Length

More information

Biological Systems: Open Access

Biological Systems: Open Access Biological Systems: Open Access Biological Systems: Open Access Liu and Zheng, 2016, 5:1 http://dx.doi.org/10.4172/2329-6577.1000153 ISSN: 2329-6577 Research Article ariant Maps to Identify Coding and

More information

Practical Fully Homomorphic Encryption without Noise Reduction

Practical Fully Homomorphic Encryption without Noise Reduction Practical Fully Homomorphic Encryption without Noise Reduction Dongxi Liu CSIRO, Marsfield, NSW 2122, Australia dongxi.liu@csiro.au Abstract. We present a new fully homomorphic encryption (FHE) scheme

More information

Breaking Plain ElGamal and Plain RSA Encryption

Breaking Plain ElGamal and Plain RSA Encryption Breaking Plain ElGamal and Plain RSA Encryption (Extended Abstract) Dan Boneh Antoine Joux Phong Nguyen dabo@cs.stanford.edu joux@ens.fr pnguyen@ens.fr Abstract We present a simple attack on both plain

More information

Theme : Cryptography. Instructor : Prof. C Pandu Rangan. Speaker : Arun Moorthy CS

Theme : Cryptography. Instructor : Prof. C Pandu Rangan. Speaker : Arun Moorthy CS 1 C Theme : Cryptography Instructor : Prof. C Pandu Rangan Speaker : Arun Moorthy 93115 CS 2 RSA Cryptosystem Outline of the Talk! Introduction to RSA! Working of the RSA system and associated terminology!

More information

An Introduction to Probabilistic Encryption

An Introduction to Probabilistic Encryption Osječki matematički list 6(2006), 37 44 37 An Introduction to Probabilistic Encryption Georg J. Fuchsbauer Abstract. An introduction to probabilistic encryption is given, presenting the first probabilistic

More information

Lecture 9 Julie Staub Avi Dalal Abheek Anand Gelareh Taban. 1 Introduction. 2 Background. CMSC 858K Advanced Topics in Cryptography February 24, 2004

Lecture 9 Julie Staub Avi Dalal Abheek Anand Gelareh Taban. 1 Introduction. 2 Background. CMSC 858K Advanced Topics in Cryptography February 24, 2004 CMSC 858K Advanced Topics in Cryptography February 24, 2004 Lecturer: Jonathan Katz Lecture 9 Scribe(s): Julie Staub Avi Dalal Abheek Anand Gelareh Taban 1 Introduction In previous lectures, we constructed

More information

ON DEFINING PROOFS OF KNOWLEDGE IN THE BARE PUBLIC-KEY MODEL

ON DEFINING PROOFS OF KNOWLEDGE IN THE BARE PUBLIC-KEY MODEL 1 ON DEFINING PROOFS OF KNOWLEDGE IN THE BARE PUBLIC-KEY MODEL GIOVANNI DI CRESCENZO Telcordia Technologies, Piscataway, NJ, USA. E-mail: giovanni@research.telcordia.com IVAN VISCONTI Dipartimento di Informatica

More information

Integer Least Squares: Sphere Decoding and the LLL Algorithm

Integer Least Squares: Sphere Decoding and the LLL Algorithm Integer Least Squares: Sphere Decoding and the LLL Algorithm Sanzheng Qiao Department of Computing and Software McMaster University 28 Main St. West Hamilton Ontario L8S 4L7 Canada. ABSTRACT This paper

More information

Privacy-preserving weighted Slope One predictor for Item-based Collaborative Filtering

Privacy-preserving weighted Slope One predictor for Item-based Collaborative Filtering Privacy-preserving weighted Slope One predictor for Item-based Collaborative Filtering Anirban Basu 1, Hiroaki Kikuchi 1, and Jaideep Vaidya 2 1 Graduate School of Engineering, Tokai University, 1117,

More information

Cryptographic Protocols Notes 2

Cryptographic Protocols Notes 2 ETH Zurich, Department of Computer Science SS 2018 Prof. Ueli Maurer Dr. Martin Hirt Chen-Da Liu Zhang Cryptographic Protocols Notes 2 Scribe: Sandro Coretti (modified by Chen-Da Liu Zhang) About the notes:

More information

Computers and Mathematics with Applications

Computers and Mathematics with Applications Computers and Mathematics with Applications 61 (2011) 1261 1265 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: wwwelseviercom/locate/camwa Cryptanalysis

More information

On The Security of The ElGamal Encryption Scheme and Damgård s Variant

On The Security of The ElGamal Encryption Scheme and Damgård s Variant On The Security of The ElGamal Encryption Scheme and Damgård s Variant J. Wu and D.R. Stinson David R. Cheriton School of Computer Science University of Waterloo Waterloo, ON, Canada {j32wu,dstinson}@uwaterloo.ca

More information

Secure Computation of Hidden Markov Models and Secure Floating-Point Arithmetic in the Malicious Model

Secure Computation of Hidden Markov Models and Secure Floating-Point Arithmetic in the Malicious Model Noname manuscript No. (will be inserted by the editor) Secure Computation of Hidden Markov Models and Secure Floating-Point Arithmetic in the Malicious Model Mehrdad Aliasgari Marina Blanton Fattaneh Bayatbabolghani

More information

THE CONJUGACY SEARCH PROBLEM IN PUBLIC KEY CRYPTOGRAPHY: UNNECESSARY AND INSUFFICIENT

THE CONJUGACY SEARCH PROBLEM IN PUBLIC KEY CRYPTOGRAPHY: UNNECESSARY AND INSUFFICIENT THE CONJUGACY SEARCH PROBLEM IN PUBLIC KEY CRYPTOGRAPHY: UNNECESSARY AND INSUFFICIENT VLADIMIR SHPILRAIN AND ALEXANDER USHAKOV Abstract. The conjugacy search problem in a group G is the problem of recovering

More information

What are we talking about when we talk about post-quantum cryptography?

What are we talking about when we talk about post-quantum cryptography? PQC Asia Forum Seoul, 2016 What are we talking about when we talk about post-quantum cryptography? Fang Song Portland State University PQC Asia Forum Seoul, 2016 A personal view on postquantum cryptography

More information

Secure Equality and Greater-Than Tests with Sublinear Online Complexity

Secure Equality and Greater-Than Tests with Sublinear Online Complexity Secure Equality and Greater-Than Tests with Sublinear Online Complexity Helger Lipmaa 1 and Tomas Toft 2 1 Institute of CS, University of Tartu, Estonia 2 Dept. of CS, Aarhus University, Denmark Abstract.

More information

Cryptographic Multilinear Maps. Craig Gentry and Shai Halevi

Cryptographic Multilinear Maps. Craig Gentry and Shai Halevi Cryptographic Multilinear Maps Craig Gentry and Shai Halevi China Summer School on Lattices and Cryptography, June 2014 Multilinear Maps (MMAPs) A Technical Tool A primitive for building applications,

More information

1 Secure two-party computation

1 Secure two-party computation CSCI 5440: Cryptography Lecture 7 The Chinese University of Hong Kong, Spring 2018 26 and 27 February 2018 In the first half of the course we covered the basic cryptographic primitives that enable secure

More information

CS 4770: Cryptography. CS 6750: Cryptography and Communication Security. Alina Oprea Associate Professor, CCIS Northeastern University

CS 4770: Cryptography. CS 6750: Cryptography and Communication Security. Alina Oprea Associate Professor, CCIS Northeastern University CS 4770: Cryptography CS 6750: Cryptography and Communication Security Alina Oprea Associate Professor, CCIS Northeastern University March 26 2017 Outline RSA encryption in practice Transform RSA trapdoor

More information

9 Knapsack Cryptography

9 Knapsack Cryptography 9 Knapsack Cryptography In the past four weeks, we ve discussed public-key encryption systems that depend on various problems that we believe to be hard: prime factorization, the discrete logarithm, and

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 3: Query Processing Query Processing Decomposition Localization Optimization CS 347 Notes 3 2 Decomposition Same as in centralized system

More information

Discrete logarithm and related schemes

Discrete logarithm and related schemes Discrete logarithm and related schemes Martin Stanek Department of Computer Science Comenius University stanek@dcs.fmph.uniba.sk Cryptology 1 (2017/18) Content Discrete logarithm problem examples, equivalent

More information

Solving Systems of Modular Equations in One Variable: How Many RSA-Encrypted Messages Does Eve Need to Know?

Solving Systems of Modular Equations in One Variable: How Many RSA-Encrypted Messages Does Eve Need to Know? Solving Systems of Modular Equations in One Variable: How Many RSA-Encrypted Messages Does Eve Need to Know? Alexander May, Maike Ritzenhofen Faculty of Mathematics Ruhr-Universität Bochum, 44780 Bochum,

More information

Compartmented Secret Sharing Based on the Chinese Remainder Theorem

Compartmented Secret Sharing Based on the Chinese Remainder Theorem Compartmented Secret Sharing Based on the Chinese Remainder Theorem Sorin Iftene Faculty of Computer Science Al. I. Cuza University Iaşi, Romania siftene@infoiasi.ro Abstract A secret sharing scheme starts

More information

The Cramer-Shoup Cryptosystem

The Cramer-Shoup Cryptosystem The Cramer-Shoup Cryptosystem Eileen Wagner October 22, 2014 1 / 28 The Cramer-Shoup system is an asymmetric key encryption algorithm, and was the first efficient scheme proven to be secure against adaptive

More information

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography Peter Schwabe October 21 and 28, 2011 So far we assumed that Alice and Bob both have some key, which nobody else has. How

More information

A Framework for Secure Speech Recognition

A Framework for Secure Speech Recognition A Framework for Secure Speech Recognition Paris Smaragdis *, Senior Member, IEEE and Madhusudana Shashanka, Student Member, IEEE Abstract In this paper we present a process which enables privacy-preserving

More information

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers Sarani Bhattacharya and Debdeep Mukhopadhyay Indian Institute of Technology Kharagpur PROOFS 2016 August

More information

Secure and Private Sequence Comparisons

Secure and Private Sequence Comparisons Secure and Private Sequence Comparisons Mikhail J. Atallah Department of Computer Sciences and CERIAS Purdue University West Lafayette, IN 47907 mja@cs.purdue.edu Florian Kerschbaum Department of Computer

More information

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrovsky. Lecture 7

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrovsky. Lecture 7 CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrovsky Lecture 7 Lecture date: Monday, 28 February, 2005 Scribe: M.Chov, K.Leung, J.Salomone 1 Oneway Trapdoor Permutations Recall that a

More information

Introduction to Modern Cryptography. Benny Chor

Introduction to Modern Cryptography. Benny Chor Introduction to Modern Cryptography Benny Chor RSA Public Key Encryption Factoring Algorithms Lecture 7 Tel-Aviv University Revised March 1st, 2008 Reminder: The Prime Number Theorem Let π(x) denote the

More information

Lecture 28: Public-key Cryptography. Public-key Cryptography

Lecture 28: Public-key Cryptography. Public-key Cryptography Lecture 28: Recall In private-key cryptography the secret-key sk is always established ahead of time The secrecy of the private-key cryptography relies on the fact that the adversary does not have access

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 11 October 7, 2015 CPSC 467, Lecture 11 1/37 Digital Signature Algorithms Signatures from commutative cryptosystems Signatures from

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Instructor: Michael Fischer Lecture by Ewa Syta Lecture 13 March 3, 2013 CPSC 467b, Lecture 13 1/52 Elliptic Curves Basics Elliptic Curve Cryptography CPSC

More information

Test Generation for Designs with Multiple Clocks

Test Generation for Designs with Multiple Clocks 39.1 Test Generation for Designs with Multiple Clocks Xijiang Lin and Rob Thompson Mentor Graphics Corp. 8005 SW Boeckman Rd. Wilsonville, OR 97070 Abstract To improve the system performance, designs with

More information