arxiv: v2 [cs.cr] 19 Jan 2011

Size: px
Start display at page:

Download "arxiv: v2 [cs.cr] 19 Jan 2011"

Transcription

1 Interactive Privacy via the Median Mechanism Aaron Roth Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 1517 Tim Roughgarden Stanford University 353 Serra Mall Stanford, CA arxiv: v [cs.cr] 19 Jan 011 ABSTRACT We define a new interactive differentially private mechanism the median mechanism for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for non-interactive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a super-polynomial factor, even in the non-interactive setting. Categories and Subject Descriptors F. [ANALYSIS OF ALGORITHMS AND PROB- LEM COMPLEXITY]: Miscellaneous General Terms Theory, Algorithms Supported in part by an NSF Graduate Research Fellowship. Portions of this work were done while visiting Stanford University. Supported in part by NSF CAREER Award CCF , an ONR Young Investigator Award, an ONR PECASE Award, an AFOSR MURI grant, and an Alfred P. Sloan Fellowship. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. STOC 10, June 5 8, 010, Cambridge, Massachusetts, USA. Copyright 010 ACM /10/06...$ INTRODUCTION Managing a data set with sensitive but useful information, such as medical records, requires reconciling two objectives: providing utility to others, perhaps in the form of aggregate statistics; and respecting the privacy of individuals who contribute to the data set. The field of private data analysis, and in particular work on differential privacy, provides a mathematical foundation for reasoning about this utilityprivacy trade-off and offers methods for non-trivial data analysis that are provably privacy-preserving in a precise sense. For a recent survey of the field, see Dwork [Dwo08]. More precisely, consider a domain X and database size n. A mechanism is a randomized function from the set X n of databases to some range. For a parameter α > 0, a mechanism M is α-differentially private if, for every database D and fixed subset S of the range of M, changing a single component of D changes the probability that M outputs something in S by at most an e α factor. The output of a differentially private mechanism (and any analysis or privacy attack that follows) is thus essentially independent of whether or not a given individual opts in or opts out of the database. Achieving differential privacy requires sufficiently noisy answers [DN03]. For example, suppose we re interested in the result of a query f a function from databases to some range that simply counts the fraction of database elements that satisfy some predicate ϕ on X. A special case of a result in Dwork et al. [DMNS06] asserts that the following mechanism is α-differentially private: if the underlying database is D, output f(d)+, where the output perturbation is drawn from the Laplace distribution Lap( 1 ) with nα density p(y) = nα exp( nα y ). Among all α-differentially private mechanisms, this one (or rather, a discretized analog of it) maximizes user utility in a strong sense [GRS09]. What if we care about more than a single one-dimensional statistic? Suppose we re interested in k predicate queries f 1,...,f k, where k could be large, even super-polynomial in n. A natural solution is to use an independent Laplace perturbation for each query answer [DMNS06]. To maintain α-differential privacy, the magnitude of noise has to scale linearly with k, with each perturbation drawn from Lap( k nα ). Put another way, suppose one fixes usefulness parameters ǫ, δ, and insists that the mechanism is (ǫ, δ)-useful, meaning that the outputs are within ǫ of the correct query answers with probability at least 1 δ. This constrains the magnitude of the Laplace noise, and the privacy parameter α now suffers linearly with the number k of answered queries. This

2 dependence limits the use of this mechanism to a sublinear k = o(n) number of queries. Can we do better than independent output perturbations? For special classes of queries like predicate queries, Blum, Ligett, and Roth [BLR08] give an affirmative answer (building on techniques of Kasiviswanathan et al. [KLN + 08]). Specifically, in [BLR08] the exponential mechanism of Mc- Sherry and Talwar [MT07] is used to show that, for fixed usefulness parameters ǫ, δ, the privacy parameter α only has to scale logarithmically with the number of queries. 1 This permits simultaneous non-trivial utility and privacy guarantees even for an exponential number of queries. Moreover, this dependence on log k is necessary in every differentially private mechanism (see the full version of [BLR08]). The mechanism in [BLR08] suffers from two drawbacks, however. First, it is non-interactive: it requires all queries f 1,...,f k tobegivenupfront, andcomputes(noisy)outputs of all of them at once. By contrast, independent Laplace output perturbations can obviously be implemented interactively, with the queries arriving online and each answered immediately. There is good intuition for why the noninteractive setting helps: outperforming independent output perturbations requires correlating perturbations across multiple queries, and this is clearly easier when the queries are known in advance. Indeed, prior to the present work, no interactive mechanism better than independent Laplace perturbations was known. Second, the mechanism in [BLR08] is inefficient. Here by efficient we mean has running time polynomial in n, k, and X ; Dwork et al. [DNR + 09] prove that this is essentially the best one could hope for (under certain cryptographic assumptions). The mechanism in [BLR08] is not efficient because it requires sampling from a non-trivial probability distribution over an unstructured space of exponential size. Dwork et al. [DNR + 09] recently gave an efficient (non-interactive) mechanism that is better than independent Laplace perturbations, in that the privacy parameter α of the mechanism scales as logk with the numberof queries k (for fixed usefulness parameters ǫ, δ). Very recently, Hardt and Talwar [HT10] gave upper and lower bounds for answering noninteractive linear queries which are tight in a related setting. These bounds are not tight in our setting, however, unless the number of queries is small with respect to the size of the database. When the number of queries is large, our mechanism actually yields error significantly less than required in general by their lower bound 3. This is not a contradiction, because when translated into the setting of [HT10], our database size n becomes a sparsity parameter that is not considered in their bounds. 1 More generally, linearly with the VC dimension of the set of queries, which is always at most log k. Or rather, it computes a compact representation of these outputs in the form of a synthetic database. 3 We give a mechanism for answering k counting queries with coordinate-wise error O(n /3 logk(log X /α) 1/3 ). This is less error than required by their lower bound of roughly Ω( klog( X /k)/α) unless k Õ((nα/log X )4/3 ). We can take k to be as large as k = Ω( (nα/log X )1/3 ), in which case our upper bound is a significant improvement as are the upper bounds of [BLR08] and [DNR + 09]. 1.1 Our Results We define a new interactive differentially private mechanism for answering k arbitrary predicate queries, called the median mechanism. 4 The basic implementation of the median mechanism interactively answers queries f 1,...,f k that arrive online, is (ǫ, δ)-useful, and has privacy α that scales with logklog X ; see Theorem 4.1 for the exact statement. These privacy and utility guarantees hold even if an adversary can adaptively choose each f i after seeing the mechanism s first i 1 answers. This is the first interactive mechanism better than the Laplace mechanism, and its performance is close to the best possible even in the non-interactive setting. The basic implementation of the median mechanism is not efficient, and we give an efficient implementation with a somewhat weaker utility guarantee. (The privacy guarantee is as strong as in the basic implementation.) This alternative implementation runs in time polynomial in n, k, and X, and satisfies the following (Theorem 5.1): for every sequence f 1,...,f k of predicate queries, for all but a negligible fraction of input distributions, the efficient median mechanism is (ǫ, δ)-useful. This is the first efficient mechanism with a non-trivial utility guarantee and polylogarithmic privacy cost, even in the non-interactive setting. 1. The Main Ideas The key challenge to designing an interactive mechanism that outperforms the Laplace mechanism lies in determining the appropriate correlations between different output perturbations on the fly, without knowledge of future queries. It is not obvious that anything significantly better than independent perturbations is possible in the interactive setting. Our median mechanism and our analysis of it can be summarized, at a high level, by three facts. First, among any set of k queries, we prove that there are O(logklog X ) hard queries, the answers to which completely determine the answers to all of the other queries (up to ±ǫ). Roughly, this holds because: (i) by a VC dimension argument, we can focus on databases over X of size only O(logk); and (ii) every time we answer a hard query, the number of databases consistent with the mechanism s answers shrinks by a constant factor, and this number cannot drop below 1 (because of the true input database). Second, we design a method to privately release an indicator vector which distinguishes between hard and easy queries online. We note that a similar private indicator vector technique was used by Dwork et al. [DNR + 09]. Essentially, the median mechanism deems a query easy if a majority of the databases that are consistent (up to ±ǫ) with the previous answers of the mechanism would answer the current query accurately. The median mechanism answers the small number of hard queries using independent Laplace perturbations. It answers an easy query (accurately) using the median query result given by databases that are consistent with previous answers. A key intuition is that if a user knows that query i is easy, then it can generate the mechanism s answer on its own. Thus answering an easy query communicates only a single new bit of information: that the query is easy. Finally, we show how to release the classification of queries as easy and hard with 4 The privacy guarantee is (α,τ)-differential privacy for a negligible function τ; see Section for definitions.

3 low privacy cost; intuitively, this is possible because (independent of the database) there can be only O(logklog X ) hard queries. Our basic implementation of the median mechanism is not efficient for the same reasons as for the mechanism in [BLR08]: it requires non-trivial sampling from a set of super-polynomial size. For our efficient implementation, we pass to fractional databases, represented as fractional histograms with components indexed by X. Here, we use the random walk technology of Dyer, Frieze, and Kannan [DFK91] for convex bodies to perform efficient random sampling. To explain why our utility guarantee no longer holdsforeveryinputdatabase, recallthefirstfactusedinthe basic implementation: every answer to a hard query shrinks the number of consistent databases by a constant factor, and this number starts at X O(logk) and cannot drop below 1. With fractional databases (where polytope volumes play the role of set sizes), the lower bound of 1 on the set of consistent (fractional) databases no longer holds. Nonetheless, we prove a lower bound on the volume of this set for almost all fractional histograms (equivalently probability distributions), which salvages the O(logklog X ) bound on hard queries for databases drawn from such distributions.. PRELIMINARIES We briefly formalize the setting of the previous section and record some important definitions. We consider some finite domain X, and define a database D to be an unordered set of elements from X (with multiplicities allowed). We write n = D to denote the size of the database. We consider the set of Boolean functions (predicates) f : X {0,1}. We abuse notation and define a predicate query f(d) : X [0,1] as {x D : f(x) = 1} / D, the function that computes the fraction of elements of D that satisfy predicate f. We say that an answer a i to a query f i is ǫ-accurate with respect to database D if f i(d) a i ǫ. A mechanism M(D,(f 1,...,f k )) is a function from databases and queries to distributions over outputs. In this paper, we consider mechanisms that answer predicate queries numerically, and so the range of our mechanisms is R k. 5 Definition 1. A mechanism M is (ǫ, δ)-useful if for every sequence of queries (f 1,...,f k ) and every database D, with probability at least 1 δ it provides answers a 1,...,a k that are ǫ-accurate for f 1,...,f k and D. Recall that differential privacy means that changing the identity of a single element of the input database does not affect the probability of any outcome by more than a small factor. Formally, givenadatabase D, we say thatadatabase D of the same size is a neighbor of D if it differs in only a single element: D D = D 1. Definition. A mechanism M satisfies (α, τ)-differential privacy if for every subset S R k, every set of queries (f 1,...,f k ), and every pair of neighboring databases D,D : Pr[M(D) S] e α Pr[M(D ) S]+τ. We are generally interested in the case where τ is a negligible function of some of the problem parameters, meaning one that goes to zero faster than x c for every constant c. 5 From ǫ-accurate answers, one can efficiently reconstruct a synthetic database that is consistent (up to ±ǫ) with those answers, if desired [DNR + 09]. Finally, the sensitivity of a real-valued query is the largest difference between its values on neighboring databases. For example, the sensitivity of every non-trivial predicate query is precisely 1/n. 3. THE MEDIAN MECHANISM: BASIC IMPLEMENTATION We now describe the median mechanism and our basic implementation of it. As described in the Introduction, the mechanism is conceptually simple. It classifies queries as easy or hard, essentially according to whether or not a majority of the databases consistent with previous answers tohard queries would give an accurate answer toit (in which case the user already knows the answer ). Easy queries are answered using the corresponding median value; hard queries are answered as in the Laplace mechanism. To explain the mechanism precisely, we need to discuss a number of parameters. We take the privacy parameter α, the accuracy parameter ǫ, and the number k of queries as input; these are hard constraints on the performance of our mechanism. 6 Our mechanism obeys these constraints with a value of δ that is inverse polynomial in k and n, and a value of τ that is negligible in k and n, provided n is sufficiently large (at least polylogarithmic in k and X, see Theorem 4.1). Of course, such a result can be rephrased as a nearly exponential lower bound on the number of queries k that can be successfully answered as a function of the database size n. 7 The median mechanism is shown in Figure 1, and it makes use of several additional parameters. For our analysis, we set their values to: α = m = lnkln 1 ǫ ǫ ; (1) ( α 70mln X = Θ αǫ log X logklog 1 ǫ ) ; () ( ) γ = 4 α ǫn ln k log X log α = Θ klog 1 ǫ. (3) αǫ 3 n The denominator in () can be thought of as our privacy cost as a function of the number of queries k. Needless to say, we made no effort to optimize the constants. The value r i in Step (a) of the median mechanism is defined as S C r i = i 1 exp( ǫ 1 f i(d) f i(s) ). (4) C i 1 For the Laplace perturbations in Steps (a) and (d), recall that the distribution Lap(σ) has the cumulative distribution function F(x) = 1 F( x) = 1 1 e x/σ. (5) 6 We typically think of α,ǫ as small constants, though our results remain meaningful for some sub-constant values of α and ǫ as well. We always assume that α is at least inverse polynomial in k. Note that when α or ǫ is sufficiently small (at most c/n for a small constant c, say), simultaneously meaningful privacy and utility is clearly impossible. 7 In contrast, the number of queries that the Laplace mechanism can privately and usefully answer is at most linear.

4 1. Initialize C 0 = { databases of size m over X }.. For each query f 1,f,...,f k in turn: (a) Define r i as in (4) and let ˆr i = r i +Lap( ). ǫnα (b) Let t i = 3 + j γ, where j {0,1,..., 1 3 } is 4 γ 0 chosen with probability proportional to j. (c) If ˆr i t i, set a i to be the median value of f i on C i 1. (d) If ˆr i < t i, set a i to be f i(d)+lap( 1 nα ). (e) If ˆr i < t i, set C i to the databases S of C i 1 with f i(s) a i ǫ/50; otherwise C i = C i 1. (f) If ˆr j < t j for more than 0mlog X values of j i, then halt and report failure. Figure 1: The Median Mechanism. The motivation behind the mechanism s steps is as follows. The set C i is the set of size-m databases consistent (up to ±ǫ/50) with previous answers of the mechanism to hard queries. The focus on databases with the small size m is justified by a VC dimension argument, see Proposition 4.6. Steps (a) and (b) choose a random value ˆr i and arandomthresholdt i. The valuer i instep(a) is ameasure of how easy the query is, with higher numbers being easier. A more obvious measure would be the fraction of databases S in C i 1 for which f i(s) f i(d) ǫ, but this is a highly sensitive statistic (unlike r i, see Lemma 4.9). The mechanism uses the perturbed value ˆr i rather than r i to privately communicate which queries are easy and which are hard. In Step (b), we choose the threshold t i at random between 3/4 and 9/10. This randomly shifted threshold ensures that, for every database D, there is likely to be a significant gap between r i and t i; such gaps are useful when optimizing the privacy guarantee. Steps (c) and (d) answer easy and hard queries, respectively. Step (e) updates the set of databases consistent with previous answers to hard queries. We prove in Lemma 4.7 that Step (f) occurs with at most inverse polynomial probability. Finally, we note that the median mechanism is defined as if the total number of queries k is (approximately) known in advance. This assumption can be removed by using successively doubling guesses of k; this increases the privacy cost by an O(logk) factor. 4. ANALYSIS OF MEDIAN MECHANISM This section proves the following privacy and utility guarantees for the basic implementation of the median mechanism. Theorem 4.1. For every sequence of adaptively chosen predicate queries f 1,...,f k arriving online, the median mechanism is (ǫ,δ)-useful and (α,τ)-differentially private, where τ is a negligible function of k and X, and δ is an inverse polynomial function of k and n, provided the database size n satisfies n 30ln k α log k α ǫ = Θ( log X log 3 klog 1 ǫ αǫ 3 ). (6) We prove the utility and privacy guarantees in Sections 4.1 and 4., respectively Utility of the Median Mechanism Here we prove a utility guarantee for the median mechanism. Theorem 4.. The median mechanism is (ǫ, δ)-useful, where δ = kexp( Ω(ǫnα )). Note that under assumption (6), δ is inverse polynomial in k and n. We give the proof of Theorem 4. in three pieces: with high probability, every hard query is answered accurately (Lemma 4.4); every easy query is answered accurately (Lemmas 4.3 and 4.5); and the algorithm does not fail (Lemma 4.7). The next two lemmas follow from the definition of the Laplace distribution (5), our choice of δ, and trivial union bounds. Lemma 4.3. With probability at least 1 δ, ri ˆri 1/100 for every query i. Lemma 4.4. With probability at least 1 δ, every answer to a hard query is (ǫ/100)-accurate for D. The next lemma shows that median answers are accurate for easy queries. Lemma 4.5. If r i ˆr i 1/100 for every query i, then every answer to an easy query is ǫ-accurate for D. Proof. For a query i, let G i 1 = {S C i 1 : f i(d) f i(s) ǫ} denote the databases of C i 1 on which the result of query f i is ǫ-accurate for D. Observe that if G i 1.51 C i 1, then the median value of f i on C i 1 is an ǫ- accurate answer for D. Thus proving the lemma reduces to showing that ˆr i 3/4 only if G i 1.51 C i 1. Consider a query i with G i 1 <.51 C i 1. Using (4), we have S C r i = i 1 exp( ǫ 1 f i(d) f i(s) ) C i 1 Gi 1 +e 1 C i 1 \G i 1 C i 1 ( e ) Ci 1 C i 1 < Since r i ˆr i 1/100 for every query i by assumption, the proof is complete. Our final lemma shows that the median mechanism does not fail and hence answers every query, with high probability; this will conclude our proof of Theorem 4.. We need the following preliminary proposition, which instantiates the standard uniform convergence bound with the fact that the VC dimension of every set of k predicate queries is at most log k [Vap96]. Recall the definition of the parameter m from (1). 8 If desired, in Theorem 4.1 we can treat n as a parameter and solve for the error ǫ. The maximum error on any query (normalized by the database size) is O(logklog 1/3 X /n 1/3 α 1/3 ); the unnormalized error is a factor of n larger.

5 Proposition 4.6 (Uniform Convergence Bound). For every collection of k predicate queries f 1,...,f k and every database D, a database S obtained by sampling points from D uniformly at random will satisfy f i(d) f i(s) ǫ for all i except with probability δ, provided S 1 ǫ ( logk +log δ ). In particular, there exists a database S of size m such that for all i {1,...,k}, f i(d) f i(s) ǫ/400. In other words, the results of k predicate queries on an arbitrarily large database can be well approximated by those on a database with size only O(logk). Lemma 4.7. If r i ˆr i 1/100 for every query i and every answer to a hard query is (ǫ/100)-accurate for D, then the median mechanism answers fewer than 0m log X hard queries (and hence answers all queries before terminating). Proof. The plan is to track the contraction of C i as hard queries are answered by the median mechanism. Initially we have C 0 X m. If the median mechanism answers a hard query i, then the definition of the mechanism and our hypotheses yield r i ˆr i < ti We then claim that the size of the set C i = {S C i 1 : f i(s) a i ǫ/50} is at most 94 Ci 1. For if not, 100 S C r i = i 1 exp( ǫ 1 f i(s) f i(d) ) C i 1 94 ( 100 exp 1 ) > , which is a contradiction. Iterating now shows that the number of consistent databases decreases exponentially with the number of hard queries: ( ) h 94 C k X m (7) 100 if h of the k queries are hard. On the other hand, Proposition 4.6 guarantees the existence of a database S C 0 for which f i(s ) f i(d) ǫ/100 for every query f i. Since all answers a i produced by the median mechanism for hard queries i are (ǫ/100)- accurate for D by assumption, f i(s ) a i f i(s ) f i(d) + f i(d) a i ǫ/50. This shows that S C k and hence C k 1. Combining this with (7) gives as desired. h mln X ln(50/47) < 0mln X, 4. Privacy of the Median Mechanism This section establishes the following privacy guarantee for the median mechanism. Theorem 4.8. The median mechanism is (α, τ)- differentially private, where τ is a negligible function of X and k when n is sufficiently large (as in (6)). We can treat the median mechanism as if it has two outputs: a vector of answers a R k, and a vector d {0,1} k such that d i = 0 if i is an easy query and d i = 1 if i is a hard query. A key observation in the privacy analysis is that answers to easy queries are a function only of the previous output of the mechanism, and incur no additional privacy cost beyond the release of the bit d i. Moreover, the median mechanism is guaranteed to produce no more than O(m log X ) answers to hard queries. Intuitively, what we need to show is that the vector d can be released after an unusually small perturbation. Our first lemma states that the small sensitivity of predicate queries carries over, with a /ǫ factor loss, to the r- function defined in (4). Lemma 4.9. The function r i(d) = ( S C exp( ǫ 1 f(d) f(s) )/ C has sensitivity ǫn for every fixed set C of databases and predicate query f. Proof. Let D and D be neighboring databases. Then S C r i(d) = exp( ǫ 1 f(d) f(s) ) C i S C i exp( ǫ 1 ( f(d ) f(s) n 1 )) C i ( ) 1 = exp r i(d ) ǫn ( 1+ ) r i(d ) ǫn r i(d )+ ǫn where the first inequality follows from the fact that the (predicate) query f has sensitivity 1/n, the second from the fact that e x 1+x when x [0,1], and the third from the fact that r i(d ) 1. The next lemma identifies nice properties of typical executions of the median mechanism. Consider an output (d,a) of the median mechanism with a database D. From D and(d,a), we canuniquelyrecoverthevaluesr 1,...,r k computed (via (4)) in Step (a) of the median mechanism, with r i depending only on the first i 1 components of d and a. We sometimes write such a value as r i(d,(d,a)), or as r i(d) if an output (d,a) has been fixed. Call a possible threshold t i good for D and (d,a) if d i = 0 and r i(d,(d,a)) t i +γ, where γ is defined as in (3). Call a vector t of possible thresholds good for D and (d,a) if all but 180mln X of the thresholds are good for D and (d,a). Lemma For every database D, with all but negligible (exp( Ω(logklog X /ǫ ))) probability, the thresholds t generated by the median mechanism are good for its output (d,a). Proof. The idea is to charge the probability of bad thresholds to that of answering hard queries, which are strictly limited by the median mechanism. Since the median mechanism only allows 0mln X of the d i s to be 1, we only need to bound the number of queries i with output d i = 0 and threshold t i satisfying r i < t i + γ, where r i is the value computed by the median mechanism in Step (a) when it answers the query i. Let Y i be the indicator random variable corresponding to the (larger) event that r i < t i +γ. Define Z i to be 1 if and

6 only if, when answering the ith query, the median mechanism chooses a threshold t i and a Laplace perturbation i such that r i+ i < t i (i.e., the queryis classified as hard). If the median mechanism fails before reaching query i, then we define Y i = Z i = 0. Set Y = k i=1 Yi and Z = k i=1zi. We can finishtheproofbyshowingthaty is atmost160mln X except with negligible probability. Consider a query i and condition on the event that r i 9 ; this event depends only on the results of previous queries. In this case, Y i = 1 only if t i = 9/10. But this 10 occurs with probability 3/0γ, which using (3) and (6) is at most 1/k. 9 Therefore, the expected contribution to Y coming from queries i with r i 9 is at most 1. Since ti 10 is selected independently at random for each i, the Chernoff bound implies that the probability that such queries contribute more than mln X to Y is exp( Ω((mlog X ) )) = exp( Ω((logk) (log X ) /ǫ 4 )). Now condition on the event that r i < 9. Let Ti denote 10 the threshold choices that would cause Y i to be 1, and let s i be the smallest such; since r i < 9, Ti. For every ti 10 T i, t i > r i γ; hence, for every t i T i \{s i}, t i > r i. Also, our distribution on the j s in Step (b) ensures that Pr[t i T i \ {s i}] 1 Pr[ti Ti]. Since the Laplace distribution is symmetric around zero and the random choices i,t i are independent, we have E[Z i] = Pr[t i > r i + i] Pr[t i > r i] Pr[ i 0] 1 4 Pr[ti > ri γ] = 1 E[Yi]. (8) 4 The definition of the median mechanism ensures that Z 0m ln X with probability 1. Linearity of expectation, inequality (8), and the Chernoff bound imply that queries with r i < 9 contribute at most 159mln X to Y with probability at least 1 exp( Ω(logklog X /ǫ )). The proof is 10 complete. We can now prove Theorem 4.8. Proof of Theorem 4.8: Recall Definition and fix a database D, queries f 1,...,f k, and a subset S of possible mechanism outputs. For simplicity, we assume that all perturbations are drawn from a discretized Laplace distribution, so that the median mechanism has a countable range; the continuous case can be treated using similar arguments. Then, we can think of S as a countable set of output vector pairs (d,a) with d {0,1} k and a R k. We write MM(D,f) = (d,a) for the event that the median mechanism classifies the queries f = (f 1,...,f k ) according to d and outputs the numerical answers a. If the mechanism computes thresholds t while doing so, we write MM(D,f) = (t,d,a). Let G((d,a),D) denote the vectors that would be good thresholds for (d,a) and D. (Recall that D and (d, a) uniquely define the corresponding r i(d,(d,a)) s.) We have Pr[MM(D,f) S] = Pr[MM(D,f) = (d,a)] (d,a) S 9 For simplicity, we ignore the normalizing constant in the distribution over j s in Step (b), which is Θ(1). τ + = τ + (d,a) S (d,a) S t G((d,a),D) Pr[MM(D,f) = (t,d,a)] Pr[MM(D,f) = (t,d,a)] with some t good for (d,a),d, and where τ is the negligible function of Lemma We complete the proof by showing that, for every neighboring database D, possible output (d,a), and thresholds t good for (d,a) and D, Pr[MM(D,f) = (t,d,a)] e α Pr[MM(D,f) = (t,d,a)]. (9) Fix a neighboring database D, a target output (d,a), and thresholds t good for (d, a) and D. The probability that the median mechanism chooses the target thresholds t is independent of the underlying database, and so is the same on both sides of (9). For the rest of the proof, we condition on the event that the median mechanism uses the thresholds t (both with database D and database D ). Let E i denote the event that MM(D,f) classifies the first i queries in agreement with the target output (i.e., query j i is deemed easy if and only if d j = 0) and that its first i answers are a 1,...,a i. Let E i denote the analogous event for MM(D,f). Observe that E k,e k are the relevant events on the left- and right-hand sides of (9), respectively (after conditioning on t). If (d,a) is such that the median mechanism would fail after the lth query, then the following proof should be applied to E l,e l instead of E k,e k. We next give a crude upper bound on the ratio Pr[E i E i 1]/Pr[E i E i 1] that holds for every query (see (10), below), followed by a much better upper bound for queries with good thresholds. Imagine running the median mechanism in parallel on D,D and condition on the events E i 1,E i 1. The set C i 1 is then the same in both runs of the mechanism, and r i(d),r i(d ) are now fixed. Let b i (b i) be 0 if MM(D,f) (MM(D,f)) classifies query i as easy and 1 otherwise. Since r i(d ) [r i(d) ± ] (Lemma 4.9) and a perturbation with distribution Lap( ) is added to these values ǫn α ǫn before comparing to the threshold t i (Step (a)), Pr[b i = 0 E i 1] e α Pr[b i = 0 E i 1] and similarly for the events where b i,b i = 1. Suppose that the target classification is d i = 1 (a hard query), and let s i and s i denote the random variables f i(d) + Lap( 1 ) α n and f i(d ) + Lap( 1 ), respectively. Independence of the α n Laplace perturbations in Steps (a) and (d) implies that and Pr[E i E i 1] = Pr[b i = 1 E i 1] Pr[s i = a i E i 1] Pr[E i E i 1] = Pr[b i = 1 E i 1] Pr[s i = a i E i 1]. Since the predicate query f i has sensitivity 1/n, we have Pr[E i E i 1] e α Pr[E i E i 1] (10) when d i = 1. Now suppose that d i = 0, and let m i denote the median value of f i on C i 1. Then Pr[E i E i 1] is either 0 (if m i a i) or Pr[b i = 0 E i 1] (if m i = a i); similarly, Pr[E i E i 1] is either 0 or Pr[b i = 0 E i 1]. Thus the bound in (10) continues to hold (even with e α replaced by e α ) when d i = 0.

7 Since α is not much smaller than the privacy target α (recall ()), we cannot afford to suffer the upper bound in (10) for many queries. Fortunately, for queries i with good thresholds we can do much better. Consider a query i such that t i is good for (d,a) and D and condition again on E i 1,E i 1, which fixes C i 1 and hence r i(d). Goodness implies that d i = 0, so the arguments from the previous paragraph also apply here. We can therefore assume that the median value m i of f i on C i 1 equals a i and focus on bounding Pr[b i = 0 E i 1] in terms of Pr[b i = 0 E i 1]. Goodness also implies that r i(d) t i + γ and hence r i(d ) t i + γ ǫn ti + γ (by Lemma 4.9). Recalling from (3) the definition of γ, we have Pr[b i = 0 E i 1] Pr[r i ˆr i < γ ] = 1 1 e γα ǫn/4 = 1 α 4k (11) and of course, Pr[b i = 0 E i 1] 1. Applying (10) to the bad queries at most 180mln X of them, since t is good for (d,a) and D and (11) to the rest, we can derive Pr[E k ] = k Pr[E i : E i 1] i=1 e 360α mln X }{{} (1 α 4k ) k }{{} e α/ by () e α Pr[E k], (1+ α k )k e α/ k Pr[E i : E i 1] which completes the proof of both the inequality (9) and the theorem. 5. THE MEDIAN MECHANISM: EFFICIENT IMPLEMENTATION The basic implementation of the median mechanism runs in time X Θ(log klog(1/ǫ)/ǫ). This section provides an efficient implementation, running in time polynomial in n, k, and X, although with a weaker usefulness guarantee. Theorem 5.1. Assume that the database size n satisfies (6). For every sequence of adaptively chosen predicate queries f 1,...,f k arriving online, the efficient implementation of the median Mechanism is (α, τ)-differentially private for a negligible function τ. Moreover, for every fixed set f 1,...,f k of queries, it is (ǫ,δ)-useful for all but a negligible fraction of fractional databases (equivalently, probability distributions). Specifically, our mechanism answers exponentially many queries for all but an O( X m ) fraction of probability distributions over X drawn from the unit l 1 ball, and from databases drawn from such distributions. Thus our efficient implementation always guarantees privacy, but for a given set of queries f 1,...,f k, there might be a negligibly small fraction of fractional histograms for which our mechanism is not useful for all k queries. We note however that even for the small fraction of fractional histograms for which the efficient median mechanism may not satisfy our usefulness guarantee, it does not output incorrect answers: it merely halts after having answered a i=1 sufficiently large number of queries using the Laplace mechanism. Therefore, even for this small fraction of databases, the efficient median mechanism is an improvement over the Laplace mechanism: in the worst case, it simply answers every query using the Laplace mechanism before halting, and in the best case, it is able to answer many more queries. We give a high-level overview of the proof of Theorem 5.1 which we then make formal. First, why isn t the median mechanism a computationally efficient mechanism? Because C 0 has super-polynomial size X m, and computing r i in Step (a), the median value in Step (c), and the set C i in Step (e) could require time proportional to C 0. An obvious idea is to randomly sample elements of C i 1 to approximately compute r i and the median value of f i on C i 1; while it is easy to control the resulting sampling error and preserve the utility and privacy guarantees of Section 4, it is not clear how to sample from C i 1 efficiently. We show how to implement the median mechanism in polynomial time by redefining the sets C i to be sets of probability distributions over points in X that are consistent (up ) with the hard queries answered up to the ith query. Each set C i will be a convex polytope in R X defined by the intersection of at most O(m log X ) halfspaces, and hence it will be possible to sample points from C i approximately uniformly at random in time poly( X,m) via the grid walk of Dyer, Frieze, and Kannan [DFK91]. Lemmas 4.3, 4.4, and 4.5 still hold (trivially modified to accommodate sampling error). We have to reprove Lemma 4.7, in a somewhat weaker form: that for all but a diminishing fraction of input databases D, the median mechanism does not abort except with probability kexp( Ω(ǫnα )). As for our privacy analysis of the median mechanism, it is independent of the representation of the sets C i and the mechanisms failure probability, and so it need not be repeated the efficient implementation is provably private for all input databases and query sequences. We now give a formal analysis of the efficient implementation. to ± ǫ Redefining the sets C i We redefine the sets C i to represent databases that can contain points fractionally, as opposed to the finite set of small discrete databases. Equivalently, we can view the sets C i as containing probability distributions over the set of points X. We initialize C 0 to be the l 1 ball of radius m in R X, mb X 1, intersected with the non-negative orthant: C 0 = {F R X : F 0, F 1 m}. Each dimension i in R X corresponds to an element x i X. Elements F C 0 can be viewed as fractional histograms. Note that integral points in C 0 correspond exactly to databases of size at most m. We generalize our query functions f i to fractional histograms in the natural way: f i(f) = 1 F j. m j:f i (x j )=1 The update operation after a hard query i is answered is the same as in the basic implementation: { C i F C i 1 : f i(f) a i ǫ }. 50

8 Note that each updating operation after a hard query merely intersects C i 1 with the pair of halfspaces: F j ma ǫm i+ and F j ma ǫm i ; j:f i (x j )=1 j:f i (x j )=1 and so C i is a convex polytope for each i. Dyer, Kannan, and Frieze [DFK91] show how to δ- approximate arandom sample from aconvexbodyk R X intime polynomialin X andtherunningtimeofamembership oracle for K, where δ can be taken to be exponentially small (which is more than sufficient for our purposes). Their algorithm has two requirements: 1. There must be an efficient membership oracle which can in polynomial time determine whether a point F R X lies in K.. K must be well rounded : B X K X B X, where B X is the unit l ball in R X. Since C i is given as the intersection of a set of explicit halfspaces, we have a simple membership oracle to determine whether a given point F C i: we simply check that F lies on the appropriate side of each of the halfspaces. This takes timepoly( X,m), sincethenumberofhalfspacesdefiningc i is linear in the number of answers to hard queries given before time i, which is never more than 0mln X. Moreover, for each i we have C i C 0 mb X 1 mb X X B X. Finally, we can safely assume that B X C i by simply considering the convex set C i = C i+b X instead. This will not affect our results. Therefore, we can implement the median mechanism in time poly( X,k) by using sets C i as defined in this section, andsamplingfrom themusingthegrid walk of[dfk91]. Estimation error in computingr i and the median value of f i on C i 1 by random sampling rather than brute force is easily controlled via the Chernoff bound and can be incorporated into the proofs of Lemmas 4.3 and 4.5 in the obvious way. It remains to prove a continuous version of Lemma 4.7 to show that the efficient implementation of the median mechanism is (ǫ,δ)-useful on all but a negligibly small fraction of fractional histograms F. 5. Usefulness for Almost All Distributions We now prove an analogue of Lemma 4.7 to establish a usefulness guarantee for the efficient version of the median mechanism. Definition 3. With respect to any set of k queries f 1,...,f k and for any F C 0, define Good ǫ(f ) = {F C 0 : max f i(f) f i(f ) ǫ} i {1,,...,k} as the set of points that agree up to an additive ǫ factor with F on every query f i. Since databases D X can be identified with their corresponding histogram vectors F R X, we can also write Good ǫ(d) when the meaning is clear from context. For any F, Good ǫ(f ) is a convex polytope contained inside C 0. We will prove that the efficient version of the median mechanism is (ǫ,δ)-useful for a database D if Vol(Good ǫ/100 (D)) 1 X m. (1) We first prove that (1) holds for almost every fractional histogram. For this, we need a preliminary lemma. Lemma 5.. Let L denote the set of integer points inside C 0. Then with respect to an arbitrary set of k queries, C 0 Good ǫ/400 (F). F L Proof. Every rational valued point F C 0 corresponds to some (large) database D X by scaling F to an integer-valued histogram. Irrational points can be arbitrarily approximated by such a finite database. By Proposition 4.6, for every set of k predicates f 1,...,f k, there is a database F X with F = m such that for each i, f i(f ) f i(f) ǫ/400. Recalling that the histograms corresponding to databases of size at most m are exactly the integer points in C 0, the proof is complete. Lemma 5.3. All but an X m fraction of fractional histograms F satisfy Vol(Good ǫ/00 (F)) 1 X m. Proof. Let { B = F L : Vol(Good ǫ/400(f)) 1 }. X m Consider a randomly selected fractional histogram F C 0. For any F B we have: Pr[F Good ǫ/400 (F)] = Vol(Good ǫ/400(f)) < 1 X m Since B L X m, by a union bound we can conclude 1 that except with probability, F Good X m ǫ/400 (F) for any F B. However, by Lemma 5., F Good ǫ/400 (F ) for somef L. Therefore, exceptwithprobability1/ X m, F L \ B. Thus, since Good ǫ/400 (F ) Good ǫ/00 (F ), except with negligible probability, we have: Vol(Good ǫ/00 (F )) Vol(Good ǫ/400(f )) 1 X m. We are now ready to prove the analogue of Lemma 4.7 for the efficient implementation of the median mechanism. Lemma 5.4. For every set of k queries f 1,...,f k, for all but an O( X m ) fraction of fractional histograms F, the efficient implementation of the median mechanism guarantees that: The mechanism answers fewer than 40m log X hard queries, except with probability kexp( Ω(ǫnα )), Proof. We assume that all answers to hard queries are ǫ/100 accurate, and that r i ˆr i 1 for every i. By 100 Lemmas 4.3 and 4.4 the former adapted to accommodate approximating r i via random sampling we are in this case except with probability kexp( Ω(ǫnα )). We analyze how the volume of C i contracts with the number of hard queries answered. Suppose the mechanism answers a hard query at time i. Then: r i ˆr i < ti

9 Recall C i = {F C i 1 : f i(f) a i ǫ/50}. Suppose that Vol(C i) 94 Vol(Ci 1). Then 100 exp( ǫ 1 f C r i = i 1 i(f) f i(d) )df Vol(C i 1) 94 ( 100 exp 1 ) > , a contradiction. Therefore, we have ( ) h 94 C k, (13) 100 if h of the k queries are hard. Since all answers to hard queries are ǫ/100 accurate, it must be that Good ǫ/100 (D) C k. Therefore, for an input database D that satisfies (1) and this is all but an O( X m ) fraction of them, by Lemma 5.3 we have Vol(C k ) Vol(Good ǫ/100 (D)) Vol(C0). (14) X m Combining inequalities (13) and (14) yields as claimed. h mln X ln < 40mln X, Lemmas 4.4, 4.5, and 5.4 give the following utility guarantee. Theorem 5.5. For every set f 1,...,f k of queries, for all but a negligible fraction of fractional histograms F, the efficient implementation of the median mechanism is (ǫ,δ)- useful with δ = kexp( Ω(ǫnα )). 5.3 Usefulness for Finite Databases Fractional histograms correspond to probability distributions over X. Lemma 5.3 shows that most probability distributions are good for the efficient implementation of the Median Mechanism; in fact, moreis true. Wenextshowthat finite databases sampled from randomly selected probability distributions also have good volume properties. Together, these lemmas show that the efficient implementation of the median mechanism will be able to answer nearly exponentially many queries with high probability, in the setting in which the private database D is drawn from some typical population distribution. DatabaseSample( D ): 1. Select a fractional point F C 0 uniformly at random.. Sample and return a database D of size D bydrawing each x D independently at random from the probability distribution over X induced by F (i.e. sample x i X with probability proportional to F i). Lemma 5.6. For D as in (6) (as required for the Median Mechanism), a database sampled by DatabaseSample( D ) satisfies (1) except with probability at most O( X m ). Proof. By lemma 5.3, except with probability X m, the fractional histogram F selected in step 1 satisfies Vol(Good ǫ/00 (F)) 1 X m. By lemma 4.6, when we sample a database D of size D O((log X log 3 klog1/ǫ)/ǫ 3 ) from the probability distribution induced by F, except with probability δ = O(k X log3 k/ǫ ), Good ǫ/00 (F) Good ǫ/100 (D), which gives us condition (1). We would like an analogue of lemma 5.3 that holds for all but a diminishing fraction of finite databases (which correspond to lattice points within C 0) rather than fractional points in C 0, but it is not clear how uniformly randomly sampled lattice points distribute themselves with respect to the volume of C 0. If n >> X, then the lattice will be fine enough to approximate the volume of C 0, and lemma 5.3 will continue to hold. We now show that small uniformly sampled databases will also be good for the efficient version of the median mechanism. Here, small means n = o( X ), which allows for databases which are still polynomial in the size of X. A tighter analysis is possible, but we opt instead to give a simple argument. Lemma 5.7. For every n such that n satisfies (6) and n = o( X ), all but an O(n / X ) fraction of databases D of size D = n satisfy condition (1). Proof. We proceed by showing that our DatabaseSample procedure, which we know via lemma 5.6 generates databases that satisfy (1) with high probability, is close to uniform. Note that DatabaseSample first selects a probability distribution F uniformly at random from the positive quadrant of the l 1 ball, and then samples D from F. For any particular database D with D = n we write Pr U[D = D ] to denote the probability of generating D when we sample a database uniformly at random, and we write Pr N[D = D ] to denote the probability of generating D when we sample a database according to DatabaseSample. Let R denote the event that D contains no duplicate elements. We begin by noting by symmetry that: Pr U[D = D R] = Pr N[D = D R] We first argue that Pr U[R] and Pr N[R] are both large. We immediately have that the expected number of repetitions in database D when drawn from the uniform distribution is ( n ) / X, and so Pr U[ R] n X. We now consider PrN[R]. Since F is a uniformly random point in the positive quadrant of the l 1 ball, each coordinate F i has the marginal of a Beta distribution: F i β(1, X 1). (See, for example, [Dev86] Chapter 5). X ( X +1) Therefore, E[Fi ] = and so the expected number of repetitions in database D when drawn from DatabaseSample is ( n) X i=1 E[F i ] = (n ) n. Therefore, X +1 X Pr N[ R] n X. Finally, let B be the event that database D fails to satisfy (1). We have: Pr[B] = Pr [B R] Pr[R]+Pr[B R] Pr[ R] U U U U U = Pr [B R] Pr[R]+Pr[B R] Pr[ R] N U U U Pr N [B R] Pr U [R]+Pr U [ R] PrU[R] Pr[B] N Pr +Pr[ R] N[R] U PrN[B] 1 n X = O( n X ) + n X

10 where the last equality follows from lemma 5.6, which states that Pr N[B] is negligibly small. We observe that we can substitute either of the above lemmas for lemma 5.3 in the proof of lemma 5.4 to obtain versions of Thoerem 5.5: Corollary 5.8. For every set f 1,...,f k of queries, for all but a negligible fraction of databases sampled by DatabaseSample, the efficient implementation of the median mechanism is (ǫ,δ)-useful with δ = kexp( Ω(ǫnα )). Corollary 5.9. For every set f 1,...,f k of queries, for all but an n / X fraction of uniformly randomly sampled databases of size n, the efficient implementation of the median mechanism is (ǫ,δ)-useful with δ = kexp( Ω(ǫnα )). 6. CONCLUSION We have shown that in the setting of predicate queries, interactivity does not pose an information theoretic barrier to differentially private data release. In particular, our dependence on the number of queries k nearly matches the optimal dependence of log k achieved in the offline setting by [BLR08]. We remark that our dependence on other parameters is not necessarily optimal: in particular, [DNR + 09] achieves a better (and optimal) dependence on ǫ. We have also shown how to implement our mechanism in time poly( X, k), although at the cost of sacrificing worst-case utility guarantees. The question of an interactive mechanism with poly( X, k) runtime and worst-case utility guarantees remains an interesting open question. More generally, although the lower bounds of [DNR + 09] seem to preclude mechanisms with run-time poly(log X ) from answering a superlinear number of generic predicate queries, the question of achieving this runtime for specific query classes of interest (offline or online) remains largely open. Recently a representation-dependent impossibility result for the class of conjunctions was obtained by Ullman and Vadhan [UV10]: either extending this to a representation-independent impossibility result, or circumventing it by giving an efficient mechanism with a novel output representation would be very interesting. Acknowledgments The first author wishes to thank a number of people for useful discussions, including Avrim Blum, Moritz Hardt, Katrina Ligett, Frank McSherry, and Adam Smith. He would particularly like to thank Moritz Hardt for suggesting trying to prove usefulness guarantees for a continuous version of the BLR mechanism, and Avrim Blum for suggesting the distribution from which we select the threshold in the median mechanism. 7. REFERENCES [BLR08] A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In Proceedings of the 40th annual ACM symposium on Theory of computing, pages ACM New York, NY, USA, 008. [Dev86] L. Devroye. Non-uniform random variate generation [DFK91] M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM (JACM), 38(1):1 17, [DMNS06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Theory of Cryptography Conference TCC, volume 3876 of Lecture Notes in Computer Science, page 65. Springer, 006. [DN03] I. Dinur and K. Nissim. Revealing information while preserving privacy. In nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 0 10, 003. [DNR + 09] C. Dwork, M. Naor, O. Reingold, G.N. Rothblum, and S. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the 41st annual ACM symposium on Symposium on theory of computing, pages ACM New York, NY, USA, 009. [Dwo08] C. Dwork. Differential privacy: A survey of results. In Proceedings of Theory and Applications of Models of Computation, 5th International Conference, TAMC 008, volume 4978 of Lecture Notes in Computer Science, page 1. Springer, 008. [GRS09] A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In Proceedings of the 41st annual ACM symposium on Symposium on theory of computing, pages ACM New York, NY, USA, 009. [HT10] M. Hardt and K. Talwar. On the Geometry of Differential Privacy. In The 4nd ACM Symposium on the Theory of Computing, 010. STOC 10, 010. [KLN + 08] S.P. Kasiviswanathan, H.K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What Can We Learn Privately? In IEEE 49th Annual IEEE Symposium on Foundations of Computer Science, 008. FOCS 08, pages , 008. [MT07] F. McSherry and K. Talwar. Mechanism design via differential privacy. In Proceedings of the 48th Annual Symposium on Foundations of Computer Science, 007. [UV10] J. Ullman and S. Vadhan. PCPs and the Hardness of Generating Synthetic Data. Manuscript, 010. [Vap96] V. Vapnik. Structure of statistical learning theory. Computational Learning and Probabilistic Reasoning, page 3, 1996.

Answering Many Queries with Differential Privacy

Answering Many Queries with Differential Privacy 6.889 New Developments in Cryptography May 6, 2011 Answering Many Queries with Differential Privacy Instructors: Shafi Goldwasser, Yael Kalai, Leo Reyzin, Boaz Barak, and Salil Vadhan Lecturer: Jonathan

More information

Iterative Constructions and Private Data Release

Iterative Constructions and Private Data Release Iterative Constructions and Private Data Release Anupam Gupta 1, Aaron Roth 2, and Jonathan Ullman 3 1 Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA. 2 Department of Computer

More information

What Can We Learn Privately?

What Can We Learn Privately? What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Homin Lee Kobbi Nissim Adam Smith Los Alamos UT Austin Ben Gurion Penn State To appear in SICOMP

More information

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis Moritz Hardt Center for Computational Intractability Department of Computer Science Princeton University Email: mhardt@cs.princeton.edu

More information

Universally Utility-Maximizing Privacy Mechanisms

Universally Utility-Maximizing Privacy Mechanisms Universally Utility-Maximizing Privacy Mechanisms Arpita Ghosh Tim Roughgarden Mukund Sundararajan August 15, 2009 Abstract A mechanism for releasing information about a statistical database with sensitive

More information

Faster Algorithms for Privately Releasing Marginals

Faster Algorithms for Privately Releasing Marginals Faster Algorithms for Privately Releasing Marginals Justin Thaler Jonathan Ullman Salil Vadhan School of Engineering and Applied Sciences & Center for Research on Computation and Society Harvard University,

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis Moritz Hardt Guy N. Rothblum Abstract We consider statistical data analysis in the interactive setting. In this setting a trusted

More information

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism Privacy of Numeric Queries Via Simple Value Perturbation The Laplace Mechanism Differential Privacy A Basic Model Let X represent an abstract data universe and D be a multi-set of elements from X. i.e.

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden November 5, 2014 1 Preamble Previous lectures on smoothed analysis sought a better

More information

A Learning Theory Approach to Noninteractive Database Privacy

A Learning Theory Approach to Noninteractive Database Privacy A Learning Theory Approach to Noninteractive Database Privacy AVRIM BLUM, Carnegie Mellon University KATRINA LIGETT, California Institute of Technology AARON ROTH, University of Pennsylvania In this article,

More information

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden March 9, 2017 1 Preamble Our first lecture on smoothed analysis sought a better theoretical

More information

Calibrating Noise to Sensitivity in Private Data Analysis

Calibrating Noise to Sensitivity in Private Data Analysis Calibrating Noise to Sensitivity in Private Data Analysis Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith Mamadou H. Diallo 1 Overview Issues: preserving privacy in statistical databases Assumption:

More information

CIS 700 Differential Privacy in Game Theory and Mechanism Design January 17, Lecture 1

CIS 700 Differential Privacy in Game Theory and Mechanism Design January 17, Lecture 1 CIS 700 Differential Privacy in Game Theory and Mechanism Design January 17, 2014 Lecture 1 Lecturer: Aaron Roth Scribe: Aaron Roth Intro to Differential Privacy for Game Theorists, and Digital Goods Auctions.

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial

More information

Lecture 1 Introduction to Differential Privacy: January 28

Lecture 1 Introduction to Differential Privacy: January 28 Introduction to Differential Privacy CSE711: Topics in Differential Privacy Spring 2016 Lecture 1 Introduction to Differential Privacy: January 28 Lecturer: Marco Gaboardi Scribe: Marco Gaboardi This note

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Bounds on the Sample Complexity for Private Learning and Private Data Release

Bounds on the Sample Complexity for Private Learning and Private Data Release Bounds on the Sample Complexity for Private Learning and Private Data Release Amos Beimel Hai Brenner Shiva Prasad Kasiviswanathan Kobbi Nissim June 28, 2013 Abstract Learning is a task that generalizes

More information

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d CS161, Lecture 4 Median, Selection, and the Substitution Method Scribe: Albert Chen and Juliana Cook (2015), Sam Kim (2016), Gregory Valiant (2017) Date: January 23, 2017 1 Introduction Last lecture, we

More information

Exploiting Metric Structure for Efficient Private Query Release

Exploiting Metric Structure for Efficient Private Query Release Exploiting Metric Structure for Efficient Private Query Release Zhiyi Huang Aaron Roth Abstract We consider the problem of privately answering queries defined on databases which are collections of points

More information

Bounds on the Sample Complexity for Private Learning and Private Data Release

Bounds on the Sample Complexity for Private Learning and Private Data Release Bounds on the Sample Complexity for Private Learning and Private Data Release Amos Beimel 1,, Shiva Prasad Kasiviswanathan 2, and Kobbi Nissim 1,3, 1 Dept. of Computer Science, Ben-Gurion University 2

More information

arxiv: v1 [cs.cr] 28 Apr 2015

arxiv: v1 [cs.cr] 28 Apr 2015 Differentially Private Release and Learning of Threshold Functions Mark Bun Kobbi Nissim Uri Stemmer Salil Vadhan April 28, 2015 arxiv:1504.07553v1 [cs.cr] 28 Apr 2015 Abstract We prove new upper and lower

More information

Some Sieving Algorithms for Lattice Problems

Some Sieving Algorithms for Lattice Problems Foundations of Software Technology and Theoretical Computer Science (Bangalore) 2008. Editors: R. Hariharan, M. Mukund, V. Vinay; pp - Some Sieving Algorithms for Lattice Problems V. Arvind and Pushkar

More information

Testing Problems with Sub-Learning Sample Complexity

Testing Problems with Sub-Learning Sample Complexity Testing Problems with Sub-Learning Sample Complexity Michael Kearns AT&T Labs Research 180 Park Avenue Florham Park, NJ, 07932 mkearns@researchattcom Dana Ron Laboratory for Computer Science, MIT 545 Technology

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

1 Hoeffding s Inequality

1 Hoeffding s Inequality Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Lecture 7: Passive Learning

Lecture 7: Passive Learning CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing

More information

Lecture 20: Introduction to Differential Privacy

Lecture 20: Introduction to Differential Privacy 6.885: Advanced Topics in Data Processing Fall 2013 Stephen Tu Lecture 20: Introduction to Differential Privacy 1 Overview This lecture aims to provide a very broad introduction to the topic of differential

More information

Sampling Contingency Tables

Sampling Contingency Tables Sampling Contingency Tables Martin Dyer Ravi Kannan John Mount February 3, 995 Introduction Given positive integers and, let be the set of arrays with nonnegative integer entries and row sums respectively

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

Faster Private Release of Marginals on Small Databases

Faster Private Release of Marginals on Small Databases Faster Private Release of Marginals on Small Databases Karthekeyan Chandrasekaran School of Engineering and Applied Sciences Harvard University karthe@seasharvardedu Justin Thaler School of Engineering

More information

Lecture Notes on Linearity (Group Homomorphism) Testing

Lecture Notes on Linearity (Group Homomorphism) Testing Lecture Notes on Linearity (Group Homomorphism) Testing Oded Goldreich April 5, 2016 Summary: These notes present a linearity tester that, on input a description of two groups G,H and oracle access to

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information

A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information Shiva Prasad Kasiviswanathan Adam Smith Department of Computer Science and Engineering Pennsylvania State University e-mail:

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Boosting and Differential Privacy

Boosting and Differential Privacy Boosting and Differential Privacy Cynthia Dwork Guy N. Rothblum Salil Vadhan Abstract Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct privacy-preserving

More information

On the complexity of approximate multivariate integration

On the complexity of approximate multivariate integration On the complexity of approximate multivariate integration Ioannis Koutis Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 USA ioannis.koutis@cs.cmu.edu January 11, 2005 Abstract

More information

Lecture 11- Differential Privacy

Lecture 11- Differential Privacy 6.889 New Developments in Cryptography May 3, 2011 Lecture 11- Differential Privacy Lecturer: Salil Vadhan Scribes: Alan Deckelbaum and Emily Shen 1 Introduction In class today (and the next two lectures)

More information

Privacy-preserving logistic regression

Privacy-preserving logistic regression Privacy-preserving logistic regression Kamalika Chaudhuri Information Theory and Applications University of California, San Diego kamalika@soe.ucsd.edu Claire Monteleoni Center for Computational Learning

More information

CS369N: Beyond Worst-Case Analysis Lecture #7: Smoothed Analysis

CS369N: Beyond Worst-Case Analysis Lecture #7: Smoothed Analysis CS369N: Beyond Worst-Case Analysis Lecture #7: Smoothed Analysis Tim Roughgarden November 30, 2009 1 Context This lecture is last on flexible and robust models of non-worst-case data. The idea is again

More information

On the Complexity of Optimal K-Anonymity

On the Complexity of Optimal K-Anonymity On the Complexity of Optimal K-Anonymity Adam Meyerson Department of Computer Science University of California Los Angeles, CA awm@cs.ucla.edu Ryan Williams Computer Science Department Carnegie Mellon

More information

Rational Proofs against Rational Verifiers

Rational Proofs against Rational Verifiers Rational Proofs against Rational Verifiers Keita Inasawa Kenji Yasunaga March 24, 2017 Abstract Rational proofs, introduced by Azar and Micali (STOC 2012), are a variant of interactive proofs in which

More information

An Improved Approximation Algorithm for Requirement Cut

An Improved Approximation Algorithm for Requirement Cut An Improved Approximation Algorithm for Requirement Cut Anupam Gupta Viswanath Nagarajan R. Ravi Abstract This note presents improved approximation guarantees for the requirement cut problem: given an

More information

Lecture 2: Program Obfuscation - II April 1, 2009

Lecture 2: Program Obfuscation - II April 1, 2009 Advanced Topics in Cryptography Lecture 2: Program Obfuscation - II April 1, 2009 Lecturer: S. Goldwasser, M. Naor Scribe by: R. Marianer, R. Rothblum Updated: May 3, 2009 1 Introduction Barak et-al[1]

More information

On the Impact of Objective Function Transformations on Evolutionary and Black-Box Algorithms

On the Impact of Objective Function Transformations on Evolutionary and Black-Box Algorithms On the Impact of Objective Function Transformations on Evolutionary and Black-Box Algorithms [Extended Abstract] Tobias Storch Department of Computer Science 2, University of Dortmund, 44221 Dortmund,

More information

Probabilistically Checkable Arguments

Probabilistically Checkable Arguments Probabilistically Checkable Arguments Yael Tauman Kalai Microsoft Research yael@microsoft.com Ran Raz Weizmann Institute of Science ran.raz@weizmann.ac.il Abstract We give a general reduction that converts

More information

Preliminary Results on Social Learning with Partial Observations

Preliminary Results on Social Learning with Partial Observations Preliminary Results on Social Learning with Partial Observations Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar ABSTRACT We study a model of social learning with partial observations from

More information

Privacy in Statistical Databases

Privacy in Statistical Databases Privacy in Statistical Databases Individuals x 1 x 2 x n Server/agency ) answers. A queries Users Government, researchers, businesses or) Malicious adversary What information can be released? Two conflicting

More information

CPSC 467b: Cryptography and Computer Security

CPSC 467b: Cryptography and Computer Security CPSC 467b: Cryptography and Computer Security Michael J. Fischer Lecture 10 February 19, 2013 CPSC 467b, Lecture 10 1/45 Primality Tests Strong primality tests Weak tests of compositeness Reformulation

More information

1 Distributional problems

1 Distributional problems CSCI 5170: Computational Complexity Lecture 6 The Chinese University of Hong Kong, Spring 2016 23 February 2016 The theory of NP-completeness has been applied to explain why brute-force search is essentially

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

CS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves

CS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves CS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves Tim Roughgarden November 5, 2014 1 Pareto Curves and a Knapsack Algorithm Our next application of smoothed analysis is

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Lower Bounds for Testing Bipartiteness in Dense Graphs

Lower Bounds for Testing Bipartiteness in Dense Graphs Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due

More information

Optimal Noise Adding Mechanisms for Approximate Differential Privacy

Optimal Noise Adding Mechanisms for Approximate Differential Privacy Optimal Noise Adding Mechanisms for Approximate Differential Privacy Quan Geng, and Pramod Viswanath Coordinated Science Laboratory and Dept. of ECE University of Illinois, Urbana-Champaign, IL 680 Email:

More information

Generalization in Adaptive Data Analysis and Holdout Reuse. September 28, 2015

Generalization in Adaptive Data Analysis and Holdout Reuse. September 28, 2015 Generalization in Adaptive Data Analysis and Holdout Reuse Cynthia Dwork Vitaly Feldman Moritz Hardt Toniann Pitassi Omer Reingold Aaron Roth September 28, 2015 arxiv:1506.02629v2 [cs.lg] 25 Sep 2015 Abstract

More information

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017

More information

Approximation Metrics for Discrete and Continuous Systems

Approximation Metrics for Discrete and Continuous Systems University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science May 2007 Approximation Metrics for Discrete Continuous Systems Antoine Girard University

More information

On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy

On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy Weina Wang, Lei Ying, and Junshan Zhang School of Electrical, Computer and Energy Engineering Arizona State

More information

On the Impossibility of Black-Box Truthfulness Without Priors

On the Impossibility of Black-Box Truthfulness Without Priors On the Impossibility of Black-Box Truthfulness Without Priors Nicole Immorlica Brendan Lucier Abstract We consider the problem of converting an arbitrary approximation algorithm for a singleparameter social

More information

Approximate and Probabilistic Differential Privacy Definitions

Approximate and Probabilistic Differential Privacy Definitions pproximate and Probabilistic Differential Privacy Definitions Sebastian Meiser University College London, United Kingdom, e-mail: s.meiser@ucl.ac.uk July 20, 208 bstract This technical report discusses

More information

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010 CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010 Recall the definition of probabilistically checkable proofs (PCP) from last time. We say L has

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Lecture 6: September 22

Lecture 6: September 22 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 6: September 22 Lecturer: Prof. Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected

More information

2 Making the Exponential Mechanism Exactly Truthful Without

2 Making the Exponential Mechanism Exactly Truthful Without CIS 700 Differential Privacy in Game Theory and Mechanism Design January 31, 2014 Lecture 3 Lecturer: Aaron Roth Scribe: Aaron Roth Privacy as a Tool for Mechanism Design for arbitrary objective functions)

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds

CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds CSC 2429 Approaches to the P vs. NP Question and Related Complexity Questions Lecture 2: Switching Lemma, AC 0 Circuit Lower Bounds Lecturer: Toniann Pitassi Scribe: Robert Robere Winter 2014 1 Switching

More information

Recoverabilty Conditions for Rankings Under Partial Information

Recoverabilty Conditions for Rankings Under Partial Information Recoverabilty Conditions for Rankings Under Partial Information Srikanth Jagabathula Devavrat Shah Abstract We consider the problem of exact recovery of a function, defined on the space of permutations

More information

Lecture 12: Lower Bounds for Element-Distinctness and Collision

Lecture 12: Lower Bounds for Element-Distinctness and Collision Quantum Computation (CMU 18-859BB, Fall 015) Lecture 1: Lower Bounds for Element-Distinctness and Collision October 19, 015 Lecturer: John Wright Scribe: Titouan Rigoudy 1 Outline In this lecture, we will:

More information

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics Tim Roughgarden November 13, 2013 1 Do Players Learn Equilibria? In this lecture we segue into the third part of the course, which studies

More information

Hardness of Embedding Metric Spaces of Equal Size

Hardness of Embedding Metric Spaces of Equal Size Hardness of Embedding Metric Spaces of Equal Size Subhash Khot and Rishi Saket Georgia Institute of Technology {khot,saket}@cc.gatech.edu Abstract. We study the problem embedding an n-point metric space

More information

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria Tim Roughgarden November 4, 2013 Last lecture we proved that every pure Nash equilibrium of an atomic selfish routing

More information

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

arxiv: v1 [cs.ds] 3 Feb 2018

arxiv: v1 [cs.ds] 3 Feb 2018 A Model for Learned Bloom Filters and Related Structures Michael Mitzenmacher 1 arxiv:1802.00884v1 [cs.ds] 3 Feb 2018 Abstract Recent work has suggested enhancing Bloom filters by using a pre-filter, based

More information

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments

Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments François Le Gall The University of Tokyo Technical version available at arxiv:1407.0085 [quant-ph]. Background. Triangle finding

More information

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015 CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring 2015 3 February 2015 The interactive online mechanism that we discussed in the last two lectures allows efficient

More information

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 MIT OpenCourseWare http://ocw.mit.edu 18.409 Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Not all counting problems are efficiently approximable. We open with a simple example.

Not all counting problems are efficiently approximable. We open with a simple example. Chapter 7 Inapproximability Not all counting problems are efficiently approximable. We open with a simple example. Fact 7.1. Unless RP = NP there can be no FPRAS for the number of Hamilton cycles in a

More information

Last time, we described a pseudorandom generator that stretched its truly random input by one. If f is ( 1 2

Last time, we described a pseudorandom generator that stretched its truly random input by one. If f is ( 1 2 CMPT 881: Pseudorandomness Prof. Valentine Kabanets Lecture 20: N W Pseudorandom Generator November 25, 2004 Scribe: Ladan A. Mahabadi 1 Introduction In this last lecture of the course, we ll discuss the

More information

An Algorithmic Proof of the Lopsided Lovász Local Lemma (simplified and condensed into lecture notes)

An Algorithmic Proof of the Lopsided Lovász Local Lemma (simplified and condensed into lecture notes) An Algorithmic Proof of the Lopsided Lovász Local Lemma (simplified and condensed into lecture notes) Nicholas J. A. Harvey University of British Columbia Vancouver, Canada nickhar@cs.ubc.ca Jan Vondrák

More information

Fingerprinting Codes and the Price of Approximate Differential Privacy

Fingerprinting Codes and the Price of Approximate Differential Privacy 2014 ACM Symposium on the on on the Theory of of of Computing Fingerprinting Codes and the ice of Approximate Differential ivacy Mark Bun Harvard University SEAS mbun@seas.harvard.edu Jonathan Ullman Harvard

More information

Lecture 5: Derandomization (Part II)

Lecture 5: Derandomization (Part II) CS369E: Expanders May 1, 005 Lecture 5: Derandomization (Part II) Lecturer: Prahladh Harsha Scribe: Adam Barth Today we will use expanders to derandomize the algorithm for linearity test. Before presenting

More information

A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover

A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover A Linear Round Lower Bound for Lovasz-Schrijver SDP Relaxations of Vertex Cover Grant Schoenebeck Luca Trevisan Madhur Tulsiani Abstract We study semidefinite programming relaxations of Vertex Cover arising

More information

A Las Vegas approximation algorithm for metric 1-median selection

A Las Vegas approximation algorithm for metric 1-median selection A Las Vegas approximation algorithm for metric -median selection arxiv:70.0306v [cs.ds] 5 Feb 07 Ching-Lueh Chang February 8, 07 Abstract Given an n-point metric space, consider the problem of finding

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Lecture 1: 01/22/2014

Lecture 1: 01/22/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 1: 01/22/2014 Spring 2014 Scribes: Clément Canonne and Richard Stark 1 Today High-level overview Administrative

More information

1 Approximate Counting by Random Sampling

1 Approximate Counting by Random Sampling COMP8601: Advanced Topics in Theoretical Computer Science Lecture 5: More Measure Concentration: Counting DNF Satisfying Assignments, Hoeffding s Inequality Lecturer: Hubert Chan Date: 19 Sep 2013 These

More information

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

On the Complexity of the Minimum Independent Set Partition Problem

On the Complexity of the Minimum Independent Set Partition Problem On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk

More information

Enabling Accurate Analysis of Private Network Data

Enabling Accurate Analysis of Private Network Data Enabling Accurate Analysis of Private Network Data Michael Hay Joint work with Gerome Miklau, David Jensen, Chao Li, Don Towsley University of Massachusetts, Amherst Vibhor Rastogi, Dan Suciu University

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Differentially Private Multi-party Computation

Differentially Private Multi-party Computation ifferentially Private Multi-party Computation Peter Kairouz, Sewoong Oh, Pramod Viswanath epartment of Electrical and Computer Engineering University of Illinois at Urbana-Champaign {kairouz2, pramodv}@illinois.edu

More information

Monotone Submodular Maximization over a Matroid

Monotone Submodular Maximization over a Matroid Monotone Submodular Maximization over a Matroid Yuval Filmus January 31, 2013 Abstract In this talk, we survey some recent results on monotone submodular maximization over a matroid. The survey does not

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information