arxiv: v3 [cs.lg] 3 Dec 2017

Size: px

Start display at page:

Download "arxiv: v3 [cs.lg] 3 Dec 2017"

Allyson French
5 years ago
Views:

1 Context-Aware Generative Aversarial Privacy Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, an Ram Rajagopal arxiv: v3 [cs.lg] 3 Dec 2017 Abstract Preserving the utility of publishe atasets while simultaneously proviing provable privacy guarantees is a well-known challenge. On the one han, context-free privacy solutions, such as ifferential privacy, provie strong privacy guarantees, but often lea to a significant reuction in utility. On the other han, context-aware privacy solutions, such as information theoretic privacy, achieve an improve privacy-utility traeoff, but assume that the ata holer has access to ataset statistics. We circumvent these limitations by introucing a novel contextaware privacy framework calle generative aversarial privacy (GAP). GAP leverages recent avancements in generative aversarial networks (GANs) to allow the ata holer to learn privatization schemes from the ataset itself. Uner GAP, learning the privacy mechanism is formulate as a constraine minimax game between two players: a privatizer that sanitizes the ataset in a way that limits the risk of inference attacks on the iniviuals private variables, an an aversary that tries to infer the private variables from the sanitize ataset. To evaluate GAP s performance, we investigate two simple (yet canonical) statistical ataset moels: (a) the binary ata moel, an (b) the binary Gaussian mixture moel. For both moels, we erive game-theoretically optimal minimax privacy mechanisms, an show that the privacy mechanisms learne from ata (in a generative aversarial fashion) match the theoretically optimal ones. This emonstrates that our framework can be easily applie in practice, even in the absence of ataset statistics. Keywors- Generative Aversarial Privacy; Generative Aversarial Networks; Privatizer Network; Aversarial Network; Statistical Data Privacy; Differential Privacy; Information Theoretic Privacy; Mutual Information Privacy; Error Probability Games; Machine Learning 1 Introuction The explosion of information collection across a variety of electronic platforms is enabling the use of inferential machine learning (ML) an artificial intelligence to guie consumers through a myria of choices an ecisions in their aily lives. In this era of artificial intelligence, ata is quickly becoming the most valuable resource [25]. Inee, large scale atasets provie tremenous utility in helping researchers esign state-of-the-art machine learning algorithms that can learn from an make preictions on real life ata. Scholars an researchers are increasingly emaning access to larger atasets that allow them to learn more sophisticate moels. Unfortunately, more often than not, in aition to containing public information that can be publishe, large scale atasets also contain private information about participating iniviuals (see Figure 1). Thus, ata collection an curation organizations are reluctant to release such atasets before carefully sanitizing them, especially in light of recent public policies on ata sharing [28, 62]. To protect the privacy of iniviuals, atasets are typically anonymize before their release. This is one by stripping off personally ientifiable information (e.g., first an last name, social security number, IDs, etc.) [50, 69, 77]. Anonymization, however, oes not provie immunity against correlation an linkage attacks [36, 61]. Inee, several successful attempts to re-ientify iniviuals from anonymize atasets have been reporte in the past ten years. For instance, [61] were able to successfully e-anonymize watch histories in the Netflix Prize, a public recommener system competition. In a more recent attack, [78] showe that participants of an anonymize DNA stuy were ientifie by linking their DNA ata with the publicly available Personal Genome Project ataset. Even more recently, [30] successfully esigne re-ientification attacks on anonymize C. Huang an L. Sankar are with the School of Electrical, Computer, an Energy Engineering at Arizona State University, Tempe, AZ P. Kairouz, X. Chen, an R. Rajagopal are with the Department of Electrical Engineering at Stanfor University, Stanfor, CA Equal contributions 1

2 Original meter ata Χ Private ata Y Meter ata Meter ata 10:00, 09/06/ :30, 05/06/2011 Income Occupancy Meter ata Meter ata 10:00, 09/06/ :30, 05/06/2011 Entry (row 1) Entry (row 2) Entry (row 3) , , ,000 4 Perturbation Entry (row n) , Database D Figure 1: An example privacy preserving mechanism for smart meter ata fmri imaging atasets. Other annoymization techniques, such as generalization [11, 32, 49] an suppression [41, 68, 86], also cannot prevent an aversary from performing the sensitive linkages or recover private information from publishe atasets [31]. Aressing the shortcomings of anonymization techniques requires ata ranomization. In recent years, two ranomization-base approaches with provable statistical privacy guarantees have emerge: (a) context-free approaches that assume worst-case ataset statistics an aversaries; (b) context-aware approaches that explicitly moel the ataset statistics an aversary s capabilities. Context-free privacy. One of the most popular context-free notions of privacy is ifferential privacy (DP) [21, 22, 23]. DP, quantifie by a leakage parameter ɛ 1, restricts istinguishability between any two neighboring atasets from the publishe ata. DP provies strong, context-free theoretical guarantees against worst-case aversaries. However, training machine learning moels on ranomize ata with DP guarantees often leas to a significantly reuce utility an comes with a tremenous hit in sample complexity [18, 19, 20, 29, 37, 42, 43, 47, 64, 82, 87, 93, 94] in the esire leakage regimes. For example, learning population level histograms uner local DP suffers from a stupenous increase in sample complexity by a factor proportional to the size of the ictionary [20, 42, 43]. Context-aware privacy. Context-aware privacy notions have been so far stuie by information theorists uner the rubric of information theoretic (IT) privacy [4, 5, 6, 8, 10, 12, 13, 14, 15, 44, 45, 46, 51, 57, 65, 67, 70, 71, 72, 84, 92]. IT privacy has preominantly been quantifie by mutual information (MI) which moels how well an aversary, with access to the release ata, can refine its belief about the private features of the ata. Recently, Issa et al. introuce maximal leakage (MaxL) to quantify leakage to a strong aversary capable of guessing any function of the ataset [40]. They also showe that their aversarial moel can be generalize to encompass local DP (wherein the mechanism ensures limite istinction for any pair of entries a stronger DP notion without a neighborhoo constraint [20, 88]) [39]. When one restricts the aversary to guessing specific private features (an not all functions of these features), the resulting aversary is a maximum a posteriori (MAP) aversary that has been stuie by Asooeh et al. in [6, 7, 8, 9]. Context-aware ata perturbation techniques have also been stuie in privacy preserving clou computing [16, 17, 48]. Compare to context-free privacy notions, context-aware privacy notions achieve a better privacy-utility traeoff by incorporating the statistics of the ataset an placing reasonable restrictions on the capabilities of the aversary. However, using information theoretic quantities (such as MI) as privacy metrics requires learning the parameters of the privatization mechanism in a ata-riven fashion that involves minimizing an empirical information theoretic loss function. This task is remarkably challenging in practice [3, 33, 56, 81, 96]. Generative aversarial privacy. Given the challenges of existing privacy approaches, we take a funamentally new approach towars enabling private ata publishing with guarantees on both privacy an utility. Instea of aopting worst-case, context-free notions of ata privacy (such as ifferential privacy), we introuce a novel context-aware moel of privacy that allows the esigner to cleverly a noise where it matters. An inherent challenge in taking a contextaware privacy approach is that it requires having access to priors, such as joint istributions of public an private variables. Such information is harly ever present in practice. To overcome this issue, we take a ata-riven approach to context-aware privacy. We leverage recent avancements in generative aversarial networks (GANs) to introuce a unifie framework for context-aware privacy calle generative aversarial privacy (GAP). Uner GAP, the parameters of a generative 1 Smaller ɛ [0, ) implies smaller leakage an stronger privacy guarantees. 2

3 X, Y ˆX = g(x, Y ) Ŷ = h(g(x, Y )) Privatizer Aversary Noise Sequence Figure 2: Generative Aversarial Privacy moel, representing the privatization mechanism, are learne from the ata itself. 1.1 Our Contributions We investigate a setting where a ata holer woul like to publish a ataset D in a privacy preserving fashion. Each row in D contains both private variables (represente by Y ) an public variables (represente by X). The goal of the ata holer is to generate ˆX in a way such that: (a) ˆX is as goo of a representation of X as possible, an (b) an aversary cannot use ˆX to reliably infer Y. To this en, we present GAP, a unifie framework for context-aware privacy that inclues existing information-theoretic privacy notions. Our formulation is inspire by GANs [34, 55, 73] an error probability games [58, 59, 60, 66, 74]. It inclues two learning blocks: a privatizer, whose task is to output a sanitize version of the public variables (subject to some istortion constraints); an an aversary, whose task is to learn the private variables from the sanitize ata. The privatizer an aversary achieve their goals by competing in a constraine minimax, zero-sum game. On the one han, the privatizer (a conitional generative moel) is esigne to minimize the aversary s performance in inferring Y reliably. On the other han, the aversary (a classifier) seeks to fin the best inference strategy that maximizes its performance. This generative aversarial framework is represente in Figure 2. At the core of GAP is a loss function 2 that captures how well an aversary oes in terms of inferring the private variables. Different loss functions lea to ifferent aversarial moels. We focus our attention on two types of loss functions: (a) a 0-1 loss that leas to a maximum a posteriori probability (MAP) aversary, an (b) an empirical log-loss that leas to a minimum cross-entropy aversary. Ultimately, our goal is to show that our ata-riven approach can provie privacy guarantees against a MAP aversary. However, erivatives of a 0-1 loss function are illefine. To overcome this issue, the ML community uses the more analytically tractable log-loss function. We o the same by choosing the log-loss function as the aversary s loss function in the ata-riven framework. We show that it leas to a performance that matches the performance of game-theoretically optimal mechanisms uner a MAP aversary. We also show that GAP recovers mutual information privacy when a log-loss function is use (see Section 2.2). To showcase the power of our context-aware, ata-riven framework, we investigate two simple, albeit canonical, statistical ataset moels: (a) the binary ata moel, an (b) the binary Gaussian mixture moel. Uner the binary ata moel, both X an Y are binary. Uner the binary Gaussian mixture moel, Y is binary whereas X is conitionally Gaussian. For both moels, we erive an compare the performance of game-theoretically optimal privatization mechanisms with those that are irectly learne from ata (in a generative aversarial fashion). For the above-mentione statistical ataset moels, we present two approaches towars esigning privacy mechanisms: (i) private-ata epenent (PDD) mechanisms, where the privatizer uses both the public an private variables, an (ii) private-ata inepenent (PDI) mechanisms, where the privatizer only uses the public variables. We show that the PDD mechanisms lea to a superior privacy-utility traeoff. 1.2 Relate Work In practice, a context-free notion of privacy (such as DP) is esirable because it places no restrictions on the ataset statistics or aversary s strength. This explains why DP has been remarkably successful in the past ten years, an has been eploye in array of systems, incluing Google s Chrome browser [27] an Apple s ios [90]. Nevertheless, because of its strong context-free nature, 2 We quantify the aversary s performance via a loss function an the quality of the release ata via a istortion function. 3

4 DP has suffere from a sequence of impossibility results. These results have mae the eployment of DP with a reasonable leakage parameter practically impossible. Inee, it was recently reporte that Apple s DP implementation suffers from several limitations most notable of which is Apple s use of unacceptably large leakage parameters [79]. Context-aware privacy notions can exploit the structure an statistics of the ataset to esign mechanisms matche to both the ata an aversarial moels. In this context, informationtheoretic metrics for privacy are naturally well suite. In fact, the aversarial moel etermines the appropriate information metric: an estimating aversary that minimizes mean square error is capture by χ 2 -square measures [13], a belief refining aversary is capture by MI [71], an aversary that can make a har MAP ecision for a specific set of private features is capture by the Arimoto MI of orer [7, 9], an an aversary that can guess any function of the private features is capture by the maximal (over all istributions of the ataset for a fixe support) Sibson information of orer [39, 40]. Information-theoretic metrics, an in particular MI privacy, allow the use of Fano s inequality an its variants [85] to boun the rate of learning the private variables for a variety of learning metrics, such as error probability an minimum mean-square error (MMSE). Despite the strength of MI in proviing statistical utility as well as capturing a fairly strong aversary that involves refining beliefs, in the absence of priors on the ataset, using MI as an empirical loss function leas to computationally intractable proceures when learning the optimal parameters of the privatization mechanism from ata. Inee, training algorithms with empirical information theoretic loss functions is a challenging problem that has been explore in specific learning contexts, such as etermining ranomize encoers for the information bottleneck problem [3] an esigning eep auto-encoers using a rate-istortion paraigm [33, 81, 96]. Even in these specific contexts, variational approaches were taken to minimize/maximize a surrogate function instea of minimizing/maximizing an empirical mutual information loss function irectly [76]. In an effort to brige theory an practice, we present a general ata-riven framework to esign privacy mechanisms that can capture a range of information-theoretic privacy metrics as loss functions. We will show how our framework leas to very practical (generative aversarial) ata-riven formulations that match their corresponing theoretical formulations. In the context of publishing atasets with privacy an utility guarantees, a number of similar approaches have been recently consiere. We briefly review them an clarify how our work is ifferent. In [91], the authors consier linear privatizer an aversary moels by aing noise in irections that are orthogonal to the public features in the hope that the spaces of the public an private features are orthogonal (or nearly orthogonal). This allows the privatizer to achieve full privacy without sacrificing utility. However, this work is restrictive in the sense that it requires the public an private features to be nearly orthogonal. Furthermore, this work provies no rigorous quantification of privacy an only investigates a limite class of linear aversaries an privatizers. DP-base obfuscators for ata publishing have been consiere in [35, 54]. The author in [35] consiers a eterministic, compressive mapping of the input ata with ifferentially private noise ae either before or after the mapping. The mapping rule is etermine by a atariven methoology to esign minimax filters that allow non-malicious entities to learn some public features from the filtere ata, while preventing malicious entities from learning other private features. The approach in [54] relies on using eep auto-encoers to etermine the relevant feature space to a ifferentially private noise to, eliminating the nee to a noise to the original ata. After noise aing, the original signal is reconstructe. These novel approaches leverage minimax filters an eep auto-encoers to incorporate a notion of context-aware privacy an achieve better privacy-utility traeoffs while using DP to enforce privacy. However, DP will still incur an insurmountable utility cost since it assumes worst-case ataset statistics. Our approach captures a broaer class of ranomization-base mechanisms via a generative moel which allows the privatizer to tailor the noise to the statistics of the ataset. Our work is also closely relate to aversarial neural cryptography [1], learning censore representations [26], an privacy preserving image sharing [64], in which aversarial learning is use to learn how to protect communications by encryption or hie/remove sensitive information. Similar to these problems, our moel inclues a minimax formulation an uses aversarial neural networks to learn privatization schemes. However, in [26, 64], the authors use non-generative autoencoers to remove sensitive information, which o not have an obvious generative interpretation. Instea, we use a GANs-like approach to learn privatization schemes that prevent an aversary from inferring the private ata. Moreover, these papers consier a Lagrangian formulation for the 4

5 utility-privacy traeoff that the obfuscator computes. We go beyon these works by stuying a game-theoretic setting with constraine optimization, which provies a specific privacy guarantee for a fixe istortion. We also compare the performance of the privatization schemes learne in an aversarial fashion with the game-theoretically optimal ones. We use conitional generative moels to represent privatization schemes. Generative moels have recently receive a lot of attention in the machine learning community [34, 38, 55, 73, 75]. Ultimately, eep generative moels hol the promise of iscovering an efficiently internalizing the statistics of the target signal to be generate. State-of-the-art generative moels are traine in an aversarial fashion [34, 55]: the generate signal is fe into a iscriminator which attempts to istinguish whether the ata is real (i.e., sample from the true unerlying istribution) or synthetic (i.e., generate from a low imensional noise sequence). Training generative moels in an aversarial fashion has proven to be successful in computer vision an enable several exciting applications. Analogous to how the generator is traine in GANs, we train the privatizer in an aversarial fashion by making it compete with an attacker. 1.3 Outline The remainer of our paper is organize as follows. We formally present our GAP moel in Section 2. We also show how, as a special case, it can recover several information theoretic notions of privacy. We then stuy a simple (but canonical) binary ataset moel in Section 3. In particular, we present theoretically optimal PDD an PDI privatization schemes, an show how these schemes can be learne from ata using a generative aversarial network. In Section 4, we investigate binary Gaussian mixture ataset moels, an provie a variety of privatization schemes. We comment on their theoretical performance an show how their parameters can be learne from ata in a generative aversarial fashion. Our proofs are eferre to sections A, B, an C of the Appenix. We conclue our paper in Section 5 with a few remarks an interesting extensions. 2 Generative Aversarial Privacy Moel We consier a ataset D which contains both public an private variables for n iniviuals (see Figure 1). We represent the public variables by a ranom variable X X, an the private variables (which are typically correlate with the public variables) by a ranom variable Y Y. Each ataset entry contains a pair of public an private variables enote by (X, Y ). Instances of X an Y are enote by x an y, respectively. We assume that each entry pair (X, Y ) is istribute accoring to P (X, Y ), an is inepenent from other entry pairs in the ataset. Since the ataset entries are inepenent of each other, we restrict our attention to memoryless mechanisms: privacy mechanisms that are applie on each ata entry separately. Formally, we efine the privacy mechanism as a ranomize mapping given by g(x, Y ) : X Y X. We consier two ifferent types of privatization schemes: (a) private ata epenent (PDD) schemes, an (b) private ata inepenent (PDI) schemes. A privatization mechanism is PDD if its output is epenent on both Y an X. It is PDI if its output only epens on X. PDD mechanisms are naturally superior to PDI mechanisms. We show, in sections 3 an 4, that there is a sizeable gap in performance between these two approaches. In our propose GAP framework, the privatizer is pitte against an aversary. We moel the interactions between the privatizer an the aversary as a non-cooperative game. For a fixe g, the goal of the aversary is to reliably infer Y from g(x, Y ) using a strategy h. For a fixe aversarial strategy h, the goal of the privatizer is to esign g in a way that minimizes the aversary s capability of inferring the private variable from the perturbe ata. The optimal privacy mechanism is obtaine as an equilibrium point at which both the privatizer an the aversary can not improve their strategies by unilaterally eviating from the equilibrium point. 2.1 Formulation Given the output ˆX = g(x, Y ) of a privacy mechanism g(x, Y ), we efine Ŷ = h(g(x, Y )) to be the aversary s inference of the private variable Y from ˆX. To quantify the effect of aversarial 5

6 inference, for a given public-private pair (x, y), we moel the loss of the aversary as l(h(g(x = x, Y = y)), Y = y) : Y Y R. Therefore, the expecte loss of the aversary with respect to (w.r.t.) X an Y is efine to be L(h, g) E[l(h(g(X, Y )), Y )], (1) where the expectation is taken over P (X, Y ) an the ranomness in g an h. Intuitively, the privatizer woul like to minimize the aversary s ability to learn Y reliably from the publishe ata. This can be trivially one by releasing an ˆX inepenent of X. However, such an approach provies no utility for ata analysts who want to learn non-private variables from ˆX. To overcome this issue, we capture the loss incurre by privatizing the original ata via a istortion function (ˆx, x) : X X R, which measures how far the original ata X = x is from the privatize ata ˆX = ˆx. Thus, the average istortion uner g(x, Y ) is E[(g(X, Y ), X)], where the expectation is taken over P (X, Y ) an the ranomness in g. On the one han, the ata holer woul like to fin a privacy mechanism g that is both privacy preserving (in the sense that it is ifficult for the aversary to learn Y from ˆX) an utility preserving (in the sense that it oes not istort the original ata too much). On the other han, for a fixe choice of privacy mechanism g, the aversary woul like to fin a (potentially ranomize) function h that minimizes its expecte loss, which is equivalent to maximizing the negative of the expecte loss. To achieve these two opposing goals, we moel the problem as a constraine minimax game between the privatizer an the aversary: min max g( ) h( ) L(h, g) (2) s.t. E[(g(X, Y ), X)] D, where the constant D 0 etermines the allowable istortion for the privatizer an the expectation is taken over P (X, Y ) an the ranomness in g an h. 2.2 GAP uner Various Loss Functions The above formulation places no restrictions on the aversary. Inee, ifferent loss functions an ecision rules lea to ifferent aversarial moels. In what follows, we will iscuss a variety of loss functions uner har an soft ecision rules, an show how our GAP framework can recover several popular information theoretic privacy notions. Har Decision Rules. When the aversary aopts a har ecision rule, h(g(x, Y )) is an estimate of Y. Uner this setting, we can choose l(h(g(x, Y )), Y ) in a variety of ways. For instance, if Y is continuous, the aversary can attempt to minimize the ifference between the estimate an true private variable values. This can be achieve by consiering a square loss function l(h(g(x, Y )), Y ) = (h(g(x, Y )) Y ) 2, (3) which is known as the l 2 loss. In this case, one can verify that the aversary s optimal ecision rule is h = E[Y g(x, Y )], which is the conitional mean of Y given g(x, Y ). Furthermore, uner the aversary s optimal ecision rule, the minimax problem in (2) simplifies to min g( ) mmse(y g(x, Y )) = max mmse(y g(x, Y )), g( ) subject to the istortion constraint. Here mmse(y g(x, Y )) is the resulting minimum mean square error (MMSE) uner h = E[Y g(x, Y )]. Thus, uner the l 2 loss, GAP provies privacy guarantees against an MMSE aversary. On the other han, when Y is iscrete (e.g., age, gener, political affiliation, etc), the aversary can attempt to maximize its classification accuracy. This is achieve by consiering a 0-1 loss function [63] given by { 0 if h(g(x, Y )) = Y l(h(g(x, Y )), Y ) = 1 otherwise. (4) 6

7 In this case, one can verify that the aversary s optimal ecision rule is the maximum a posteriori probability (MAP) ecision rule: h = argmax y Y P (y g(x, Y )), with ties broken uniformly at ranom. Moreover, uner the MAP ecision rule, the minimax problem in (2) reuces to min g( ) (1 max P (y, g(x, Y ))) = min y Y max g( ) y Y P (y, g(x, Y )) 1, (5) subject to the istortion constraint. Thus, uner a 0-1 loss function, the GAP formulation provies privacy guarantees against a MAP aversary. Soft Decision Rules. Instea of a har ecision rule, we can also consier a broaer class of soft ecision rules where h(g(x, Y )) is a istribution over Y; i.e., h(g(x, Y )) = P h (y g(x, Y )) for y Y. In this context, we can analyze the performance uner a log-loss l(h(g(x, Y )), y) = log In this case, the objective of the aversary simplifies to max E[log 1 ] = H(Y g(x, Y )), h( ) P h (y g(x, Y )) 1 P h (y g(x, Y )). (6) an that the maximization is attaine at Ph (y g(x, Y )) = P (y g(x, Y )). Therefore, the optimal aversarial ecision rule is etermine by the true conitional istribution P (y g(x, Y )), which we assume is known to the ata holer in the game-theoretic setting. Thus, uner the log-loss function, the minimax optimization problem in (2) reuces to min g( ) H(Y g(x, Y )) = min I(g(X, Y ); Y ) H(Y ), g( ) subject to the istortion constraint. Thus, uner the log-loss in (6), GAP is equivalent to using MI as the privacy metric [12]. The 0-1 loss captures a strong guessing aversary; in contrast, log-loss or information-loss moels a belief refining aversary. Next, we consier a more general α-loss function [52] that allows continuous interpolation between these extremes via l(h(g(x, Y )), y) = α α 1 ( 1 P h (y g(x, Y )) 1 1 α ), (7) for any α > 1. As shown in [52], for very large α (α ), this loss approaches that of the 0-1 (MAP) aversary. As α ecreases, the convexity of the loss function encourages the estimator Ŷ to be probabilistic, as it increasingly rewars correct inferences of lesser an lesser likely outcomes (in contrast to a har ecision rule by a MAP aversary of the most likely outcome) conitione on the reveale ata. As α 1, (7) yiels the logarithmic loss, an the optimal belief PŶ is simply the posterior belief. Denoting Hα(Y a g(y, X)) as the Arimoto conitional entropy of orer α, one can verify that [52] [ α max E h( ) α 1 ( 1 P h (y g(x, Y )) 1 1 α ) ] = H a α(y g(x, Y )), which is achieve by a α-tilte conitional istribution [52] P h (y g(x, Y )) = P (y g(x, Y ))α P (y g(x, Y )) α. y Y Uner this choice of a ecision rule, the objective of the minimax optimization in (2) reuces to min g( ) Ha α(y g(x, Y )) = min g( ) Ia α(g(x, Y ); Y ) H α (Y ), (8) where I a α is the Arimoto mutual information an H α is the Rényi entropy. Note that as α 1, we recover the classical MI privacy setting an when α, we recover the 0-1 loss. 7

8 2.3 Data-riven GAP So far, we have focuse on a setting where the ata holer has access to P (X, Y ). When P (X, Y ) is known, the ata holer can simply solve the constraine minimax optimization problem in (2) (theoretical version of GAP) to obtain a privatization mechanism that woul perform best against a chosen type of aversary. In the absence of P (X, Y ), we propose a ata-riven version of GAP that allows the ata holer to learn privatization mechanisms irectly from a ataset of the form D = {(x (i), y (i) )} n i=1. Uner the ata-riven version of GAP, we represent the privacy mechanism via a conitional generative moel g(x, Y ; θ p ) parameterize by θ p. This generative moel takes (X, Y ) as inputs an outputs ˆX. In the training phase, the ata holer learns the optimal parameters θ p by competing against a computational aversary: a classifier moele by a neural network h(g(x, Y ; θ p ); θ a ) parameterize by θ a. After convergence, we evaluate the performance of the learne g(x, Y ; θp) by computing the maximal probability of inferring Y uner the MAP aversary stuie in the theoretical version of GAP. We note that in theory, the functions h an g can (in general) be arbitrary; i.e., they can capture all possible learning algorithms. However, in practice, we nee to restrict them to a rich hypothesis class. Figure 3 shows an example of the GAP moel in which the privatizer an aversary are moele as multi-layer ranomize neural networks. For a fixe h an g, we quantify the aversary s empirical loss using a continuous an ifferentiable function L EMP (θ p, θ a ) = 1 n n l(h(g(x (i), y (i) ; θ p ); θ a ), y (i) ), (9) i=1 where (x (i), y (i) ) is the i th row of D an l(h(g(x (i), y (i) ; θ p ); θ a ), y (i) ) is the aversary loss in the ata-riven context. The optimal parameters for the privatizer an aversary are the solution to min max L EMP (θ p, θ a ) (10) θ p θ a s.t. E D [(g(x, Y ; θ p ), X)] D, where the expectation is taken over the ataset D an the ranomness in g. In keeping with the now common practice in machine learning, in the ata-riven approach for GAP, one can use the empirical log-loss function [80, 95] given by (9) with l(h(g(x (i), y (i) ; θ p ); θ a ), y (i) ) = y (i) log h(g(x (i), y (i) ; θ p ); θ a ) (1 y (i) ) log(1 h(g(x (i), y (i) ; θ p ); θ a )), which leas to a minimum cross-entropy aversary. As a result, the empirical loss of the aversary is quantifie by the cross-entropy L XE (θ p, θ a ) = 1 n n y (i) log h(g(x (i), y (i) ; θ p ); θ a ) + (1 y (i) ) log(1 h(g(x (i), y (i) ; θ p ); θ a )). (11) i=1 An alternative loss that can be reaily use in this setting is the α-loss introuce in Section 2.2. In the ata-riven context, the α-loss can be written as l(h(g(x (i), y (i) ; θ p ); θ a ), y (i) ) = α ( y (i) (1 h(g(x (i), y (i) ; θ p ); θ a ) 1 1 α ) α 1 ) +(1 y (i) )(1 (1 h(g(x (i), y (i) ; θ p ); θ a )) 1 1 α ), (12) for any constant α > 1. As iscusse in Section 2.2, the α-loss captures a variety of aversarial moels an recovers both the log-loss (when α 1) an 0-1 loss (when α ). Futhermore, (12) suggests that α-leakage can be use as a surrogate (an smoother) loss function for the 0-1 loss (when α is relatively large). The minimax optimization problem in (10) is a two-player non-cooperative game between the privatizer an the aversary. The strategies of the privatizer an aversary are given by θ p an θ a, respectively. Each player chooses the strategy that optimizes its objective function w.r.t. what its opponent oes. In particular, the privatizer must expect that if it chooses θ p, the aversary will choose a θ a that maximizes the negative of its own loss function base on the choice of the privatizer. The optimal privacy mechanism is given by the equilibrium of the privatizer-aversary game. 8

9 Privatizer θ p Aversary θa Sampling Input X Y X Y Noise Input layer Hien layer Output layer Input layer Hien layer Output layer Figure 3: A multi-layer neural network moel for the privatizer an aversary In practice, we can learn the equilibrium of the game using an iterative algorithm presente in Algorithm 1. We first maximize the negative of the aversary s loss function in the inner loop to compute the parameters of h for a fixe g. Then, we minimize the privatizer s loss function, which is moele as the negative of the aversary s loss function, to compute the parameters of g for a fixe h. To avoi over-fitting an ensure convergence, we alternate between training the aversary for k epochs an training the privatizer for one epoch. This results in the aversary moving towars its optimal solution for small perturbations of the privatizer [34]. To incorporate the istortion constraint into the learning algorithm, we use the penalty metho [53] an augmente Lagrangian metho [24] to replace the constraine optimization problem by a series of unconstraine problems whose solutions asymptotically converge to the solution of the constraine problem. Uner the penalty metho, the unconstraine optimization problem is forme by aing a penalty to the objective function. The ae penalty consists of a penalty parameter ρ t multiplie by a measure of violation of the constraint. The measure of violation is non-zero when the constraint is violate an is zero if the constraint is not violate. Therefore, in Algorithm 1, the constraine optimization problem of the privatizer can be approximate by a series of unconstraine optimization problems with the loss function l(θ p, θa t+1 ) = 1 M M i=1 + ρ t max{0, l(h(g(x (i), y (i) ; θ p ); θ t+1 a ), y (i) ) (13) 1 M M (g(x (i), y (i) ; θ p ), x (i) ) D}, i=1 where ρ t is a penalty coefficient which increases with the number of iterations t. For convex optimization problems, the solution to the series of unconstraine problems will eventually converge to the solution of the original constraine problem [53]. The augmente Lagrangian metho is another approach to enforce equality constraints by penalizing the objective function whenever the constraints are not satisfie. Different from the penalty metho, the augmente Lagrangian metho combines the use of a Lagrange multiplier an a quaratic penalty term. Note that this metho is esigne for equality constraints. Therefore, we introuce a slack variable δ to convert the inequality istortion constraint into an equality constraint. Using the augmente Lagrangian metho, the constraine optimization problem of the privatizer can be replace by a series of unconstraine problems with the loss function given by l(θ p, θa t+1, δ) = 1 M M i=1 + ρ t 2 ( 1 M λ t ( 1 M l(h(g(x (i), y (i) ; θ p ); θ t+1 a ), y (i) ) (14) M (g(x (i), y (i) ; θ p ), x (i) ) + δ D) 2 i=1 M (g(x (i), y (i) ; θ p ), x (i) ) + δ D), i=1 i=1 where ρ t is a penalty coefficient which increases with the number of iterations t an λ t is upate M accoring to the rule λ t+1 = λ t ρ t ( 1 M (g(x (i), y (i) ; θ p ), x (i) ) + δ D). For convex optimization problems, the solution to the series of unconstraine problems formulate by the augmente Lagrangian metho also converges to the solution of the original constraine problem [24]. 9

10 Algorithm 1 Alternating minimax privacy preserving algorithm Input: ataset D, istortion parameter D, iteration number T Output: Optimal privatizer parameter θ p proceure Alernate Minimax(D, D, T ) Initialize θ 1 p an θ 1 a for t = 1,..., T o Ranom minibatch of M atapoints {x (1),..., x (M) } rawn from full ataset Generate {ˆx (1),..., ˆx (M) } via ˆx (i) = g(x (i), y (i) ; θ t p) Upate the aversary parameter θ t+1 a θ t+1 a = θ t a + α t θa 1 M by stochastic graient ascen for k epochs M l(h(ˆx (i) ; θ a ), y (i) ), α t > 0 i=1 Compute the escent irection θp l(θ p, θa t+1 ), where l(θ p, θa t+1 ) = 1 M M i=1 subject to 1 M M i=1 [(g(x (i), y (i) ; θ p ), x (i) )] D l(h(g(x (i), y (i) ; θ p ); θ t+1 a ), y (i) ) Perform line search along θp l(θ p, θa t+1 ) an upate Exit if solution converge θ t+1 p = θ t p α t θp l(θ p, θ t+1 a ) return θ t+1 p 2.4 Our Focus Our GAP framework is very general an can be use to capture many notions of privacy via various ecision rules an loss funcitons. In the rest of this paper, we investigate GAP uner 0-1 loss for two simple ataset moels: (a) the binary ata moel (Section 3), an (b) the binary Gaussian mixture moel (Section 4). Uner the binary ata moel, both X an Y are binary. Uner the binary Gaussian mixture moel, Y is binary whereas X is conitionally Gaussian. We use these results to valiate that the ata-riven version of GAP can iscover theoretically optimal privatization schemes. In the ata-riven approach of GAP, since P (X, Y ) is typically unknown in practice an our objective is to learn privatization schemes irectly from ata, we have to consier the empirical (ata-riven) version of (5). Such an approach immeiately hits a roablock because taking erivatives of a 0-1 loss function w.r.t. the parameters of h an g is ill-efine. To circumvent this issue, similar to the common practice in the ML literature, we use the empirical log-loss (see Equation (11)) as the loss function for the aversary. We erive game-theoretically optimal mechanisms for the 0-1 loss function, an use them as a benchmark against which we compare the performance of the ata-riven GAP mechanisms. 10

11 3 Binary Data Moel In this section, we stuy a setting where both the public an private variables are binary value ranom variables. Let p i,j enote the joint probability of (X, Y ) = (i, j), where i, j {0, 1}. To prevent an aversary from correctly inferring the private variable Y from the public variable X, the privatizer applies a ranomize mechanism on X to generate the privatize ata ˆX. Since both the original an privatize public variables are binary, the istortion between x an ˆx can be quantifie by the Hamming istortion; i.e. (ˆx, x) = 1 if ˆx x an (ˆx, x) = 0 if ˆx = x. Thus, the expecte istortion is given by E[( ˆX, X)] = P ( ˆX X). 3.1 Theoretical Approach for Binary Data Moel The aversary s objective is to correctly guess Y from ˆX. We consier a MAP aversary who has access to the joint istribution of (X, Y ) an the privacy mechanism. The privatizer s goal is to privatize X in a way that minimizes the aversary s probability of correctly inferring Y from ˆX subject to the istortion constraint. We first focus on private-ata epenent (PDD) privacy mechanisms that epen on both Y an X. We later consier private-ata inepenent (PDI) privacy mechanisms that only epen on X PDD Privacy Mechanism Let g(x, Y ) enote a PDD mechanism. Since X, Y, an ˆX are binary ranom variables, the mechanism g(x, Y ) can be represente by the conitional istribution P ( ˆX X, Y ) that maps the public an private variable pair (X, Y ) to an output ˆX given by P ( ˆX = 0 X = 0, Y = 0) = s 0,0, P ( ˆX = 0 X = 0, Y = 1) = s 0,1, P ( ˆX = 1 X = 1, Y = 0) = s 1,0, P ( ˆX = 1 X = 1, Y = 1) = s 1,1. Thus, the marginal istribution of ˆX is given by P ( ˆX = 0) = X,Y P ( ˆX = 0 X, Y )P (X, Y ) = s 0,0 p 0,0 + s 0,1 p 0,1 + (1 s 1,0 )p 1,0 + (1 s 1,1 )p 1,1, P ( ˆX = 1) = X,Y P ( ˆX = 1 X, Y )P (X, Y ) = (1 s 0,0 )p 0,0 + (1 s 0,1 )p 0,1 + s 1,0 p 1,0 + s 1,1 p 1,1. If ˆX = 0, the aversary s inference accuracy for guessing Ŷ = 1 is P (Y = 1, ˆX = 0) = X P (X, Y = 1)P ( ˆX = 0 X, Y = 1) = p 1,1 (1 s 1,1 ) + p 0,1 s 0,1, (15) an the inference accuracy for guessing Ŷ = 0 is P (Y = 0, ˆX = 0) = X P (X, Y = 0)P ( ˆX = 0 X, Y = 0) = p 1,0 (1 s 1,0 ) + p 0,0 s 0,0. (16) Let s = {s 0,0, s 0,1, s 1,0, s 1,1 }. For ˆX = 0, the MAP aversary s inference accuracy is given by P (B) (s, ˆX = 0) = max{p (Y = 1, ˆX = 0), P (Y = 0, ˆX = 0)}. (17) Similarly, if ˆX = 1, the MAP aversary s inference accuracy is given by where P (B) (s, ˆX = 1) = max{p (Y = 1, ˆX = 1), P (Y = 0, ˆX = 1)}, (18) P (Y = 1, ˆX = 1) = X P (Y = 0, ˆX = 1) = X P (X, Y = 1)P ( ˆX = 1 X, Y = 1) = p 1,1 s 1,1 + p 0,1 (1 s 0,1 ), (19) P (X, Y = 0)P ( ˆX = 1 X, Y = 0) = p 1,0 s 1,0 + p 0,0 (1 s 0,0 ). 11

12 As a result, for a fixe privacy mechanism s, the MAP aversary s inference accuracy can be written as P (B) (B) = max P (h(g(x, Y )) = Y ) = P (s, ˆX = 0) + P (B) (s, ˆX = 1). h( ) Thus, the optimal PDD privacy mechanism is etermine by solving min s s.t. P (B) (s, ˆX = 0) + P (B) (s, ˆX = 1) (20) P ( ˆX = 0, X = 1) + P ( ˆX = 1, X = 0) D s [0, 1] 4. Notice that the above constraine optimization problem is a four imensional optimization problem parameterize by p = {p 0,0, p 0,1, p 1,0, p 1,1 } an D. Interestingly, we can formulate (20) as a linear program (LP) given by min t 0 + t 1 (21) s 1,1,s 0,1,s 1,0,s 0,0,t 0,t 1 s.t. 0 s 1,1, s 0,1, s 1,0, s 0,0 1 p 1,1 (1 s 1,1 ) + p 0,1 s 0,1 t 0 p 1,0 (1 s 1,0 ) + p 0,0 s 0,0 t 0 p 1,1 s 1,1 + p 0,1 (1 s 0,1 ) t 1 p 1,0 s 1,0 + p 0,0 (1 s 0,0 ) t 1 p 1,1 (1 s 1,1 ) + p 0,1 (1 s 0,1 ) + p 1,0 (1 s 1,0 ) + p 0,0 (1 s 0,0 ) D, where t 0 an t 1 are two slack variables representing the maxima in (17) an (18), respectively. The optimal mechanism can be obtaine by numerically solving (21) using any off-the-shelf LP solver PDI Privacy Mechanism In the previous section, we consiere PDD privacy mechanisms. Although we were able to formulate the problem as a linear program with four variables, etermining a close form solution for such a highly parameterize problem is not analytically tractable. Thus, we now consier the simple (yet meaningful) class of PDI privacy mechanisms. Uner PDI privacy mechanisms, the Markov chain Y X ˆX hols. As a result, P (Y, ˆX = ˆx) can be written as P (Y, ˆX = ˆx) = X = X = X P (Y, ˆX = ˆx X)P (X) (22) P (Y X)P ( ˆX = ˆx X)P (X) (23) P (Y, X)P ( ˆX = ˆx X), (24) where the secon equality is ue to the conitional inepenence property of the Markov chain Y X ˆX. For the PDI mechanisms, the privacy mechanism g(x, Y ) can be represente by the conitional istribution P ( ˆX X). To make the problem more tractable, we focus on a slightly simpler setting in which Y = X N, where N {0, 1} is a ranom variable inepenent of X an follows a Bernoulli istribution with parameter q. In this setting, the joint istribution of (X, Y ) can be compute as P (X = 1, Y = 1) = P (Y = 1 X = 1)P (X = 1) = p(1 q), (25) P (X = 0, Y = 1) = P (Y = 1 X = 0)P (X = 0) = (1 p)q, (26) P (X = 1, Y = 0) = P (Y = 0 X = 1)P (X = 1) = pq, (27) P (X = 0, Y = 0) = P (Y = 0 X = 0)P (X = 0) = (1 p)(1 q). (28) Let s = {s 0, s 1 } in which s 0 = P ( ˆX = 0 X = 0) an s 1 = P ( ˆX = 1 X = 1). The joint 12

13 istribution of (Y, ˆX) is given by P (Y = 1, ˆX = 0) = p(1 q)(1 s 1 ) + (1 p)qs 0, P (Y = 0, ˆX = 0) = pq(1 s 1 ) + (1 p)(1 q)s 0, P (Y = 1, ˆX = 1) = p(1 q)s 1 + (1 p)q(1 s 0 ), P (Y = 0, ˆX = 1) = pqs 1 + (1 p)(1 q)(1 s 0 ). Using the above joint probabilities, for a fixe s, we can write the MAP aversary s inference accuracy as P (B) = max h( ) P (h(g(x, Y )) = Y ) = max{p (Y = 1, ˆX = 0), P (Y = 0, ˆX = 0)} (29) + max{p (Y = 1, ˆX = 1), P (Y = 0, ˆX = 1)}. Therefore, the optimal PDI privacy mechanism is given by the solution to min s s.t. P (B) (30) P ( ˆX = 0, X = 1) + P ( ˆX = 1, X = 0) D s [0, 1] 2, where the istortion in (30) is given by (1 s 0 )(1 p) + (1 s 1 )p. By (29), P (B) can be consiere as a sum of two functions, where each function is a maximum of two linear functions. Therefore, it is convex in s 0 an s 1 for ifferent values of p, q an D. Theorem 1. For fixe p, q an D, there exists infinitely many PDI privacy mechanisms that achieve the optimal privacy-utility traeoff. If q = 1 2, any privacy mechanism that satisfies {s 0, s 1 ps 1 + (1 p)s 0 1 D, s 0, s 1 [0, 1]} is optimal. If q 1 2, the optimal PDI privacy mechanism is given as follows: If 1 D > max{p, 1 p}, the optimal privacy mechanism is given by {s 0, s 1 ps 1 + (1 p)s 0 = 1 D, s 0, s 1 [0, 1]}. The aversary s accuracy of correctly guessing the private variable is { (1 2q)(1 D) + q if q < 1 2 (2q 1)(1 D) + 1 q if q > 1. (31) 2 Otherwise, the optimal privacy mechanism is given by {s 0, s 1 max{min{p, 1 p}, 1 D} ps 1 + (1 p)s 0 max{p, 1 p}, s 0, s 1 [0, 1]} an the aversary s accuracy of correctly guessing the private variable is { p(1 q) + (1 p)q if p 1 2, q < 1 2 or p 1 2, q > 1 2 pq + (1 p)(1 q) if p 1 2, q > 1 2 or p 1 2, q < 1. (32) 2 Proof sketch: The proof of Theorem 1 is provie in Appenix A. We briefly sketch the proof etails here. For the special case q = 1 2, the solution is trivial since the private variable Y is inepenent of the public variable X. Thus, the optimal solution is given by any s 0, s 1 that satisfies the istortion constraint {s 0, s 1 ps 1 + (1 p)s 0 1 D, s 0, s 1 [0, 1]}. For q 1 2, we separate the optimization problem in (30) into four subproblems base on the ecision of the aversary. We then compute the optimal privacy mechanism of the privatizer in each subproblem. Summarizing the optimal solutions to the subproblems for ifferent values of p, q an D yiels Theorem 1. Remark: Note that if 1 D > max{p, 1 p}, i.e., D < min{p, 1 p}, the privacy guarantee achieve by the optimal PDI mechanism (the MAP aversary s accuracy of correctly guessing the private variable) ecreases linearly with D. For D min{p, 1 p}, the optimal PDI mechanism achieves a constant privacy guarantee regarless of D. However, in this case, the privatizer can just use the optimal privacy mechanism with D = min{p, 1 p} to optimize privacy guarantee without further sacrificing utility. 13

14 Privatizer Network Aversary Network Input (X, Y) s 0,0 s 0,1 s 1,0 s 1,1 Sampling X θ a,0 Y = θ a,1 X + θ a,0 (1 X ) θ a,1 Noise Figure 4: Neural network structure of the privatizer an aversary for binary ata moel 3.2 Data-riven Approach for Binary Data Moel In practice, the joint istribution of (X, Y ) is often unknown to the ata holer. Instea, the ata holer has access to a ataset D, which is use to learn a goo privatization mechanism in a generative aversarial fashion. In the training phase, the ata holer learns the parameters of the conitional generative moel (representing the privatization scheme) by competing against a computational aversary represente by a neural network. The etails of both neural networks are provie later in this section. When convergence is reache, we evaluate the performance of the learne privatization scheme by computing the accuracy of inferring Y uner a strong MAP aversary that: (a) has access to the joint istribution of (X, Y ), (b) has knowlege of the learne privacy mechanism, an (c) can compute the MAP rule. Ultimately, the ata holer s hope is to learn a privatization scheme that matches the one obtaine uner the game-theoretic framework, where both the aversary an privatizer are assume to have access to P (X, Y ). To evaluate our ata-riven approach, we compare the mechanisms learne in an aversarial fashion on D with the game-theoretically optimal ones. Since the private variable Y is binary, we use the empirical log-loss function for the aversary (see Equation (11)). For a fixe θ p, the aversary learns the optimal θa by maximizing L XE (h(g(x, Y ; θ p ); θ a ), Y ) given in Equation (11). For a fixe θ a, the privatizer learns the optimal θp by minimizing L XE (h(g(x, Y ; θ p ); θ a ), Y ) subject to the istortion constraint (see Equation (10)). Since both X an Y are binary variables, we can use the privatizer parameter θ p to represent the privacy mechanism s irectly. For the aversary, we efine θ a = (θ a,0, θ a,1 ), where θ a,0 = P (Y = 0 ˆX = 0) an θ a,1 = P (Y = 1 ˆX = 1). Thus, given a privatize public variable input g(x (i), y (i) ; θ p ) {0, 1}, the output belief of the aversary guessing y (i) = 1 can be written as (1 θ a,0 )(1 g(x (i), y (i) ; θ p )) + θ a,1 g(x (i), y (i) ; θ p ). For PDD privacy mechanisms, we have θ p = s = {s 0,0, s 0,1, s 1,0, s 1,1 }. Given the fact that both x (i) an y (i) are binary, we use two simple neural networks to moel the privatizer an the aversary. As shown in Figure 4, the privatizer is moele as a two-layer neural network parameterize by s, while the aversary is moele as a two-layer neural network classifier. From the perspective of the privatizer, the belief of an aversary guessing y (i) = 1 conitione on the input (x (i), y (i) ) is given by where h(g(x (i), y (i) ; s); θ a ) = θ a,1 P (ˆx (i) = 1) + (1 θ a,0 )P (ˆx (i) = 0), (33) P (ˆx (i) = 1) =x (i) y (i) s 1,1 + (1 x (i) )y (i) (1 s 0,1 ) + x (i) (1 y (i) )s 1,0 + (1 x (i) )(1 y (i) )(1 s 0,0 ), P (ˆx (i) = 0) =x (i) y (i) (1 s 1,1 ) + (1 x (i) )y (i) s 0,1 + x (i) (1 y (i) )(1 s 1,0 ) + (1 x (i) )(1 y (i) )s 0,0. Furthermore, the expecte istortion is given by E D [(g(x, Y ; s), X)] = 1 n n [x (i) y (i) (1 s 1,1 ) + x (i) (1 y (i) )(1 s 1,0 ) (34) i=1 + (1 x (i) )y (i) (1 s 0,1 ) + (1 x (i) )(1 y (i) )(1 s 0,0 )]. Similar to the PDD case, we can also compute the belief of guessing y (i) = 1 conitional on the input (x (i), y (i) ) for the PDI schemes. Observe that in the PDI case, θ p = s = {s 0, s 1 }. Therefore, 14

15 we have h(g(x (i), y (i) ; s); θ a ) = θ a,1 [x (i) s 1 + (1 x (i) )(1 s 0 )] + (1 θ a,0 )[(1 x (i) )s 0 + x (i) (1 s 1 )]. (35) Uner PDI schemes, the expecte istortion is given by E D [(g(x, Y ; s), X)] = 1 n n [x (i) (1 s 1 ) + (1 x (i) )(1 s 0 )]. (36) i=1 Thus, we can use Algorithm 1 propose in Section 2.3 to learn the optimal PDD an PDI privacy mechanisms from the ataset. 3.3 Illustration of Results We now evaluate our propose GAP framework using synthetic atasets. We focus on the setting in which Y = X N, where N {0, 1} is a ranom variable inepenent of X an follows a Bernoulli istribution with parameter q. We generate two synthetic atasets with (p, q) equal to (0.75, 0.25) an (0.5, 0.25), respectively. Each synthetic ataset use in this experiment contains 10, 000 training samples an 2, 000 test samples. We use Tensorflow [2] to train both the privatizer an the aversary using Aam optimizer with a learning rate of 0.01 an a minibatch size of 200. The istortion constraint is enforce by the penalty metho provie in (13). Optimal probability of etection w.r.t. ifferent value of D for p=0.5, q= Optimal probability of etection w.r.t. ifferent value of D for p=0.75, q= Accuracy Accuracy Distortion (a) Performance of privacy mechanisms against MAP aversary for p = 0.5 Optimal mutual information privacy w.r.t istortion for p=0.5, q= Distortion (b) Performance of privacy mechanisms against MAP aversary for p = 0.75 Optimal mutual information privacy w.r.t istortion for p=0.75, q= privacy loss (bits) privacy loss (bits) Distortion (c) Performance of privacy mechanisms uner MI privacy metric for p = Distortion () Performance of privacy mechanisms uner MI privacy metric for p = 0.75 Figure 5: Privacy-istortion traeoff for binary ata moel Figure 5a illustrates the performance of both optimal PDD an PDI privacy mechanisms against a strong theoretical MAP aversary when (p, q) = (0.5, 0.25). It can be seen that the inference accuracy of the MAP aversary reuces as the istortion increases for both optimal PDD an PDI privacy mechanisms. As one woul expect, the PDD privacy mechanism achieves a lower 15

16 inference accuracy for the aversary, i.e., better privacy, than the PDI mechanism. Furthermore, when the istortion is higher than some threshol, the inference accuracy of the MAP aversary saturates regarless of the istortion. This is ue to the fact that the correlation between the private variable an the privatize public variable cannot be further reuce once the istortion is larger than the saturation threshol. Therefore, increasing istortion will not further reuce the accuracy of the MAP aversary. We also observe that the privacy mechanism obtaine via the ata-riven approach performs very well when pitte against the MAP aversary (maximum accuracy ifference aroun 3% compare to the theoretical approach). In other wors, for the binary ata moel, the ata-riven version of GAP can yiel privacy mechanisms that perform as well as the mechanisms compute uner the theoretical version of GAP, which assumes that the privatizer has access to the unerlying istribution of the ataset. Figure 5b shows the performance of both optimal PDD an PDI privacy mechanisms against the MAP aversary for (p, q) = (0.75, 0.25). Similar to the equal prior case, we observe that both PDD an PDI privacy mechanisms reuce the accuracy of the MAP aversary as the istortion increases an saturate when the istortion goes above a certain threshol. It can be seen that the saturation threshols for both PDD an PDI privacy mechanisms in Figure 5b are lower than the equal prior case plotte in Figure 5a. The reason is that when (p, q) = (0.75, 0.25), the correlation between Y an X is weaker than the equal prior case. Therefore, it requires less istortion to achieve the same privacy. We also observe that the performance of the GAP mechanism obtaine via the ata-riven approach is comparable to the mechanism compute via the theoretical approach. The performance of the GAP mechanism obtaine using the log-loss function (i.e., MI privacy) is plotte in Figure 5c an 5. Similar to the MAP aversary case, as the istortion increases, the mutual information between the private variable an the privatize public variable achieve by the optimal PDD an PDI mechanisms ecreases as long as the istortion is below some threshol. When the istortion goes above the threshol, the optimal privacy mechanism is able to make the private variable an the privatize public variable inepenent regarless of the istortion. Furthermore, the values of the saturation threshols are very close to what we observe in Figure 5a an 5b. 4 Binary Gaussian Mixture Moel Thus far, we have stuie a simple binary ataset moel. In many real atasets, the sample space of variables often takes more than just two possible values. It is well known that the Gaussian istribution is a flexible approximate for many istributions [89]. Therefore, in this section, we stuy a setting where Y {0, 1} an X is a Gaussian ranom variable whose mean an variance are epenent on Y. Without loss of generality, let E[X Y = 1] = E[X Y = 0] = µ an P (Y = 1) = p. Thus, X Y = 0 N ( µ, σ 2 0) an X Y = 1 N (µ, σ 2 1). Similar to the binary ata moel, we stuy two privatization schemes: (a) private-ata inepenent (PDI) schemes (where ˆX = g(x)), an (b) private-ata epenent (PDD) schemes (where ˆX = g(x, Y )). In orer to have a tractable moel for the privatizer, we assume g(x, Y ) is realize by aing an affine function of an inepenently generate ranom noise to the public variable X. The affine function enables controlling both the mean an variance of the privatize ata. In particular, we consier g(x, Y ) = X + (1 Y )β 0 Y β 1 + (1 Y )γ 0 N + Y γ 1 N, in which N is a one imensional ranom variable an β 0, β 1, γ 0, γ 1 are constant parameters. The goal of the privatizer is to sanitze the public ata X subject to the istortion constraint E ˆX,X ˆX X 2 2 D. 4.1 Theoretical Approach for Binary Gaussian Mixture Moel We now investigate the theoretical approach uner which both the privatizer an the aversary have access to P (X, Y ). To make the problem more tractable, let us consier a slightly simpler setting in which σ 0 = σ 1 = σ. We will relax this assumption later when we take a ata-riven approach. We further assume that N is a stanar Gaussian ranom variable. One might, rightfully, question our choice of focusing on aing (potentially Y -epenent) Gaussian noise. Though other istributions can be consiere, our approach is motivate by the following two reasons: (a) Even though it is known that aing Gaussian noise is not the worst case noise aing mechanism for non-gaussian X [74], ientifying the optimal noise istribution is mathematically intractable. Thus, for tractability an ease of analysis, we choose Gaussian noise. 16

17 (b) Aing Gaussian noise to each ata entry preserves the conitional Gaussianity of the release ataset. In what follows, we will analyze a variety of PDI an PDD mechanisms PDI Gaussian Noise Aing Privacy Mechanism We consier a PDI noise aing privatization scheme which as an affine function of the stanar Gaussian noise to the public variable. Since the privacy mechanism is PDI, we have g(x, Y ) = X +β +γn, where β an γ are constant parameters an N N (0, 1). Using the classical Gaussian hypothesis testing analysis [83], it is straightforwar to verify that the optimal inference accuracy (i.e., probability of etection) of the MAP aversary is given by P (G) = pq ( α α ln ( 1 p p )) + (1 p)q ( α 2 1 α ln ( 1 p p )), (37) 2µ where α = 1 u2 an Q(x) = γ2 +σ2 2π exp( x 2 )u. Moreover, since E ˆX,X [( ˆX, X)] = β 2 + γ 2, the istortion constraint is equivalent to β 2 + γ 2 D. Theorem 2. For a PDI Gaussian noise aing privatization scheme given by g(x, Y ) = X + β + γn, with β R an γ 0, the optimal parameters are given by β = 0, γ = D. (38) Let α = 2µ D+σ. For this optimal scheme, the accuracy of the MAP aversary is 2 P (G)* = pq ( α α ln ( 1 p p )) + (1 p)q ( α 2 1 α ln ( 1 p p )). (39) The proof of Theorem 2 is provie in Appenix B. We observe that the PDI Gaussian noise aing privatization scheme which minimizes the inference accuracy of the MAP aversary with istortion upper-boune by D is to a a zero-mean Gaussian noise with variance D PDD Gaussian Noise Aing Privacy Mechanism For PDD privatization schemes, we first consier a simple case in which γ 0 = γ 1 = 0. Without loss of generality, we assume that both β 0 an β 1 are non-negative. The privatize ata is given by ˆX = X+(1 Y )β 0 Y β 1. This is a PDD mechanism since ˆX epens on both X an Y. Intuitively, this mechanism privatizes the ata by shifting the two Gaussian istributions (uner Y = 0 an Y = 1) closer to each other. Uner this mechanism, it is easy to show that the aversary s MAP probability of inferring the private variable Y from ˆX is given by P (G) in (37) with α = 2µ (β1+β0) σ. Observe that since ( ˆX, X) = ((1 Y )β 0 Y β 1 ) 2, we have E ˆX,X [( ˆX, X)] = (1 p)β pβ 2 1. Thus, the istortion constraint implies (1 p)β pβ 2 1 D. Theorem 3. For a PDD privatization scheme given by g(x, Y ) = X +(1 Y )β 0 Y β 1, β 0, β 1 0, the optimal parameters are given by β0 pd = 1 p, (1 p)d β 1 =. (40) p For this optimal PDD privatization scheme, the accuracy of the MAP aversary is given by (37) with α = 2µ ( (1 p)d p + pd 1 p ) σ. The proof of Theorem 3 is provie in Appenix C. When P (Y = 1) = P (Y = 0) = 1 2, we have β 0 = β 1 = D, which implies that the optimal privacy mechanism for this particular case is to shift the two Gaussian istributions closer to each other equally by D regarless of the variance σ 2. When P (Y = 1) = p > 1 2, the Gaussian istribution with a lower prior probability, in this p 1 p case, X Y = 0, gets shifte times more than X Y = 1. Next, we consier a slightly more complicate case in which γ 0 = γ 1 = γ 0. Thus, the privacy mechanism is given by g(x, Y ) = X + (1 Y )β 0 Y β 1 + γn, where N N (0, 1). Intuitively, 17

18 this mechanism privatizes the ata by shifting the two Gaussian istributions (uner Y = 0 an Y = 1) closer to each other an aing another Gaussian noise N N (0, 1) scale by a constant γ. In this case, the MAP probability of inferring the private variable Y from ˆX is given by (37) with α = 2µ (β1+β0). Furthermore, the istortion constraint is equivalent to (1 p)β 2 γ2 +σ pβ1 2 + γ 2 D. Theorem 4. For a PDD privatization scheme given by g(x, Y ) = X + (1 Y )β 0 Y β 1 + γn with β 0, β 1, γ 0, the optimal parameters β 0, β 1, γ are given by the solution to min β 0,β 1,γ s.t. 2µ β 0 β 1 γ2 + σ 2 (41) (1 p)β pβ γ 2 D β 0, β 1, γ 0. Using this optimal scheme, the accuracy of the MAP aversary is given by (37) with α = 2µ β 0 β 1. (γ ) 2 +σ 2 Proof. Similar to the proofs of Theorem 2 an 3, we can compute the erivative of P (G) w.r.t. α. It is easy to verify that P (G) is monotonically increasing with α. Therefore, the optimal mechanism is given by the solution to (41). Substituting the optimal parameters into (37) yiels the MAP probability of inferring the private variable Y from ˆX. Remark: Note that the objective function in (41) only epens on β 0 + β 1 an γ. We efine β = β 0 + β 1. Thus, the above objective function can be written as min β,γ 2µ β γ2 + σ 2. (42) It is straightforwar to verify that the eterminant of the Hessian of (42) is always non-positive. Therefore, the above optimization problem is non-convex in β an γ. Finally, we consier the PDD Gaussian noise aing privatization scheme given by g(x, Y ) = X +(1 Y )β 0 Y β 1 +(1 Y )γ 0 N +Y γ 1 N, where N N (0, 1). This PDD mechanism is the most general one in the Gaussian noise aing setting an inclues the two previous mechanisms. The objective of the privatizer is to minimize the aversary s probability of correctly inferring Y from g(x, Y ) subject to the istortion constraint given by p((β 1 ) 2 + (γ 1 ) 2 ) + (1 p)((β 0 ) 2 + (γ 0 ) 2 ) D. As we have iscusse in the remark after Theorem 4, the problem becomes non-convex even for the simpler case in which γ 0 = γ 1 = γ. In orer to obtain the optimal parameters for this case, we first show that the optimal privacy mechanism lies on the bounary of the istortion constraint. Proposition 1. For the privacy mechanism given by g(x, Y ) = X+(1 Y )β 0 Y β 1 +(1 Y )γ 0 N + Y γ 1 N, the optimal parameters β 0, β 1, γ 0, γ 1 satisfy p((β 1) 2 + (γ 1) 2 ) + (1 p)((β 0) 2 + (γ 0) 2 ) = D. Proof. We prove the above statement by contraiction. Assume that the optimal parameters satisfy p((β1) 2 + (γ1) 2 ) + (1 p)((β0) 2 + (γ0) 2 ) < D. Let β 1 = β1 + c, where c > 0 is chosen so that p(( β 1 ) 2 + (γ1) 2 ) + (1 p)((β0) 2 + (γ0) 2 ) = D. Since the inference accuracy is monotonically ecreasing with β 1, the resultant inference accuracy can only be lower for replacing β1 with β 1. This contraicts with the assumption that p((β1) 2 + (γ1) 2 ) + (1 p)((β0) 2 + (γ0) 2 ) < D. Using the same type of analysis, we can show that any parameter that eviates from p((β1) 2 + (γ1) 2 ) + (1 p)((β0) 2 + (γ0) 2 ) = D is suboptimal. Let e 2 0 = (β0) 2 + (γ0) 2 an e 2 1 = (β1) 2 + (γ1) 2. Since the optimal parameters of the privatizer lie on the bounary of the istortion constraint, we have pe (1 p)e 2 0 = D. This implies (e 0, e 1 ) D 1 ɛ lies on the bounary of an ellipse parametrize by p an D. Thus, we have e 1 = 2 p 1+ɛ an 2 D ɛ e 0 = 2 1 p 1+ɛ, where ɛ [0, 1]. Therefore, the optimal parameters satisfy 2 [ (β0) 2 + (γ0) 2 D ɛ = 2 1 p 1 + ɛ 2 ]2, (β 1) 2 + (γ 1) 2 = [ D 1 ɛ 2 ]2 p 1 + ɛ 2. (43) 18

19 Privatizer Network Aversary Network Input Gaussian Noise Figure 6: Neural network structure of the privatizer an aversary for binary Gaussian mixture moel This implies (βi, γ i ), i {0, 1} lie on the bounary of two circles parametrize by D, p an ɛ. Thus, we can write β0, β1, γ0, γ1 as β0 = 2 D ɛ 1 w0 2 1 p 1 + ɛ w0 2, β1 D 1 ɛ 2 1 w1 2 = p 1 + ɛ w1 2, (44) γ0 = 4 D ɛ w 0 1 p 1 + ɛ w0 2, γ1 = 2 D 1 ɛ 2 w 1 p 1 + ɛ w1 2, where ɛ, w 0, w 1 [0, 1]. The optimal parameters β 0, β 1, γ 0, γ 1 can be compute by a gri search in the cube parametrize by ɛ, w 0, w 1 [0, 1] that minimizes the accuracy of the MAP aversary. In the following section, we will use this general PDD Gaussian noise aing privatization scheme in our ata-riven simulations an compare the performance of the privacy mechanisms obtaine by both theoretical an ata-riven approaches. 4.2 Data-riven Approach for Binary Gaussian Mixture Moel To illustrate our ata-riven GAP approach, we assume the privatizer only has access to the ataset D but oes not know the joint istribution of (X, Y ). Fining the optimal privacy mechanism becomes a learning problem. In the training phase, we use the empirical log-loss function L XE (h(g(x, Y ; θ p ); θ a ), Y ) provie in (11) for the aversary. Thus, for a fixe privatizer parameter θ p, the aversary learns the optimal parameter θ a that maximizes L XE (h(g(x, Y ; θ p ); θ a ), Y ). On the other han, the optimal parameter for the privacy mechanism is obtaine by solving (10). After convergence, we use the learne ata-riven GAP mechanism to compute the accuracy of inferring the private variable uner a strong MAP aversary. We evaluate our ata-riven approach by comparing the mechanisms learne in an aversarial fashion on D with the game-theoretically optimal ones in which both the aversary an privatizer are assume to have access to P (X, Y ). We consier the PDD Gaussian noise aing privacy mechanism given by g(x, Y ) = X + (1 Y )β 0 Y β 1 + (1 Y )γ 0 N + Y γ 1 N. Similar to the binary setting, we use two neural networks to moel the privatizer an the aversary. As shown in Figure 6, the privatizer is moele by a two-layer neural network with parameters β 0, β 1, γ 0, γ 1 R. The aversary, whose goal is to infer Y from privatize ata ˆX, is moele by a three-layer neural network classifier with leaky ReLU activations. The ranom noise is rawn from a stanar Gaussian istribution N N (0, 1). In orer to enforce the istortion constraint, we use the augmente Lagrangian metho to penalize the learning objective when the constraint is not satisfie. In the binary Gaussian mixture moel setting, the augmente Lagrangian metho uses two parameters, namely λ t an ρ t to approximate the constraine optimization problem by a series of unconstraine problems. Intuitively, a large value of ρ t enforces the istortion constraint to be bining, whereas λ t is an estimate of the Lagrangian multiplier. To obtain the optimal solution of the constraine optimization problem, we solve a series of unconstraine problems given by (14). 19

20 Table 1: Synthetic atasets Dataset P (Y = 1) X Y = 0 X Y = N ( 3, 1) N (3, 1) N ( 3, 4) N (3, 1) N ( 3, 1) N (3, 1) N ( 3, 4) N (3, 1) 4.3 Illustration of Results We use synthetic atasets to evaluate our propose GAP framework. We consier four synthetic atasets shown in Table 1. Each synthetic ataset use in this experiment contains 20, 000 training samples an 2, 000 test samples. We use Tensorflow to train both the privatizer an the aversary using Aam optimizer with a learning rate of 0.01 an a minibatch size of 200. Optimal probability of etection w.r.t. ifferent value of D for p=0.5 1 Optimal probability of etection w.r.t. ifferent value of D for p= Accuracy 0.7 Accuracy Distortion (a) Performance of PDD mechanisms against MAP aversary for p = Distortion (b) Performance of PDD mechanisms against MAP aversary for p = 0.75 Figure 7: Privacy-istortion traeoff for binary Gaussian mixture moel Figure 7a an 7b illustrate the performance of the optimal PDD Gaussian noise aing mechanisms against the strong theoretical MAP aversary when P (Y = 1) = 0.5 an P (Y = 1) = 0.75, respectively. It can be seen that the optimal mechanisms obtaine by both theoretical an atariven approaches reuce the inference accuracy of the MAP aversary as the istortion increases. Similar to the binary ata moel, we observe that the accuracy of the aversary saturates when the istortion crosses some threshol. Moreover, it is worth pointing out that for the binary Gaussian mixture setting, we also observe that the privacy mechanism obtaine through the ata-riven approach performs very well when pitte against the MAP aversary (maximum accuracy ifference aroun 6% compare with theoretical approach). In other wors, for the binary Gaussian mixture moel, the ata-riven approach for GAP can generate privacy mechanisms that are comparable, in terms of performance, to the theoretical approach, which assumes the privatizer has access to the unerlying istribution of the ata. Figures 8 to 13 show the privatization schemes for ifferent atasets. The intuition of this Gaussian noise aing mechanism is to shift istributions of X Y = 0 an X Y = 1 closer an scale the variances to preserve privacy. When P (Y = 0) = P (Y = 1) an σ 0 = σ 1, the privatizer shifts an scales the two istributions almost equally. Furthermore, the resultant ˆX Y = 0 an ˆX Y = 1 have very similar istributions. We also observe that if P (Y = 0) P (Y = 1), the public variable whose corresponing private variable has a lower prior probability gets shifte more. It is also worth mentioning that when σ 0 σ 1, the public variable with a lower variance gets scale more. The optimal privacy mechanisms obtaine via the ata-riven approach uner ifferent atasets are presente in Tables 2 to 5. In each table, D is the maximum allowable istortion. β 0, β 1, γ 0, an γ 1 are the parameters of the privatizer neural network. These learne parameters ictate the statistical moel of the privatizer, which is use to sanitize the ataset. We use acc to enote the inference accuracy of the aversary using a test ataset an xent to enote the converge cross- 20

P etect is the MAP aversary s inference accuracy uner the learne privatization scheme, assuming that the aversary: (a) has access to the joint

P etect-theory is the lowest inference accuracy we get if the privatizer ha access to the joint istribution of (X, Y ), an use this information to

21 entropy of the aversary. The column title istance represents the average istortion E D X ˆX 2 that results from sanitizing the test ataset via the learne privatization scheme. P etect is the MAP aversary s inference accuracy uner the learne privatization scheme, assuming that the aversary: (a) has access to the joint istribution of (X, Y ), (b) has knowlege of the learne privatization scheme, an (c) can compute the MAP rule. P etect-theory is the lowest inference accuracy we get if the privatizer ha access to the joint istribution of (X, Y ), an use this information to compute the parameters of the privatization scheme base on the approach provie at the en of Section Figure 8: Raw test samples, equal variance (a) D = 1 (b) D = 3 (c) D = 8 Figure 9: Prior P (Y = 1) = 0.5, X Y = 1 N(3, 1), X Y = 0 N( 3, 1) (a) D = 1 (b) D = 3 (c) D = 8 Figure 10: Prior P (Y = 1) = 0.75, X Y = 1 N(3, 1), X Y = 0 N( 3, 1) 21

22 Figure 11: Raw test samples, unequal variance (a) D = 1 (b) D = 3 (c) D = 8 Figure 12: Prior P (Y = 1) = 0.5, X Y = 1 N(3, 1), X Y = 0 N( 3, 4) (a) D = 1 (b) D = 3 (c) D = 8 Figure 13: Prior P (Y = 1) = 0.75, X Y = 1 N(3, 1), X Y = 0 N( 3, 4) Table 2: Prior P (Y = 1) = 0.5, X Y = 1 N(3, 1), X Y = 0 N( 3, 1) D β 0 β 1 γ 0 γ 1 acc xent istance P etect P etect theory

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration