Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction

Size: px

Start display at page:

Download "Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction"

Maud Francis
5 years ago
Views:

1 Differentia Privacy Preservation for Deep Auto-Encoders: an Appication of Human Behavior Prediction NhatHai Phan 1, Yue Wang 2, Xintao Wu 3, Dejing Dou 1 {haiphan,dou}@csuoregonedu, ywang91@unccedu, xintaowu@uarkedu 1 University of Oregon, USA; 2 University of North Caroina at Charotte, USA; 3 University of Arkansas, USA Abstract In recent years, deep earning has spread beyond both academia and industry with many exciting rea-word appications The deveopment of deep earning has presented obvious privacy issues However, there has been ack of scientific study about privacy preservation in deep earning In this paper, we concentrate on the autoencoder, a fundamenta component in deep earning, and propose the deep private auto-encoder (dpa Our main idea is to enforce ɛ-differentia privacy by perturbing the objective functions of the traditiona deep autoencoder, rather than its resuts We appy the dpa to human behavior prediction in a heath socia network Theoretica anaysis and thorough experimenta evauations show that the dpa is highy effective and efficient, and it significanty outperforms existing soutions Introduction Deep earning is a timey and promising area of machine earning research In genera, deep earning is about earning mutipe eves of representation and abstraction that hep to make sense of data such as images, sound, and text (Deng and Yu 2014 With significant success in recent years, the appications of deep earning are being expanded into other fieds such as socia media (Yuan et a 2013, socia network anaysis (Perozzi, A-Rfou, and Skiena 2014, bioinformatics (Chicco, Sadowski, and Badi 2014, medicine and heathcare (Song et a 2014 This presents obvious issues about protecting privacy in deep earning modes when they are buit on users persona and highy sensitive data, such as cinica records, user profies, photo, etc The present research efforts ack sufficient deep earning techniques that incorporate privacy concerns Therefore, deveoping a privacy preserving deep earning mode is an urgent demand Reeasing sensitive resuts of statistica anaysis and data mining whie protecting privacy has been studied in the past few decades One state-of-the-art approach to the probem is ɛ-differentia privacy (Dwork et a 200, which works by injecting random noise into the reeased statistica resuts computed from the underying sensitive data, such that the distribution of the noisy resuts is reativey insensitive to Copyright c 201, Association for the Advancement of Artificia Inteigence (wwwaaaiorg A rights reserved any change of a singe record in the origina dataset This ensures that the adversary cannot infer any information about any particuar record with high confidence (controed by parameter ɛ, even if the adversary possesses a the remaining tupes of the sensitive data In this paper, we propose to deveop an ɛ-differentia privacy preserving deep earning mode The structure of deep earning modes, eg, deep auto-encoders (Bengio 2009, deep beief networks (Hinton, Osindero, and Teh 200, usuay contains mutipe ayers of neurons At each ayer, they use different objective functions (eg, cross-entropy error, energy based functions and agorithms (eg, contrastive divergence (Hinton 2002, backpropagation to earn the optima parameters for output tasks such as cassification and prediction In addition, the designs of deep earning modes are varied and dependent on appication domains It is difficut, therefore, to design a unified ɛ-differentia privacy soution that covers a deep earning modes Deep auto-encoders (das (Bengio 2009 are one of the fundamenta deep earning modes which have been used in many appications such as heathcare and medicine, natura anguage processing, etc (Deng and Yu 2014 In this paper, we aim at deriving an ɛ-differentia privacy approach to deep auto-encoders (das for a binomia cassification task There are severa potentia soutions for ɛ-differentia privacy that target regression, which is simiar to the data reconstruction and cross-entropy error functions in deep autoencoders (Zhang et a 2012; Chaudhuri and Monteeoni 2008; Chaudhuri, Monteeoni, and Sarwate 2011; Lei 2011; Smith 2011 The most pertinent agorithm that we can adapt in our work is functiona mechanism (FM (Zhang et a 2012 The FM enforces ɛ-differentia privacy by perturbing the objective function of the optimization probem, eg, ogistic regression, rather than its resuts By everaging the functiona mechanism to perturb the objective functions in deep auto-encoders so that the ɛ-differentia privacy is preserved, we propose a nove ɛ-differentia Private Autoencoder (PA First, the cross-entropy error functions of the data reconstruction and softmax ayer are approximated to poynomia forms by using Tayor Expansion (Arfken 1985 Second, we inject noise into these poynomia forms so that the ɛ-differentia privacy is satisfied in the training phases Third, in the PA, we add a normaization ayer on top of the hidden ayer to protect the ɛ-differentia privacy when stacking mutipe PAs Mutipe PAs can be stacked on each other

2 to produce a deep private auto-encoder (dpa To evauate its performance, we appy the dpa mode for human behavior prediction in a rea heath socia network Conducted experimenta resuts demonstrate that the dpa mode achieves accurate resuts with prediction power comparabe to the unperturbed resuts, and it outperforms the existing soutions Preiminaries and Reated Works In this section, we briefy revisit the differentia privacy definition, Functiona Mechanism (Zhang et a 2012, and formay introduce the deep auto-encoder Let D be a database that contains n tupes t 1, t 2,, t n and d+1 attributes X 1, X 2,, X d, Y For each tupe t i = (x i1, x i2,, x id, y i, we assume without oss of generaity d x2 ij 1 where x ij 0, y i foows a binomia distribution Our objective is to construct a deep auto-encoder ρ from D that (i takes x i = (x i1, x i2,, x id as input and (ii outputs a prediction of y i that is as accurate as possibe The mode function ρ contains a mode parameter vector ω To evauate whether ω eads to an accurate mode, a cost function f D (ω is often used to measure the difference between the origina and predicted vaues of y i As the reeased mode parameter ω may discose sensitive information of D, to protect the privacy, we require that the mode training shoud be performed with an agorithm which satisfies ɛ-differentia privacy Differentia Privacy Differentia privacy research has been significanty studied from the theoretica perspective, eg, (Chaudhuri and Monteeoni 2008; Kifer and Machanavajjhaa 2011 There are aso extensive studies on the appicabiity of enforcing differentia privacy in some particuar anaysis tasks, eg, coaborative recommendation (McSherry and Mironov 2009, ogistic regression (Chaudhuri and Monteeoni 2008, pubishing contingency tabes (Xiao, Wang, and Gehrke 2010, and spectra graph anaysis (Wang, Wu, and Wu 2013 in socia network anaysis The mechanisms of achieving differentia privacy mainy incude the cassic approach of adding Lapacian noise (Dwork et a 200, the exponentia mechanism (McSherry and Tawar 2007, and the functiona perturbation approach (Chaudhuri and Monteeoni 2008 Definition 1 (ɛ-different Privacy (Dwork et a 200 A randomized agorithm A fufis ɛ-differentia privacy, iff for any two databases D and D differing at most one tupe, and for any output O of A, we have: P r(a(d = O e ɛ P r(a(d = O (1 The privacy budget ɛ contros the amount by which the distributions induced by two neighboring data sets may differ Smaer vaues of ɛ enforce a stronger privacy guarantee of A A genera method for computing an approximation to any function f (on D whie preserving ɛ-differentia privacy is the Lapace mechanism (Dwork et a 200, where the output of f is a vector of rea numbers In particuar, the mechanism expoits the goba sensitivity of f over any two neighboring data sets (differing at most one record, which is denoted as S f (D Given S f (D, the Lapace mechanism ensures ɛ-differentia privacy by injecting noise η into each vaue in the output of f(d where η is drawn iid from Lapace distribution with zero mean and scae S f (D/ɛ In (Shokri and Shmatikov 2015, Shokri et a proposed a distributed training method to preserve privacy in neura networks They extended their method to preserve differentia privacy for distributed deep neura networks Their method aims at preserving epsion-differentia privacy for each training epoch of each participant Even their method is reasonabe given the context of distributed deep neura networks; their approach may consume too much privacy budget to ensure mode accuracy when the number of training epochs, the number of participants, and the number of shared parameters are arge Functiona Mechanism Revisited Functiona mechanism (Zhang et a 2012 is an extension of the Lapace mechanism It achieves ɛ-differentia privacy by perturbing the objective function f D (ω and then reeasing the mode parameter ω that minimizes the perturbed objective function f D (ω instead of the origina one The functiona mechanism expoits the poynomia representation of f D (ω The mode parameter ω is a vector that contains d vaues ω 1,, ω d Let φ(ω denote a product of ω 1,, ω d, namey, φ(ω = ω c1 1 ωc2 2 ωc d d for some c 1,, c d N Let Φ j (j N denote the set of a products of ω 1,, ω d with degree j, ie, Φ j = { ω c1 1 ωc2 2 ωc d d d =1 c = j } By the Stone-Weierstrass Theorem, any continuous and differentiabe f(t i, ω can aways be written as a poynomia of ω 1,, ω d, for some J [0, ], ie, f(t i, ω = J j=0 φ Φ j λ φti φ(ω where λ φti R denotes the coefficient of φ(ω in the poynomia For instance, the poynomia expression of the oss function in the inear regression is as foows: f(x i, ω = (y i x T i ω2 = yi 2 d (2y ix ij ω j + 1 j, d (x ijx i ω j ω We can see that it ony invoves monomias in Φ 0 = {1}, Φ 1 = {ω 1,, ω d }, and Φ 2 = {ω i ω j i, j [1, d]} Each φ(ω has its own coefficient, eg, for ω j, its poynomia coefficient λ φti = 2y i x ij Simiary, f D (ω can aso be expressed as a poynomia of ω 1,, ω d f D (ω = J λ φti φ(ω (2 j=0 φ Φ j t i D Lemma 1 (Zhang et a 2012 Let D and D be any two neighboring datasets Let f D (ω and f D (ω be the objective functions of regression anaysis on D and D, respectivey The foowing inequaity hods = J 2 φ Φ j t i D λ φti t i D λ φt i where t i, t i or t is an arbitrary tupe 2 max 1 t J φ Φ j λ φt 1 To achieve ɛ-differentia privacy, f D (ω is perturbed by injecting Lapace noise Lap( ɛ into its poynomia coefficients λ φ, and then the mode parameter ω is derived

3 to minimize the perturbed function f D (ω, where = J 2 max t φ Φ j λ φt 1, according to the Lemma 1 When the poynomia form of an objective function (eg, ogistic regression contains terms with unbounded degrees, Zhang et a (Zhang et a 2012 deveoped an approximation poynomia form based on Tayor Expansion (Arfken 1985 For instance, given the cost function f(t i, ω of regression anaysis, assume that there exist 2m functions f 1,, f m and g 1,, g m such that f(t i, ω = =1 f (g (t i, ω, and each g (t i, ω is a poynomia function of ω, where m N is the number of functions f and g Given the above decomposition of f(t i, ω, we can appy the Tayor Expansion (Arfken 1985 on each f ( to obtain the foowing equation: f(t i, ω = m =1 R=0 (z ( R g (t i, ω z (3 f (R where each z is a rea number Thus, the objective function f(d, ω can be written as a poynomia function, ie, f D (ω = m i=1 =1 R=0 (z ( R g (t i, ω z (4 f (R Deep Auto-Encoders The deep auto-encoder (Bengio 2009 is one of the fundamenta modes in deep earning (Deng and Yu 2014 An auto-encoder has a ayer of input units, and a ayer of data reconstruction units fuy connected to a ayer of hidden units, but no connections within a ayer (Fig 1a An auto-encoder is trained to encode the input in some representation x i = {x i1,, x id } so that the input can be reconstructed from that representation The auto-encoder prefers to minimize the negative og-ikeihood of the reconstruction, given the encoding x i, RE(t i, W = og P (x i x i, W ( = xij og x ij + (1 x ij og(1 x ij (5 where W is a weight matrix (ie, from now on W wi be used to denote the parameters instead of ω, x = σ(hw, h = σ(w T x ( σ( is the sigmoid function The oss function on the whoe dataset D is the summation over a the data points x: = i=1 RE(D, W = RE(x i, W = (7 i=1 [ ] x ij og(1+e Wjhi +(1 x ij og(1+e Wjhi In the foowing, we bur the distinction between x i and t i, which means t i indicates the set of attributes in x i t i and x i wi be used interchangeaby We can stack mutipe auto-encoders to produce a deep auto-encoder On top of the deep auto-encoder, we add an output ayer which incudes a singe binomia variabe to predict Y The output variabe ŷ Agorithm 1: Pseudo Code of a dpa mode 1 Derive poynomia approximation of data reconstruction function RE(D, W (Eq 7, denoted as RE(D, W 2 The function RE(D, W is perturbed by using functiona mechanism (FM (Zhang et a 2012, the perturbed function is denoted as RE(D, W 3 Compute W = arg min W RE(D, W 4 Private Auto-encoder (PA stacking 5 Derive and perturb the poynomia approximation of cross-entropy error C(θ (Eq 8, the perturbed function is denoted as C(θ Compute θ = arg min θ C(θ; Return θ is fuy inked to the hidden variabes of the highest hidden ayer, denoted h (k, by weighted connections W (k, where k is the number of hidden ayers in the deep auto-encoder We use the ogistic function as an activation function of ŷ, ie, ŷ = σ(w (k h (k A oss function to appropriatey dea with the binomia probem is cross-entropy error Let Y T be a set of abeed data points used to train the mode, the cross-entropy error function is given by Y T ( C(Y T, θ = y i og ŷ i + (1 y i og(1 ŷ i i=1 We can use the ayer-wise unsupervised training agorithm (Bengio et a 2007 and back-propagation to train deep auto-encoders Deep Private Auto-Encoder (dpa In this section, we formay present our dpa mode, a deep auto-encoder mode under ɛ-differentia privacy Our agorithm to construct the dpa mode incudes six main steps (Agorithm 1 We everage the functiona mechanism to enforce ɛ-differentia privacy in our dpa mode Therefore, in the first step (Agorithm 1, we derive a poynomia approximation of data reconstruction function RE(D, W (Eq 7 The poynomia approximation is denoted as RE(D, W This is a non-trivia task due to the foowing key chaenges and requirements: 1 The data reconstruction function is compicated (compared with inear or ogistic regression anaysis; and 2 The error of approximation methods must be bounded and independent of the data size The second requirement guarantees the potentia to use unabeed data in a dpa mode As shown in the next sections, our approximation method theoreticay satisfies this requirement In the second step, the functiona mechanism is used to perturb the approximation function RE(D, W, the perturbed function is denoted as RE(D, W In this step, we introduce a new resut of goba sensitivity computation for auto-encoders In the third step, we train the mode to obtain the optima perturbed parameters W by using gradient descent That resuts in private auto-encoders (PAs In the fourth step, we stack private auto-encoders (PAs to construct the deep private auto-encoders (dpas Before each stacking operation, we introduce a normaization ayer, de- (8

4 noted as h (Fig 1b, which guarantees a the data assumptions and ɛ-differentia privacy wi be satisfied when training the next hidden ayer The normaization ayer is paced on top of each private auto-encoder In the fifth step, we introduce new resuts in deriving and perturbing (ie, by using functiona mechanism the poynomia approximation of the cross-entropy error C(θ (Eq 8 in the softmax ayer for prediction tasks The perturbed function is denoted as C(θ At the sixth step, back-propagation agorithm is used to finetune a the parameters in the deep private auto-encoders In our framework, ɛ-differentia privacy is preserved in our dpa mode, since it is enforced at every ayer and training step Let us first derive the poynomia approximation form of RE(D, W, denoted RE(D, W, as foows Poynomia Approximation To expain how we appy the Tayor Expansion, and how Equations 3 and 4 are reated to RE(D, W, reca the oss function shown in Eq 7 j {1,, d}, et f 1j, f 2j, g 1j, and g 2j be four functions defined as foows: g 1j (t i, W j = W j h i, g 2j (t i, W j = W j h i (9 f 1j (z j = x ij og(1 + e zj, f 2j (z j = (1 x ij og(1 + e zj Then, we have RE(t i, W = ( (f 1j g1j (t i, W j ( + f 2j g2j (t i, W j By Equations 3 and 4, we have RE(D, W = 2 i=1 =1 R=0 f (R (z ( g (t i, W j z R where each z is a rea number By setting z = 0, the above equation can be simpified as: RE(D, W = 2 i=1 =1 R=0 f (R (0 ( Wj h i R (10 There are two compications in Eq 10 that prevent us from appying it for private data reconstruction anaysis First, the equation invoves an infinite summation Second, the term f (R (0 invoved in the equation does not have a cosed form soution We address these two issues by presenting an approximation approach that reduces the degree of the summation Our approximation approach works by truncating the Tayor series in Eq 10 to remove a poynomia terms with order arger than 2 This eads to a new objective function with ow order poynomias as foows: RE(D, W = i=1 ( 2 =1 + ( 2 =1 f (0 (0+( 2 f (2 (0 2! =1 f (1 (0 W j h i (Wj h i 2 (11 (a Auto-encoder (b PA mode (c dpa mode Figure 1: Exampes of Auto-encoder, Private Auto-encoder, Deep Private Auto-encoder Perturbation of Objective Functions In this section, we empoy the functiona mechanism to perturb the objective function RE(D, W (ie, in Eq 11 by injecting Lapace noise into its poynomia coefficients The perturbed function is denoted as RE(D, W Then we derive the mode parameter W that minimizes the perturbed function RE(D, W First of a, we need to expore the sensitivity, denoted as, of RE on D In Eq 11, 2 =1 f (0 (0, 2 =1 f (1 (0, and 2 f (2 (0 =1 2! essentiay are the poynomia coefficients of the function RE given the database D To be concise, we denote {λ (0, λ (1, λ (2 } as a set of the coefficients where t i D, λ (0 = 2 =1 f (0 (0, λ(1 = 2 =1 f (1 (0, and λ(2 = 2 f (2 (0 =1 2! We are now ready to state the foowing emma Lemma 2 Let D and D be any two neighboring databases Let RE(D, W and RE(D, W be the objective functions of auto-encoder earning on D and D respectivey, then the goba sensitivity of RE over any two neighboring databases is as foows: = 2 max t 2 R=0 λ (R jt d(b b2 (12 where b is the number of hidden variabes We eave the detaied proof of Lemma 2 in the Appendix We use gradient descent to train the perturbed mode RE(D, W Stacking Private Auto-Encoders We have presented Private Auto-Encoders (PAs, the ɛ-differentia privacy preserving agorithm for auto-encoders To construct a deep Private Auto-Encoder (dpa in which ɛ-differentia privacy is satisfied, we need to stack mutipe PAs on top of each other The hidden units of the ower ayer wi be considered as the input of the next PA (Fig 1 To guarantee that this input to b the next PA satisfies our assumption (ie, h2 ij 1 may not hod, we add a normaization ayer, denoted as h, on top of the PA Each variabe h ij can be directy computed from the h ij as foows: h ij γ j h ij = (ϕ j γ j b (13

5 where γ j and ϕ j denote the minimum and maximum vaues in the domain of H j = {h 1j,, h j } Now we have b h2 ij 1 On top of the dpa mode, we add a softmax ayer for either cassification or prediction To guarantee that privacy is preserved at every ayer, we need to deveop an ɛ-differentia privacy preserving softmax ayer Perturbation Softmax Layer We focus on a binomia prediction probem for the softmax ayer This ayer contains a singe binomia variabe ŷ to predict Y (Fig 1c A oss function to appropriatey dea with this task is the cross-entropy error, which is given in Eq 8 We can preserve the ɛ-differentia privacy for the softmax ayer by deriving a poynomia approximation of the cross-entropy error function, which wi then be perturbed by using the functiona mechanism Function C(Y T, θ in Eq 8 can be rewritten as Y T ( C(Y T, θ = y i og(1 + e W (kh i(k i=1 + (1 y i og(1 + e W (kh i(k where W (k and h i(k are the states of weights W and hidden variabes h at the k-th normaization ayer, the h i(k is derived from the tupe t i by aggregating the tupe t i through the structure of the dpa (Fig 1b In addition, the four functions in Eq 9 become: g 1 (t i, W (k = W (k h i(k, g 2 (t i, W (k = W (k h i(k f 1 (z = y i og(1 + e z, f 2 (z = (1 y i og(1 + e z The poynomia approximation of C(Y T, θ is as foows: Y T 2 2 f (R (0( R Ĉ(Y T, θ = W(k h i(k (14 i=1 =1 R=0 The goba sensitivity of the function of Ĉ(Y T, θ over Y T is given in the foowing emma Lemma 3 Let Y T and Y T be any two neighboring sets of abeed data points Let Ĉ(Y T, θ and Ĉ(Y T, θ be the objective functions of auto-encoder earning on Y T and Y T respectivey, then the goba sensitivity of Ĉ over Y T is as: C = h (k h (k 2 (15 where h (k is the number of hidden variabes in h (k We eave the proof of Lemma 3 to the Appendix After computing the sensitivity C, functiona mechanism is used to achieve the differentia privacy for the objective function Ĉ(Y T, θ The perturbed function is denoted as C(Y T, θ The back-propagation agorithm is used to train a the parameters in the deep Private Auto-Encoders (dpas that aim at optimizing the function C(Y T, θ Approximation Error Bounds The foowing emma shows the resut of how much error our approximation approaches, RE(D, W (Eq 11 and Ĉ(Y T, θ (Eq 14, incur The error ony depends on the structure of the function and the number of attributes of the dataset In addition, the average error of the approximations is aways bounded Lemma 4 Given four poynomia functions RE(D, W, RE(D, W, C(YT, θ, and Ĉ(Y T, θ, the average error of the approximations is aways bounded as foows: RE(D, Ŵ RE(D, W e2 + 2e 1 e(1 + e 2 d (1 C(Y T, θ C(Y T, θ e2 + 2e 1 e(1 + e 2 (17 where W = arg minw RE(D, W, Ŵ = arg min W RE(D, W, θ = arg minθ C(YT, θ, and θ = arg min θ Ĉ(Y T, θ We eave the detaied proof of Lemma 4 to the Appendix Lemma 4 shows that the error of the approximation of the data reconstruction in an auto-encoder is a product of a sma constant and the number of attributes d (Eq 1 It is reasonabe, since the more attributes need to be reconstructed, the more errors are injected into the resuts The error is competey independent of the number of tupes or data instances This sufficienty guarantees that our approximation of the auto-encoder can be appied in arge datasets In the experiment section, we wi show that our approximation approach eads to accurate resuts In addition, the error of the approximation of cross-entropy error function is a sma constant Therefore, the error is independent of the number of abeed data points in this supervised training step Experiments To vaidate the proposed dpa mode, we have deveoped a dpa-based human behavior prediction mode (dpah on a rea heath socia network Our starting observation is that a human behavior is the outcome of behavior determinants such as sef-motivation, socia infuences, and environmenta events This observation is rooted in human agency in socia cognitive theory (Bandura 1989 In the dpah mode, these human behavior determinants are integrated together Our Heath Socia Network Data were coected from Oct 2010 to Aug 2011 as a coaboration between Peace- Heath Laboratories, SK Teecom Americas, and the University of Oregon to record daiy physica activities, socia activities (text messages, competitions, etc, biomarkers, and biometric measures (choestero, BMI, etc for a group of 254 overweight and obese individuas Physica activities, incuding information about the number of waking and running steps, were reported via a mobie device carried by each user A users enroed in an onine socia network, aowing them to friend and communicate with each other Users biomarkers and biometric measures were recorded via daiy/weeky/monthy medica tests performed at home individuay or at our aboratories In tota, we consider three groups of attributes: 1 Behaviors: #joining competitions, #exercising days, #goas set, #goas achieved, (distances, avg(speeds; 2 #Inbox Messages: Encouragement, Fitness, Foowup, Competition, Games, Persona, Study protoco, Progress report, Technique, Socia network, Meetups, Goa, Weness meter, Feedback, Hecking, Expanation, Invitation, Notice, Technica fitness, Physica; and 3 Biomarkers and Biometric Measures: Weness Score, BMI, BMI sope, Weness Score sope

6 (a Weeky Dataset (b Daiy Dataset Figure 2: Prediction accuracy vs dataset cardinaity dpa-based Human Behavior Prediction (dpah Given a tupe t i, x i1,, x id are the persona attributes and y i is a binomia parameter which indicates whether a user increases or decreases his/her exercises To describe the dpah mode, we adjust the notations x i1 and y i a itte bit to denote the tempora dimension, and our socia network information Specificay, x ut = {x 1ut,, x dut } is used to denote the d attributes of user u at time point t Meanwhie, y ut is used to denote the status of the user u at time point t y ut = 1 denotes u increases exercises at time t, otherwise y ut = 0 The human behavior prediction task is to predict the statuses of a the users in the next time point t + 1 given M past time points t M + 1,, t Given a user u, to mode sef-motivation and socia infuences, we add an aggregation of his/her past attributes and the effects from his/her friends into the activation function of the hidden units at the current time t The hidden variabes h at time t now become h t = σ(w1 T x ut + â ut (18 where N â ut = { A T e,t kx u,t k 1 e b} + η u ψ t (v, u F u v F u k=1 In the above Equation, â ut incudes a set of dynamic biases for each hidden unit in h t, ie, h t = â ut In addition, A T e,t k is a matrix of weight which connects x u,t k with the e-th hidden variabe in h t ψ t (v, u is the probabiity that v infuences u on physica activity at time t ψ t (v, u is derived from the CPP mode (Phan et a 2014 which is an efficient socia infuence mode in heath socia networks F u is a set of friends of u in the socia network η u is a parameter which presents the abiity to observe the expicit socia infuences from neighboring users of user u N is the number of time points in the past, ie, N < M In essence, the first aggregation term is the effect of user u s past attributes on the current attributes The second aggregation term captures the effect of socia infuences from neighboring users on user u The environmenta events (ie, #competitions, #meet-up events are incuded in persona attributes Thus, the effect of environmenta events is we embedded into the mode The parameters and input of our dpah mode are W = {W 1, A, η u } and x = {x ut, k x u,t k, mode incudes two PAs and a softmax ayer for the human behavior prediction task The mode has been trained on two granuar eves of our heath socia network which v Fu ψt(v,u F u } The (a Weeky Dataset (b Daiy Dataset Figure 3: Prediction accuracy vs privacy budget ɛ are daiy and weeky datasets Both datasets contain 30 attributes, 254 users, 3,7 messages, 1,883 friend connections, 11,458 competitions, etc In addition, the daiy dataset contains 300 time points, whie the weeky dataset contains 48 time points The number of hidden units and the number of previous time intervas N are set to 200 and 3, respectivey A the earning rates are set to 10 3 Competitive Modes We compare the proposed dpah mode with three types of modes 1 Deep earning modes for human behavior prediction The CRBM (Tayor, Hinton, and Roweis 200 and SctRBM (Li et a 2014; Phan et a 2015 are competitive modes None of these modes enforces ɛ-differentia privacy 2 Deep Auto-Encoder (da and Truncated Deep Auto-Encoder (TdA da and TdA are two agorithms derived from our anaysis in Section 4 They perform anaysis but do not enforce ɛ-differentia privacy da directy outputs the mode parameters that minimize the objective functions, and TdA returns the parameters obtained from approximate objective functions with truncated poynomia terms 3 Methods for regression anaysis under ɛ-differentia privacy This experimenta evauation incudes 3 regression anaysis approaches which are under ɛ- differentia privacy, namey, Functiona Mechanism (FM (Zhang et a 2012, DPME (Lei 2011, and Fiter-Priority (FP (Cormode 2011 FM and DPME are the state-of-theart methods for regression anaysis under ɛ-differentia privacy, whie FP is an ɛ-differentiay private technique for generating synthetic data that can aso be used for regression tasks For FM, DPME and FP, we use the impementations provided by their respective authors, and we set a interna parameters to their recommended vaues Accuracy vs Dataset Cardinaity Fig 2 shows the prediction accuracy of each agorithm as a function of the dataset cardinaity In this experiment, we vary the size of M, which aso can be considered as the samping rate of the dataset ɛ is 10 in this experiment In both datasets, there is a gap between the prediction accuracy of dpah and that of da and Truncated one TdA, but the gap gets smaer rapidy with the increase of dataset cardinaity In addition, the performance of FM, FP, and DPME improves with the dataset cardinaity, which is consistent with the theoretica resuts in (Zhang et a 2012; Lei 2011; Cormode 2011 Nevertheess, even when we use a tupes in the dataset, the accuracies of FM, FP, and DPME are sti ower than that of dpah, da, and TdA With very sma samping rate, the performance of the dpah mode is sighty ower than the SctRBM, which

7 is a non-privacy-enforcing deep earning mode for human behavior prediction However, the dpah significanty is better than the SctRBM when the samping rate goes just a bit higher, ie, > 03 This is a significant resut, since 03 is a sma samping rate Accuracy vs Privacy Budget Fig 3 pots the prediction accuracy of each mode as a function of the privacy budget ɛ The prediction accuracies of da, TdA, CRBM, and SctRBM remain unchanged for a ɛ This is because none of them enforces ɛ-differentia privacy Since a smaer ɛ requires a arger amount of noise to be injected, the other four modes incur higher inaccurate prediction resuts when ɛ decreases dpah outperforms FM, FP, and DPME in a cases In addition, it is reativey robust against the change of ɛ In fact, the dpah mode is competitive even with privacy non-enforcing modes such as CRBM and SctRBM Concusions and Future Works We have deveoped an approach for differentiay private deep auto-encoders Our approach conducts both sensitivity anaysis and noise insertion on the data reconstruction and the cross-entropy error objective functions The proposed dpa mode achieves more accurate prediction resuts, when the objective functions can be represented as finite poynomias Our experiments in an appication of human behavior prediction in rea heath socia networks vaidate our theoretica resuts, and demonstrate the effectiveness of our approach Our work can be extended on the foowing directions First, our current mechanism has the potentia to be extended to different deep earning modes; since it can be appied to each ayer, and stacked Second, we pan to study whether there may exist aternative anaytica toos which can be used to better approximate the objective functions Third, it is worthy to study how we might be abe to extract private information from deep neura networks Acknowedgment This work is supported by the NIH grant R01GM to the SMASH project Wu is aso supported by NSF grant and Dou is aso supported by NSF grant We thank Xiao Xiao and Rebeca Sacks for their contributions References Aposto, T 197 Cacuus John Wiey & Sons Arfken, G 1985 In Mathematica Methods for Physicists (Third Edition Academic Press Bandura, A 1989 Human agency in socia cognitive theory The American Psychoogist Bengio, Y; Lambin, P; Popovici, D; Larochee, H; Montrea, U D; and Quebec, M 2007 Greedy ayer-wise training of deep networks In NIPS Bengio, Y 2009 Learning deep architectures for ai Found Trends Mach Learn 2(1:1 127 Chaudhuri, K, and Monteeoni, C 2008 Privacy-preserving ogistic regression In NIPS 08, Chaudhuri, K; Monteeoni, C; and Sarwate, A D 2011 Differentiay private empirica risk minimization J Mach Learn Res 12 Chicco, D; Sadowski, P; and Badi, P 2014 Deep autoencoder neura networks for gene ontoogy annotation predictions In ACM BCB 14, Cormode, G 2011 Persona privacy vs popuation privacy: earning to attack anonymization In KDD 11, Deng, L, and Yu, D 2014 Deep earning: Methods and appications Technica Report MSR-TR Dwork, C; McSherry, F; Nissim, K; and Smith, A 200 Caibrating noise to sensitivity in private data anaysis Theory of Cryptography Hinton, G; Osindero, S; and Teh, Y-W 200 A fast earning agorithm for deep beief nets Neura Comput 18(7: Hinton, G E 2002 Training products of experts by minimizing contrastive divergence Neura Comput 14(8 Kifer, D, and Machanavajjhaa, A 2011 No free unch in data privacy In SIGMOD 11, Lei, J 2011 Differentiay private m-estimators In NIPS, Li, X; Du, N; Li, H; Li, K; Gao, J; and Zhang, A 2014 A deep earning approach to ink prediction in dynamic networks In SIAM 14, McSherry, F, and Mironov, I 2009 Differentiay Private Recommender Systems In KDD 09 ACM McSherry, F, and Tawar, K 2007 Mechanism design via differentia privacy In FOCS 07, Perozzi, B; A-Rfou, R; and Skiena, S 2014 Deepwak: Onine earning of socia representations In KDD 14, Phan, N; Dou, D; Xiao, X; Piniewski, B; and Ki, D 2014 Anaysis of physica activity propagation in a heath socia network In CIKM 14, Phan, N; Dou, D; Piniewski, B; and Ki, D 2015 Socia restricted botzmann machine: Human behavior prediction in heath socia networks In ASONAM 15, Shokri, R, and Shmatikov, V 2015 Privacy-preserving deep earning In CCS 15, Smith, A 2011 Privacy-preserving statistica estimation with optima convergence rates In STOC 11, Song, Y; Ni, D; Zeng, Z; He, L; et a 2014 Automatic vagina bacteria segmentation and cassification based on superpixe and deep earning Journa of Medica Imaging and Heath Informatics 4(5: Tayor, G; Hinton, G; and Roweis, S 200 Modeing human motion using binary atent variabes In NIPS 0, Wang, Y; Wu, X; and Wu, L 2013 Differentia privacy preserving spectra graph anaysis In PAKDD (2, Xiao, X; Wang, G; and Gehrke, J 2010 Differentia privacy via waveet transforms In ICDE 10, Yuan, Z; Sang, J; Liu, Y; and Xu, C 2013 Latent feature earning in socia media network In ACM MM 13,

8 Zhang, J; Zhang, Z; Xiao, X; Yang, Y; and Winsett, M 2012 Functiona mechanism: regression anaysis under differentia privacy PVLDB 5(11: Appendices Proof of Lemma 2 Proof 1 Assume that D and D differ in the ast tupe Let t n (t n be the ast tupe in D (D Then, = 2 R=0 t i D λ (R λ (R jt t i i D = 2 R=0 λ (R λ (R jt n We can show that λ (0 = 2 =1 f (0 (0 = x nj og 2 + (1 x nj og 2 = og 2 Simiary, we can show that λ (0 jt = n og 2 As a resut, λ (0 = λ (0 jt Therefore n = 2 R=0 R=1 λ (R λ (R = jt n 2 ( λ (R + λ (R jt n 2 R=1 2 max t [ 2 max ( 1 m t 2 x tj h te + e=1 2( 1 2 d b d b2 = d(b b2 λ (R λ (R 2 jt n λ (R jt R=1 (1 8 ] h te h tq where h te is the state of e-th hidden variabe derived from the tupe t, ie, h = σ(w T x Proof of Lemma 3 Proof 2 By appying Lemma 2, we can compute the sensitivity C of Ĉ(Y T, θ on the set of abeed data points Y T as foows: C 2 max t [ h (k h (k h (k 2 e,q ( 1 2 y i + 1 ] h te(k h tq(k 8 where h te(k is the state of the e-th hidden variabe at the k-th normaization ayer, and it is derived from the tupe t h (k is the number of hidden variabes in h (k Proof of Lemma 4 Proof 3 Let W = arg minw RE(D, W and Ŵ = arg min W RE(D, W, U = maxw ( RE(D, W RE(D, W and S = min W ( RE(D, W RE(D, W We have that U RE(D, Ŵ RE(D, Ŵ and W : S RE(D, W RE(D, W Therefore, we have RE(D, Ŵ RE(D, Ŵ RE(D, W + RE(D, W U S RE(D, Ŵ RE(D, W U S + ( RE(D, Ŵ RE(D, W In addition, RE(D, Ŵ RE(D, W 0, so RE(D, Ŵ RE(D, W U S If U 0 and S 0 then we have: RE(D, Ŵ RE(D, W U S (19 e,q Eq 19 hods for every W Therefore, it sti hods for W Eq 19 shows that the error incurred by truncating the Tayor series approximate function depends on the maximum and minimum vaues of RE(D, W RE(D, W To quantify the magnitude of the error, we first rewrite RE(D, W RE(D, W as RE(D, W RE(D, W = = ( 2 i=1 =1 R=3 f (R ( RE(D, Wj RE(D, W j (z ( g (t i, W j z R To derive the minimum and maximum vaues of the function above, we ook into the remainder of Tayor Expansion for each j Let z j [z 1, z + 1] According to the we-known resut (Aposto 197, 1 ( RE(D, Wj RE(D, W j must be in the interva [ min zj f (3 (zj(zj z 3, ] max zj f (3 (zj(zj z 3 max zj f (3 If (zj(zj z 3 0 and min zj f (3 (zj(zj z 3 0, then we have that ( RE(D, 1 W RE(D, W d max zj f (3 (zj(zj z 3 min zj f (3 (zj(zj z 3 This anaysis appies to the case of an auto-encoder as foows First, for the functions f 1j (z j = x ij og(1 + e zj and f 2j (z j = (1 x ij og(1 + e zj, we have f (3 1j (z j = 2xijez j (1+e z j and f (3 3 2j (z j = (1 x ij e z j (e z j 1 (1+e z j 3 It can be verified that arg min zj f (3 1j (z j = 2e (1+e < 0, 3 arg max zj f (3 1j (z j = 2e (1+e > 0, arg min 3 zj f (3 2j (z j = 1 e e(1+e < 0, and arg max 3 zj f (3 2j (z j = e(e 1 (1+e > 0 Thus, 3 the average error of the approximation is at most RE(D, Ŵ RE(D, W [ ( 2e (1 + e 3 2e (1 + e 3 + ( e(e 1 (1 + e 3 1 e e(1 + e 3 ] d = e2 + 2e 1 e(1 + e 2 d Therefore, Eq 1 hods Eq 17 can be proved in the simiar way The main difference is that there is ony one output variabe which is ŷ in the function Ĉ(Y T, θ Meanwhie, there are d output variabes (ie, x = d in the data reconstruction function RE(D, W Thus, by repacing the functions RE(D, W and RE(D, W with C(Y T, θ and Ĉ(Y T, θ in the proof of Eq 1, we obtain that C(Y T, θ C(Y T, θ [ ( 2e (1 + e 3 2e (1 + e 3 + ( e(e 1 (1 + e 3 1 e e(1 + e 3 ] = e2 + 2e 1 e(1 + e 2

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering