Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign

Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign

Acknowledgements Shripad Gade Lili Su

argmin x2x SX i=1 i f i (x)

Applications g f i (x) = cost for robot i to go to location x f 1 (x) x g Minimize total cost of rendezvous x 1 argmin x2x SX i=1 i f i (x) f 2 (x) x 2

Applications f 1 (x) f 2 (x) Learning Minimize cost Σ f i (x) i f 3 (x) f 4 (x) 5

Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Distributed Optimization Server f & f " f & f # f " f % f # f $ 7

Client-Server Architecture Server f 1 (x) f 2 (x) f & f # f " f 3 (x) 8 f 4 (x)

Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server f & f # f "

Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f "

Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f " g Server x (-& x ( α ( 2 f ) x ( )

Variations g Stochastic g Asynchronous g 12

Peer-to-Peer Architecture f 1 (x) f 2 (x) f & f " f % f # f $ f 3 (x) f 4 (x)

Peer-to-Peer Architecture g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Server f ) (x ( ) f & f # f "

Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised

Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised Achieve privacy and yet collaboratively optimize

Related Work g Cryptographic methods (homomorphic encryption) g Function transformation g Differential privacy 19

Differential Privacy Server f ) x ( + ε k f & f # f " 20

Differential Privacy Server f ) x ( + ε k f & f # f " Trade-off privacy with accuracy 21

Proposed Approach g Motivated by secret sharing g Exploit diversity Multiple servers / neighbors 22

Proposed Approach Server 1 Server 2 f & f # f " Privacy if subset of servers adversarial 23

Proposed Approach f & f " f % f # f $ Privacy if subset of neighbors adversarial 24

Proposed Approach g Structured noise that cancels over servers/neighbors 25

Intuition x 1 x 2 Server 1 Server 2 f & f # f " 26

Intuition x 1 x 2 Server 1 Server 2 Each client simulates multiple clients f && f &# f #& f ## f "& f "# 27

Intuition x 1 x 2 Server 1 Server 2 f && f &# f #& f ## f "& f "# f && (x) + f&# x = f & x f )8 (x) not necessarily convex 28

Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients

Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 31

Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# 32

Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# g Server 1 may learn f &&, f #&, f "&, f #& + f ## +f "# g Not sufficient to learn f ) 33

f && (x) + f&# x = f& x g Function splitting not necessarily practical g Structured randomization as an alternative 34

Structured Randomization g Multiplicative or additive noise in gradients g Noise cancels over servers 35

Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 36

Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 37

Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 38

Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Suffices for this invariant to hold over a larger number of iterations

Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Noise from client i to server j not zero-mean

Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 41

Peer-to-Peer Architecture f & f " f % f # f $

Reminder g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide f & f " f % f # f $

Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide x + ε 1 ε 1 + ε 2 = 0 (over iterations) f & f " f % f # f $ x + ε 2

Peer-to-Peer Architecture g Poster today Shripad Gade

Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Fault-Tolerance g Some agents may be faulty g Need to produce correct output despite the faults 48

Byzantine Fault Model g No constraint on misbehavior of a faulty agent g May send bogus messages g Faulty agents can collude 49

Peer-to-Peer Architecture g f i (x) = cost for robot i to go to location x f 1 (x) x g Faulty agent may choose arbitrary cost function x 1 f 2 (x) x 2

Peer-to-Peer Architecture f & f " f % f # f $ 51

Client-Server Architecture f ) (x ( ) Server f & f # f "

Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) 53

Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents argmin x2x SX f i (x) i=1 i good

Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents Impossible! argmin x2x SX f i (x) i=1 i good

Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i g With α i as close to 1/ good as possible

Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i With t Byzantine faulty agents: t weights may be 0

Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)

Centralized Algorithm g Of the n agents, any t may be faulty g How to filter cost functions of faulty agents? X

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows 60

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions 61

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x 63

Peer-to-Peer Fault-Tolerant Optimization g Gradient filtering similar to centralized algorithm require rich enough connectivity correlation between functions helps g Vector case harder redundancy between functions helps 66

Summary argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Thanks! disc.ece.illinois.edu

Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates f & f " f % f # f $

Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

RSS Locally Balanced Perturbations g Add to zero (locally per node) g Bounded ( Δ) Algorithm g Node j selects d @ A,B such that d@ A,B B = 0 and d ( 8,) Δ g Share w @ A,B = x@ A + d@ A,B with node i g Consensus and (Stochastic) Gradient Descent 74

RSS Network Balanced Perturbations g Add to zero (over network) g Bounded ( Δ) Algorithm g Node j computes perturbation d @ A - sends s A,B to i - add received s B,A and subtract sent s A,B d @ A = rcvd sent A A A g Obfuscate state w @ = x @ + d@ shared with neighbors g Consensus and (Stochastic) Gradient Descent 75

Convergence Let xp A Q = Q A α @ x @ / Q α @ and α @ = 1/ k f xp A Q f x O log T T + O Δ# log T T g Asymptotic convergence of iterates to optimum g Privacy-Convergence Trade-off g Stochastic gradient updates work too 76

Function Sharing g Let f B (x) be bounded degree polynomials Algorithm g Node j shares s A,B x with node i g Node j obfuscates using p A x = s B,A x s A,B (x) g Use f^a x = f A x + p A (x) and use distributed gradient descent 77

Function Sharing - Convergence g Function Sharing iterates converge to correct optimum ( f^b x = f(x)) g Privacy: If vertex connectivity of graph f then no group of f nodes can estimate true functions f ) (or any good subset) g p A (x) is also similar to f A (x) then it can hide f B x well 78