Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign
Acknowledgements Shripad Gade Lili Su
argmin x2x SX i=1 i f i (x)
Applications g f i (x) = cost for robot i to go to location x f 1 (x) x g Minimize total cost of rendezvous x 1 argmin x2x SX i=1 i f i (x) f 2 (x) x 2
Applications f 1 (x) f 2 (x) Learning Minimize cost Σ f i (x) i f 3 (x) f 4 (x) 5
Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance
Distributed Optimization Server f & f " f & f # f " f % f # f $ 7
Client-Server Architecture Server f 1 (x) f 2 (x) f & f # f " f 3 (x) 8 f 4 (x)
Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server f & f # f "
Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f "
Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f " g Server x (-& x ( α ( 2 f ) x ( )
Variations g Stochastic g Asynchronous g 12
Peer-to-Peer Architecture f 1 (x) f 2 (x) f & f " f % f # f $ f 3 (x) f 4 (x)
Peer-to-Peer Architecture g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $
Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance
Server f ) (x ( ) f & f # f "
Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised
Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised Achieve privacy and yet collaboratively optimize
Related Work g Cryptographic methods (homomorphic encryption) g Function transformation g Differential privacy 19
Differential Privacy Server f ) x ( + ε k f & f # f " 20
Differential Privacy Server f ) x ( + ε k f & f # f " Trade-off privacy with accuracy 21
Proposed Approach g Motivated by secret sharing g Exploit diversity Multiple servers / neighbors 22
Proposed Approach Server 1 Server 2 f & f # f " Privacy if subset of servers adversarial 23
Proposed Approach f & f " f % f # f $ Privacy if subset of neighbors adversarial 24
Proposed Approach g Structured noise that cancels over servers/neighbors 25
Intuition x 1 x 2 Server 1 Server 2 f & f # f " 26
Intuition x 1 x 2 Server 1 Server 2 Each client simulates multiple clients f && f &# f #& f ## f "& f "# 27
Intuition x 1 x 2 Server 1 Server 2 f && f &# f #& f ## f "& f "# f && (x) + f&# x = f & x f )8 (x) not necessarily convex 28
Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients
Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients g Servers periodically exchange estimates to perform a consensus step
Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 31
Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# 32
Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# g Server 1 may learn f &&, f #&, f "&, f #& + f ## +f "# g Not sufficient to learn f ) 33
f && (x) + f&# x = f& x g Function splitting not necessarily practical g Structured randomization as an alternative 34
Structured Randomization g Multiplicative or additive noise in gradients g Noise cancels over servers 35
Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 36
Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 37
Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 38
Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Suffices for this invariant to hold over a larger number of iterations
Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Noise from client i to server j not zero-mean
Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 41
Peer-to-Peer Architecture f & f " f % f # f $
Reminder g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $
Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide f & f " f % f # f $
Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide x + ε 1 ε 1 + ε 2 = 0 (over iterations) f & f " f % f # f $ x + ε 2
Peer-to-Peer Architecture g Poster today Shripad Gade
Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance
Fault-Tolerance g Some agents may be faulty g Need to produce correct output despite the faults 48
Byzantine Fault Model g No constraint on misbehavior of a faulty agent g May send bogus messages g Faulty agents can collude 49
Peer-to-Peer Architecture g f i (x) = cost for robot i to go to location x f 1 (x) x g Faulty agent may choose arbitrary cost function x 1 f 2 (x) x 2
Peer-to-Peer Architecture f & f " f % f # f $ 51
Client-Server Architecture f ) (x ( ) Server f & f # f "
Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) 53
Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents argmin x2x SX f i (x) i=1 i good
Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents Impossible! argmin x2x SX f i (x) i=1 i good
Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i g With α i as close to 1/ good as possible
Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i With t Byzantine faulty agents: t weights may be 0
Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)
Centralized Algorithm g Of the n agents, any t may be faulty g How to filter cost functions of faulty agents? X
Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows 60
Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions 61
Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients 62
Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x 63
Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex
Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex à Can optimize easily
Peer-to-Peer Fault-Tolerant Optimization g Gradient filtering similar to centralized algorithm require rich enough connectivity correlation between functions helps g Vector case harder redundancy between functions helps 66
Summary argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance
Thanks! disc.ece.illinois.edu
69
70
Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates f & f " f % f # f $
Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $
Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate g Local estimates converge to f & f " x (-& x ( α ( f ) x ( SX argmin f i (x) x2x i i=1 f % f # f $
RSS Locally Balanced Perturbations g Add to zero (locally per node) g Bounded ( Δ) Algorithm g Node j selects d @ A,B such that d@ A,B B = 0 and d ( 8,) Δ g Share w @ A,B = x@ A + d@ A,B with node i g Consensus and (Stochastic) Gradient Descent 74
RSS Network Balanced Perturbations g Add to zero (over network) g Bounded ( Δ) Algorithm g Node j computes perturbation d @ A - sends s A,B to i - add received s B,A and subtract sent s A,B d @ A = rcvd sent A A A g Obfuscate state w @ = x @ + d@ shared with neighbors g Consensus and (Stochastic) Gradient Descent 75
Convergence Let xp A Q = Q A α @ x @ / Q α @ and α @ = 1/ k f xp A Q f x O log T T + O Δ# log T T g Asymptotic convergence of iterates to optimum g Privacy-Convergence Trade-off g Stochastic gradient updates work too 76
Function Sharing g Let f B (x) be bounded degree polynomials Algorithm g Node j shares s A,B x with node i g Node j obfuscates using p A x = s B,A x s A,B (x) g Use f^a x = f A x + p A (x) and use distributed gradient descent 77
Function Sharing - Convergence g Function Sharing iterates converge to correct optimum ( f^b x = f(x)) g Privacy: If vertex connectivity of graph f then no group of f nodes can estimate true functions f ) (or any good subset) g p A (x) is also similar to f A (x) then it can hide f B x well 78