Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign

Size: px

Start display at page:

Download "Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign"

Anis Kennedy
5 years ago
Views:

1 Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign

2 Acknowledgements Shripad Gade Lili Su

3 argmin x2x SX i=1 i f i (x)

4 Applications g f i (x) = cost for robot i to go to location x f 1 (x) x g Minimize total cost of rendezvous x 1 argmin x2x SX i=1 i f i (x) f 2 (x) x 2

5 Applications f 1 (x) f 2 (x) Learning Minimize cost Σ f i (x) i f 3 (x) f 4 (x) 5

6 Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

7 Distributed Optimization Server f & f " f & f # f " f % f # f $ 7

8 Client-Server Architecture Server f 1 (x) f 2 (x) f & f # f " f 3 (x) 8 f 4 (x)

9 Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server f & f # f "

10 Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f "

11 Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f " g Server x (-& x ( α ( 2 f ) x ( )

12 Variations g Stochastic g Asynchronous g 12

13 Peer-to-Peer Architecture f 1 (x) f 2 (x) f & f " f % f # f $ f 3 (x) f 4 (x)

14 Peer-to-Peer Architecture g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

15 Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

16 Server f ) (x ( ) f & f # f "

17 Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised

18 Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised Achieve privacy and yet collaboratively optimize

19 Related Work g Cryptographic methods (homomorphic encryption) g Function transformation g Differential privacy 19

20 Differential Privacy Server f ) x ( + ε k f & f # f " 20

21 Differential Privacy Server f ) x ( + ε k f & f # f " Trade-off privacy with accuracy 21

22 Proposed Approach g Motivated by secret sharing g Exploit diversity Multiple servers / neighbors 22

23 Proposed Approach Server 1 Server 2 f & f # f " Privacy if subset of servers adversarial 23

24 Proposed Approach f & f " f % f # f $ Privacy if subset of neighbors adversarial 24

25 Proposed Approach g Structured noise that cancels over servers/neighbors 25

26 Intuition x 1 x 2 Server 1 Server 2 f & f # f " 26

27 Intuition x 1 x 2 Server 1 Server 2 Each client simulates multiple clients f && f &# f #& f ## f "& f "# 27

28 Intuition x 1 x 2 Server 1 Server 2 f && f &# f #& f ## f "& f "# f && (x) + f&# x = f & x f )8 (x) not necessarily convex 28

29 Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients

30 Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients g Servers periodically exchange estimates to perform a consensus step

31 Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 31

32 Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# 32

33 Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# g Server 1 may learn f &&, f #&, f "&, f #& + f ## +f "# g Not sufficient to learn f ) 33

34 f && (x) + f&# x = f& x g Function splitting not necessarily practical g Structured randomization as an alternative 34

35 Structured Randomization g Multiplicative or additive noise in gradients g Noise cancels over servers 35

36 Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 36

37 Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 37

38 Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 38

39 Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Suffices for this invariant to hold over a larger number of iterations

40 Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Noise from client i to server j not zero-mean

41 Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 41

42 Peer-to-Peer Architecture f & f " f % f # f $

43 Reminder g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

44 Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide f & f " f % f # f $

45 Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide x + ε 1 ε 1 + ε 2 = 0 (over iterations) f & f " f % f # f $ x + ε 2

46 Peer-to-Peer Architecture g Poster today Shripad Gade

47 Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

48 Fault-Tolerance g Some agents may be faulty g Need to produce correct output despite the faults 48

49 Byzantine Fault Model g No constraint on misbehavior of a faulty agent g May send bogus messages g Faulty agents can collude 49

50 Peer-to-Peer Architecture g f i (x) = cost for robot i to go to location x f 1 (x) x g Faulty agent may choose arbitrary cost function x 1 f 2 (x) x 2

51 Peer-to-Peer Architecture f & f " f % f # f $ 51

52 Client-Server Architecture f ) (x ( ) Server f & f # f "

53 Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) 53

54 Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents argmin x2x SX f i (x) i=1 i good

55 Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents Impossible! argmin x2x SX f i (x) i=1 i good

56 Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i g With α i as close to 1/ good as possible

57 Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i With t Byzantine faulty agents: t weights may be 0

58 Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)

59 Centralized Algorithm g Of the n agents, any t may be faulty g How to filter cost functions of faulty agents? X

60 Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows 60

61 Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions 61

62 Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients 62

63 Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x 63

64 Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex

65 Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex à Can optimize easily

66 Peer-to-Peer Fault-Tolerant Optimization g Gradient filtering similar to centralized algorithm require rich enough connectivity correlation between functions helps g Vector case harder redundancy between functions helps 66

67 Summary argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

68 Thanks! disc.ece.illinois.edu

69 69

70 70

71 Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates f & f " f % f # f $

72 Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

73 Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate g Local estimates converge to f & f " x (-& x ( α ( f ) x ( SX argmin f i (x) x2x i i=1 f % f # f $

74 RSS Locally Balanced Perturbations g Add to zero (locally per node) g Bounded ( Δ) Algorithm g Node j selects A,B such that d@ A,B B = 0 and d ( 8,) Δ g Share A,B = x@ A + d@ A,B with node i g Consensus and (Stochastic) Gradient Descent 74

75 RSS Network Balanced Perturbations g Add to zero (over network) g Bounded ( Δ) Algorithm g Node j computes perturbation A - sends s A,B to i - add received s B,A and subtract sent s A,B A = rcvd sent A A A g Obfuscate state = + d@ shared with neighbors g Consensus and (Stochastic) Gradient Descent 75

76 Convergence Let xp A Q = Q A / Q and = 1/ k f xp A Q f x O log T T + O Δ# log T T g Asymptotic convergence of iterates to optimum g Privacy-Convergence Trade-off g Stochastic gradient updates work too 76

77 Function Sharing g Let f B (x) be bounded degree polynomials Algorithm g Node j shares s A,B x with node i g Node j obfuscates using p A x = s B,A x s A,B (x) g Use f^a x = f A x + p A (x) and use distributed gradient descent 77

78 Function Sharing - Convergence g Function Sharing iterates converge to correct optimum ( f^b x = f(x)) g Privacy: If vertex connectivity of graph f then no group of f nodes can estimate true functions f ) (or any good subset) g p A (x) is also similar to f A (x) then it can hide f B x well 78

Deterministic Consensus Algorithm with Linear Per-Bit Complexity

Deterministic Consensus Algorithm with Linear Per-Bit Complexity Guanfeng Liang and Nitin Vaidya Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University of Illinois