Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign

Similar documents
Deterministic Consensus Algorithm with Linear Per-Bit Complexity

Resilient Asymptotic Consensus in Robust Networks

The Complexity of a Reliable Distributed System

Consensus-Based Distributed Optimization with Malicious Nodes

Multi-Robotic Systems

Fault-Tolerant Consensus

Resilient Distributed Optimization Algorithm against Adversary Attacks

Introduction to Modern Cryptography Lecture 11

Do we have a quorum?

Scientific Computing: An Introductory Survey

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Byzantine Vector Consensus in Complete Graphs

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Provable Security for Program Obfuscation

Decoupling Coupled Constraints Through Utility Design

Finite-Time Resilient Formation Control with Bounded Inputs

Section 6 Fault-Tolerant Consensus

Distributed Systems Byzantine Agreement

Broadcast and Verifiable Secret Sharing: New Security Models and Round-Optimal Constructions

Distributed Optimization over Networks Gossip-Based Algorithms

Degradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #

Benny Pinkas. Winter School on Secure Computation and Efficiency Bar-Ilan University, Israel 30/1/2011-1/2/2011

r-robustness and (r, s)-robustness of Circulant Graphs

10. Smooth Varieties. 82 Andreas Gathmann

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

From Secure MPC to Efficient Zero-Knowledge

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Statistical Machine Learning

The Weighted Byzantine Agreement Problem

Communication-efficient and Differentially-private Distributed SGD

A n = A N = [ N, N] A n = A 1 = [ 1, 1]. n=1

Australian National University WORKSHOP ON SYSTEMS AND CONTROL

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

Appendix A.1 Derivation of Nesterov s Accelerated Gradient as a Momentum Method

Generalized Consensus and Paxos

Optimal and Player-Replaceable Consensus with an Honest Majority Silvio Micali and Vinod Vaikuntanathan

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

Consensus Algorithms for Camera Sensor Networks. Roberto Tron Vision, Dynamics and Learning Lab Johns Hopkins University

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Secure Multiparty Computation from Graph Colouring

Architectures and Algorithms for Distributed Generation Control of Inertia-Less AC Microgrids

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Asynchronous Convex Consensus in the Presence of Crash Faults

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

BELIEF-PROPAGATION FOR WEIGHTED b-matchings ON ARBITRARY GRAPHS AND ITS RELATION TO LINEAR PROGRAMS WITH INTEGER SOLUTIONS

On the complexity of maximizing the minimum Shannon capacity in wireless networks by joint channel assignment and power allocation

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Math 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS

Iterative Approximate Byzantine Consensus in Arbitrary Directed Graphs

Asynchronous Non-Convex Optimization For Separable Problem

Inference in Bayesian Networks

Lecture 1. 1 Introduction. 2 Secret Sharing Schemes (SSS) G Exposure-Resilient Cryptography 17 January 2007

Zangwill s Global Convergence Theorem

Secret Sharing CPT, Version 3

Least Mean Squares Regression. Machine Learning Fall 2018

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Constrained Consensus and Optimization in Multi-Agent Networks

Gradient Descent. Sargur Srihari

Variance Reduction and Ensemble Methods

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 24: Introduction to Submodular Functions. Instructor: Shaddin Dughmi

Differential Privacy

CSC321 Lecture 8: Optimization

Comparison of Modern Stochastic Optimization Algorithms

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

2 = = 0 Thus, the number which is largest in magnitude is equal to the number which is smallest in magnitude.

Solutions of Equations in One Variable. Newton s Method

a 2 = ab a 2 b 2 = ab b 2 (a + b)(a b) = b(a b) a + b = b

An Unconditionally Secure Protocol for Multi-Party Set Intersection

COMS W4995 Introduction to Cryptography October 12, Lecture 12: RSA, and a summary of One Way Function Candidates.

Dual Decomposition for Inference

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Symmetric Rendezvous in Graphs: Deterministic Approaches

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Stochastic Gradient Descent in Continuous Time

Communication constraints and latency in Networked Control Systems

Jim Lambers MAT 460 Fall Semester Lecture 2 Notes

MS 2001: Test 1 B Solutions

Linear Codes, Target Function Classes, and Network Computing Capacity

Introduction to Optimization

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Benny Pinkas Bar Ilan University

Private and Verifiable Interdomain Routing Decisions. Proofs of Correctness

Privacy in Statistical Databases

Agreement algorithms for synchronization of clocks in nodes of stochastic networks

Lecture Notes 20: Zero-Knowledge Proofs

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Early-Deciding Consensus is Expensive

PERFECTLY secure key agreement has been studied recently

Randomized Protocols for Asynchronous Consensus

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Distributed Algorithms for Consensus and Coordination in the Presence of Packet-Dropping Communication Links

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Secure Computation. Unconditionally Secure Multi- Party Computation

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Transcription:

Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign

Acknowledgements Shripad Gade Lili Su

argmin x2x SX i=1 i f i (x)

Applications g f i (x) = cost for robot i to go to location x f 1 (x) x g Minimize total cost of rendezvous x 1 argmin x2x SX i=1 i f i (x) f 2 (x) x 2

Applications f 1 (x) f 2 (x) Learning Minimize cost Σ f i (x) i f 3 (x) f 4 (x) 5

Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Distributed Optimization Server f & f " f & f # f " f % f # f $ 7

Client-Server Architecture Server f 1 (x) f 2 (x) f & f # f " f 3 (x) 8 f 4 (x)

Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server f & f # f "

Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) x ( Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f "

Client-Server Architecture g Server maintains estimate x ( g Client i knows f ) (x) Server In iteration k+1 f ) (x ( ) g Client i idownload x ( from server iupload gradient f ) (x ( ) f & f # f " g Server x (-& x ( α ( 2 f ) x ( )

Variations g Stochastic g Asynchronous g 12

Peer-to-Peer Architecture f 1 (x) f 2 (x) f & f " f % f # f $ f 3 (x) f 4 (x)

Peer-to-Peer Architecture g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Server f ) (x ( ) f & f # f "

Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised

Server f ) (x ( ) f & f # f " Server observes gradients è privacy compromised Achieve privacy and yet collaboratively optimize

Related Work g Cryptographic methods (homomorphic encryption) g Function transformation g Differential privacy 19

Differential Privacy Server f ) x ( + ε k f & f # f " 20

Differential Privacy Server f ) x ( + ε k f & f # f " Trade-off privacy with accuracy 21

Proposed Approach g Motivated by secret sharing g Exploit diversity Multiple servers / neighbors 22

Proposed Approach Server 1 Server 2 f & f # f " Privacy if subset of servers adversarial 23

Proposed Approach f & f " f % f # f $ Privacy if subset of neighbors adversarial 24

Proposed Approach g Structured noise that cancels over servers/neighbors 25

Intuition x 1 x 2 Server 1 Server 2 f & f # f " 26

Intuition x 1 x 2 Server 1 Server 2 Each client simulates multiple clients f && f &# f #& f ## f "& f "# 27

Intuition x 1 x 2 Server 1 Server 2 f && f &# f #& f ## f "& f "# f && (x) + f&# x = f & x f )8 (x) not necessarily convex 28

Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients

Algorithm g Each server maintains an estimate In each iteration g Client i idownload estimates from corresponding server iupload gradient of f ) g Each server updates estimate using received gradients g Servers periodically exchange estimates to perform a consensus step

Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 31

Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# 32

Privacy f && + f #& +f "& f #& + f ## +f "# Server 1 Server 2 f && f &# f #& f ## f "& f "# g Server 1 may learn f &&, f #&, f "&, f #& + f ## +f "# g Not sufficient to learn f ) 33

f && (x) + f&# x = f& x g Function splitting not necessarily practical g Structured randomization as an alternative 34

Structured Randomization g Multiplicative or additive noise in gradients g Noise cancels over servers 35

Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 36

Multiplicative Noise x 1 x 2 Server 1 Server 2 f & f # f " 37

Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 38

Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Suffices for this invariant to hold over a larger number of iterations

Multiplicative Noise x 1 x 2 Server 1 Server 2 α f & (x 1 ) β f & (x 2 ) f & f # f " α+β=1 Noise from client i to server j not zero-mean

Claim g Under suitable assumptions, servers eventually reach consensus in argmin x2x SX i=1 i f i (x) 41

Peer-to-Peer Architecture f & f " f % f # f $

Reminder g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide f & f " f % f # f $

Proposed Approach g Each agent shares noisy estimate with neighbors Scheme 1 Noise cancels over neighbors Scheme 2 Noise cancels network-wide x + ε 1 ε 1 + ε 2 = 0 (over iterations) f & f " f % f # f $ x + ε 2

Peer-to-Peer Architecture g Poster today Shripad Gade

Outline argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Fault-Tolerance g Some agents may be faulty g Need to produce correct output despite the faults 48

Byzantine Fault Model g No constraint on misbehavior of a faulty agent g May send bogus messages g Faulty agents can collude 49

Peer-to-Peer Architecture g f i (x) = cost for robot i to go to location x f 1 (x) x g Faulty agent may choose arbitrary cost function x 1 f 2 (x) x 2

Peer-to-Peer Architecture f & f " f % f # f $ 51

Client-Server Architecture f ) (x ( ) Server f & f # f "

Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) 53

Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents argmin x2x SX f i (x) i=1 i good

Fault-Tolerant Optimization g The original problem is not meaningful argmin x2x SX i=1 i f i (x) g Optimize cost over only non-faulty agents Impossible! argmin x2x SX f i (x) i=1 i good

Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i g With α i as close to 1/ good as possible

Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i With t Byzantine faulty agents: t weights may be 0

Fault-Tolerant Optimization g Optimize weighted cost over only non-faulty agents argmin x2x SX i=1 i good f i (x) α i t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)

Centralized Algorithm g Of the n agents, any t may be faulty g How to filter cost functions of faulty agents? X

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows 60

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions 61

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients 62

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x 63

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex

Centralized Algorithm: Scalar argument x Define a virtual function G(x) whose gradient is obtained as follows At a given x g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex à Can optimize easily

Peer-to-Peer Fault-Tolerant Optimization g Gradient filtering similar to centralized algorithm require rich enough connectivity correlation between functions helps g Vector case harder redundancy between functions helps 66

Summary argmin x2x SX i=1 i f i (x) f & f " f & f " f & f " f % f # f $ f % f # f $ f % f # f $ Distributed Optimization Privacy Fault-tolerance

Thanks! disc.ece.illinois.edu

69

70

Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates f & f " f % f # f $

Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate x (-& x ( α ( f ) x ( f & f " f % f # f $

Distributed Peer-to-Peer Optimization g Each agent maintains local estimate x In each iteration g Compute weighted average with neighbors estimates g Apply own gradient to own estimate g Local estimates converge to f & f " x (-& x ( α ( f ) x ( SX argmin f i (x) x2x i i=1 f % f # f $

RSS Locally Balanced Perturbations g Add to zero (locally per node) g Bounded ( Δ) Algorithm g Node j selects d @ A,B such that d@ A,B B = 0 and d ( 8,) Δ g Share w @ A,B = x@ A + d@ A,B with node i g Consensus and (Stochastic) Gradient Descent 74

RSS Network Balanced Perturbations g Add to zero (over network) g Bounded ( Δ) Algorithm g Node j computes perturbation d @ A - sends s A,B to i - add received s B,A and subtract sent s A,B d @ A = rcvd sent A A A g Obfuscate state w @ = x @ + d@ shared with neighbors g Consensus and (Stochastic) Gradient Descent 75

Convergence Let xp A Q = Q A α @ x @ / Q α @ and α @ = 1/ k f xp A Q f x O log T T + O Δ# log T T g Asymptotic convergence of iterates to optimum g Privacy-Convergence Trade-off g Stochastic gradient updates work too 76

Function Sharing g Let f B (x) be bounded degree polynomials Algorithm g Node j shares s A,B x with node i g Node j obfuscates using p A x = s B,A x s A,B (x) g Use f^a x = f A x + p A (x) and use distributed gradient descent 77

Function Sharing - Convergence g Function Sharing iterates converge to correct optimum ( f^b x = f(x)) g Privacy: If vertex connectivity of graph f then no group of f nodes can estimate true functions f ) (or any good subset) g p A (x) is also similar to f A (x) then it can hide f B x well 78