Consensus and Distributed Inference Rates Using Network Divergence

Size: px

Start display at page:

Download "Consensus and Distributed Inference Rates Using Network Divergence"

Percival Anderson
6 years ago
Views:

1 DIMACS August / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017 (Joint work with Tara Javidi and Anusha Lalitha (UCSD)) Work sponsored by NSF under award CCF

2 DIMACS August / 26 The model: finite hypothesis testing Second simple model: estimate a global parameter θ. Each agent takes observations over time conditioned on θ. Can do local updates followed by communication with neighbors. Main focus: simple rule and rate of convergence.

3 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes.

4 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }.

5 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d.

6 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}.

7 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}. θ Θ is fixed global unknown parameter

8 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}. θ Θ is fixed global unknown parameter X (t) i f i ( ; θ ).

9 DIMACS August 2017 > A pseudo-bayes update 3 / 26 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}. θ Θ is fixed global unknown parameter X (t) i f i ( ; θ ). GOAL Parametric inference of unknown θ

10 DIMACS August 2017 > A pseudo-bayes update 4 / 26 Hypothesis Testing

11 DIMACS August 2017 > A pseudo-bayes update 4 / 26 Hypothesis Testing If θ is globally identifiable, then collecting all observations X (t) = {X (t) 1, X(t) 2,..., X(t) n } at a central locations yields a centralized hypothesis testing problem.

12 DIMACS August 2017 > A pseudo-bayes update 4 / 26 Hypothesis Testing If θ is globally identifiable, then collecting all observations X (t) = {X (t) 1, X(t) 2,..., X(t) n } at a central locations yields a centralized hypothesis testing problem. Exponentially fast convergence to the true hypothesis

.., X(t) n } at a central locations yields a centralized hypothesis testing problem.

13 DIMACS August 2017 > A pseudo-bayes update 4 / 26 Hypothesis Testing If θ is globally identifiable, then collecting all observations X (t) = {X (t) 1, X(t) 2,..., X(t) n } at a central locations yields a centralized hypothesis testing problem. Exponentially fast convergence to the true hypothesis Can this be achieved locally with low dimensional observations?

14 DIMACS August 2017 > A pseudo-bayes update 5 / 26 Example: Low-dimensional Observations If all observations are not collected centrally, node 1 individually cannot learn θ.

15 DIMACS August 2017 > A pseudo-bayes update 5 / 26 Example: Low-dimensional Observations If all observations are not collected centrally, node 1 individually cannot learn θ. = nodes must communicate.

16 DIMACS August 2017 > A pseudo-bayes update 6 / 26 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}.

17 DIMACS August 2017 > A pseudo-bayes update 6 / 26 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}. θ Θ i = θ and θ are observationally equivalent for node i.

18 DIMACS August 2017 > A pseudo-bayes update 6 / 26 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}. θ Θ i = θ and θ are observationally equivalent for node i. Suppose {θ } = Θ 1 Θ 2... Θ n.

19 DIMACS August 2017 > A pseudo-bayes update 6 / 26 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}. θ Θ i = θ and θ are observationally equivalent for node i. Suppose {θ } = Θ 1 Θ 2... Θ n. GOAL Parametric inference of unknown θ

20 DIMACS August 2017 > A pseudo-bayes update 7 / 26 Learning Rule

21 DIMACS August 2017 > A pseudo-bayes update 7 / 26 Learning Rule At t = 0, node i begins with initial estimate vector q (0) i > 0, where components of q (t) i form a probability distribution on Θ.

22 DIMACS August 2017 > A pseudo-bayes update 7 / 26 Learning Rule At t = 0, node i begins with initial estimate vector q (0) i > 0, where components of q (t) i form a probability distribution on Θ. At t > 0, node i draws X (t) i.

23 DIMACS August 2017 > A pseudo-bayes update 8 / 26 Learning Rule Node i computes belief vector,, via Bayesian update b (t) i b (t) i (θ) = ( f i θ Θ f i X (t) i ) ; θ q (t 1) i (θ) ( X (t) i ; θ ) q (t 1) i (θ ).

24 DIMACS August 2017 > A pseudo-bayes update 8 / 26 Learning Rule Node i computes belief vector,, via Bayesian update b (t) i b (t) i (θ) = ( f i θ Θ f i X (t) i ) ; θ q (t 1) i (θ) ( X (t) i ; θ ) q (t 1) i (θ ). Sends message Y (t) i = b (t) i.

25 DIMACS August 2017 > A pseudo-bayes update 9 / 26 Learning Rule Receives messages from its neighbors at the same time.

26 DIMACS August 2017 > A pseudo-bayes update 9 / 26 Learning Rule Receives messages from its neighbors at the same time. Updates q (t) i via averaging of log beliefs, ( n ) exp j=1 W ijlog b (t) j (θ) q (t) i (θ) = θ Θ exp ( n j=1 W ijlog b (t) j (θ ) where weight W ij denotes the influence of node j on estimate of node i. ),

27 DIMACS August 2017 > A pseudo-bayes update 9 / 26 Learning Rule Receives messages from its neighbors at the same time. Updates q (t) i via averaging of log beliefs, ( n ) exp j=1 W ijlog b (t) j (θ) q (t) i (θ) = θ Θ exp ( n j=1 W ijlog b (t) j (θ ) where weight W ij denotes the influence of node j on estimate of node i. Put t = t + 1 and repeat. ),

28 DIMACS August 2017 > A pseudo-bayes update 10 / 26 In a picture Q i (,t) X i (t) f i ( ) local observations Bayesian Update b i (,t)= f i (X i (t) ) P f 0 i (X i (t) )Q i (,t) Average log-beliefs Q i (,t+ 1) = messages from {b j (,t)} neighbors Pn exp j=1 W ij log b j (,t) P 0 2 exp Pn. j=1 W ij log b j ( 0,t)

29 DIMACS August 2017 > A pseudo-bayes update 11 / 26 An example

30 DIMACS August 2017 > A pseudo-bayes update 11 / 26 An example When connected in a network, using the proposed learning rule node 1 learns θ.

31 DIMACS August 2017 > A pseudo-bayes update 12 / 26 Assumptions Assumption 1 For every pair θ θ, f i ( ; θ ) f i ( ; θ) for at least one node, i.e the KL-divergence D (f i ( ; θ ) f i ( ; θ)) > 0.

32 DIMACS August 2017 > A pseudo-bayes update 12 / 26 Assumptions Assumption 1 For every pair θ θ, f i ( ; θ ) f i ( ; θ) for at least one node, i.e the KL-divergence D (f i ( ; θ ) f i ( ; θ)) > 0. Assumption 2 The stochastic matrix W is irreducible.

33 DIMACS August 2017 > A pseudo-bayes update 12 / 26 Assumptions Assumption 1 For every pair θ θ, f i ( ; θ ) f i ( ; θ) for at least one node, i.e the KL-divergence D (f i ( ; θ ) f i ( ; θ)) > 0. Assumption 2 The stochastic matrix W is irreducible. Assumption 3 For all i [n], the initial estimate q (0) i (θ) > 0 for every θ Θ.

34 DIMACS August 2017 > A pseudo-bayes update 13 / 26 Network Divergence The eigenvector centrality v = [v 1, v 2,..., v n ] is the left eigenvector of W corresponding to eigenvalue 1. The central quantity of interest is what we call the network divergence n K(θ, θ) = v j D (f j ( ; θ ) f j ( ; θ)) j=1

35 DIMACS August 2017 > A pseudo-bayes update 14 / 26 Convergence Results Let θ be the unknown fixed parameter. Suppose assumptions 1 3 hold.

36 DIMACS August 2017 > A pseudo-bayes update 14 / 26 Convergence Results Let θ be the unknown fixed parameter. Suppose assumptions 1 3 hold. Theorem: Rate of rejecting θ θ Every node i s estimate of θ θ almost surely converges to 0 exponentially fast. Mathematically, 1 lim log q(t) t i (θ) = K(θ, θ) P-a.s. t where K(θ, θ) = n j=1 v jd (f j ( ; θ ) f j ( ; θ)).

37 DIMACS August 2017 > A pseudo-bayes update 15 / 26 Example: Network-wide Learning Θ = {θ 1, θ 2, θ 3, θ 4 } and θ = θ 1. If i and j are connected, 1 W ij = degree of node i, otherwise 0. v = [ 1 12, 1 8, 1 12, 1 8, 1 6, 1 8, 1 12, 1 8, 1 12 ].

38 DIMACS August 2017 > A pseudo-bayes update 16 / 26 Example Θ 1 = {θ }, Θ i = Θ i 1 Θ5 = {θ }, Θ i = Θ i 5

39 DIMACS August 2017 > A pseudo-bayes update 17 / 26 Corollaries Theorem: Rate of rejecting θ θ Every node i s estimate of θ θ almost surely converges to 0 exponentially fast. Mathematically, 1 lim log q(t) t i (θ) = K(θ, θ) P-a.s. t where K(θ, θ) = n j=1 v jd (f j ( ; θ ) f j ( ; θ)). Lower bound on rate of convergence to θ For every node i, the rate at which error in the estimate of θ goes to zero can be lower bounded as 1 ( ) lim t t log 1 q (t) i (θ ) = min θ θ K(θ, θ) P-a.s.

40 DIMACS August 2017 > A pseudo-bayes update 18 / 26 Corollaries Lower bound on rate of learning The rate of learning λ across the network can be lower bounded as, λ min θ Θ min θ θ K(θ, θ) P-a.s. where, and λ = lim inf t 1 t log e t, e t = 1 2 n i=1 q (t) i ( ) 1 θ (.) 1 = n i=1 q (t) i θ θ (θ).

41 DIMACS August 2017 > Examples 19 / 26 Example: Periodicity node 1 can distinguish node 2 can distinguish Θ = {θ 1, θ 2, θ 3, θ 4 } and θ = θ 1. Underlying graph is periodic, ( ) 0 1 W =. 1 0

42 DIMACS August 2017 > Examples 20 / 26 Example: Networks with Large Mixing Times node 1 can distinguish node 2 can distinguish Θ = {θ 1, θ 2, θ 3, θ 4 } and θ = θ 1. Underlying graph is aperiodic, ( ) W =

43 DIMACS August 2017 > Examples 21 / 26 Concentration Result Assumption 4 For k [n], X X k, and for any given θ i, θ j Θ such that θ i θ j, is bounded, denoted by L. log f k( ;θ i) f k ( ;θ j) Theorem Under Assumptions 1 4, for every ɛ > 0 there exists a T such that for all t T and for every θ θ and i [n] we have and ( Pr ( Pr log q (t) i log q (t) i ) (θ) (K(θ, θ) ɛ)t γ(ɛ, L, t), ) (θ) (K(θ, θ) + ɛ)t γ( ɛ, L, t), 2 where L is a finite constant and γ(ɛ, L, t) = 2 exp ( ) ɛ2 t 2L 2 d.

44 DIMACS August 2017 > Examples 22 / 26 Related Work and Contribution Jadbabaie et al. use local Bayesian update of beliefs followed by averaging the beliefs. Show exponential convergence with no closed form of convergence rate. [ 12] Provide an upper bound on learning rate. [ 13] We average the log beliefs instead. Provide a lower bound on learning rate λ. Lower bound on learning rate is greater than the upper bound = Our learning rule converges faster.

45 DIMACS August 2017 > Examples 22 / 26 Related Work and Contribution Jadbabaie et al. use local Bayesian update of beliefs followed by averaging the beliefs. Show exponential convergence with no closed form of convergence rate. [ 12] Provide an upper bound on learning rate. [ 13] We average the log beliefs instead. Provide a lower bound on learning rate λ. Lower bound on learning rate is greater than the upper bound = Our learning rule converges faster. Shahrampour and Jadbabaie, 13 formulated a stochastic optimization learning problem; obtained a dual-based learning rule for doubly stochastic W, Provide closed-form lower bound on rate of identifying θ. Using our rule we achieve the same lower bound (from corollary 1) min θ θ ( 1 n ) n D(f j( ; θ ) f j( ; θ)). j=1

46 DIMACS August 2017 > Examples 23 / 26 Related Work and Contribution An update rule similar to ours was used in Rahnama Rad and Tahbaz-Salehi, 2010 to Show that node s belief converges in probability to the true parameter. However, under certain analytic assumptions. For general model and discrete parameter spaces we show almost-sure exponentially fast convergence.

47 DIMACS August 2017 > Examples 23 / 26 Related Work and Contribution An update rule similar to ours was used in Rahnama Rad and Tahbaz-Salehi, 2010 to Show that node s belief converges in probability to the true parameter. However, under certain analytic assumptions. For general model and discrete parameter spaces we show almost-sure exponentially fast convergence. Shahrampour et. al. and Nedic et. al. (independently) showed that our learning rule coincides with distributed stochastic optimization based learning rule (W irreducible and aperiodic)

48 DIMACS August 2017 > Summing up 24 / 26 Hypothesis testing and semi-bayes Q i (t) Get Local Send Social Q i (t + 1) Data Update Msg Update X i (t) {Y j (t)} Combination of local Bayesian updates and averaging. Network divergence: an intuitive measure for the rate of convergence. Posterior consistency gives a Bayesio-frequentist analysis.

49 DIMACS August 2017 > Summing up 25 / 26 Looking forward i 4 Continuous distributions and parameters. Applications to distributed optimization. Further limiting messages via coordinate descent ( and Javidi 15). Time-varying parameters and distributed stochastic filtering.

50 DIMACS August 2017 > Summing up 26 / 26 Thank You!

Learning distributions and hypothesis testing via social learning

UMich EECS 2015 1 / 48 Learning distributions and hypothesis testing via social learning Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey September 29, 2015