Learning distributions and hypothesis testing via social learning

Size: px

Start display at page:

Download "Learning distributions and hypothesis testing via social learning"

Rudolph Elliott
5 years ago
Views:

1 UMich EECS / 48 Learning distributions and hypothesis testing via social learning Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey September 29, 2015 (Joint work with Tara Javidi and Anusha Lalitha (UCSD)) Work sponsored by NSF under award CCF

2 UMich EECS / 48 Introduction

3 UMich EECS / 48 Some philosophical questions How we (as a network of social agents) make common choices or inferences about the world? If I want to help you learn, should I tell you my evidence or just my opinion? How much do we need to communicate with each other?

4 UMich EECS / 48 Which may have some applications (?) Distributed monitoring in networks (estimating a state). Hypothesis testing or detection using multi-modal sensors. Models for vocabulary evolution. Social learning in animals.

5 UMich EECS / 48 Estimation First simple model: estimate a histogram of local data. Each agent starts with a single color. Pass message to learn the histogram of initial colors or sample from that histogram. Main focus: simple protocols with limited communication.

6 UMich EECS / 48 Hypothesis testing Second simple model: estimate a global parameter θ. Each agent takes observations over time conditioned on θ. Can do local updates followed by communication with neighbors. Main focus: simple rule and rate of convergence.

7 UMich EECS / 48 Social learning i 4 Social learning focuses on simple models for how (human) networks can form consensus opinions: Consensus-based DeGroot model: gossip, average consensus etc. Bayesian social learning (Acemoglu et al., Bala and Goyal): agents make decisions and are observed by other agents. Opinion dynamics where agents change beliefs based on beliefs of nearby neighbors.

8 UMich EECS / 48 On limited messages Both of our problems involve some sort of average consensus step. In the first part we are interested in exchanging approximate messages.

9 UMich EECS / 48 On limited messages Both of our problems involve some sort of average consensus step. In the first part we are interested in exchanging approximate messages. Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carli et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez)

10 UMich EECS / 48 On limited messages Both of our problems involve some sort of average consensus step. In the first part we are interested in exchanging approximate messages. Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carli et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network topologies (even more references).

11 UMich EECS / 48 On limited messages Both of our problems involve some sort of average consensus step. In the first part we are interested in exchanging approximate messages. Lots of work in quantized consensus (Aysal-Coates-Rabbat, Carli et al., Kashyap et al. Lavaei and Murray, Nedic et al, Srivastava and Nedic, Zhu and Martinez) Time-varying network topologies (even more references). Pretty mature area at this point.

12 UMich EECS / 48 A roadmap W 2,1 2 W 1,3 3 1 W 1,2 W 3,1 i W ji W ij j

13 UMich EECS / 48 A roadmap W 2,1 2 W 1,3 3 1 W 1,2 W 3,1 i W ji W ij j Social sampling and estimating histograms

14 UMich EECS / 48 A roadmap W 2,1 2 W 1,3 3 1 W 1,2 W 3,1 i W ji W ij j Social sampling and estimating histograms Distributed hypothesis testing and network divergence

15 UMich EECS / 48 A roadmap W 2,1 2 W 1,3 3 1 W 1,2 W 3,1 i W ji W ij j Social sampling and estimating histograms Distributed hypothesis testing and network divergence Some ongoing work and future ideas.

16 UMich EECS 2015 > Estimating Histograms 10 / 48 Social sampling and merging opinions A.D., T. Javidi, Distributed Learning of Distributions via Social Sampling, IEEE Transactions on Automatic Control 60(1): pp , January 2015.

17 UMich EECS 2015 > Estimating Histograms 11 / 48 Consensus and dynamics in networks Collection of individuals or agents

18 UMich EECS 2015 > Estimating Histograms 11 / 48 Consensus and dynamics in networks i 4 Collection of individuals or agents Agents observe part of a global phenomenon

19 UMich EECS 2015 > Estimating Histograms 11 / 48 Consensus and dynamics in networks i 4 Collection of individuals or agents Agents observe part of a global phenomenon Network of connections for communication

20 UMich EECS 2015 > Estimating Histograms 12 / 48 Phenomena vs. protocols

21 UMich EECS 2015 > Estimating Histograms 12 / 48 Phenomena vs. protocols Engineering: 7 9 Focus on algorithms Minimize communication cost How much do we lose vs. centralized?

22 UMich EECS 2015 > Estimating Histograms 12 / 48 Phenomena vs. protocols Engineering: 7 9 Focus on algorithms Minimize communication cost How much do we lose vs. centralized? Phenomenological: Focus on modeling Simple protocols What behaviors emerge?

23 UMich EECS 2015 > Estimating Histograms 13 / 48 Why simple protocols? i 4 We are more interested in developing simple models that can exhibit different phenomena. Simple source models. Simple communication that uses fewer resources. Simple update rules that are easier to analyze.

24 UMich EECS 2015 > Estimating Histograms 14 / 48 Communication and graph i 4 The n agents are arranged in a connected graph G.

25 UMich EECS 2015 > Estimating Histograms 14 / 48 Communication and graph i 4 The n agents are arranged in a connected graph G.

26 UMich EECS 2015 > Estimating Histograms 14 / 48 Communication and graph Y i (t) Y i (t) Y i (t) Y i (t) Y i (t) The n agents are arranged in a connected graph G. Agent i broadcasts to neighbors N i in the graph.

27 UMich EECS 2015 > Estimating Histograms 14 / 48 Communication and graph Y 1 (t) Y 2 (t) Y 3 (t) Y 5 (t) Y 4 (t) The n agents are arranged in a connected graph G. Agent i broadcasts to neighbors N i in the graph. Message Y i (t) lies in a discrete set.

28 UMich EECS 2015 > Estimating Histograms 15 / 48 The problem

29 UMich EECS 2015 > Estimating Histograms 15 / 48 The problem Each agent starts with θ i {1, 2,..., M}

30 UMich EECS 2015 > Estimating Histograms 15 / 48 The problem Each agent starts with θ i {1, 2,..., M} Agent i knows θ i (no noise)

31 UMich EECS 2015 > Estimating Histograms 15 / 48 The problem Each agent starts with θ i {1, 2,..., M} Agent i knows θ i (no noise) Maintain estimates Q i (t) of the empirical distribution Π of {θ i }

32 UMich EECS 2015 > Estimating Histograms 16 / 48 Social sampling We model the messages as random samples from local estimates. 1 Update rule from Q i (t 1) to Q i (t) : Q i (t) = W i (Q i (t 1), X i (t), Y i (t 1), {Y j (t 1) : j N i }, t). 2 Build a sampling distribution on {0, 1,..., M}: P i (t) = V i (Q i (t), t). 3 Sample message: Y i (t) P i (t).

33 UMich EECS 2015 > Estimating Histograms 17 / 48 Social sampling X i i W ij trust j X j i Y j (t) update j W ji Y i (t) Q i (0) Q j (0) Q i (1) Q j (1) i Y j (t) messages j i W ij t!1 j Y i (t) W ji Q i (0) Q j (0) Q i (t) Q j (t)

34 UMich EECS 2015 > Estimating Histograms 18 / 48 Possible phenomena

35 UMich EECS 2015 > Estimating Histograms 18 / 48 Possible phenomena 1.00 Estimate at a single node vs. time Node estimate element Q1 Q2 Q3 Q time Coalescence: all agents converge to singletons

36 UMich EECS 2015 > Estimating Histograms 18 / 48 Possible phenomena Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23 Consensus: agents converge to common ˆΠ Π

37 UMich EECS 2015 > Estimating Histograms 18 / 48 Possible phenomena Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23 Convergence: agents converge to Π

38 UMich EECS 2015 > Estimating Histograms 19 / 48 Linear update rule Q i (t) = A i (t)q i (t 1) + B i (t)y i (t 1) + j N i W ij (t)y j (t 1) Linear update rule combining Y i P i and Q i.

39 UMich EECS 2015 > Estimating Histograms 19 / 48 Linear update rule Q i (t) = A i (t)q i (t 1) + B i (t)y i (t 1) + j N i W ij (t)y j (t 1) Linear update rule combining Y i P i and Q i. Exhibits different behavior depending on A i (t), B i (t), and W (t).

40 UMich EECS 2015 > Estimating Histograms 20 / 48 Convergence Main idea : massage the update rule into matrix form: Q(t + 1) = Q(t) + δ(t) [ ] HQ(t) + C(t) + M(t). with 1 Step size δ(t) = 1/t 2 Perturbation C(t) = O(δ(t)) 3 Martingale difference term M(t) This is a stochastic approximation: converges to a fixed point of H.

41 UMich EECS 2015 > Estimating Histograms 21 / 48 Example: censored updates Suppose we make distribution P i (t) a censored version of Q i (t): P i,m (t) = Q i,m (t) 1 (Q i,m (t) > δ(t)(1 W ii ))) M P i,0 (t) = Q i,m (t) 1 (Q i,m (t) δ(t)(1 W ii )) m=1

42 UMich EECS 2015 > Estimating Histograms 21 / 48 Example: censored updates Suppose we make distribution P i (t) a censored version of Q i (t): P i,m (t) = Q i,m (t) 1 (Q i,m (t) > δ(t)(1 W ii ))) M P i,0 (t) = Q i,m (t) 1 (Q i,m (t) δ(t)(1 W ii )) m=1 Agent sends Y i (t) = 0 if it samples a rare element in Q i.

43 UMich EECS 2015 > Estimating Histograms 22 / 48 Phenomena captured by censored model Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23

44 UMich EECS 2015 > Estimating Histograms 22 / 48 Phenomena captured by censored model Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23 Censored distribution P i (t) guards against marginal opinions.

45 UMich EECS 2015 > Estimating Histograms 22 / 48 Phenomena captured by censored model Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23 Censored distribution P i (t) guards against marginal opinions. Sampled messages Y i (t) P i (t) are simple messages.

46 UMich EECS 2015 > Estimating Histograms 22 / 48 Phenomena captured by censored model Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23 Censored distribution P i (t) guards against marginal opinions. Sampled messages Y i (t) P i (t) are simple messages. Decaying weights δ(t) represent solidifying of opinions.

47 UMich EECS 2015 > Estimating Histograms 22 / 48 Phenomena captured by censored model Node estimate Node estimatess of a single bin vs. time time element true node3 node8 node13 node18 node23 Censored distribution P i (t) guards against marginal opinions. Sampled messages Y i (t) P i (t) are simple messages. Decaying weights δ(t) represent solidifying of opinions. Result : all estimates converge almost surely to Π.

48 UMich EECS 2015 > Estimating Histograms 23 / 48 Future directions

49 UMich EECS 2015 > Estimating Histograms 23 / 48 Future directions Find the rate of convergence and dependence of the rate on the parameters

50 UMich EECS 2015 > Estimating Histograms 23 / 48 Future directions Find the rate of convergence and dependence of the rate on the parameters Investigate the robustness of the update rule to noise and perturbations

51 UMich EECS 2015 > Estimating Histograms 23 / 48 Future directions Find the rate of convergence and dependence of the rate on the parameters Investigate the robustness of the update rule to noise and perturbations Continuous distributions?

52 UMich EECS 2015 > Estimating Histograms 23 / 48 Future directions Find the rate of convergence and dependence of the rate on the parameters Investigate the robustness of the update rule to noise and perturbations Continuous distributions? Other message passing algorithms?

53 UMich EECS 2015 > Estimating Histograms 23 / 48 Future directions Find the rate of convergence and dependence of the rate on the parameters Investigate the robustness of the update rule to noise and perturbations Continuous distributions? Other message passing algorithms? Distributed optimization?

54 UMich EECS 2015 > Pseudo-Bayes 24 / 48 Non-Bayesian social learning A. Lalitha, T. Javidi, A., Social Learning and Distributed Hypothesis Testing, ArXiV report number arxiv: [math.st], October, 2014.

55 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes.

56 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }.

57 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d.

58 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}.

59 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}. θ Θ is fixed global unknown parameter

60 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}. θ Θ is fixed global unknown parameter X (t) i f i ( ; θ ).

61 UMich EECS 2015 > Pseudo-Bayes 25 / 48 Model Set of n nodes. Set of hypotheses Θ = {θ 1, θ 2,..., θ M }. Observations X (t) i are i.i.d. Fixed known distributions {f i ( ; θ 1 ), f i ( ; θ 2 ),..., f i ( ; θ M )}. θ Θ is fixed global unknown parameter X (t) i f i ( ; θ ). GOAL Parametric inference of unknown θ

62 UMich EECS 2015 > Pseudo-Bayes 26 / 48 Hypothesis Testing

63 UMich EECS 2015 > Pseudo-Bayes 26 / 48 Hypothesis Testing If θ is globally identifiable, then collecting all observations X (t) = {X (t) 1, X(t) 2,..., X(t) n } at a central locations yields a centralized hypothesis testing problem.

64 UMich EECS 2015 > Pseudo-Bayes 26 / 48 Hypothesis Testing If θ is globally identifiable, then collecting all observations X (t) = {X (t) 1, X(t) 2,..., X(t) n } at a central locations yields a centralized hypothesis testing problem. Exponentially fast convergence to the true hypothesis

.., X(t) n } at a central locations yields a centralized hypothesis testing problem.

65 UMich EECS 2015 > Pseudo-Bayes 26 / 48 Hypothesis Testing If θ is globally identifiable, then collecting all observations X (t) = {X (t) 1, X(t) 2,..., X(t) n } at a central locations yields a centralized hypothesis testing problem. Exponentially fast convergence to the true hypothesis Can this be achieved locally with low dimensional observations?

66 UMich EECS 2015 > Pseudo-Bayes 27 / 48 Example: Low-dimensional Observations If all observations are not collected centrally, node 1 individually cannot learn θ.

67 UMich EECS 2015 > Pseudo-Bayes 27 / 48 Example: Low-dimensional Observations If all observations are not collected centrally, node 1 individually cannot learn θ. = nodes must communicate.

68 UMich EECS 2015 > Pseudo-Bayes 28 / 48 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}.

69 UMich EECS 2015 > Pseudo-Bayes 28 / 48 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}. θ Θ i = θ and θ are observationally equivalent for node i.

70 UMich EECS 2015 > Pseudo-Bayes 28 / 48 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}. θ Θ i = θ and θ are observationally equivalent for node i. Suppose {θ } = Θ 1 Θ 2... Θ n.

71 UMich EECS 2015 > Pseudo-Bayes 28 / 48 Distributed Hypothesis Testing Define Θ i = {θ Θ : f i ( ; θ) = f i ( ; θ )}. θ Θ i = θ and θ are observationally equivalent for node i. Suppose {θ } = Θ 1 Θ 2... Θ n. GOAL Parametric inference of unknown θ

72 UMich EECS 2015 > Pseudo-Bayes 29 / 48 Learning Rule

73 UMich EECS 2015 > Pseudo-Bayes 29 / 48 Learning Rule At t = 0, node i begins with initial estimate vector q (0) i > 0, where components of q (t) i form a probability distribution on Θ.

74 UMich EECS 2015 > Pseudo-Bayes 29 / 48 Learning Rule At t = 0, node i begins with initial estimate vector q (0) i > 0, where components of q (t) i form a probability distribution on Θ. At t > 0, node i draws X (t) i.

75 UMich EECS 2015 > Pseudo-Bayes 30 / 48 Learning Rule Node i computes belief vector,, via Bayesian update b (t) i b (t) i (θ) = ( f i θ Θ f i X (t) i ) ; θ q (t 1) i (θ) ( X (t) i ; θ ) q (t 1) i (θ ).

76 UMich EECS 2015 > Pseudo-Bayes 30 / 48 Learning Rule Node i computes belief vector,, via Bayesian update b (t) i b (t) i (θ) = ( f i θ Θ f i X (t) i ) ; θ q (t 1) i (θ) ( X (t) i ; θ ) q (t 1) i (θ ). Sends message Y (t) i = b (t) i.

77 UMich EECS 2015 > Pseudo-Bayes 31 / 48 Learning Rule Receives messages from its neighbors at the same time.

78 UMich EECS 2015 > Pseudo-Bayes 31 / 48 Learning Rule Receives messages from its neighbors at the same time. Updates q (t) i via averaging of log beliefs, ( n ) exp j=1 W ijlog b (t) j (θ) q (t) i (θ) = θ Θ exp ( n j=1 W ijlog b (t) j (θ ) where weight W ij denotes the influence of node j on estimate of node i. ),

79 UMich EECS 2015 > Pseudo-Bayes 31 / 48 Learning Rule Receives messages from its neighbors at the same time. Updates q (t) i via averaging of log beliefs, ( n ) exp j=1 W ijlog b (t) j (θ) q (t) i (θ) = θ Θ exp ( n j=1 W ijlog b (t) j (θ ) where weight W ij denotes the influence of node j on estimate of node i. Put t = t + 1 and repeat. ),

80 UMich EECS 2015 > Pseudo-Bayes 32 / 48 In a picture Q i (,t) X i (t) f i ( ) local observations Bayesian Update b i (,t)= f i (X i (t) ) P f 0 i (X i (t) )Q i (,t) Average log-beliefs Q i (,t+ 1) = messages from {b j (,t)} neighbors Pn exp j=1 W ij log b j (,t) P 0 2 exp Pn. j=1 W ij log b j ( 0,t)

81 UMich EECS 2015 > Pseudo-Bayes 33 / 48 An example

82 UMich EECS 2015 > Pseudo-Bayes 33 / 48 An example When connected in a network, using the proposed learning rule node 1 learns θ.

83 UMich EECS 2015 > Pseudo-Bayes 34 / 48 Assumptions Assumption 1 For every pair θ θ, f i ( ; θ ) f i ( ; θ) for at least one node, i.e the KL-divergence D (f i ( ; θ ) f i ( ; θ)) > 0.

84 UMich EECS 2015 > Pseudo-Bayes 34 / 48 Assumptions Assumption 1 For every pair θ θ, f i ( ; θ ) f i ( ; θ) for at least one node, i.e the KL-divergence D (f i ( ; θ ) f i ( ; θ)) > 0. Assumption 2 The stochastic matrix W is irreducible.

85 UMich EECS 2015 > Pseudo-Bayes 34 / 48 Assumptions Assumption 1 For every pair θ θ, f i ( ; θ ) f i ( ; θ) for at least one node, i.e the KL-divergence D (f i ( ; θ ) f i ( ; θ)) > 0. Assumption 2 The stochastic matrix W is irreducible. Assumption 3 For all i [n], the initial estimate q (0) i (θ) > 0 for every θ Θ.

86 UMich EECS 2015 > Pseudo-Bayes 35 / 48 Convergence Results Let θ be the unknown fixed parameter. Suppose assumptions 1 3 hold. The eigenvector centrality v = [v 1, v 2,..., v n ] is the left eigenvector of W for eigenvalue 1.

87 UMich EECS 2015 > Pseudo-Bayes 35 / 48 Convergence Results Let θ be the unknown fixed parameter. Suppose assumptions 1 3 hold. The eigenvector centrality v = [v 1, v 2,..., v n ] is the left eigenvector of W for eigenvalue 1. Theorem: Rate of rejecting θ θ Every node i s estimate of θ θ almost surely converges to 0 exponentially fast. Mathematically, 1 lim log q(t) t i (θ) = K(θ, θ) P-a.s. t where K(θ, θ) = n j=1 v jd (f j ( ; θ ) f j ( ; θ)).

88 UMich EECS 2015 > Pseudo-Bayes 36 / 48 Example: Network-wide Learning Θ = {θ 1, θ 2, θ 3, θ 4 } and θ = θ 1. If i and j are connected, 1 W ij = degree of node i, otherwise 0. v = [ 1 12, 1 8, 1 12, 1 8, 1 6, 1 8, 1 12, 1 8, 1 12 ].

89 UMich EECS 2015 > Pseudo-Bayes 37 / 48 Example Θ 1 = {θ }, Θ i = Θ i 1 Θ5 = {θ }, Θ i = Θ i 5

90 UMich EECS 2015 > Pseudo-Bayes 38 / 48 Corollaries Theorem: Rate of rejecting θ θ Every node i s estimate of θ θ almost surely converges to 0 exponentially fast. Mathematically, 1 lim log q(t) t i (θ) = K(θ, θ) P-a.s. t where K(θ, θ) = n j=1 v jd (f j ( ; θ ) f j ( ; θ)). Lower bound on rate of convergence to θ For every node i, the rate at which error in the estimate of θ goes to zero can be lower bounded as 1 ( ) lim t t log 1 q (t) i (θ ) = min θ θ K(θ, θ) P-a.s.

91 UMich EECS 2015 > Pseudo-Bayes 39 / 48 Corollaries Lower bound on rate of learning The rate of learning λ across the network can be lower bounded as, λ min θ Θ min θ θ K(θ, θ) P-a.s. where, and λ = lim inf t 1 t log e t, e t = 1 2 n i=1 q (t) i ( ) 1 θ (.) 1 = n i=1 q (t) i θ θ (θ).

92 UMich EECS 2015 > Pseudo-Bayes 40 / 48 Example: Periodicity node 1 can distinguish node 2 can distinguish Θ = {θ 1, θ 2, θ 3, θ 4 } and θ = θ 1. Underlying graph is periodic, ( ) 0 1 W =. 1 0

93 UMich EECS 2015 > Pseudo-Bayes 41 / 48 Example: Networks with Large Mixing Times node 1 can distinguish node 2 can distinguish Θ = {θ 1, θ 2, θ 3, θ 4 } and θ = θ 1. Underlying graph is aperiodic, ( ) W =

94 UMich EECS 2015 > Pseudo-Bayes 42 / 48 Concentration Result Assumption 4 For k [n], X X k, and for any given θ i, θ j Θ such that θ i θ j, is bounded, denoted by L. log f k( ;θ i) f k ( ;θ j) Theorem Under Assumptions 1 4, for every ɛ > 0 there exists a T such that for all t T and for every θ θ and i [n] we have and ( Pr ( Pr log q (t) i log q (t) i ) (θ) (K(θ, θ) ɛ)t γ(ɛ, L, t), ) (θ) (K(θ, θ) + ɛ)t γ( ɛ, L, t), 2 where L is a finite constant and γ(ɛ, L, t) = 2 exp ( ) ɛ2 t 2L 2 d.

95 UMich EECS 2015 > Pseudo-Bayes 43 / 48 Related Work and Contribution Jadbabaie et al. use local Bayesian update of beliefs followed by averaging the beliefs. Show exponential convergence with no closed form of convergence rate. [ 12] Provide an upper bound on learning rate. [ 13] We average the log beliefs instead. Provide a lower bound on learning rate λ. Lower bound on learning rate is greater than the upper bound = Our learning rule converges faster.

96 UMich EECS 2015 > Pseudo-Bayes 43 / 48 Related Work and Contribution Jadbabaie et al. use local Bayesian update of beliefs followed by averaging the beliefs. Show exponential convergence with no closed form of convergence rate. [ 12] Provide an upper bound on learning rate. [ 13] We average the log beliefs instead. Provide a lower bound on learning rate λ. Lower bound on learning rate is greater than the upper bound = Our learning rule converges faster. Shahrampour and Jadbabaie, 13 formulated a stochastic optimization learning problem; obtained a dual-based learning rule for doubly stochastic W, Provide closed-form lower bound on rate of identifying θ. Using our rule we achieve the same lower bound (from corollary 1) min θ θ ( 1 n ) n D(f j( ; θ ) f j( ; θ)). j=1

97 UMich EECS 2015 > Pseudo-Bayes 44 / 48 Related Work and Contribution An update rule similar to ours was used in Rahnama Rad and Tahbaz-Salehi, 2010 to Show that node s belief converges in probability to the true parameter. However, under certain analytic assumptions. For general model and discrete parameter spaces we show almost-sure exponentially fast convergence.

98 UMich EECS 2015 > Pseudo-Bayes 44 / 48 Related Work and Contribution An update rule similar to ours was used in Rahnama Rad and Tahbaz-Salehi, 2010 to Show that node s belief converges in probability to the true parameter. However, under certain analytic assumptions. For general model and discrete parameter spaces we show almost-sure exponentially fast convergence. Shahrampour et. al. and Nedic et. al. (independently) showed that our learning rule coincides with distributed stochastic optimization based learning rule (W irreducible and aperiodic)

99 UMich EECS 2015 > Summing up 45 / 48 Social sampling to estimate histograms X i i W ij trust j X j i Y j (t) update j W ji Y i (t) Qi(0) Qj(0) Qi(1) Qj(1) i Y j (t) messages j i W ij t!1 j Y i (t) W ji Qi(0) Qj(0) Q i(t) Q j(t) Simple model of randomized message exchange. Unified analysis captures different qualitative behaviors. Censoring rule to achieve consensus to true histogram.

100 UMich EECS 2015 > Summing up 46 / 48 Hypothesis testing and semi-bayes Q i (t) Get Local Send Social Q i (t + 1) Data Update Msg Update X i (t) {Y j (t)} Combination of local Bayesian updates and averaging. Network divergence: an intuitive measure for the rate of convergence. Posterior consistency gives a Bayesio-frequentist analysis.

101 UMich EECS 2015 > Summing up 47 / 48 Looking forward i 4 Continuous distributions and parameters. Applications to distributed optimization. Time-varying case.

102 UMich EECS 2015 > Summing up 48 / 48 Thank You!

Consensus and Distributed Inference Rates Using Network Divergence

DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017