Probability Models of Information Exchange on Networks Lecture 3

Similar documents
Probability Models of Information Exchange on Networks Lecture 1

Probability Models of Information Exchange on Networks Lecture 2. Elchanan Mossel UC Berkeley All rights reserved

Probability Models of Information Exchange on Networks Lecture 5

Asymptotic Learning on Bayesian Social Networks

Social Choice and Networks

Social Choice and Social Networks. Some non martingale learning models. Draft All rights reserved

Social Choice and Social Network. Aggregation of Biased Binary Signals. (Draft)

Logic and Artificial Intelligence Lecture 7

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Opinion Exchange Dynamics. Elchanan Mossel and Omer Tamuz. Copyright All rights reserved to the authors

Mixing in Product Spaces. Elchanan Mossel

Agreement Theorems from the Perspective of Dynamic-Epistemic Logic

CO759: Algorithmic Game Theory Spring 2015

Disjointness and Additivity

Social Choice and Social Networks. Aggregation of General Biased Signals (DRAFT)

Quick Tour of Basic Probability Theory and Linear Algebra

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Economics of Networks Social Learning

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Limiting Distributions

A PECULIAR COIN-TOSSING MODEL

Tractable Bayesian Social Learning on Trees

Lecture Notes 3 Convergence (Chapter 5)

Recitation 9: Loopy BP

Preliminary Results on Social Learning with Partial Observations

CS181 Midterm 2 Practice Solutions

Arrow Theorem. Elchanan Mossel. Draft All rights reserved

Bayesian Learning in Social Networks

CS145: Probability & Computing

Making Consensus Tractable

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

CSE 312 Final Review: Section AA

6.207/14.15: Networks Lecture 24: Decisions in Groups

Social Learning Equilibria

Statistics: Learning models from data

Lecture 13: Spectral Graph Theory

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Limiting Distributions

11. Learning graphical models

Lectures 2 3 : Wigner s semicircle law

Algorithmic Game Theory and Applications

Information Aggregation in Complex Dynamic Networks

Lecture 4. 1 Examples of Mechanism Design Problems

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Learning in Linear Games over Networks

Consensus and Distributed Inference Rates Using Network Divergence

Proving the central limit theorem

Definitions and Proofs

2 2 + x =

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture 4: Graph Limits and Graphons

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Variational Inference (11/04/13)

Lecture 14 October 13

Non Linear Invariance & Applications. Elchanan Mossel. U.C. Berkeley

The central limit theorem

Economics 3012 Strategic Behavior Andy McLennan October 20, 2006

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

Game Theory Lecture 10+11: Knowledge

Notes 6 : First and second moment methods

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Introduction to Probability

MAS223 Statistical Inference and Modelling Exercises

Contracts under Asymmetric Information

Distributed Systems Gossip Algorithms

Primer on statistics:

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

Learning distributions and hypothesis testing via social learning

Lecture 8: Continuous random variables, expectation and variance

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

T Machine Learning: Basic Principles

Optimal Auctions with Correlated Bidders are Easy

ERROR IN LINEAR INTERPOLATION

Competitive Equilibrium

Notes 15 : UI Martingales

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Undirected graphical models

Machine Learning 4771

Alvaro Rodrigues-Neto Research School of Economics, Australian National University. ANU Working Papers in Economics and Econometrics # 587

We set up the basic model of two-sided, one-to-one matching

Probability Background

Introduction to Formal Epistemology Lecture 5

Bayesian Machine Learning - Lecture 7

Robust Knowledge and Rationality

Iterated Strict Dominance in Pure Strategies

Naïve Bayes classification

CS6220: DATA MINING TECHNIQUES

Lectures 2 3 : Wigner s semicircle law

Lecture 3. 1 Polynomial-time algorithms for the maximum flow problem

1 Terminology and setup

At the start of the term, we saw the following formula for computing the sum of the first n integers:

MAS113 Introduction to Probability and Statistics

lecture notes October 22, Probability Theory

Notes 18 : Optional Sampling Theorem

{X i } realize. n i=1 X i. Note that again X is a random variable. If we are to

Most data analysis starts with some data set; we will call this data set P. It will be composed of a set of n

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009

Transcription:

Probability Models of Information Exchange on Networks Lecture 3 Elchanan Mossel UC Berkeley All Right Reserved

The Bayesian View of the Jury Theorem Recall: we assume 0/1 with prior probability (0.5,0.5). Each voter receives signal x i which is correct with probability p independently. Note that if this is indeed the case, then after the vote has been cast, all voters can calculate: P[s = 1 x]/ P[s = 0 x]. Obtain posterior probability of 1,0. Everybody agree about the posterior. Can this be extended to networks?

Bayesian Exchange on Networks Setup: S 2 {0,1} with P[S = 1] = ½ (apriori). Distributions of signals D 0, D 1 A (directed) social network G = (V,E) of size n with self loops.

Bayesian Exchange on Networks Setup: S 2 {0,1} with P[S = 1] = ½ (apriori). Distributions of signals D 0, D 1 A (directed) social network G = (V,E) of size n with self loops. At time 0: agents receive ind. signals X(i,0) from D S Let F(i,0) = ¾(X(i)) At each discrete time step t 1: Agent i declares X(i,t) = P[S = 1 F(i,t-1)] = E[S F(i,t-1)] Let F(i,t) = ¾(X(j,s) : (i j) 2 E, s t)

Bayesian Exchange on Networks Setup: S 2 {0,1} with P[S = 1] = ½ (apriori). Distributions of signals D 0, D 1 A (directed) social network G = (V,E) of size n with self loops. At time 0: agents receive ind. signals X(i,0) from D S Let F(i,0) = ¾(X(i)) At each discrete time step t 1: Agent i declares X(i,t) = E[S F(i,t-1)] Let F(i,t) = ¾(X(j,s) : (i j) 2 E, s t) Q1: Do agents converge? Q2: If they do, what to do they converge to?

Convergence of a Single Agent At time 0: agent i, receives a signal X(i,0) from D S Let F(i,0) = ¾(X(i)) At each discrete time step t 1: Agent i declares X(i,t) = E[S F(i,t-1)] Let F(i,t) = ¾(X(j,s) : (i j) 2 E, s t) Claim: Each agent converges.

Convergence of a Single Agent At time 0: agent i, receives a signal X(i,0) from D S Let F(i,0) = ¾(X(i)) At each discrete time step t 1: Agent i declares X(i,t) = E[S F(i,t-1)] Let F(i,t) = ¾(X(j,s) : (i j) 2 E, s t) Claim: Each agent converges. Pf: X(i,t) is a bounded martingale.

Convergence of a Single Agent At time 0: agent i, receives a signal X(i,0) from D S Let F(i.0) = ¾(X(i)) At each discrete time step t 1: Agent i declares X(i,t) = E[S F(i,t-1)] Let F(i,t) = ¾(X(j,s) : (i j) 2 E, s t) Claim: Each agent converges. Pf: X(i,t) is a bounded martingale. Note: Doesn t use anything about Network structure or Independence of signals. Q: Do agents agree in the limit?

Bayesian Exchange on Networks Q1: Do agents converge to the same belief? Not necessarily. For example disconnected graph.

Bayesian Exchange on Networks Q1: Do agents converge to the same belief? Not necessarily. For example disconnected graph. Or even graph that is not strongly connected.

Agreement Q1: Do agents converge to the same belief? Not necessarily. For example disconnected graph. Or even graph that is not strongly connected. Thm (Aumann 76, Geanakoplos & Polemarchakis 82, Parikh, Krasucki): If the graph G is strongly connected, all agents will a.s. converge to the same value. Recall: Strongly connected means that for every pair of vertices there is a directed path connecting them.

Agreement Proof Proof Sketch: : Let X(i) = lim X(i,t) = E[S F(i)], where F(i) = lim F(i,t). X(i) is the function closest to S in L 2 (F(i)).

Agreement Proof Proof Sketch: : Let X(i) = lim X(i,t) = E[S F(i)], where F(i) = lim F(i,t). X(i) is the function closest to S in L 2 (F(i)). If i j in G then X(j) 2 L 2 (F(i)). Therefore: X(i)-S 2 X(j)-S 2. Strong connectivity ) 8 i,j: X(i)-S 2 = X(j)-S 2

Agreement Proof Proof Sketch: : Let X(i) = lim X(i,t) = E[S F(i)], where F(i) = lim F(i,t). X(i) is the function closest to S in L 2 (F(v)). If i j in G then X(j) 2 L 2 (F(i)). Therefore: X(i)-S 2 X(j)-S 2. Strong connectivity ) 8 i,j: X(i)-S 2 = X(j)-S 2 If i j, P[X(i) X(j)] > 0 then Z = 0.5(X(i)+X(j)) 2 F(i) and Z closer to S than either X(i) or X(j). Strong connectivity ) 8 i,j: X(i) = X(j).

Agreement History Note: Result and proof above did not use independence of signals. Aumann 76: notion of common knowledge: If two people have the same priors and their posterior for a given event are common knowledge, then these posteriors must be equal Critique of Bayesian economics. Geanakoplos & Polemarchakis 82: Dynamics with two individuals. Parikh, Krasucki: Networks

The Learning Problem Assume G is strongly connected. Do agents learn? Strongest possible sense of learning: Do as well as if all see all signals. Strongest possible sense of non-learning: Do not do better than random. Consider: dependent / independent signals.

Non-Learning Assume G is strongly connected. Do agents learn? Example of non learning: G = ({1,2}, {(1,2)}) S(1) and S(2) uniform +/- with S = S(1) S(2) X(1,t) = X(2,t) = 0 for all t. Or G = K n where ½ of the vertices get S(1) and ½ get S(2). A lot of information but it is all lost. How about if the the signals are (conditionally) independent?

Learning with independent signals Assume G is strongly connected & The signals X(i) are i.i.d. conditionally on S. Let X = lim t X(i,t) = limit common belief. Thm (M, Sly, Tamuz): X = E[S X(1),,X(n)]

Learning with independent signals Assume G is strongly connected & The signals X(i) are i.i.d. conditionally on S. Let X = lim t X(i,t) = limit common belief. Thm (M, Sly, Tamuz): X = E[S X(1),,X(n)] Agents aggregate optimally! Statement and proof work for any model where the posterior beliefs are common knowledge.

Learning with independent signals Assume G is strongly connected & The signals X(i) are i.i.d. conditionally on S. Let X = lim t X(i,t) = limit common belief. Thm (M, Sly, Tamuz): X = E[S X(1),,X(n)] Agents aggregate optimally! Statement and proof work for any model where the posterior beliefs are common knowledge. The proof uses Chebyshev s sum inequality: if f and g are strictly increasing then E[f(X) g(x)] E[f(X)] E[g(X))] and equality means that g = c f.

Agreeing on beliefs implies learning - proof Belief Learning Theorem (M. Sly and Tamuz (2012)) If there exists a random variable X such that X = X i := E [S F i (1)] for all i then all agents learned optimally: X = P [S =1 X (1),...,X (n)]. 23 / 27

Proof Sketch I Z i := log P [S =1 X (i)] P [S =0 X (i)i] = log P [X (i) S = 1] P [X (i) S = 0], Z = X i Z i so P [S =1 X (1),...,X (n)] = L(Z) where L(x) =e x /(e x + e x ). 24 / 27

Proof Sketch I Z i := log so P [S =1 X (i)] P [S =0 X (i)i] where L(x) =e x /(e x + e x ). I Since X is F i measurable = log P [X (i) S = 1] P [X (i) S = 0], P [S =1 X (1),...,X (n)] = L(Z) X = E [L(Z) F i ]=E [L(Z) X ]. Z = X i Z i 24 / 27

Proof Sketch I Z i := log so P [S =1 X (i)] P [S =0 X (i)i] where L(x) =e x /(e x + e x ). I Since X is F i measurable = log P [X (i) S = 1] P [X (i) S = 0], P [S =1 X (1),...,X (n)] = L(Z) X = E [L(Z) F i ]=E [L(Z) X ]. I Hence since Z i is F i measurable E [Z i L(Z) X ]=E [E [Z i L(Z) F i ] X ] = E [Z i X X ] = E [Z i X ] E [L(Z) X ] Z = X i Z i 24 / 27

Proof concluded E [Z i L(Z) X ]=E [Z i X ] E [L(Z) X ] I Summing over i we get that E [Z L(Z) X ]=E [Z X ] E [L(Z) X ] 25 / 27

Proof concluded E [Z i L(Z) X ]=E [Z i X ] E [L(Z) X ] I Summing over i we get that E [Z L(Z) X ]=E [Z X ] E [L(Z) X ] I Since L(x) is strictly increasing this implies that Z is constant conditional on X,i.e.Z is X measurable so X = E [L(Z) X ]=E [L(Z) Z] =L(Z) I So the agreed value X equal to the optimal estimator L(Z) as needed. 25 / 27

Some Open Problems Open problem 1: Is the learning theorem valid under weaker conditions on the distributions? Open problem 2: How quick is the convergence to the agreed value? Open problem 3: Are there good algorithms to compute the dynamics? We will now look at problems 2 and 3 in some simple special cases.

Learning in finite probability spaces Question: Assume the probability space of the state of the world and the signals is finite. Does the learning process converge in a finite number of iterations?

Learning in finite probability spaces Claim: Consider the process on a graph G with n vertices and assume that the size of the probability space (including S and all signals) is a finite M. Then the learning process converges in at most M n iteration.

Learning in finite probability spaces Claim: Consider the process on a graph G with n vertices and assume that the size of the probability space Ω (including S and all signals) is a finite M. Then the learning process converges in at most M n iteration. Pf: The information F i (t) may be encoded by S i (t) ½ Ω. Satisfying: S i (t+1) µ S i (t). If S i (t+1) = S i (t) for some t and all i then the process has converged. Argument is close to that of Geanakoplos & Polemarchakis 82. Open problem: Can this bound be improved?

An Example of Learning in Finite Spaces Example: X 2 [n 2 ] with uniform prior. The signals are: Player 1: X 2 {1,n}, X 2 {n+1,..,2n} etc. Player 2: X 2 {1,,n+1},X 2 {n+2,..,2n+2},, X 2 {n 2 } True value is X is sampled to be 1. The event the players are estimating S = 1(X 2 {1,n+2,2n+3, 3n + 4,, n 2-1, n 2 }). Q: What will happen?

An Example of Learning in Finite Spaces Example: X 2 [n 2 ] with uniform prior. The signals are: Player 1: X 2 {1,n}, X 2 {n+1,..,2n} etc. Player 2: X 2 {1,,n+1},X 2 {n+2,..,2n+2},, X 2 {n 2 } True value is X is sampled to be 1. The event the players are estimating S = 1(X 2 {1,n+2,2n+3, 3n + 4,, n 2-1, n 2 }). What will happen? Player 1 will say 1/n Player 2 will say 1/(n+1) Player 1 learns that it is not n 2 but will still say 1/n. Player 2 learns that player 1 was not in the last group but will still say 1/(n+1). Q: How tight is the bound?

Next Example We will talk about a Gaussian model which is: Computationally feasible Has rapid convergence. Converges to the optimal answer for every connected network. Following model was studied in P. DeMarzo, D. Vayanos, and J. Zwiebel. and by Mossel and Tamuz.

The Gaussian Model The original signals are N(µ =?,1). In each iteration Each agent action reveals her current estimate of µ to her neighbors. E.g. set price by min utility (x- µ) 2 Each agent calculates a new estimate of µ based on her neighbors broadcasts. Assume agents know the graph structure. Repeat ad infinitum Assume agents know the graph structure. Example: interval of length 4.

Utopia Network Learns Avg(X v ) Variance of this estimator is 1/n. Could be achieved if everyone was friends with everyone. Technical comments: This is both the ML estimator & Bayesian estimator with uniform prior on (-1,1)

Results For every connected network: The best estimator is reached within n 2 rounds where n = #nodes (DVZ) Convergence time can be improved to 2* n * diameter (MT) All computations are efficient (MT)

Pf: ML and Min Variance. Claim 1: At each iteration X v (t) = Bayes Estimator = Maximum Like estimator Moreover, X v (t) 2 L v (t), where L v (t)= span { X w (0),,X w (t-1) : w ~ v} X v (t) is a minimizer of {Var(X) : X 2 L v (t), E[X] = µ} Claim: Can be calculated efficiently

Pf: ML and Min Variance. Cor: Var(X v (t)) decreases with time Note: If X v (t) X u (t), dim of either L v or L u goes up by 1 (v ~ u) ) Converges in n 2 rounds. Claim: Weight that agent gives own estimator has to be at least 1/n (prove it!) ) converges to optimal estimator

Convergence in 2n*d steps Claim: If an agent u estimator X remains constant for 2*d steps t,t+1, t+2d then the process has converged. Pf: Let L = L u (t+2d) Let v be a neighbor of u. X t+1 (v), X t+2d-1 (v) 2 L. X 2 L v (t+1) So X t+1 (v) = = X t+2d-1 (v) = X If w is a neighbor of w then: X t+2 (v) = = X t+2d-2 (v) = X By induction at time t+d all estimators are X. Open: Is there a bound that depends only on d?

Some open problems When is learning achieved? e.g. positively correlated signals? General statements about convergence times? More models where convergence time can be estimated? Efficiency of computations?

Truncated information Why could we analyze the cases so far? A main feature was that agents declarations were martingales. A more difficult case is where agents declarations are more limited. Example: +/- actions / declarations. This will be discussed next lectures.