Fast Convergence in Semi-Anonymous Potential Games

Size: px

Start display at page:

Download "Fast Convergence in Semi-Anonymous Potential Games"

Trevor Byrd
6 years ago
Views:

1 Fast Convergence in Semi-Anonymous Potential Games Holly Borowski, Jason R. Marden, and Eric W. Frew Abstract The log-linear learning algorithm has been extensively studied in both the game theoretic and distributed control literature. One of the central appeals with regards to log-linear learning for distributed control is that it often guarantees that the agents behavior will converge in probability to the optimal configuration. However, one of the central issues with log-linear learning for this purpose is that the worst case convergence time can be prohibitively long, e.g., exponential in the number of players. In this work, we formalize a modified log linear learning algorithm that exhibits a worst case convergence time that is roughly linear in the number of players. We prove this characterization in semi-anonymous potential games with limited populations. That is, potential games where the agents utility functions can be expressed as a function of aggregate behavior within each population. I. INTRODUCTION Game theoretic learning algorithms have gained traction as a powerful design tool for distributed control systems [6], [8], [5], [3], [5]. Here, a static game is repeated over time and agents are given opportunities to revise their strategies in response to their local objective functions and other agents behavior. The emergent behavior associated with such revision strategies has been studied extensively in the existing literature, e.g., fictitious play [5], [4], [6], regret matching [], loglinear learning [], [3], [], among many others. While many existing results characterize desirable asymptotic guarantees, the convergence times associated with these algorithms remain uncharacterized or have been shown to be prohibitively long [4], [0], [], []. Characterizing convergence rates is a key component of determining whether a game theoretic algorithm is desirable for system control. The class of games known as potential games [7] has received significant research attention with regards to distributed control. A game is a potential game if each agent s local objective function is aligned with some global system level objective function. While potential games may not naturally emerge This research was supported by AFOSR grants #FA and #FA , by ONR grant #N , and by the NASA Aeronautics Scholarship Program. H. Borowski is a graduate research assistant with the Department of Aerospace Engineering, University of Colorado, Boulder., holly.borowski@colorado.edu. J. R. Marden is with the Department of Electrical, Computer, and Energy Engineering, University of Colorado, Boulder, CO 80309, jason.marden@colorado.edu. Corresponding author. E. W. Frew is with the Department of Aerospace Engineering, University of Colorado, Boulder, CO 80309, eric.frew@colorado.edu. in many social systems, it is important to highlight that for engineering systems there are methods for designing local agent objective functions such that (i the resulting game is a potential game and (ii the optimal behavior corresponds to a (pure Nash equilibrium of this designed game [], [5], [4]. Consequently, there has been significant research aimed at deriving distributed learning algorithm which converge to this efficient Nash equilibria in potential games. One such learning algorithm that accomplishes this task is termed log-linear learning [3]. Log-linear learning can be interpreted as a perturbed, or noisy, best reply process where the individual agents commonly select the best action given their belief regarding the behavior of other agents. However, the agents will periodically make mistakes and select suboptimal actions with a probability that decays exponentially with regards to the potential payoff loss. As the noise levels tend to zero, the structure of this perturbation model ensures that the resulting process has a unique stationary distribution with full support on the efficient Nash equilibria. Thus, by designing the agents local objective functions appropriately, log-linear learning can be employed to derive distributed control laws with highly desirable asymptotic guarantees. Unfortunately, convergence rates associated with loglinear learning in its nominal form have been shown to be exponential in the game size []. This stems from the inherent tension between desirable asymptotic behavior and resulting convergence rates. This tension arises from the fact that small noise levels are necessary for ensuring that the bulk of the mass of the stationary distribution lies on the efficient Nash equilibria; however, small noise levels also make it difficult to leave inefficient Nash equilibria which ultimately slows convergence rates. Nonetheless, positive results regarding convergence rates of log-linear learning and its variants have recently begun to emerge for specific problem instantiations [3], [8], []. In [], the authors introduce a log-linear learning variant and show that for a class of potential games, i.e., congestion games with symmetric agents and singleton strategies, the convergence rates grow roughly linearly in the number of agents. Furthermore, similar positive results were derived in [8] which focuses on a different class of potential games: local interaction game with a sufficiently sparse interaction structure. Accordingly, the two dominant questions that emerge are the following: (i Do such positive results extend to broader classes of games? (ii Is it possible to design

2 systems to fit into one of these two classes? The focus of this paper is on the first question. In particular, we introduce a class of games termed semi-anonymous potential games. Such games can be parameterized by populations of agents where each agent s local objective function can be evaluated using only information regarding the agent s own decision and the aggregate behavior within each of the populations. The classical congestion game, as introduced in [], represents a semi-anonymous potential game where a population corresponds to a collection of agents each with a common set of paths connecting their source and destination. First, we introduce a variant of log-learning learning for semi-anonymous potential games that closely resembles the algorithm introduced in []. Next, we prove that for semi-anonymous potential games, the convergence rates associated with this algorithm grows roughly linearly in the number of agents provided that the number of populations is bounded. This result can be viewed as an extension of [] which focuses purely on the single population setting. II. PRELIMINARIES Consider a game with agents N {,,..., n} where each agent i N has a finite action set denoted by A i and a utility function U i : A R where A i N A i denotes the set of joint actions. We will frequently express an action profile a A as (a i, a i where a i (a,..., a i, a i+,..., a n denotes the collection of actions of all agents other than agent i. Similarly, we let A i A A i A i+ A n denote the action sets of all players excluding i. We will denote a game G by the tuple G (N, {A i } i N, {U i } i N. Definition : A game G is a semi-anonymous potential game if there exists a partition N (N, N,..., N m of N such that the following three conditions are satisfied: (i For any population N i N and agents i, j N i we have A i A j. Accordingly, we say population N i has action set A Ni {a N i, a N i,..., a si N i } where s i denotes the number of actions available to population N i. (ii The utility function of any agent i N can be expressed as a lower-dimensional function of the form U i : A i R where m and j ( v j n, v j n,..., v s j j : n s j k v k j N j for any j {,..., m}. (iii There exists a potential function φ : R such that for any a A and agent i N with action a i A i, ( U i(a U i(a i, a i φ(x(a φ(x(a i, a i, ( where x : A captures each population s aggregate behavior in a. More specifically, For brevity, we refer to G by G (N, {A i }, {U i }. x(a (x (a, x (a,..., x m (a where, for any j {,..., m}, x j(a j n ff j N i : a j a k N n i o k,...,s i If each agent i N is alone in its respective partition, the definition of semi-anonymous potential games is equivalent to that of exact potential games in [7]. Example : A congestion game is an example of a semi-anonymous potential game. Consider a congestion game with players N {,..., n} and roads R {r, r,..., r k }. Road r R has congestion function C r : + R, where C r (k is the congestion on road r with number of users k. Player i s action set is A i R, and he selects a set of roads in A i to minimize his cost of J i (a i, a i r a i C r ( {a} r, where {a} r is the set of players choosing road r R under joint action profile a. This game has potential function φ : A R φ(a r R {a} r k C r (k. (3 When player actions sets are asymmetric, a congestion game is an semi-anonymous potential game. In the case where player action sets are symmetric, the congestion game is an anonymous potential game. III. MAIN RESULTS Consider the following algorithm, termed modified log linear learning, which is a mild variant of the algorithm introduced in []. Let a(t represent the action profile at some time t 0. For any population N j N, each agent i N j has a Poisson clock of rate αn/zj i(t, where zj i(t {k N j : a k (t a i (t} ; hence, a player has a higher update rate if the player is not using a common action. If player i s clock ticks, he chooses action a i A i A Nj probabilistically according to Prob[a i (t + a i ] e βui(ai,a i(t a eβui(ai,a i(t i Ai e βφ(x(ai,a i(t, (4 a i Ai eβφ(x(a i,a i(t where a i (t + indicates the player s new action choice after the update and the parameter β dictates how likely a player is to choose a high payoff action; as β, payoff maximizing actions are chosen, and as β 0, players choose from their actions sets with uniform probability. Accordingly, the new action is of the form a(t + (a i (t +, a i (t. Update rates are the only variation between this algorithm and the continuous time implementation of standard logit response. In the standard logit response dynamics, all have a fixed Poisson clock of rate which yields n expected updates per second. The expected number of updates per second for the modified log

3 linear learning algorithm is lower bounded by mαn and upper bounded σαn; hence, the number of updates is on the order of the updates in the standard logic response dynamics for any given α. To achieve an expected update rate which is at least as fast as the standard log linear learning update rate, we may assume that α /m so that the expected number of updates per second n. Note that for any α, the proposed dynamics give rise to an ergodic, reversible Markov process. The following theorem extends the results of [] to semi-anonymous potential games. Theorem : Let G (N, {A i }, {U i } be a semi anonymous limited population potential game with populations {N, N,..., N m }. Define n i : N i, σ : m i A N i, and s : m i A N i. Let the potential function φ be such that φ(x [0, ] for all x. Further, let φ be λ-lipschitz, i.e., φ(x φ(y λ x y, x, y, (5 and let m + m i n i σ. For any (0, and any initial state, under modified log linear learning, if j 4m(s 4m(s β max log ms, log 8msλ ff (6 then E[φ(x(a(t] sup φ(x (7 x for all t ms c e 3β m(m(s n 4α log log(n + ms m + log β + log where c is a constant that depends only on s. This proof of Theorem is provided in Appendix D. The results in [] focus on a special class of semianonymous potential games with a single population, i.e., m. Consequently, the bounds given in [] are better than the bounds provided in Theorem resulting from the fact that the aggregate state space,, grows significantly with the number of populations. The impact of this larger state space is realized with the dependence of the mixing time bound on the number of populations, m, both directly and indirectly in the requirement for parameter β of Equation (6. However, for a fixed number of populations, the mixing grows as n log log n. IV. ILLUSTRATIVE EAMPLE We present an example which supports our theorem that as n increases, the time to convergence in modified log linear learning grows as Θ(n log log n. Consider a resource allocation game with resources R {r, r, r 3 } and agents N {,,..., n} distributed into two populations, N and N. Let n be even and suppose each population contains n/ agents. A player in population N may choose one resource from {r, r }, and a player in population N may choose one resource (8 from {r, r 3 }. A state x can be represented as a tuple of the form x (x, x, x 3, x 4 where nx and nx are the numbers of players from N choosing resources r and r respectively. Likewise, nx 3 and nx 4 are the numbers of players from N choosing resources r and r 3 respectively. For x (x, x, x 3, x 4, the welfare for each resource in R is given by W (x ex e, W (x ex +x 3 e, W 3(x e.5x 4 e. and the system welfare for state x is given by W (x W (x + W (x + W 3 (x. (9 By defining each agent s utility as its marginal contribution to the system welfare, i.e., ( ( U i (a i, a i W x(a i, a i W x(, a i, the resulting game will be a semi-anonymous potential game with potential function W. It is straightforward to verify that the allocation that optimizes the system welfare is when a i r for all i N, i.e., x opt (0, /, /, 0. We simulated our proposed algorithm with rate α /σ starting from an inefficient Nash equilibrium x ne (/, 0, 0, /. Using Theorem, we examine the time it takes for the total variation distance between the empirical frequency and the stationary distribution to drop below / 0.05, i.e., the smallest t such that ν(t π T V : ν x(t π x 0.05, (0 x where the empirical frequency for a given state x is defined by ν x(t t I{x(τ x}. t τ The indicator function I returns if the state x(τ is x and 0 otherwise, so that component ν x (t is the percentage of time from τ to τ t that the system has been in state x. Setting β 50 ensures that E π W (a max x W (a /, so that, when ν(t π T V /, E ν(t W (a max x W (a per Theorem. Simulation results are shown in Figure for an average over 00 simulations with n ranging from 4 to 00. Average convergence times are bounded below by 7n log log n and above by 00n log log n; when n > 3 average convergence rates are bounded above by n log log n. These results support our main theorem. V. CONCLUSION We have extended the results of [] to define dynamics for a class of semi-anonymous limited population

4 Time to reach within 0.05 of the stationary distribution Growth of convergence time with the number of players Average onvergence time over 00 simulations 7 n log log n n log log n 00 n log log n Number of players, n Fig. : Average convergence times with respect to number of players potential games whose player utility functions may be written as functions of aggregate behavior within each population. For games with a small number of fixed allowable actions and a small number of fixed populations, the time it takes to come arbitrarily close to a potential function maximizer is linear in the number of players. REFERENCES [] C. Alós-Ferrer and N. Netzer. The logit-response dynamics. Games and Economic Behavior, 68(:43 47, 00. [] G. Arslan, J. R. Marden, and J. S. Shamma. Autonomous vehicletarget assignment: a game theoretical formulation. ASME Journal of Dynamic Systems, Measurement and Control, 9(5: , 007. [3] L. E. Blume. The statistical mechanics of strategic interaction. Games and Economic Behavior, 993. [4] G. Ellison. Basins of attraction, long-run stochastic stability, and the speed of step-by-step evolution. The Review of Economic Studies, 000. [5] D. P. Foster and H. P. Young. On the Nonconvergence of Fictitious Play in Coordination Games. Games and Economic Behavior, 5(:79 96, October 998. [6] Michael J. Fox and Jeff S. Shamma. Communication, convergence, and stochastic stability in self-assembly. In 49th IEEE Conference on Decision and Control (CDC, December 00. [7] A. Frieze and R. Kannan. Log-Sobolev inequalities and sampling from log-concave distributions. The Annals of Applied Probability, 998. [8] T. Goto, T. Hatanaka, and M. Fujita. Potential game theoretic attitude coordination on the circle: Synchronization and balanced circular formation. 00 IEEE International Symposium on Intelligent Control, pages 34 39, September 00. [9] G. Grimmett and D. Stirzaker. Probability and Random Processes. Oxfort University Press, New York, 3 edition, 00. [0] S. Hart and Y. Mansour. How long to equilibrium? The communication complexity of uncoupled equilibrium procedures. Games and Economic Behavior, 69:07 6, May 00. [] S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5:7 50, 003. [] M. Kandori, G. J. Mailath, and R. Rob. Learning, Mutation, and Long Run Equilibria in Games. Econometrica, 6(:9 56, 993. [3] G. E. Kreindler and H. P. Young. Fast convergence in evolutionary equilibrium selection. 0. [4] J. R. Marden, G. Arslan, and J. S. Shamma. Joint strategy fictitious play with inertia for potential games [5] J. R. Marden and A. Wierman. Distributed Welfare Games. Operations Research, (007: 5, 008. [6] D. Monderer and L. S. Shapley. Fictitious Play Property for Games with Identical Interests. Journal of Economic Theory, (68:58 65, 996. [7] D. Monderer and L. S. Shapley. Potential games. Games and Economic Behavior, 43(December 988:4 43, 996. [8] A. Montanari and A. Saberi. The spread of innovations in social networks. Proceedings of the National Academy of Sciences, pages , 00. [9] R. Montenegro and P. Tetali. Mathematical Aspects of Mixing Times in Markov Chains, volume. Now Pub., 006. [0] J. R. Norris. Markov Chains. Cambridge University Press, Cambridge, 997. [] R. W. Rosenthal. A Class of Games Possessing Pure-Strategy Nash Equilibria. International Journal of Game Theory, pages 3, 973. [] D. Shah and J. Shin. Dynamics in Congestion Games. In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 00. [3] M. Staudigl. Stochastic stability in asymmetric binary choice coordination games. Games and Economic Behavior, 75(:37 40, May 0. [4] D. H. Wolpert and K. Tumer. Optimal Payoff Functions for Members of Collectives. Advances in Complex Systems, 4:65 79, 00. [5] M. hu and S. Martinez. Distributed coverage games for mobile visual sensors (I: Reaching the set of Nash equilibria. In Joint 48th IEEE Conference on Decision and Control and 8th Chinese Control Conference, 009. APPENDI A. Markov chain preliminaries The following Markov chain preliminaries are taken from [0], []. Let { k : k N} be a discrete time Markov chain over finite state space Ω with probability transition matrix M, so that the distribution of k at time k N satisfies µ(k µ(0m k ( A measure π on Ω is called a stationary distribution of M if π x 0, x Ω, and π πm ( A Markov chain with transition matrix M is irreducible if, for all x, y Ω, there exists k 0 such that M k (x, y > 0. The chain is aperiodic if, for all x, y Ω, gcd{k : M k (x, y > 0}. An irreducible and aperiodic Markov chain has a unique stationary distribution, π, such that lim k µ(0m k (x, y π for any initial distribution µ(0. Finally, M is reversible with distribution π if π x M(x, y π y M(x, y for all x, y Ω. If M is reversible with distribution π, then π is a stationary distribution for M [9]. A continuous time Markov chain, { t } t0, over finite state space Ω may be written in terms of a corresponding discrete time chain with transition matrix M [9]: µ(t µ(0e t(m I µ(0e t k0 t k M k, t 0 (3 k We refer to matrix M as the kernel of the process t. The following definitions and theorems regarding distances between measures and mixing times are taken

5 from [9], []. Let µ, ν be measures on finite state space Ω. Total variation distance is defined as µ ν T V : µ x ν x, (4 x Ω and the relative entropy (Kullback-Leibler divergence is defined as D(µ : ν : x Ω µ x log µ x ν x. (5 Then, D(µ : ν µ ν T V (6 For a continuous time Markov chain with kernel M and stationary distribution π, the distribution µ(t obeys D(µ(t : π e 4tρ(M D(µ(0 : π, t 0 (7 where ρ(m is the Sobolev constant of M, defined by with E(f, f : ρ(m : E(f, f inf f:ω R : L(f L(f 0 (8 (f(x f(y M(x, yπ x (9 x,y Ω L(f : E π log f E π f. (0 Here E π denotes the expectation with respect to distribution π. Then, for a Markov chain with initial distribution µ(0 and stationary distribution π, the total variation and relative entropy mixing times are T T V ( : min{ µ(t π } t ( T D ( : min{d(µ(t : π } t ( respectively. From [9], Corollary.6 and Remark., ( T D ( log log, (3 4ρ(M π min + log where π min : min x Ω π x. Applying (6, T T V ( T D ( 4ρ(M ( log log + log π min (4 Hence, a lower bound on the Sobolev constant ρ(m provides an upper bound on the mixing time for Markov chain. B. Markov chain defined by modified log linear learning The following Markov chain M over space is the kernel of the continuous time modified log linear learning process. Let e j i R si be the jth standard basis vector of length s i for j {,..., s i }. From x (x Ni.x N i, transition to the next state as follows: Choose a population N i {N, N,..., N m } with probability s i /σ Choose an action a j N i A Ni with probability /s i. If x j N i > 0, choose a player p {p N i : a p a j N i } uniformly at random to update according to (4, i.e., transition to (x Ni + n (ek i ej i, x N i with probability e βφ(x N i + n (ek i ej i,x N i si l eβφ(x N i + n (ei l ej i,x N i for each k {,,..., s i }. Using this to define transition probabilities in M, for any allowable transition from state x (x Ni, x N i to state y (x Ni + n (ek i ej i, x N i in which a player from population N i updates his action, M(x, y si σ s i {p N i : a p a j N i } p {p N i : a pa j N i } e βφ x Ni + n (ek i ej i,x N i σ P s i l eβφ(x N i + n (el i ej i,x N i e βφ x Ni + n (ek i ej i,x N i P si l eβφ(x N i + n (el i ej i,x N i (5 For a transition of any other form, M(x, y 0. Applying (3 to the chain with kernel M and global clock rate ασn, modified log linear learning evolves as µ(t µ(0e ασnt(m I. (6 C. Theorem supporting lemmas To prove Theorem we require three supporting lemmas. With Lemma we accomplish two tasks. First we establish the stationary distribution for modified log linear learning as a function of β. Second, we characterize how large β must be to ensure that the expected value of the potential function is within / of the potential function maximizer. Lemma establishes a lower bound on the Sobolev constant for the Markov chain M. A lower bound for the Sobolev constant associated with a given Markov chain provides an upper bound on its mixing time, as in Equation (4. Finally, Lemma 3 relates the Sobolev constant to the mixing time to within / of the stationary distribution for the continuous time modified log linear learning process. The lemmas and method of proof for Theorem follows the structure of the supporting lemmas and proof for Theorem 3 in []. Lemma : For the potential game G (N, A i, U i described in Section II with state space and potential function φ : R the stationary distribution under modified log linear learning is π x e βφ(x, x (7

6 Moreover, if φ is λ-lipschitz, i.e., φ(x φ(y λ x y, x, y, and j 4m(s β max log ms, then 4m(s log 8msλ ff (8 E π [φ(x] sup φ(x /. (9 x The proof of this lemma follows closely with the structure of the proof in [] of Lemma 6. Proof: Using Equation (5 with π x e βφ(x z eβφ(z, we see that π x M(x, y π y M(y, x for all x, y. Hence M is reversible with distribution π, implying that π is its stationary distribution. For the second part of the proof, define the following: L β E π[φ(x], C β x e βφ(x, x arg max φ(x, B(x, δ {x : x x δ} x where δ [0, ] is constant. Because π is of exponential form with normalization factor C β, the derivative of log C β with respect to β is L β. Moreover, L β is monotonically increasing in β; thus we may use the Mean Value Theorem as follows: L β β (log C β log C 0 β log C β β log x eβφ(x φ(x + β log x eβ(φ(x φ(x (a φ(x + β log x B(x,δ e βδλ φ(x + β log B(x, δ e βδλ φ(x δλ + ( B(x β log, δ where (a is from the fact that φ is λ-lipschitz and the definition of B(x, δ. Using intermediate results in the proof of Lemma 6 of [], B(x, δ and are bounded as follows: m ( si B(x δ(ni +, δ, and (30 ms i i m (n i + si i To see the lower bound for B(x, δ, begin with state x (x N i, x N i and perform the following steps to obtain a new state, x (x Ni, x N i. Step : For each i {,,..., m}, j {,,... s i } either 0 x j N i + δ(k+ n ms i n i /n for all k {0,,..., n i }. or 0 x j N i δ(k+ n ms i n i /n for all k {0,,..., n i }. Step : Performing operations satisfying step for each N i and each j {,,..., s i } and then adding or subtracting an appropriate to each x si N i so that s i k xk N i n i /n results in a state x satisfying x x δ. ( si The term δ(ni+ ms i counts the number of ways to do steps and within a single population N i. The product in (30 counts the number of ways to perform any variation of the above steps over all populations {N, N,..., N m }. This counting argument provides a lower bound on the size of B(x, δ as it clearly does not account for all states within the set. Now, L β φ(x δλ + ( B(x β log, δ si ( m δ(ni+ φ(x δλ + β log i ms i m i (n i + si φ(x δλ + m β log ( δ ms i φ(x δλ + i m(s β log si ( δ ms Consider two cases: (i λ /4, and (ii λ > /4. For case (i, choose δ and let β 4m(s log ms. Then, ( L β φ(x m(s δ δλ + log β ms φ(x m(s /4 log ms β In [], the size of the set B(x, δ for a single population is bounded by (δ(n + /s s, which we extend to m populations using the product above. Similarly, the size of the single population state space is bounded by (n+ s, which we extend to the multiple population state space.

7 φ(x m(s /4 log ms 4m(s log ms φ(x / For case (ii, note that λ > /4 δ π x e βφ(x /4λ < so we may choose δ /4λ. Let β 4m(s log ( π 8λms x y eβφ(y. Then ( L β φ(x m(s δ eβφ(x δλ + log y eβφ(y β ms φ(x m(s ( Since φ(x [0, ] for all x, /4 + log β 8λms ( e β φ(x m(s 8λms /4 4m(s log ( log 8λms so that φ(x / as desired. Lemma : For the Markov chain M defined in Section B, if φ : [0, ] and m + m i n i σ, then ρ(m e 3β c m(m(s n for some constant c which depends only on s. Proof: Using the technique of [], we compare the Sobolev constants for the chain M and a similar random walk on a convex set. We follow these steps: Step : Let M be the Markov chain M as defined in Appendix B with β 0. Let π be the stationary distribution of M. We use comparison techniques to lower bound the Sobolev constant ρ(m in terms of ρ(m, obtaining the bound ρ(m e 3β ρ(m. Step : Define M by restricting transitions within M to movement to and from certain holding actions. We use comparison techniques to lower bound ρ(m in terms of ρ(m, obtaining the bound ρ(m s ρ(m Step 3: We consider a random walk over a convex set C which is equivalent to M. For the random walk, we lower bound E(f, f and upper bound L(f, f for any function f : C R. This yields a lower bound on its Sobolev constant per (8, and hence on ρ(m. Step 4: Finally, we combine the string of inequalities to obtain a bound on the Sobolev constant ρ(m, ρ(m e 3β ρ(m e 3β s ρ(m e 3β As(σ m P e 3β m i n i c m(m(s n, where A is a constant, and c is a constant depending only on s. Step : M to M : Note that in M an updating agent chooses his next action uniformly at random. Per Equation (7 with β 0, the stationary distribution π of M is the uniform distribution over. Let x, y. We bound π x /πx and M(x, y/m (x, y to use Corollary 3.5 in [9]: y e0 e 0 e βφ(x eβ y eβφ(y e β π x πx e β (3 Similarly, for y (x Ni + n (ek i ej i, x N i, M(x, y M (x, y Since φ(x [0, ] for all x, s i e βφ(y si l eβφ(x N i + n (el i ej i,x N i e β s i e βφ(y si l eβφ(x N i + n (el i ej i,x N i eβ. (3 Then, for any x, y of the above form, e β M(x, y M (x, y eβ. For a transition to any y not of the form above, M(x, y M (x, y 0. Using this fact and Equations (3 and (3, we apply Corollary 3.5 in [9] to lower bound ρ(m : ρ(m e 3β ρ(m. (33 Step : M to M : Next, consider the Markov chain M on, where transitions from state x to y occur as follows: Choose a population N i with probability s i /σ Choose j {,..., s i } and choose κ {, }, each uniformly at random. If κ and x j N i > 0, then y (x Ni + n (ei s i e i j, x N i If κ and x si N i > 0, then y (x Ni + n (ei j e i s i, x N i That is, for each population N i, action a si i is treated as a holding bin; allowable transitions in M involve a single agent moving into or out of his population s holding bin. Since M (x, y M (y, x for any x, y, M is reversible with the uniform distribution over. Hence the stationary distribution is uniform, and π π. For a transition x to y in which an agent from population N i changes his action, M (x, y s i M (x, y,

8 implying M (x, y s M (x, y, x, y (34 since s s i, i {,..., m}. Using (34 and the fact that π π, we use Corollary 3.5 from [9] to bound ρ(m : ρ(m s ρ(m (35 Step 3: M to a random walk: The following random walk on ( C (z,..., z m σ m + : z i s i z j i ni, i +, si j is equivalent to M. For x C, transition to y as follows: Choose i [σ m] and κ {, }, each uniformly { at random. x + κei if x + κe y i C x otherwise The stationary distribution of this random walk is uniform, i.e., π x /, x C. Our aim is to lower bound the Sobolev constant, ρ(m, which, using the above steps, provides a lower bound on ρ(m and hence an upper bound for the mixing time of our modified log linear learning algorithm. We first recall the definition of the Sobolev constant for random walk M : ρ(m E(f, f : inf (36 f:c R : L(f L(f 0 Let g : C R to be an arbitrary function. To lower bound ρ(m, we will first lower bound E(g, g and then upper bound L(g. The ratio of these bounds then serves as a lower bound for the ratio E(g, g/l(g. Since this bound was established for an arbitrary function from C to R, it lower bounds the infimum over all such functions, i.e., the Sobolev constant. In the forthcoming analysis, we take advantage of a theorem due to [7] which applies to an extension of a given function g : C R to a function defined over the convex hull of C; here we formally define this extension. Let K be the convex hull of C. Given g : C R, we follow the procedure of [7], [] to extend g to a function g : K R. For x C, let C(x and C(x, be the σ m dimensional cubes of center x and sides and respectively. For sufficiently small > 0 and z C(x, define g : K R by: j g(x if z C(x, g (z (+η(zg(x+( η(zg(y otherwise (37 where y C is a point such that D C(x C(y is the closest face of C(x to z (if more than one y satisfy this condition, one such point may be chosen arbitrarily, and η dist(z,d [0,. The dist function represents standard Euclidean distance in R σ m. Define I : J : K K g (z dz (38 g (z log g(z vol(k dz. (39 RK g(y dy Applying Theorem of [7] for K R σ m with diameter m i n i, if m + m i n i σ, I J A m. (40 i n i We lower bound E(g, g in terms of I and then upper bound L(g in terms of J to obtain a lower bound on their ratio with Equation (40. Since these bounds were derived for an arbitrary function g : C R, the desired lower bound on the Sobolev constant follows. Lower bounding E(g, g in terms of I, I g (z dz K g (z dz C(x g(x dz C(x, + ( + η(zg(x + ( η(zg(y C(x\ C(x, C(x\ ( + η(zg(x y C: x y y C: x y y C: x y y C: x y y C: x y C(x, C(x\ C(x, C(x\ C(x, 0 y C: x y + ( η(zg(y dz g(x + g(y+ «dz dist(z, D(g(x g(y dz (dist(z, D(g(x g(y dz «g(x g(y dz C(x\ C(x, «g(x g(y ( ( σ m «g(x g(y + O(

9 (a 4 (σ m x,y C (σ m E(g, g + O(, where (a uses the facts that π x M (x, y { (σ m (g(x g(y π xm (x, y + O( and if x y, y C 0 otherwise Then, I 0 (σ me(g, g, and hence I E(g, g 0 (σ m Before bounding L(g in terms of J, we claim: vol(k σ m (σ m (4 (4 To see this, note that for each x C, vol(k C(x σ m (σ m, as the smallest possible volume for this intersection occurs at a corner []. Moreover, for any x, y C, either C(x C(y or the two cubes intersect on a set of measure zero. Therefore, vol(k σ m (σ m as desired. Now we upper bound L(g in terms of J, which was defined in Equation (39. Letting ν x vol(c(x K J K 0 g (z log g(z vol(k RK g(y dy dz g(x vol(c(x K vol(k, g(x vol(k log P y C g(y vol(c(y K vol(k g(x g(x ν x log. Py C g(y ν y This implies J vol(k (a 0 g(x ν x log g(x P y C g(y ν y ν x g(x g(x log P g(x y C g(y ν y " ν x «min inf π x g(x log g(x g(x π x c>0 c vol(k + c min vol(c(x K " (c ν x «min inf π x g(x log g(x g(x π x c>0 c + c σ m (σ m min " (d (e π x σ m (σ m ν x π x inf c>0 g(x log g(x c σ m (σ m min ν x π x " g(x log π x + # g(x π x (σ m vol(k(σ m # g(x + c # # g(x P g(x y C g(y π y π xg(x g(x log P y C g(y π y (σ m vol(k(σ m L(f where (a - (e are due to the following: (a The cubes C(x cover K and as 0, C(x, C(x, so that g (z g(x for z C(x (b By taking the derivative of this expression with respect to c we see that the minimum occurs at c g(x ν x (c Apply (4 and the fact that min vol(c(x K. (d This infimum occurs at c g(x π x as opposed to g(x ν x which minimized (b (e The smallest possible value of the intersection C(x K occurs at a corner, so min vol(c(x K σ m (σ m, and hence min νx σ m vol(k(σ m. π x + g(x ν x " (b inf ν x g(x log g(x c>0 c " ν x min inf π x c>0 g(x «+ c π x g(x log g(x c π x + c max ν x # # g(x «Then L(g 0 (σ m (σ m J (43

10 Step 4: Combining inequalities: Recall inequalities (40, (4, and (43: I J A m i n i I E(g, g 0 (σ m L(g 0 (σ m (σ m J. Using these and the fact that g : C R was chosen arbitrarily, E(f, f L(f f : C R. Therefore, (σ m A(σ m(σ m P m, (44 i n i ρ(m E(f, f min f:c R L(f (σ m A(σ m(σ m m (45 i n i Combining equations (33, (35, and (45 which related the Sobolev constants of the Markov chains M, M, and M, ρ(m e 3β ρ(m e 3β s ρ(m e 3β (σ m As(σ m(σ m P m i n i (a e 3β ms Asm(ms m(ms m n e 3β ms c m (m(s n. where c is a constant depending only on s. Inequality (a uses the facts that ms σ and mn m i n i. Lemma 3: For the Markov chain defined by modified log linear learning with kernel M and stationary distribution π, if m + m i n i m i s i, then for all t ms c e 3β m(m(s n 4α log log(n + ms m + log β + log where c c (s is a constant depending only on s, µ(t π T V. Proof: The proof of Lemma 3 follows the derivation method for Equations (3 and (5 in []. By applying Equation (4 to the continuous Markov chain defined by modified log linear learning, with kernel M as defined in Section B and update rate at least αmn, T T V ( 4αmnρ(M log log π min + log «ms c e 3β m(m(s n 4α log log + log «π min (a ms c e 3β m(m(s n 4α log log e β + log «(b ms c e 3β m(m(s n 4α my log log (n i + si + log β + log i ms c e 3β m(m(s n 4α log log(n + ms m + log β + log for a constant c depending only on s where (a and (b are due to the following: Py x eβφ(y e βφ(x (a We use the fact that π e β since φ(x [0, ], (b We use the fact that m i (n i + si. D. Proof of Theorem eβ e βφ(x We now use the above lemmas to prove Theorem in a similar manner as in the proof of Theorem 3 in []. Proof: From Lemma, if if φ is λ-lipschitz, i.e., φ(x φ(y λ x y, x, y, and { 4m(s β max log ms, then 4m(s E π[φ(x] sup φ(x /. x log 8msλ } From Lemma 3, if m + m i n i m i s i, and then Then t ms c e 3β m(m(s n 4α log log(n + ms m + log β + log µ(t π T V /. E[φ(x(a(t] E µ(t [φ(x] E π[φ(x] µ(t π T V max x φ(x (a max x φ(x max x φ(x where (a is from the fact that φ(x [0, ].

Understanding the Influence of Adversaries in Distributed Systems

Understanding the Influence of Adversaries in Distributed Systems Holly P. Borowski and Jason R. Marden Abstract Transitioning from a centralized to a distributed decision-making strategy can create vulnerability