Synchronization of coupled oscillators is a game. Prashant G. Mehta 1

Size: px

Start display at page:

Download "Synchronization of coupled oscillators is a game. Prashant G. Mehta 1"

Elijah Simpson
5 years ago
Views:

1 CSL COORDINATED SCIENCE LABORATORY Synchronization of coupled oscillators is a game Prashant G. Mehta 1 1 Coordinated Science Laboratory Department of Mechanical Science and Engineering University of Illinois at Urbana-Champaign GERAD seminar, April 20, 2010 Acknowledgment: AFOSR, NSF

2 Huibing Yin Sean P. Meyn Uday V. Shanbhag H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, Synchronization of coupled oscillators is a game, ACC 2010 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

3 Acknowledgement Thanks to Minyi Huang, Roland P. Malhame, and Peter Caines Minyi Huang Roland P. Malhame Peter Caines M. Huang, P. Caines, and R. Malhame, Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-nash equilibria, IEEE TAC, 2007 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

4 Millennium bridge Video of London Millennium bridge from youtube [11] S. H. Strogatz et al., Nature, 2005 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

5 Classical Kuramoto model dθ i (t) = ( ω i + κ N N j=1 sin(θ j (t) θ i (t)) ) dt + σ dξ i (t), i = 1,...,N ω i taken from distribution g(ω) over [1 γ,1 + γ] γ measures the heterogeneity of the population κ measures the strength of coupling [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

6 Classical Kuramoto model dθ i (t) = ( ω i + κ N N j=1 sin(θ j (t) θ i (t)) ) dt + σ dξ i (t), i = 1,...,N ω i taken from distribution g(ω) over [1 γ,1 + γ] γ measures the heterogeneity of the population κ measures the strength of coupling [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

7 Classical Kuramoto model dθ i (t) = ( ω i + κ N N j=1 sin(θ j (t) θ i (t)) ) dt + σ dξ i (t), i = 1,...,N ω i taken from distribution g(ω) over [1 γ,1 + γ] γ measures the heterogeneity of the population κ measures the strength of coupling [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

8 Classical Kuramoto model dθ i (t) = ( ω i + κ N N j=1 sin(θ j (t) θ i (t)) ) dt + σ dξ i (t), i = 1,...,N ω i taken from distribution g(ω) over [1 γ,1 + γ] γ measures the heterogeneity of the population κ measures the strength of coupling κ 0.3 Synchrony Locking R κ < κ c (γ) Incoherence γ [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

9 Movies of incoherence and synchrony solution Incoherence Synchrony P. G. Mehta (UIUC) GERAD Apr. 20, / 75

10 Problem statement Dynamics of i th oscillator dθ i = (ω i + u i (t))dt + σ dξ i, i = 1,...,N, t 0 u i (t) control i th oscillator seeks to minimize 1 T η i (u i ;u i ) = lim E[ c(θ i ;θ i ) + 1 T T 0 }{{} 2 Ru2 i ]ds }{{} cost of anarchy cost of control θ i = (θ j ) j i R control penalty c( ) cost function c(θ i ;θ i ) = 1 N j i c (θ i,θ j ), c 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

11 Problem statement Dynamics of i th oscillator dθ i = (ω i + u i (t))dt + σ dξ i, i = 1,...,N, t 0 u i (t) control i th oscillator seeks to minimize 1 T η i (u i ;u i ) = lim E[ c(θ i ;θ i ) + 1 T T 0 }{{} 2 Ru2 i ]ds }{{} cost of anarchy cost of control θ i = (θ j ) j i R control penalty c( ) cost function c(θ i ;θ i ) = 1 N j i c (θ i,θ j ), c 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

12 Problem statement Dynamics of i th oscillator dθ i = (ω i + u i (t))dt + σ dξ i, i = 1,...,N, t 0 u i (t) control i th oscillator seeks to minimize 1 T η i (u i ;u i ) = lim E[ c(θ i ;θ i ) + 1 T T 0 }{{} 2 Ru2 i ]ds }{{} cost of anarchy cost of control θ i = (θ j ) j i R control penalty c( ) cost function c(θ i ;θ i ) = 1 N j i c (θ i,θ j ), c 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

13 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

14 Quiz Motivation Why a game? In the video you just watched, why were the individuals walking strangely? A. To show respect to the Queen. B. Anarchists in the crowd were trying to destabilize the bridge. C. They were stepping to the beat of the soundtrack "Walk Like an Egyptian." D. The individuals were trying to maintain their balance. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

15 Quiz Motivation Why a game? In the video you just watched, why were the individuals walking strangely? A. To show respect to the Queen. B. Anarchists in the crowd were trying to destabilize the bridge. C. They were stepping to the beat of the soundtrack "Walk Like an Egyptian." D. The individuals were trying to maintain their balance. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

16 Quiz Motivation Why a game? In the video you just watched, why were the individuals walking strangely? A. To show respect to the Queen. B. Anarchists in the crowd were trying to destabilize the bridge. C. They were stepping to the beat of the soundtrack "Walk Like an Egyptian." D. The individuals were trying to maintain their balance. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

17 Quiz Motivation Why a game? In the video you just watched, why were the individuals walking strangely? A. To show respect to the Queen. B. Anarchists in the crowd were trying to destabilize the bridge. C. They were stepping to the beat of the soundtrack "Walk Like an Egyptian." D. The individuals were trying to maintain their balance. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

18 Quiz Motivation Why a game? In the video you just watched, why were the individuals walking strangely? A. To show respect to the Queen. B. Anarchists in the crowd were trying to destabilize the bridge. C. They were stepping to the beat of the soundtrack "Walk Like an Egyptian." D. The individuals were trying to maintain their balance. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

19 Rational irrationality Motivation Why a game? behavior that, on the individual level, is perfectly reasonable but that, when aggregated in the marketplace, produces calamity. Examples Millennium bridge Financial market John Cassidy, Rational Irrationality: The real reason that capitalism is so crash-prone, The New Yorker, 2009 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

20 Rational irrationality Motivation Why a game? behavior that, on the individual level, is perfectly reasonable but that, when aggregated in the marketplace, produces calamity. Examples Millennium bridge Financial market John Cassidy, Rational Irrationality: The real reason that capitalism is so crash-prone, The New Yorker, 2009 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

21 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

22 Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dv dt = g T m 2 (V) h (V E T ) g h r (V E h )... dh dt = h (V) h τ h (V) dr dt = r (V) r τ r (V) Voltage Neural spike train time [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

23 Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dv dt = g T m 2 (V) h (V E T ) g h r (V E h )... dh dt = h (V) h τ h (V) dr dt = r (V) r τ r (V) Voltage Neural spike train time [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

24 r Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dv dt = g T m 2 (V) h (V E T ) g h r (V E h )... dh dt = h (V) h τ h (V) dr dt = r (V) r τ r (V) Voltage Neural spike train time r Limit cyle h h V 0 v [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

25 r Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dv dt = g T m 2 (V) h (V E T ) g h r (V E h )... dh dt = h (V) h τ h (V) dr dt = r (V) r τ r (V) Voltage Neural spike train time r Limit cyle Normal form reduction h h V 0 v θ i = ω i + u i Φ(θ i ) [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

26 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

27 Problems and results Finite oscillator model Problem statement Dynamics of i th oscillator dθ i = (ω i + u i (t))dt + σ dξ i, i = 1,...,N, t 0 u i (t) control i th oscillator seeks to minimize 1 T η i (u i ;u i ) = lim E[ c(θ i ;θ i ) + 1 T T 0 }{{} 2 Ru2 i ]ds }{{} cost of anarchy cost of control θ i = (θ j ) j i R control penalty c( ) cost function c(θ i ;θ i ) = 1 N j i c (θ i,θ j ), c 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

28 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

29 Problems and results Main results 1. Synchronization is a solution of game R 1/ Synchrony Locking γ R > R c (γ) Incoherence γ dθ i = (ω i + u i )dt + σ dξ i 1 T η i (u i ;u i ) = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]ds 0 Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

30 Problems and results Main results 1. Synchronization is a solution of game R 1/ Synchrony Locking κ 0.3 Synchrony Locking 0.25 R γ R > R c (γ) Incoherence γ κ < κ c (γ) Incoherence γ dθ i = (ω i + u i )dt + σ dξ i 1 T η i (u i ;u i ) = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]ds 0 ( ) dθ i = ω i + κ N N sin(θ j θ i ) dt + σ dξ i j=1 Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

31 Problems and results Main results 2. Kuramoto control is approximately optimal 6 k = 0.01; R = Population Density 0 π 2 π θ u i = A i R 1 N sin(θ θ j (t)) j i Control laws ω = 1 Kuramoto A * A i t Learning algorithm: da i = ε... dt Yin et.al. CDC 2010 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

32 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

33 Derivation of model Overview of model derivation Overview dθ i = (ω i + u i (t))dt + σ dξ i 1 η i (u i ;u i ) = lim T T T 0 E[ c(θ i,t) Ru2 i ]ds Mass Influence Influence 1 Mean-field approximation Assumption: N c(θ i ;θ i (t)) = 1 N c (θ i,θ j ) c(θ,t) j i 2 Optimal control of single oscillator Decentralized control structure [5] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM] P. G. Mehta (UIUC) GERAD Apr. 20, / 75

34 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

35 Derivation of model Derivation steps Single oscillator with given cost Dynamics of the oscillator dθ i = (ω i + u i (t))dt + σ dξ i, t 0 The cost function is assumed known 1 T η i (u i ; c) = lim E[ c(θ i ;θ i ) + 1 T T 2 Ru2 i (s)]ds 0 c(θ i (s),s) HJB equation: t h i + ω i θ h i = 1 2R ( θ h i ) 2 c(θ,t) + η i σ θθ h i Optimal control law: u i (t) = ϕ i (θ,t) = 1 R θ h i (θ,t) [1] D. P. Bertsekas (1995); [9] S. P. Meyn, IEEE TAC, 1997 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

36 Derivation of model Derivation steps Single oscillator with optimal control Dynamics of the oscillator ( dθ i (t) = ω i 1 ) R θ h i (θ i,t) dt + σ dξ i (t) Fokker-Planck equation for pdf p(θ,t,ω i ) FPK: t p + ω i θ p = 1 R θ [p( θ h)] + σ θθ p [7] A. Lasota and M. C. Mackey, Chaos, Fractals and Noise, Springer 1994 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

37 Derivation of model Mean-field Approximation Derivation steps HJB equation for population t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η(ω) σ θθ h h(θ,t,ω) Population density t p + ω θ p = 1 R θ [p( θ h)] + σ 2 Enforce cost consistency 2π c(θ,t) = c (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω Ω 0 1 N j i c (θ,ϑ) 2 2 θθ p p(θ,t,ω) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

38 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

39 Summary Derivation of model PDE model HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π Mean-field approx.: c(ϑ, t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 1 Bellman s optimality principle (H,J,B) 2 Propagation of chaos (F,P,K, Mckean, Vlasov,... ) 3 Mean-field approximation (Boltzmann, Kac,... ) 4 Connection to Nash game (Weintraub, HCM, Altman,... ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

40 Summary Derivation of model PDE model HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π Mean-field approx.: c(ϑ, t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 1 Bellman s optimality principle (H,J,B) 2 Propagation of chaos (F,P,K, Mckean, Vlasov,... ) 3 Mean-field approximation (Boltzmann, Kac,... ) 4 Connection to Nash game (Weintraub, HCM, Altman,... ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

41 Summary Derivation of model PDE model HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π Mean-field approx.: c(ϑ, t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 1 Bellman s optimality principle (H,J,B) 2 Propagation of chaos (F,P,K, Mckean, Vlasov,... ) 3 Mean-field approximation (Boltzmann, Kac,... ) 4 Connection to Nash game (Weintraub, HCM, Altman,... ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

42 Summary Derivation of model PDE model HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π Mean-field approx.: c(ϑ, t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 1 Bellman s optimality principle (H,J,B) 2 Propagation of chaos (F,P,K, Mckean, Vlasov,... ) 3 Mean-field approximation (Boltzmann, Kac,... ) 4 Connection to Nash game (Weintraub, HCM, Altman,... ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

43 Summary Derivation of model PDE model HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π Mean-field approx.: c(ϑ, t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 1 Bellman s optimality principle (H,J,B) 2 Propagation of chaos (F,P,K, Mckean, Vlasov,... ) 3 Mean-field approximation (Boltzmann, Kac,... ) 4 Connection to Nash game (Weintraub, HCM, Altman,... ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

44 Derivation of model PDE model 1. Solution of PDE gives ε-nash equilibrium Optimal control law ε-nash property (as N ) u o i = 1 R θ h(θ(t),t,ω) ω=ωi η i (u o i ;u o i) η i (u i ;u o i) + O( 1 N ), i = 1,...,N. So, we look for solutions of PDEs. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

45 Derivation of model PDE model 1. Solution of PDE gives ε-nash equilibrium Optimal control law ε-nash property (as N ) u o i = 1 R θ h(θ(t),t,ω) ω=ωi η i (u o i ;u o i) η i (u i ;u o i) + O( 1 N ), i = 1,...,N. So, we look for solutions of PDEs. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

46 Derivation of model PDE model 1. Solution of PDE gives ε-nash equilibrium Optimal control law ε-nash property (as N ) u o i = 1 R θ h(θ(t),t,ω) ω=ωi η i (u o i ;u o i) η i (u i ;u o i) + O( 1 N ), i = 1,...,N. So, we look for solutions of PDEs. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

47 Derivation of model PDE model 2. Incoherence solution (PDE) Incoherence solution h(θ,t,ω) = h 0 (θ) := 0 p(θ,t,ω) = p 0 (θ) := 1 2π incoherence h(θ,t,ω) = 0 t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h t p + ω θ p = 1 R θ [p( θ h)] + σ θθ p 2π c(θ,t) = c (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω Ω 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

48 Derivation of model PDE model 2. Incoherence solution (PDE) Incoherence solution h(θ,t,ω) = h 0 (θ) := 0 p(θ,t,ω) = p 0 (θ) := 1 2π incoherence h(θ,t,ω) = 0 t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h p(θ,t,ω) = 1 2π tp + ω θ p = 1 R θ [p( θ h)] + σ θθ p 2π c(θ,t) = c (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω Ω 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

49 Derivation of model PDE model 2. Incoherence solution (PDE) ( ) ϑ θ Assume c (ϑ,θ) = c (ϑ θ) = 1 2 sin2 2 Incoherence solution h(θ,t,ω) = h 0 (θ) := 0 p(θ,t,ω) = p 0 (θ) := 1 2π Optimal control u = 1 R θ h = 0 Average cost 2π ( ) 1 θ ϑ 1 c(θ,t) = 2 sin2 g(ω)dϑ dω Ω 0 2 2π η (ω) = c(θ,t) = 1 4 =: η 0 for all ω Ω incoherence soln. No cost of control P. G. Mehta (UIUC) GERAD Apr. 20, / 75

50 Derivation of model PDE model 2. Incoherence solution (Finite population) Closed-loop dynamics dθ i = (ω i + u }{{} i )dt + σ dξ i (t) =0 Average cost 1 T η i = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]dt 0 }{{} =0 1 = lim T N 1 T ( E[ 1 θi (t) θ j (t) j i T 2 sin2 0 2 = 1 N 2π ( E[ 1 θi (t) ϑ 2 sin2 j i 0 2 ) ]dt ) ] 1 2π dϑ = N N η incoherence 1 ε-nash property η i (u o i ;u o i) η i (u i ;u o i) + O( 1 N ), i = 1,...,N. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

51 Derivation of model PDE model 2. Incoherence solution (Finite population) Closed-loop dynamics dθ i = (ω i + u }{{} i )dt + σ dξ i (t) =0 Average cost 1 T η i = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]dt 0 }{{} =0 1 = lim T N 1 T ( E[ 1 θi (t) θ j (t) j i T 2 sin2 0 2 = 1 N 2π ( E[ 1 θi (t) ϑ 2 sin2 j i 0 2 ) ]dt ) ] 1 2π dϑ = N N η incoherence 1 ε-nash property η i (u o i ;u o i) η i (u i ;u o i) + O( 1 N ), i = 1,...,N. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

52 Derivation of model PDE model 3. Synchronization is a solution of game R 1/ Synchrony Locking η(ω) η(ω) = η0 c R < Rc ω= 0.95 ω= 1 ω= 1.05 γ R > R c (γ) 0.15 Incoherence γ R > Rc 0.15 η(ω) < η R 1/ 2 dθ i = (ω i + u i )dt + σ dξ i 1 T η i (u i ;u i ) = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]ds 0 η(ω) = minη i (u i ;u o u i i) t = Synchrony solution of P. G. Mehta (UIUC) GERAD Apr. 20, / 75

53 Derivation of model PDE model 3. Synchronization is a solution of game R 1/ Synchrony Locking η(ω) η(ω) = η0 incoherence soln. c R < Rc ω= 0.95 ω= 1 ω= 1.05 γ R > R c (γ) 0.15 Incoherence γ R > Rc 0.15 η(ω) < η R 1/ 2 dθ i = (ω i + u i )dt + σ dξ i 1 T η i (u i ;u i ) = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]ds 0 η(ω) = minη i (u i ;u o u i i) t = Synchrony solution of P. G. Mehta (UIUC) GERAD Apr. 20, / 75

54 Derivation of model PDE model 3. Synchronization is a solution of game γ R 1/ 2 Synchrony Locking R > R c (γ) 0.15 Incoherence γ η(ω) R > Rc η(ω) = η0 R < Rc η(ω) < η R 1/ 2 c ω= 0.95 ω= 1 ω= 1.05 synchrony soln. dθ i = (ω i + u i )dt + σ dξ i 1 T η i (u i ;u i ) = lim E[c(θ i ;θ i ) + 1 T T 2 Ru2 i ]ds 0 η(ω) = minη i (u i ;u o u i i) t = Synchrony solution of P. G. Mehta (UIUC) GERAD Apr. 20, / 75

55 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

56 Analysis of phase transition Overview of the steps Incoherence solution HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π c(ϑ,t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 ( ) ϑ θ Assume c (ϑ,θ) = c (ϑ θ) = 1 2 sin2 2 Incoherence solution h(θ,t,ω) = h 0 (θ) := 0 p(θ,t,ω) = p 0 (θ) := 1 2π P. G. Mehta (UIUC) GERAD Apr. 20, / 75

57 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

58 Analysis of phase transition Linearization and spectra Bifurcation analysis Linearized PDE (about incoherence solution) ( t z(θ,t,ω) = ω θ h c σ 2 2 θθ 2 h ) ω θ p + 1 2πR θθ 2 h + σ 2 2 θθ 2 p =: L R z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S (k) } + k= S (k) := {λ C λ = ± σ 2 } 2 k2 kωi for all ω Ω 2 Discrete spectrum Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

59 Analysis of phase transition Linearization and spectra Bifurcation analysis Linearized PDE (about incoherence solution) ( t z(θ,t,ω) = ω θ h c σ 2 2 θθ 2 h ) ω θ p + 1 2πR θθ 2 h + σ 2 2 θθ 2 p =: L R z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S (k) } + k= S (k) := {λ C λ = ± σ 2 } 2 k2 kωi for all ω Ω 2 Discrete spectrum Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

60 Analysis of phase transition Linearization and spectra Bifurcation analysis Linearized PDE (about incoherence solution) ( t z(θ,t,ω) = ω θ h c σ 2 2 θθ 2 h ) ω θ p + 1 2πR θθ 2 h + σ 2 2 θθ 2 p =: L R z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S (k) } + k= S (k) := {λ C λ = ± σ 2 } 2 k2 kωi for all ω Ω 2 Discrete spectrum imag γ = 0.1 R decreases k=1 k=1 k=2 k= real Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

61 Analysis of phase transition Linearization and spectra Bifurcation analysis Linearized PDE (about incoherence solution) ( t z(θ,t,ω) = ω θ h c σ 2 2 θθ 2 h ) ω θ p + 1 2πR θθ 2 h + σ 2 2 θθ 2 p =: L R z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S (k) } + k= S (k) := {λ C λ = ± σ 2 } 2 k2 kωi for all ω Ω 2 Discrete spectrum imag γ = 0.1 R decreases k=1 k=1 k=2 k= real Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

62 Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) Stability proof [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

63 Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) Stability proof Bifurcation point (a) imag Cont. spectrum; ind. of R Disc. spectrum; fn. of R real [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

64 Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R Ω g(ω) (λ σ ωi)(λ + dω + 1 = 0. σ ωi) Stability proof Bifurcation point (a) ) R R > Rc(γ) (c) Incoherence imag Cont. spectrum; ind. of R Disc. spectrum; fn. of R real Synchrony γ [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

65 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

66 Analysis of phase transition Numerical solution of PDEs Numerics incoherence Incoherence; R = 60 incoherence P. G. Mehta (UIUC) GERAD Apr. 20, / 75

67 Analysis of phase transition Numerical solution of PDEs Numerics incoherence Incoherence; R = 60 incoherence synchrony Synchrony; R = 10 synchrony P. G. Mehta (UIUC) GERAD Apr. 20, / 75

68 Bifurcation diagram Analysis of phase transition Numerics R 1/ Synchrony Locking η(ω) 0.25 R < Rc η(ω) < η0 ω = 0.95 ω = 1 ω = γ R > R c (γ) 0.15 Incoherence γ 0.15 R > Rc η(ω) = η R 1/2 dθ i = (ω i + u i )dt + σ dξ i 1 η i (u i ;u i ) = lim T T T 0 E[c(θ i ;θ i ) Ru2 i ]ds P. G. Mehta (UIUC) GERAD Apr. 20, / 75

69 Bifurcation diagram Analysis of phase transition Numerics R 1/ Synchrony Locking η(ω) 0.25 incoherence soln. R < Rc η(ω) < η0 ω = 0.95 ω = 1 ω = γ R > R c (γ) 0.15 Incoherence γ 0.15 R > Rc η(ω) = η R 1/2 dθ i = (ω i + u i )dt + σ dξ i 1 η i (u i ;u i ) = lim T T T 0 E[c(θ i ;θ i ) Ru2 i ]ds P. G. Mehta (UIUC) GERAD Apr. 20, / 75

70 Bifurcation diagram Analysis of phase transition Numerics R 1/ Synchrony Locking η(ω) 0.25 R < Rc η(ω) < η0 ω = 0.95 ω = 1 ω = 1.05 γ 0.2 R > R c (γ) 0.15 Incoherence γ R > Rc η(ω) = η synchrony soln. R 1/2 dθ i = (ω i + u i )dt + σ dξ i 1 η i (u i ;u i ) = lim T T T 0 E[c(θ i ;θ i ) Ru2 i ]ds P. G. Mehta (UIUC) GERAD Apr. 20, / 75

71 p R = θ Analysis of phase transition Numerics Bifurcation diagram with another cost c (θ,ϑ) = 0.25(1 cos(θ ϑ)) R 1/2 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

72 p R = θ Analysis of phase transition Numerics Bifurcation diagram with another cost c (θ,ϑ) = 0.25(1 cos(θ ϑ)) R = p R 1/ θ P. G. Mehta (UIUC) GERAD Apr. 20, / 75

73 p R = θ Analysis of phase transition Numerics Bifurcation diagram with another cost c (θ,ϑ) = 0.25(1 cos(θ ϑ)) c (θ,ϑ) = 1 cos(θ ϑ) cos3(θ ϑ) R 1/ R 1/2 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

74 p R = θ Analysis of phase transition Numerics Bifurcation diagram with another cost c (θ,ϑ) = 0.25(1 cos(θ ϑ)) c (θ,ϑ) = 1 cos(θ ϑ) cos3(θ ϑ) p R 1/ θ P. G. Mehta (UIUC) GERAD Apr. 20, / 75

75 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

76 Learning Comparison to Kuramoto law Q-function approximation Control law u = ϕ(θ,t,ω) = 1 R θ h(θ,t,ω) Population Density Control laws ω = 0.95 ω = 1 ω = π 2π θ Equivalent control law in Kuramoto oscillator u (Kur) i = κ N N j=1 sin(θ j (t) θ i ) N κ 0 sin(ϑ 0 + t θ i ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

77 Learning Comparison to Kuramoto law Q-function approximation Control law u = ϕ(θ,t,ω) = 1 R θ h(θ,t,ω) Population Density π 2 π θ Control laws ω = 0.95 ω = 1 ω = 1.05 Kuramoto Equivalent control law in Kuramoto oscillator u (Kur) i = κ N N j=1 sin(θ j (t) θ i ) N κ 0 sin(ϑ 0 + t θ i ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

78 Learning Q-function approximation Optimality equation Optimal control law u i = 1 R θ h i (θ,t) Parameterization: min{c(θ;θ i (t)) + 1 u i 2 Ru2 i + D ui h i (θ,t)} = ηi }{{} =: H i (θ,u i ;θ i (t)) Kuramoto law u (Kur) i = κ N j i sin(θ i θ j (t)) H (A i,φ i ) i (θ,u i ;θ i (t)) = c(θ;θ i (t))+ 1 2 Ru2 i +(ω i 1+u i )A i S (φ i) + σ 2 2 A ic (φ i) where S (φ) (θ,θ i ) = 1 N sin(θ θ j φ), C (φ) (θ,θ i ) = 1 j i N cos(θ θ j φ) j i Approx. optimal control: u (A i,φ i ) i = argmin{h (A i,φ i ) i (θ,u i ;θ i (t))} = A i u i R S(φ i) (θ,θ i ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

79 Learning Q-function approximation Optimality equation Optimal control law u i = 1 R θ h i (θ,t) Parameterization: min{c(θ;θ i (t)) + 1 u i 2 Ru2 i + D ui h i (θ,t)} = ηi }{{} =: H i (θ,u i ;θ i (t)) Kuramoto law u (Kur) i = κ N j i sin(θ i θ j (t)) H (A i,φ i ) i (θ,u i ;θ i (t)) = c(θ;θ i (t))+ 1 2 Ru2 i +(ω i 1+u i )A i S (φ i) + σ 2 2 A ic (φ i) where S (φ) (θ,θ i ) = 1 N sin(θ θ j φ), C (φ) (θ,θ i ) = 1 j i N cos(θ θ j φ) j i Approx. optimal control: u (A i,φ i ) i = argmin{h (A i,φ i ) i (θ,u i ;θ i (t))} = A i u i R S(φ i) (θ,θ i ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

80 Learning Q-function approximation Optimality equation Optimal control law u i = 1 R θ h i (θ,t) Parameterization: min{c(θ;θ i (t)) + 1 u i 2 Ru2 i + D ui h i (θ,t)} = ηi }{{} =: H i (θ,u i ;θ i (t)) Kuramoto law u (Kur) i = κ N j i sin(θ i θ j (t)) H (A i,φ i ) i (θ,u i ;θ i (t)) = c(θ;θ i (t))+ 1 2 Ru2 i +(ω i 1+u i )A i S (φ i) + σ 2 2 A ic (φ i) where S (φ) (θ,θ i ) = 1 N sin(θ θ j φ), C (φ) (θ,θ i ) = 1 j i N cos(θ θ j φ) j i Approx. optimal control: u (A i,φ i ) i = argmin{h (A i,φ i ) i (θ,u i ;θ i (t))} = A i u i R S(φ i) (θ,θ i ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

81 Learning Q-function approximation Optimality equation Optimal control law u i = 1 R θ h i (θ,t) Parameterization: min{c(θ;θ i (t)) + 1 u i 2 Ru2 i + D ui h i (θ,t)} = ηi }{{} =: H i (θ,u i ;θ i (t)) Kuramoto law u (Kur) i = κ N j i sin(θ i θ j (t)) H (A i,φ i ) i (θ,u i ;θ i (t)) = c(θ;θ i (t))+ 1 2 Ru2 i +(ω i 1+u i )A i S (φ i) + σ 2 2 A ic (φ i) where S (φ) (θ,θ i ) = 1 N sin(θ θ j φ), C (φ) (θ,θ i ) = 1 j i N cos(θ θ j φ) j i Approx. optimal control: u (A i,φ i ) i = argmin{h (A i,φ i ) i (θ,u i ;θ i (t))} = A i u i R S(φ i) (θ,θ i ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

82 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

83 Learning Steepest descent algorithm Bellman error: Pointwise: Simple gradient descent algorithm ẽ(a i,φ i ) = 2 k=1 L (A i,φ i ) (θ,t) = min{h (A i,φ i ) u i } η (A i,φ i i L (A i,φ i ), ϕ k (θ) 2 da i = ε dẽ(a i,φ i ), dt da i Theorem (Convergence) dφ i dt = ε dẽ(a i,φ i ) dφ i Assume population is in synchrony. The i th oscillator updates according to ( ). Then A i (t) A = 1 2σ 2 The pointwise Bellman error L (A i,0) (θ,t) = ε(r)cos2(θ t) where ε(r) = 1 16Rσ 4 P. G. Mehta (UIUC) GERAD Apr. 20, / 75 i ) ( )

84 Phase transition Learning Steepest descent algorithm Suppose all oscillators use approx. optimal control law: u i = A R 1 N sin(θ i θ j (t)) j i then the phase transition boundary is { 1 if γ = 0 2σ R c (γ) = 4 ( ) 1 2γ 4σ 2 γ tan 1 if γ > 0 σ k = 0.01; R = PDE Learning 5 A * A i R Synchrony Incoherence t γ P. G. Mehta (UIUC) GERAD Apr. 20, / 75

85 Learning Comparison of control laws Steepest descent algorithm From PDE: u i = ϕ(θ,t,ω) = 1 ω=ωi R θ h(θ,t,ω) From learning: u i = A R 1 N sin(θ i θ j (t) φ i ) j i ω=ωi Population Density Population Density θ π 2 π 0 π 2 π θ Control laws PDE ω = 0.95 ω = 0.1 ω = 1.05 Learning ω = 0.95 ω = 0.1 ω = 1.05 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

86 Learning Comparison of average cost Steepest descent algorithm Average Cost PDE Learning ω = 0.95 ω = 0.1 ω = P. G. Mehta (UIUC) GERAD Apr. 20, / 75

87 Thank you! Website: Huibing Yin Sean P. Meyn Uday V. Shanbhag H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, Synchronization of coupled oscillators is a game, ACC 2010

88 Part I Apendix

89 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

90 Infinit population limit Appendix Game-theoretic results (ε-nash) Recall the PDE model t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h t p + ω θ p = 1 R θ [p( θ h)] + σ 2 2 θθ 2 p 2π c(ϑ,t) = c (ϑ,θ)p(θ,t,ω)g(ω)dθ dω Ω 0 Solution h(θ,t,ω) Control u o i (t) := 1 R θ h(θ,t,ω) ω=ωi By construction: η i (u o i ; c) η i (u i ; c) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

91 Finite population Appendix Game-theoretic results (ε-nash) Finite population cost [ c (N) 1 i (ϑ,t) := E N c ( θ i (t);θ i(t) o θ i (t) = ϑ )] j i θ i o = (θ1 o,...,θo i 1,θo i+1,...,θo N) with θi o (t) obtained from dynamics with control u i = u o i (t) Lemma 1: for each i = 1,...,N [ 1 T ] max lim sup c(ϑ,s) c (N) i (ϑ,s) ds ϑ [0,2π] T T 0 = O( 1 N ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

92 ε-nash equilibrium Appendix Game-theoretic results (ε-nash) Theorem The control {u o i (t) = 1 R θ h(θ(t),t,ω) ω=ωi } N i=1 equilibrium with respect to the finite population cost is an ε-nash η (POP) i (u i ;u o 1 T i) = lim T T 0 i.e., for any adapted control u i, c (N) i Ru2 i ds; η (POP) i (u o i ;u o i) η (POP) i (u i ;u o i) + O( 1 ), N i = 1,...,N. c (N) i (ϑ,t) := E[c(θ i (t);θ o i(t)) θ i (t) = ϑ] P. G. Mehta (UIUC) GERAD Apr. 20, / 75

93 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

94 Stability proof Appendix Stability dx = ax + by, a > 0 dt dy = cx + ay, bc < 0 dt FPK eqn HJB eqn with boundary conditions x(0) = x 0 and y( ) = 0 t x(t) = e at x 0 + b e a(t τ) y(τ)dτ, y(t) = c t 0 e a(t τ) x(τ)dτ. For x 0 = 0, equilibrium solution x(t) 0, y(t) 0 Characteristic equation σ 2 = a 2 + bc. Stable in the sense that The equilibrium solution is asymptotically stable if for any initial perturbation x 0, the solution x(t) 0 as t. Return to spectrum P. G. Mehta (UIUC) GERAD Apr. 20, / 75

95 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

96 Appendix Algorithm for solving PDEs Numerics HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π c(ϑ,t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 V(θ,t,ω) = h(θ,t s,ω) Iterative algorithm s Backward time Forward time t P. G. Mehta (UIUC) GERAD Apr. 20, / 75

97 Appendix Algorithm for solving PDEs Numerics HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π c(ϑ,t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 V(θ,t,ω) = h(θ,t s,ω) Iterative algorithm s Backward time Forward time t P. G. Mehta (UIUC) GERAD Apr. 20, / 75

98 Appendix Algorithm for solving PDEs Numerics HJB: t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h h(θ,t,ω) FPK: t p + ω θ p = 1 R θ [p( θ h )] + σ 2 2 θθ 2 p p(θ,t,ω) 2π c(ϑ,t) = c (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Ω 0 V(θ,t,ω) = h(θ,t s,ω) Iterative algorithm s Backward time Forward time t P. G. Mehta (UIUC) GERAD Apr. 20, / 75

99 Appendix Numerical solution of PDEs Numerics Incoherence; R = 60 incoherence ) R R > Rc(γ) (c) Incoherence 20 Synchrony γ p(θ,t,ω) at given position θ 0.2 γ = 0.01; Iter = γ = 0.01; Iter = γ = 0.01; Iter = p(θ,t,ω) p(θ,t,ω) p(θ,t,ω) t t t p(θ,t,ω) p(θ,t,ω) p(θ,t,ω) t t t Iter = 1 Iter=4 Iter=19 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

100 Appendix Numerical solution of PDEs Numerics synchrony Synchrony; R = 10 synchrony p(θ,t,ω) at given position θ 0.25 γ = 0.01; Iter = γ = 0.01; Iter = 4 1 γ = 0.01; Iter = 19 p(θ,t,ω) p(θ,t,ω) p(θ,t,ω) t t t 1 p(θ,t,ω) p(θ,t,ω) p(θ,t,ω) t t t Iter = 1 Iter=4 Iter=19 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

101 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

102 Linearization Appendix Linearization/Spectra ( t z(θ,t,ω) = Eigenvector problem λz = L R Z ω θ h c σ θθ h ω θ p + 1 2πR 2 θθ h + σ θθ p Fourier series expansion with respect to θ H = + k= H k (ω)e ikθ, P = + k= P k (ω)e ikθ Decomposition of the linear operator L R = k L (k) R ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

103 Linearization (cont.) Appendix Linearization/Spectra Individual operator ( σ 2 L (1) R := 2 ωi π 4 Ω g(ω)dω L (k) R := 1 2πR σ 2 2 ( ωi σ 2 2 k2 kωi 0 ) k2 2πR σ 2 2 k2 kωi L ( k) R = L (k) R Continuous spectrum {S (k) } + k= S (k) := {λ C λ = ± σ 2 } 2 k2 kωi for all ω Ω ), k 2 Discrete spectrum coincides with the discrete spectrum of L (±1) R P. G. Mehta (UIUC) GERAD Apr. 20, / 75

104 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

105 Neuron Appendix Modeling P. G. Mehta (UIUC) GERAD Apr. 20, / 75

106 Neuron Appendix Modeling Equivalent circuits P. G. Mehta (UIUC) GERAD Apr. 20, / 75

107 Finite oscillator model Appendix Modeling Dynamics of i th oscillator [10]: dθ i = (ω i + u i (t))dt + σ dξ i, i = 1,...,N, t 0 (1) θ i phase of the oscillator {ξ i } mutually independent Wiener processes ω i constant and chosen independently according to a fixed distribution with density g u i (t) control P. G. Mehta (UIUC) GERAD Apr. 20, / 75

108 Appendix Finite oscillator model (cont.) Modeling i th oscillator seeks to minimize η (POP) 1 T i (u i ;u i ) = lim E[ c(θ i ;θ i ) + 1 T T 0 }{{} 2 Ru2 i ]ds }{{} cost of anarchy cost of control θ i = (θ j ) j i R control penalty c( ) cost function c(θ i ;θ i ) = 1 N j i c (θ i,θ j ), c 0 {u i } N i=1 Nash equilibrium if u i minimizes η (POP) i (u i ;u i), for i = 1,...,N. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

109 Intuition Appendix Modeling i th oscillator seeks to minimize η (POP) 1 T i (u i ;u i ) = lim E[ c(θ i ;θ i ) + 1 T T 0 }{{} 2 Ru2 i ]ds }{{} cost of anarchy cost of control Trade-off between reducing the two costs Cost associated with θ i θ j : c(θ i ;θ i ) = 1 N j i c (θ i θ j ) Cost associated with control: 1 2 Ru2 i Qualitative macro-behavior change when R varies ) R R > Rc(γ) (c) Incoherence Synchrony P. G. Mehta (UIUC) GERAD Apr. 20, / 75 γ

110 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

111 PDE model (as N ) Appendix Multiple oscillators HJB equation obtain optimal control t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h (2) h(θ,t,ω) relative value function η (ω) average optimal cost Optimal feed-back control law ϕ(θ,t,ω) := 1 R θ h(θ,t,ω) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

112 PDE model (as N ) Appendix Multiple oscillators HJB equation obtain optimal control t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h (2) h(θ,t,ω) relative value function η (ω) average optimal cost Optimal feed-back control law ϕ(θ,t,ω) := 1 R θ h(θ,t,ω) Fokker-Planck-Kolmogorov (FPK) equation dθ i (t) = (ω i + u i (t))dt + σ dξ i, i = 1,...,N ) replaced by: t p + θ ((ω + ϕ )p = σ 2 2 θθ 2 p p(θ,t,ω) probability density function [10] Strogatz et al., Journal of Statistical Physics, 1991 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

113 PDE model (as N ) Appendix Multiple oscillators HJB equation obtain optimal control t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h (2) h(θ,t,ω) relative value function η (ω) average optimal cost Optimal feed-back control law ϕ(θ,t,ω) := 1 R θ h(θ,t,ω) Fokker-Planck-Kolmogorov (FPK) equation dθ i (t) = (ω i + u i (t))dt + σ dξ i, i = 1,...,N ) replaced by: t p + θ ((ω + ϕ )p = σ 2 2 θθ 2 p p(θ,t,ω) probability density function [10] Strogatz et al., Journal of Statistical Physics, 1991 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

114 PDE model (as N ) Appendix Multiple oscillators HJB equation obtain optimal control t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h (2) h(θ,t,ω) relative value function η (ω) average optimal cost Optimal feed-back control law ϕ(θ,t,ω) := 1 R θ h(θ,t,ω) FPK equation evolution of population phase angles t p + ω θ p = 1 R θ [p( θ h)] + σ θθ p (3) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

115 PDE model (as N ) Appendix Multiple oscillators HJB equation obtain optimal control t h + ω θ h = 1 2R ( θ h) 2 c(θ,t) + η σ θθ h (2) h(θ,t,ω) relative value function η (ω) average optimal cost Optimal feed-back control law ϕ(θ,t,ω) := 1 R θ h(θ,t,ω) FPK equation evolution of population phase angles t p + ω θ p = 1 R θ [p( θ h)] + σ 2 2 θθ 2 p (3) 2π Generate cost: c (ϑ,θ)p(θ,t,ω)g(ω)dθ dω Ω 0 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

116 Consistency Appendix Multiple oscillators Consistency requirement on deterministic mass influence 2π c(ϑ,t) = c (ϑ,θ)p(θ,t,ω)g(ω)dθ dω Ω 0 Recall: c(θ i ;θ i ) = 1 N j i c (θ i,θ j ) P. G. Mehta (UIUC) GERAD Apr. 20, / 75

117 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

118 Q-learning Appendix Learning Dynamics of i th d agent dt x i = a i x i + b i u i i th agent seeks to minimize Control u i = k i xx i k i zz T η i = lim e γs ( (x i (s) z(s)) 2 + u 2 i (s) ) ds T 0 (individual state) (ensemble state) Agent [8] P. Mehta and S. Meyn, CDC, 2009 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

119 Control comparison Appendix Learning PDE Learning ω = 0.95 ω = 0.1 ω = Control Amplitude Control Phase P. G. Mehta (UIUC) GERAD Apr. 20, / 75

120 6 Appendix Game-theoretic results (ε-nash) Stability Numerics Linearization/Spectra Modeling Multiple oscillators Learning The New Yorker excerpt 7 Bibliography

121 Rational irrationality Appendix The New Yorker excerpt The Prisoner s Dilemma is the obverse of Adam Smith s theory of the invisible hand, in which the free market coördinates the behavior of self-seeking individuals to the benefit of all. Each businessman intends only his own gain, Smith wrote in The Wealth of Nations, and he is in this, as in many other cases, led by an invisible hand to promote an end which was no part of his intention. But in a market environment the individual pursuit of self-interest, however rational, can give way to collective disaster. The invisible hand becomes a fist. John Cassidy, Rational Irrationality: The real reason that capitalism is so crash-prone, The New Yorker, 2009 P. G. Mehta (UIUC) GERAD Apr. 20, / 75

122 Bibliography Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Belmont, Massachusetts, Eric Brown, Jeff Moehlis, and Philip Holmes. On the phase reduction and response dynamics of neural oscillator populations. Neural Computation, 16(4): , M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle. Generic bifurcations of pendula. Int. Series Num. Math., 104: , J. Guckenheimer. Isochrons and phaseless sets. J. Math. Biol., 1: , Minyi Huang, Peter E. Caines, and Roland P. Malhame. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

123 Bibliography Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-nash equilibria. IEEE transactions on automatic control, 52(9): , Y. Kuramoto. International Symposium on Mathematical Problems in Theoretical Physics, volume 39 of Lecture Notes in Physics. Springer-Verlag, Andrzej Lasota and Michael C. Mackey. Chaos, Fractals and Noise. Springer, P. Mehta and S. Meyn. Q-learning and Pontryagin s Minimum Principle. To appear, 48th IEEE Conference on Decision and Control, December Sean P. Meyn. P. G. Mehta (UIUC) GERAD Apr. 20, / 75

124 Bibliography The policy iteration algorithm for average reward markov decision processes with general state space. IEEE Transactions on Automatic Control, 42(12): , December S. H. Strogatz and R. E. Mirollo. Stability of incoherence in a population of coupled oscillators. Journal of Statistical Physics, 63: , May Steven H. Strogatz, Daniel M. Abrams, Bruno Eckhardt, and Edward Ott. Theoretical mechanics: Crowd synchrony on the millennium bridge. Nature, 438:43 44, P. G. Mehta (UIUC) GERAD Apr. 20, / 75

(1) I. INTRODUCTION (2)

(1) I. INTRODUCTION (2) 920 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 4, APRIL 2012 Synchronization of Coupled Oscillators is a Game Huibing Yin, Member, IEEE, Prashant G. Mehta, Member, IEEE, Sean P. Meyn, Fellow,