MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS

Similar documents
arxiv: v1 [math.pr] 18 Oct 2016

LECTURES ON MEAN FIELD GAMES: II. CALCULUS OVER WASSERSTEIN SPACE, CONTROL OF MCKEAN-VLASOV DYNAMICS, AND THE MASTER EQUATION

Control of McKean-Vlasov Dynamics versus Mean Field Games

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio

On continuous time contract theory

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio

Risk-Sensitive and Robust Mean Field Games

Production and Relative Consumption

Systemic Risk and Stochastic Games with Delay

MEAN FIELD GAMES AND SYSTEMIC RISK

Introduction to Mean Field Game Theory Part I

Mean-Field Games with non-convex Hamiltonian

Stochastic optimal control with rough paths

Explicit solutions of some Linear-Quadratic Mean Field Games

Thermodynamic limit and phase transitions in non-cooperative games: some mean-field examples

Solution of Stochastic Optimal Control Problems and Financial Applications

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

An Introduction to Moral Hazard in Continuous Time

Mean Field Games: Basic theory and generalizations

Interest Rate Models:

Introduction Optimality and Asset Pricing

On the Multi-Dimensional Controller and Stopper Games

Large-population, dynamical, multi-agent, competitive, and cooperative phenomena occur in a wide range of designed

Non-Zero-Sum Stochastic Differential Games of Controls and St

Leader-Follower Stochastic Differential Game with Asymmetric Information and Applications

Systemic Risk and Stochastic Games with Delay

Dynamic Bertrand and Cournot Competition

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so

Matthew Zyskowski 1 Quanyan Zhu 2

Prof. Erhan Bayraktar (University of Michigan)

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Deterministic Dynamic Programming

Asymptotic Perron Method for Stochastic Games and Control

A class of globally solvable systems of BSDE and applications

Weak Convergence of Numerical Methods for Dynamical Systems and Optimal Control, and a relation with Large Deviations for Stochastic Equations

From SLLN to MFG and HFT. Xin Guo UC Berkeley. IPAM WFM 04/28/17 Based on joint works with J. Lee, Z. Ruan, and R. Xu (UC Berkeley), and L.

Large Deviations Principles for McKean-Vlasov SDEs, Skeletons, Supports and the law of iterated logarithm

Lecture 4: Introduction to stochastic processes and stochastic calculus

Chapter 2 Event-Triggered Sampling

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Weak solutions to Fokker-Planck equations and Mean Field Games

Robust Markowitz portfolio selection. ambiguous covariance matrix

ON MEAN FIELD GAMES. Pierre-Louis LIONS. Collège de France, Paris. (joint project with Jean-Michel LASRY)

Game Theory Extra Lecture 1 (BoB)

Infinite-dimensional methods for path-dependent equations

Mean Field Control: Selected Topics and Applications

1. Stochastic Process

Mean Field Game Theory for Systems with Major and Minor Agents: Nonlinear Theory and Mean Field Estimation

Mean field games and related models

A numerical approach to the MFG

Optimal Execution Tracking a Benchmark

Ergodicity and Non-Ergodicity in Economics

Variational approach to mean field games with density constraints

Network Games with Friends and Foes

Multilevel Monte Carlo for Stochastic McKean-Vlasov Equations

Collusion between the government and the enterprise in pollution management

Linear-Quadratic Stochastic Differential Games with General Noise Processes

The Pennsylvania State University The Graduate School MODELS OF NONCOOPERATIVE GAMES. A Dissertation in Mathematics by Deling Wei.

Linear Filtering of general Gaussian processes

The Mathematics of Continuous Time Contract Theory

The Wiener Itô Chaos Expansion

DISCRETE-TIME STOCHASTIC MODELS, SDEs, AND NUMERICAL METHODS. Ed Allen. NIMBioS Tutorial: Stochastic Models With Biological Applications

Weakly interacting particle systems on graphs: from dense to sparse

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Ambiguity and Information Processing in a Model of Intermediary Asset Pricing

On a class of stochastic differential equations in a financial network model

Session 1: Probability and Markov chains

Dynamic Consistency for Stochastic Optimal Control Problems

Memory and hypoellipticity in neuronal models

Risk-Averse Control of Partially Observable Markov Systems. Andrzej Ruszczyński. Workshop on Dynamic Multivariate Programming Vienna, March 2018

Numerical methods for Mean Field Games: additional material

Stochastic Maximum Principle and Dynamic Convex Duality in Continuous-time Constrained Portfolio Optimization

Management of dam systems via optimal price control

minimize x subject to (x 2)(x 4) u,

Stochastic contraction BACS Workshop Chamonix, January 14, 2008

Alberto Bressan. Department of Mathematics, Penn State University

Stochastic Mean-field games and control

Gillespie s Algorithm and its Approximations. Des Higham Department of Mathematics and Statistics University of Strathclyde

Bernardo D Auria Stochastic Processes /12. Notes. March 29 th, 2012

Control Theory in Physics and other Fields of Science

Portfolio Optimization with unobservable Markov-modulated drift process

Linear ODEs. Existence of solutions to linear IVPs. Resolvent matrix. Autonomous linear systems

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

OPTIMAL CONTROL SYSTEMS

Partial Differential Equations with Applications to Finance Seminar 1: Proving and applying Dynkin s formula

A Concise Course on Stochastic Partial Differential Equations

Mean Field Game Systems with Common Noise and Markovian Latent Processes

Mean Field Consumption-Accumulation Optimization with HAR

Information and Credit Risk

arxiv: v4 [math.oc] 24 Sep 2016

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Chapter Introduction. A. Bensoussan

A new approach for investment performance measurement. 3rd WCMF, Santa Barbara November 2009

A random perturbation approach to some stochastic approximation algorithms in optimization.

Quadratic BSDE systems and applications

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Mean Field Games on networks

ON NON-UNIQUENESS AND UNIQUENESS OF SOLUTIONS IN FINITE-HORIZON MEAN FIELD GAMES

Formula Sheet for Optimal Control

Transcription:

MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS René Carmona Department of Operations Research & Financial Engineering PACM Princeton University CEMRACS - Luminy, July 17, 217

MFG WITH MAJOR AND MINOR PLAYERS SET-UP R.C. - G. Zhu, R.C. - P. Wang State equations { dxt = b (t, Xt, µ t, αt )dt + σ (t, Xt, µ t, αt )dwt dx t = b(t, X t, µ t, Xt, α t, αt )dt + σ(t, X t, µ t, Xt, α t, αt dw t, Costs { J (α, α) J(α, α) = E [ T f (t, Xt, µ t, αt )dt + g (XT, µ T ) ] = E [ T f (t, Xt, µn t, Xt, α t, αt )dt + g(x T, µ T ) ],

OPEN LOOP VERSION OF THE MFG PROBLEM The controls used by the major player and the representative minor player are of the form: α t = φ (t, W [,T ]), and α t = φ(t, W [,T ], W [,T ] ), (1) for deterministic progressively measurable functions φ : [, T ] C([, T ]; R d ) A and φ : [, T ] C([, T ]; R d ) C([, T ]; R d ) A

THE MAJOR PLAYER BEST RESPONSE Assume representative minor player uses the open loop control given by φ : (t, w, w) φ(t, w, w), Major player minimizes [ T ] J φ, (α ) = E f (t, Xt, µ t, αt )dt + g (XT, µ T ) under the dynamical constraints: dxt = b (t, Xt, µ t, αt )dt + σ (t, Xt, µ t, αt )dwt dx t = b(t, X t, µ t, Xt, φ(t, W [,T ], W [,T ]), αt )dt +σ(t, X t, µ t, Xt, φ(t, W [,T ], W [,T ]), αt )dw t, µ t = L(X t W [,t]) conditional distribution of Xt given W[,t]. Major player problem as the search for: φ, (φ) = arg inf α t =φ (t,w Optimal control of the conditional McKean-Vlasov type! [,T ] ) J φ, (α ) (2)

THE REP. MINOR PLAYER BEST RESPONSE System against which best response is sought comprises a major player a field of minor players different from the representative minor player Major player uses strategy α t = φ (t, W [,T ] ) Representative of the field of minor players uses strategy α t = φ(t, W [,T ], W [,T ]). State dynamics dxt = b (t, Xt, µ t, φ (t, W [,T ] ))dt + σ (t, Xt, µ t, φ (t, W [,T ] ))dw t dx t = b(t, X t, µ t, X t, φ(t, W [,T ], W [,T ]), φ (t, W [,T ] ))dt +σ(t, X t, µ t, X t, φ(t, W [,T ], W [,T ]), φ (t, W [,T ] ))dwt, where µ t = L(X t W [,t]) is the conditional distribution of Xt given W[,t]. Given φ and φ, SDE of (conditional) McKean-Vlasov type

THE REP. MINOR PLAYER BEST RESPONSE (CONT.) Representative minor player chooses a strategy α t = φ(t, W [,T ], W [,T ]) to minimize J φ,φ (ᾱ) = E [ T f (t, X t, X t, µ t, ᾱ t, φ (t, W [,T ]))dt + g(x T, µ t) ], where the dynamics of the virtual state X t are given by: dx t = b(t, X t, µ t, X t, φ(t, W [,T ], W [,T ] ), φ (t, W [,T ]))dt + σ(t, X t, µ t, X t, φ(t, W [,T ], W [,T ] ), φ (t, W [,T ]))dw t, for a Wiener process W = (W t) t T independent of the other Wiener processes. Optimization problem NOT of McKean-Vlasov type. Classical optimal control problem with random coefficients φ (φ, φ) = arg inf J φ α t =φ(t,w [,T ],W [,T ] ),φ (ᾱ)

NASH EQUILIBRIUM Search for Best Response Map Fixed Point ( ˆφ, ˆφ) = ( φ, ( ˆφ), φ ( ˆφ, ˆφ) ). Fixed point in a space of controls, not measures!!!

CLOSED LOOP VERSIONS OF THE MFG PROBLEM Closed Loop Version Controls of the major player and the representative minor player are of the form: α t = φ (t, X [,T ], µ t), and α t = φ(t, X [,T ], µ t, X [,T ]), for deterministic progressively measurable functions φ : [, T ] C([, T ]; R d ) P 2 (R d ) A and φ : [, T ] C([, T ]; R d ) P 2 (R d ) C([, T ]; R d ) A. Markovian Version Controls of the major player and the representative minor player are of the form: α t = φ (t, X t, µ t), and α t = φ(t, X t, µ t, X t ), for deterministic feedback functions φ : [, T ] R d P 2 (R d ) A and φ : [, T ] R d P 2 (R d ) R d A.

NASH EQUILIBRIUM Search for Best Response Map Fixed Point ( ˆφ, ˆφ) = ( φ, ( ˆφ), φ ( ˆφ, ˆφ) ).

CONTRACT THEORY: A STACKELBERG VERSION R.C. - D. Possamaï - N. Touzi State equation dx t = σ(t, X t, ν t, α t)[λ(t, X t, ν t, α t)dt + dw t], X t Agent output α t agent effort (control) ν t distribution of output and effort (control) of agent Rewards {J (ξ) = E [ U P (X [,T ], ν T, ξ) ] J(ξ, α) = E [ T f (t, Xt, νt, αt)dt + U A(ξ) ], Given the choice of a contract ξ by the Principal Each agent in the field of exchangeable agents chooses an effort level α t meets his/her reservation price get the field of agents in a (MF) Nash equilibrium Principal chooses the contract to maximize his/her expected utility

LINEAR QUADRATIC MODELS State dynamics { dx t = (L Xt + B αt + F Xt)dt + D dwt dx t = (LX t + Bα t + F X t + GX t )dt + DdW t where X t = E[X t F t ], (F t ) t filtration generated by W Costs [ T J (α, α) = E [(Xt H Xt η ) Q (X t H Xt η ) + α t ] R αt ]dt [ T ] J(α, α) = E [(X t HXt H 1 Xt η) Q(X t HXt H 1 Xt η) + α t Rαt]dt in which Q, Q, R, R are symmetric matrices, and R, R are assumed to be positive definite.

EQUILIBRIA Open Loop Version Optimization problems + fixed point = large FBSDE affine FBSDE solved by a large matrix Riccati equation Closed Loop Version Fixed point step more difficult Search limited to controls of the form αt = φ (t, Xt, X t) = φ (t) + φ 1(t)Xt + φ 2(t) X t α t = φ(t, X t, Xt, X t) = φ (t) + φ 1 (t)x t + φ 2 (t)xt + φ 3 (t) X t Optimization problems + fixed point = large FBSDE affine FBSDE solved by a large matrix Riccati equation Solutions are not the same!!!!

APPLICATION TO BEE SWARMING V,N t velocity of the (major player) streaker bee at time t V i,n t Linear dynamics { the velocity of the i-th worker bee, i = 1,, N at time t Minimization of Quadratic costs [ J T = E dv,n t = α t dt + Σ dwt dv i,n t = α i t dt + ΣdW i t ( λ V,N t ν t 2 + λ 1 V,N t V ] t N 2 + (1 λ λ 1 ) α t 2) dt N V t := 1 N N i=1 V i,n t the average velocity of the followers, deterministic function [, T ] t νt R d (leader s free will) λ and λ 1 are positive real numbers satisfying λ + λ 1 1 [ J i T ( = E l V i,n t V,N t 2 + l 1 V i,n t l and l 1, l + l 1 1. V ] t N 2 + (1 l l 1 ) α i t 2) dt

SAMPLE TRAJECTORIES IN EQUILIBRIUM ν(t) := [ 2π sin(2πt), 2π cos(2πt)] k =.8 k1 =.19 l =.19 l1 =.8 k =.8 k1 =.19 l =.8 l1 =.19 1. 1..5.5 y y...5 1..5..5 x 1..5..5 x FIGURE: Optimal velocity and trajectory of follower and leaders

SAMPLE TRAJECTORIES IN EQUILIBRIUM ν(t) := [ 2π sin(2πt), 2π cos(2πt)] k =.19 k1 =.8 l =.19 l1 =.8 k =.19 k1 =.8 l =.8 l1 =.19 2. 2. 1.5 1.5 y 1. y 1..5.5....5 1. 1.5 2. x.5..5 1. 1.5 x FIGURE: Optimal velocity and trajectory of follower and leaders

CONDITIONAL PROPAGATION OF CHAOS N = 5 1 N = 1 1 N = 2 1.8.8.8.6.6.6.4.4.4.2.2.2.2.2.2.4.4.4.6.6.6.8.8.8 1 N = 5 1 1 N = 1 1 1.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8 1 FIGURE: Conditional correlation of 5 followers velocities 1

FINITE STATE SPACES: A CYBER SECURITY MODEL Kolokolstov - Bensoussan, R.C. - P. Wang N computers in a network (minor players) One hacker / attacker (major player) Action of major player affect minor player states (even when N >> 1) Major player feels only µ N t the empirical distribution of the minor players states Finite State Space: each computer is in one of 4 states protected & infected protected & sucseptible to be infected unprotected & infected unprotected & sucseptible to be infected Continuous time Markov chain in E = {DI, DS, UI, US} Each player s action is intended to affect the rates of change from one state to another to minimize expected costs [ T ] J(α, α) = E (k D 1 D + k I 1 I )(X t )dt [ T J (α (, α) = E f (µ t ) + k H φ (µ t ) ) ] dt

FINITE STATE MEAN FIELD GAMES State Dynamics (X t ) t continuous time Markov chain in E with Q-matrix (q t (x, x )) t,x,x E. Mean Field structure of the Q-matrix Control Space q t (x, x ) = λ t (x, x, µ, α) A R k, sometime A finite, e.g. A = {, 1}, or a function space Control Strategies in feedback form α t = φ(t, X t ), for some φ : [, T ] E A Mean Field Interaction through Empirical Measures µ P(E) = (µ({x})) x E Kolmogorov-Fokker-Planck equation: if µ t = L(X t ) t µ t ({x}) = [L µ t,φ(t, ), t µ t ]({x}) = µ t ({x })ˆq µ t,φ(t, ) t (x, x), x E = µ t ({x })λ t (x, x, µ t, φ(t, x)) x E, x E

FINITE STATE MEAN FIELD GAMES: OPTIMIZATION Hamiltonian Hamiltonian minimizer Minimized Hamiltonian HJB Equation H(t, x, µ, h, α) = λ t (x, x, µ, α)h(x ) + f (t, x, µ, α). x E ˆα(t, x, µ, h) = arg inf H(t, x, µ, h, α), α A H (t, x, µ, h) = inf H(t, x, µ, h, α) = H(t, x, µ, h, ˆα(t, x, µ, h)). α A t u µ (t, x) + H (t, x, µ t, u µ (t, )) =, t T, x E, with terminal condition u µ (T, x) = g(x, µ T ).

TRANSITION RATES Q-MATRICES For α = λ t (,, µ, α, ) = DI DS UI US DI q D rec DS α qinf D + β DDµ({DI}) + β UD µ({ui}) UI qrec U US α qinf U + β UUµ({UI}) + β DU µ({di}) and for α = 1: λ t (,, µ, α, 1) = DI DS UI US DI q Drec λ DS α qinf D + β DDµ({DI}) + β UD µ({ui}) λ UI λ qrec U US λ α qinf U + β UUµ({UI}) + β DU µ({di}) where all the instances of should be replaced by the negative of the sum of the entries of the row in which appears on the diagonal.

E QUILIBRIUM D ISTRIBUTION OVER T IME WITH C ONSTANT ATTACKER 2 4 6 time t 8 1.2.4 µ(t).6.8 µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us]..2.4 µ(t).6.8 µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us]...2.4 µ(t).6.8 µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] 1. Time evolution of the state distribution µ(t) 1. Time evolution of the state distribution µ(t) 1. Time evolution of the state distribution µ(t) 2 4 6 time t 8 1 2 4 6 time t 8 1

EQUILIBRIUM OPTIMAL FEEDBACK φ(t, ) Time evolution of the optimal feedback function φ(t)[di] Time evolution of the optimal feedback function φ(t)[ds] Time evolution of the optimal feedback function φ(t)[ui] Time evolution of the optimal feedback function φ(t)[us] φ(t)[di]..2.4.6.8 1. φ(t)[ds]..2.4.6.8 1. φ(t)[ui]..2.4.6.8 1. φ(t)[us]..2.4.6.8 1. 2 4 6 8 1 2 4 6 8 1 2 4 6 8 1 2 4 6 8 1 time t time t time t time t From left to right φ(t, DI), φ(t, DS), φ(t, UI), and φ(t, US).

CONVERGENCE MAY BE ELUSIVE From left to right, time evolution of the distribution µ t for the parameters given in the text, after 1, 5, 2, and 1 iterations of the successive solutions of the HJB equation and the Kolmogorov Fokker Planck equation.

THE MASTER EQUATION EQUATION t U + H (t, x, µ, U(t,, µ)) + x E h (t, µ, U(t,, µ))(x U(t, x, µ) ) µ({x =, }) where the R E -valued function h is defined on [, T ] P(E) R E by: h ( ) (t, µ, u) = λ t x,, µ, ˆα(t, x, µ, u) dµ(x) E = x E λ t ( x,, µ, ˆα(t, x, µ, u) ) µ({x}). System of Ordinary Differential Equations (ODEs) If and when the Master equation is solved t µ t ({x}) = h (t, µ t, U(t,, µ t ))(x)

µ t EVOLUTION FROM THE MASTER EQUATION Time evolution of the state distribution µ(t) Time evolution of the state distribution µ(t) µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] 2 4 6 8 1 2 4 6 8 1 time t time t As before, we used the initial conditions µ : µ = (.25,.25,.25,.25) in the left and µ = (1,,, ) on the right.

IN THE PRESENCE OF A MAJOR (HACKER) PLAYER Time evolution of the state distribution µ(t) Time evolution of the state distribution µ(t) Time evolution of the state distribution µ(t) µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] 2 4 6 8 1 2 4 6 8 1 2 4 6 8 1 time t time t time t Time evolution in equilibrium, of the distribution µ t of the states of the computers in the network for the initial condition µ : µ = (.25,.25,.25,.25) when the major player is not rewarded for its attacks, i.e. when f (µ) (leftmost pane), in the absence of major player and v = (middle plot), and with f (µ) = k (µ({ui}) + µ({di})) with k =.5 (rightmost plot).

POA BOUNDS FOR CONTINUOUS TIME MFGS Price of Anarchy Bounds compare Social Welfare for NE to what a Central Planer would achieve Koutsoupias-Papadimitriou Usual Game Model for Cyber Security Zero-Sum Game between Attacker and Network Manager Compute Expected Cost to Network for Protection MFG Model for Cyber Security Let the individual computer owners take care of their security Hope for a Nash Equilibrium Compute Expected Cost to Network for Protection How much worse the NE does is the PoA

POA BOUNDS FOR CONTINUOUS TIME MFGS WITH FINITE STATE SPACES X t = (Xt 1,, X t N ) state at time t, with Xt i {e 1,, e d } Use distributed feedback controls, for state to be a continuous time Markov Chain Dynamics given by Q-matrices (q t (x, x ) t,x,x E Empirical measures µ N x = 1 N d δ N x i = p l δ el i=1 l=1 where p l = #{i; 1 i N, x i = e l }/N is the proportion of elements x i of the sample which are equal to e l. Cost Functionals Player i minimizes: [ T J i (α 1,, α N ) = E f (t, Xt i, α i X i t ) dt + g(x T i, µn 1 ) ], t X i T

SOCIAL COST If the N players use distributed Markovian control strategies of the form α i t = φ(t, X i t ) we define the cost (per player) to the system as the quantity J (N) φ J (N) φ In the limit N the social cost should be lim N J(N) φ = lim 1 N N 1 = lim N N = 1 N J i (α 1,, α N ) N j=1 N J i (α 1,, α N ) j=1 N [ T ] E f (t, Xt i X t, φ(t, Xt i T i, µn X T ), j=1 [ T = lim N < f (t,, µ N X t, φ(t, )), µ N X t > dt+ < g(, µ N X T ), µ N X T > if we use the notation < ϕ, ν > for the integral ϕ(z)ν(dz). Now if µ N X t converge toward a deterministic µ t, the social cost becomes: T SC φ (µ) = < f (t,, µ t, φ(t, )), µ t > dt + < g(, µ T ), µ T >, ], (3)

ASYMPTOTIC REGIME N = Two alternatives φ is the optimal feedback function for a MFG equilibrium for which µ is the equilibrium flow of statistical distributions of the state, in which case we use the notation SC MFG for SC φ (µ); [ T ] E f (t, X t, µ t, φ(t, X t ))dt + g(x T, µ T ) T = < f (t,, µ t, φ(t, )), L(X t ) > dt+ < g(, µ T ), L(X T ) > = SC φ (µ) = SC MFG in equilibrium φ is the feedback (chosen by a central planner) minimizing the social cost SC φ (µ) without having to be an MFG Nash equilibrium, in which case we use the notation SC MKV for SC φ (µ); T ˆφ = arg inf < f (t,, µ t, φ(t, )), µ t > dt+ < g(, µ T ), µ t > φ where µ t satisfies Kolmogorov-Fokker-Planck forward dynamics

POA: SOCIAL COST COMPUTATION Minimize T < f (t,, µ t, φ(t, )), µ t > dt+ < g(, µ T ), µ t > under the dynamical constraint t µ t ({x}) = [L µ t,φ(t, ), t µ t ]({x}) = µ t ({x })λ(t, x, x, µ t, φ(t, x)), x E, x E ODE in the d-dimensional probability simplex S d R d!!! Hamiltonian H(t, µ, ϕ, φ) by minimized Hamiltonian: H(t, µ, ϕ, φ) =< ϕ, [L µ,φ, t µ] > + < f (t,, µ, φ( )), µ > Assume infimum is attained for a unique ˆφ: =< L µ,φ t ϕ + f (t,, µ, φ( )), µ > H (t, µ, ϕ) = inf H(t, µ, ϕ, φ). φ Ã H (t, µ, ϕ) = H(t, µ, ϕ, ˆφ(t, µ, ϕ)) =< L µ, ˆφ(t,µ,ϕ) t ϕ + f (t,, µ, ˆφ(t, µ, ϕ)( )), µ >. HJB equation t v(t, µ) + H (t, µ, δv(t, µ) ) =, v(t, µ) =< g(, µ), µ >. δµ

REMARKS ON DERIVATIVES W.R.T. MEASURES Standard identification P(E) µ p = (p 1,, p d ) S d via µ p = (p 1,, p d ) with p i = µ({e i }) for i = 1,, d i.e. µ = d i=1 p i δ ei δv/δµ when v is defined on an open neighborhood of the probability simplex S d. v(t, µ)/ µ({x }) is the derivative of v with respect to the weight µ({x }). Important Remark L µ,φ t ϕ does not change if we add a constant to the function ϕ, so does ˆφ(t, µ, ϕ). Consequence (for numerical purposes): We can identify t v(t, µ) + H ( ( v(t, µ) t, µ, µ({x }) v(t, µ) v(t, µ) µ({x }) µ({x}), v(t, x, µ) ) ) =. µ({x}) x E for x x, with the partial derivative of v(t, ) with respect to µ({x }) whenever v(t, ) is regarded as a smooth function of the (d 1) tuple (µ({x })) x E\{x}, which we can see as an element of the (d 1)-dimensional domain S d 1, = {(p 1,, p d 1 ) [, 1] d 1 : d 1 p i 1} i=1

MFGS OF TIMING WITH MAJOR AND MINOR PLAYERS The Example of Corporate Bonds Major Player = bond issuer Bond is Callable Major Player (issuer) chooses a stopping time to pay-off the investors stop coupon payments to the investors refinance his debt with better terms Minor Players = field of investors Bond is Convertible Each Minor Player (investor) chooses a stopping time at which to convert the bond certificate in a fixed number (conversion ratio) of stock shares if and when owning the stock is more profitable creating Dilution of the stock