Tracking Adversarial Targets

Similar documents
EE363 homework 1 solutions

Lecture 20: Riccati Equations and Least Squares Feedback Control

SUFFICIENT CONDITIONS FOR EXISTENCE SOLUTION OF LINEAR TWO-POINT BOUNDARY PROBLEM IN MINIMIZATION OF QUADRATIC FUNCTIONAL

An Introduction to Malliavin calculus and its applications

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

10. State Space Methods

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

The expectation value of the field operator.

Homogenization of random Hamilton Jacobi Bellman Equations

Chapter 3 Boundary Value Problem

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Homework 10 (Stats 620, Winter 2017) Due Tuesday April 18, in class Questions are derived from problems in Stochastic Processes by S. Ross.

DISCRETE GRONWALL LEMMA AND APPLICATIONS

Linear Quadratic Regulator (LQR) - State Feedback Design

t 2 B F x,t n dsdt t u x,t dxdt

System of Linear Differential Equations

Two Coupled Oscillators / Normal Modes

Exercises: Similarity Transformation

556: MATHEMATICAL STATISTICS I

Problem set 2 for the course on. Markov chains and mixing times

Solutions to Homework 8 - Math 3410

Math 315: Linear Algebra Solutions to Assignment 6

Linear Dynamic Models

Linear Response Theory: The connection between QFT and experiments

Concourse Math Spring 2012 Worked Examples: Matrix Methods for Solving Systems of 1st Order Linear Differential Equations

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

Oscillation of an Euler Cauchy Dynamic Equation S. Huff, G. Olumolode, N. Pennington, and A. Peterson

The equation to any straight line can be expressed in the form:

Instructor: Barry McQuarrie Page 1 of 5

Math 334 Fall 2011 Homework 11 Solutions

MATH 31B: MIDTERM 2 REVIEW. x 2 e x2 2x dx = 1. ue u du 2. x 2 e x2 e x2] + C 2. dx = x ln(x) 2 2. ln x dx = x ln x x + C. 2, or dx = 2u du.

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

Distance Between Two Ellipses in 3D

14 Autoregressive Moving Average Models

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Convergence of the Neumann series in higher norms

Let us start with a two dimensional case. We consider a vector ( x,

Empirical Process Theory

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

Fréchet derivatives and Gâteaux derivatives

2 Some Property of Exponential Map of Matrix

1 1 + x 2 dx. tan 1 (2) = ] ] x 3. Solution: Recall that the given integral is improper because. x 3. 1 x 3. dx = lim dx.

ψ(t) = V x (0)V x (t)

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

Lecture 1 Overview. course mechanics. outline & topics. what is a linear dynamical system? why study linear systems? some examples

Chapter 7: Solving Trig Equations

Chapter 2. First Order Scalar Equations

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

The General Linear Test in the Ridge Regression

ELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018

CHAPTER 2: Mathematics for Microeconomics

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Hamilton Jacobi equations

Y 0.4Y 0.45Y Y to a proper ARMA specification.

Laplace Transforms. Examples. Is this equation differential? y 2 2y + 1 = 0, y 2 2y + 1 = 0, (y ) 2 2y + 1 = cos x,

KEY. Math 334 Midterm III Winter 2008 section 002 Instructor: Scott Glasgow

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

4. The multiple use forestry maximum principle. This principle will be derived as in Heaps (1984) by considering perturbations

Challenge Problems. DIS 203 and 210. March 6, (e 2) k. k(k + 2). k=1. f(x) = k(k + 2) = 1 x k

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

non -negative cone Population dynamics motivates the study of linear models whose coefficient matrices are non-negative or positive.

Vanishing Viscosity Method. There are another instructive and perhaps more natural discontinuous solutions of the conservation law

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov

L p -L q -Time decay estimate for solution of the Cauchy problem for hyperbolic partial differential equations of linear thermoelasticity

1 Solutions to selected problems

Examples of Dynamic Programming Problems

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Sliding Mode Extremum Seeking Control for Linear Quadratic Dynamic Game

Notes on Kalman Filtering

Application 5.4 Defective Eigenvalues and Generalized Eigenvectors

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Lecture 10: The Poincaré Inequality in Euclidean space

BU Macro BU Macro Fall 2008, Lecture 4

Example on p. 157

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Solutions of Sample Problems for Third In-Class Exam Math 246, Spring 2011, Professor David Levermore

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

Supplementary Material: Accelerating Adaptive Online Learning by Matrix Approximation

Vehicle Arrival Models : Headway

THE GENERALIZED PASCAL MATRIX VIA THE GENERALIZED FIBONACCI MATRIX AND THE GENERALIZED PELL MATRIX

Representation of Stochastic Process by Means of Stochastic Integrals

6. Stochastic calculus with jump processes

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Echocardiography Project and Finite Fourier Series

Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem

Chapter 6. Systems of First Order Linear Differential Equations

2. Nonlinear Conservation Law Equations

Final Spring 2007

( ) = b n ( t) n " (2.111) or a system with many states to be considered, solving these equations isn t. = k U I ( t,t 0 )! ( t 0 ) (2.

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Tests of Nonlinear Resonse Theory. We compare the results of direct NEMD simulation against Kawasaki and TTCF for 2- particle colour conductivity.

Book Corrections for Optimal Estimation of Dynamic Systems, 2 nd Edition

Math 527 Lecture 6: Hamilton-Jacobi Equation: Explicit Formulas

translational component of a rigid motion appear to originate.

Transcription:

A. Proofs Proof of Lemma 3. Consider he Bellman equaion λ + V π,l x, a lx, a + V π,l Ax + Ba, πax + Ba. We prove he lemma by showing ha he given quadraic form is he unique soluion of he Bellman equaion. Le z x a z Ax + Ba Ax + Ba + c We guess a quadraic form for he value funcions wrie The above equaion has a soluion if We have ha P I A x 0 B +. a c λ + z P z + L z x g Qx g + a a + z P z + L z. P P 2 P 2 P 22 L L L 2 A I P B I A Q 0 B +, 0 I L + 2 0 c P I A B 2g Q 0, 2 λ g Qg + c P 22 c + L 2 c. I A B A BK <. This implies ha ieraive equaions 2 have a unique soluion. he quadraic form is he soluion of he Bellman equaion. Proof of Lemma 4. From Lemma 3, we have ha A I P B I A Q 0 P B + 0 I L L + 2 0 c P I A B 2g Q 0. Noice ha he value of P depends only on he values of A, B, K, which in urn, by Lemma 2, depend only on {K, P,..., P }. marix P is deermined by K independenly of he adversarial choices {g,..., g }. In he absence of adversarial vecors, he opimal policy has he form of πx x, where K I + B SB B SA S is he soluion of he Riccai equaion. Consider a problem where g g 2 0, c c 2 0, K K is he gain marix of he opimal policy. Then, V is he value funcion of he opimal policy. Because π 2 is he greedy policy wih respec o V, i is he opimal policy hus K 2 is also he gain marix of he opimal policy, so K 2 K. Repeaing he same argumen shows ha all gain marices are he same. if we choose K o be he opimal gain marix in he non-adversarial problem, we will ge K K hence P P 2 P. Proof of Lemma 7. Firs we prove i. Under policy π x x + c, we have ha x π, π x π Ax π + Bπ x π, π Ax π + Bπ x π.

by 7, Then 5 implies ha By Lemmas 2 4, c 2 P,22 λ x π g Qx π g + x π + c x π + c g Qg + c I + B I A + BK Q + K K I A + BK Bc + 2 g Q c K I A + BK Bc. L,2 2 g Q c K I A + BK B, P,22 I + B I A + BK Q + K K I A + BK B. L s,2. c P,22 L s,2 P,22 B I A + BK Qg s K c s D g s + H c s, 3 where H P,22 B I A + BK K. To obain a bound on max c from he above equaion, we need o show ha H is sufficienly smaller han one. Le N I A + BK, M K NB, L I + M M M. We have ha H I + B N Q + K K NB M I + B N K K NB M I + M M M L, 4 LL I + M M M MI + M M I + M M M M + I II + M M I + M M I I + M M. Because M M λmax M M, N / ρ, M M K 2 B 2 / ρ 2, we ge ha LL I + M M I I + M M By 4 he above inequaliy, we ge ha + M M + K 2 B 2 / ρ 2 K 2 B 2 / ρ 2 + K 2 B 2 / ρ 2. H L L λ max LL K B / ρ + K 2 B 2 / ρ 2. LL

Le v / H. We ge ha v K B / ρ + K 2 B 2 / ρ 2 + K 2 B 2 / ρ 2 + K 2 B 2 / ρ 2 K B / ρ + K 2 B 2 / ρ 2 + K 2 B 2 / ρ 2 + K B ρ H. Now we are ready o bound c. By 3, we ge ha for any, c D G + c s D G + H max c s. s max c D G + H max c hus, max c D G H D GH C. Proof of ii. Firs we wrie c in erms of c : c D g s + H c s Dg + Hc + 2 D 2 g s + H 2 c s 2 2 Dg + Hc + 2 c Dg + 2I + Hc. This implies ha c c Dg I Hc. Then we use he facs ha c C H < o obain c c D G + 2C. Proof of Lemma 8. Le f π : X X be he ransiion funcion under policy π K, c, i.e. f π x A BKx + Bc. Le ɛ k, x k x π ɛ x x π denoe he difference beween he sae variable he limiing sae under he chosen policy. We wrie 4 From his decomposiion, we ge ha ɛ k, f π k x k f π x k + f π x k x π f π k x k f π x k + f π x k f π x π. ɛ k, B c k c + f π x k f π x π B c k c + ρ x k x π B D G + 2C s + ρ x k x π. 4 A similar decomposiion, bu wih a differen norm, was used in Even-Dar e al., 2009, proof of Lemma 5.2. o bound he difference beween he saionary disribuion of he chosen policy he disribuion of he sae variable in a finie MDP problem. sk

ɛ B D G + 2C k B D G + 2C B D G + 2C ρ ρ k s + ρ x x π sk s ks ρ k B C + ρ ρ B C + ρ s ρ, where he second sep follows from Equaion 7, Lemma 7, he fac ha x 0. If > logt / log/ρ, we ge ha s s: / s + s:> >/ s log + s log/ρ log / log/ρ + log log +. log/ρ log / log/ρ ɛ B D G + 2C ρ B C + ρ ρ. + log + log log/ρ log / log/ρ To prove he second par of lemma, le u T logt / log/ρ. We have ha logt / log/ρ >u T u T >u T u T T + logt. 5 by 8 5, ɛ ɛ + ɛ u T >u T 4 B C ρ log T log/ρ + B C ρ + B D G + 2C + log T + log T + log T. log/ρ The fac ha all gain marices are idenical grealy simplifies he boundedness proof. Proof of Lemma. Firs, i is easy o verify ha P,22 I hus, HV P,22 2I. The gradien of he value funcion can be wrien as a V x π, a 2P,22 a + P,2 x π + L,2. a V x π, a F for any a U.

Proof of i. By 8, x X, by Lemma 7, c C. all acions are bounded by Proof of ii iii. By Lemma 6, Similarly, a x + c K X + C U. x π + c K X + C U. x π + c K X + C U. Proof of iv. By 4 he fac ha K K P P, we ge ha L 2 ρ G Q + ρc P. Furher, by 2, for any policy π Π any acion saisfying a U, he value funcions are bounded by V x π, a x π a x π P a + L x π a P X + U 2 + 2 ρ G Q + ρc P X + U V. Proof of Lemma 3. For policy π K, c, we have l x, π x Q + K Kx 2c K + g Qx + c c + g Qg. Define S Q + K K d 2c K + g Q. We wrie γ T x π Sx π d x π d x π x π + T x π Sx π d x π S /2 x π S /2 x π S /2 x π + S /2 x π. γ T d x π x π + S /2 x π x π S /2 x π + S /2 x π d + S /2 S /2 x π + S /2 x π x π x π Z x π x π. We ge he desired resul by Lemma 6.