Ben Pence Scott Moura. December 12, 2009

Size: px

Start display at page:

Download "Ben Pence Scott Moura. December 12, 2009"

Emerald Lynch
5 years ago
Views:

1 Ben Pence Scott Moura EECS 558 Project December 12, 2009 SLIDE 1of 24

2 Stochastic system with unknown parameters Want to identify the parameters Adapt the control based on the parameter estimate Identifiability Estimation Adaptive Control Results With Results W/O Identifiability Identifiability Mandl(1973) Borkar, Varaiya (1979,82) Conclusions SLIDE 2of 24

3 Model: Controlled Markov Chain P ij ( u, ) A u U U Transition Probability Matrix from state ito j; i,jœs =Unknown Parameter, A=Parameter Space u=control Action, U=Control Space Identifiability condition: If βin A, then P ( u, ) P ( u, β ) ij ij For anyu œu SLIDE 3of 24

4 P ij = 1 0 u {1,2} {0.1,0.2,0.3} Identifiable Not Identifiable P ij u u = 1 0 u {1,2} {0.1,0.2,0.3} Identifiable Not Identifiable P ij u u = 1 0 u {1,4} {0.1,0.2,0.3} Identifiable Not Identifiable SLIDE 4of 24

5 Maximum Likelihood Estimator States What value of makes this trajectory most likely? ˆ = t arg max t 1 s= 0 P( x s, x s+ 1 ; us, ) Probability of transitioning from x s to x s+1 given u s and SLIDE 5of 24

6 For each A a SMP has been prespecified U = g X t ( ) t At each time step, apply the SMP corresponding to the parameter estimate ˆt ˆt U = g X t ( ) t SLIDE 6of 24

7 Identifiability Estimation Adaptive Control Results With Results W/O Identifiability Identifiability Mandl(1973) Borkar, Varaiya (1979,82) Conclusions SLIDE 7of 24

8 Theorem 1 For the model under consideration, with the assumptions previously outlined (indentifiability), there exists random time Tsuch that for t T ˆt * = Parameter estimate reaches limit point = * 0 Limit point is true parameter value! ( * * ) ( 0 ) P i, j; g ( i), = P i, j; g ( i), i, j S Moreover, closed-loop transition probabilities are the same as if we had known the true parameter 0 SLIDE 8of 24

9 Markov Chain S = {1, 2} U = {1,2} 1 A = {0.1, 0.2, 0.3} Transition Prob s Parameter Estimate, P ij = True parameter = Converges to true parameter value Time, t Control Law ( ) ( ) ( ) 0.1 u g i = = 0.2 u g i 0.3 u g i 2 = = 1 = = 2 SLIDE 9of 24

10 Identifiability Estimation Adaptive Control Results With Results W/O Identifiability Identifiability Mandl(1973) Borkar, Varaiya (1979,82) Conclusions SLIDE 10of 24

11 Hypothesis The identifiability assumption is too restrictive P ij u u u {1,2 } = 1 0 {0.1,0.2,0.3} } Identifiable Not Identifiable Question What happens if we relax this assumption? SLIDE 11of 24

12 Theorem 2 For the model under consideration, with the assumptions previously outlined except indentifiability, there exists random time Tsuch that for t T ˆt * = Parameter estimate reaches limit point But this limit point is not necessarily equal to the true parameter! ( * * ) ( * 0 ) P i, j; g ( i), = P i, j; g ( i), i, j S However, closed-loop transition probabilities are the same 1. for true system and some imaginary system where the parameter really was * 2. control law corresponding to * is applied SLIDE 12of 24

13 Markov Chain Transition Prob s P ij u u u u = S U = {1, 2} = {1,2} A = {0.1, 0.2, 0.3} True parameter 0 = 0.2 Control, u Parameter Estimate, Lack of identifiabilityresults in control stuck at u = 2 Converges to INCORRECT parameter values Control Law EECS 558 Project Identifiability Time, t & Adaptive Control of Markov Chains ( ) ( ) ( ) 0.1 u g i = = 0.2 u g i 0.3 u g i 2 = = 1 = = 2 SLIDE 13of 24

14 Recall the objectives: Estimate the unknown parameter Satisfactorily control the Markov chain? Source of Difficulty: Only uses control U k = g * (X k ) Can only identify subset {P ij (u) u=g * (i)} of all transition probs {P ij (u) u U} Adaptive controller should probe outside subset SLIDE 14of 24

15 Main Idea Probe outside the subset of transition probabilities by perturbing control signals randomly Key Assumption: Partial Identifiability If βin A,then the MC is partially identifiableif in every open set OœU, there exist uœos.t. P ij ( u, ) P ( u, β ) ij SLIDE 15of 24

16 Construction of Randomized Controls Current parameter estimate = Control u = g ( x) Construct an open ball B ε (u t ) for each u t Radius ε > 0small centered at u Construct a probability measure µ on U which assigns probabilities to open balls Perform independent experiment to generate random control u t from B ε (g (x t )) using µ µ u t u = g B ε (g (x t )) ε ( x) SLIDE 16of 24

17 Theorem 3 For the model under consideration, with the assumptions previously outlined including partial indentifiability, then under any ε-randomization of g lim ˆ t t * = Parameter estimate reaches limit point = * 0 Limit point is true parameter value! ( * * ) ( 0 ) P i, j; g ( i), = P i, j; g ( i), i, j S Moreover, closed-loop transition probabilities are the same as if we had known the true parameter 0 SLIDE 17of 24

18 Markov Chain Transition Prob s P ij u u u u = 1 0 S = {1,2} U = {1, 2} A = {0.1, 0.2, 0.3} True parameter 0 = 0.2 Control, u Parameter Estimate, Randomized control explores outside of u = 2 Converges to true parameter Time, t Randomized Control Law P( u = 2 {0.1, 0.3}) = 0.75 P( u = 1 {0.1, 0.3}) = 0.25 P( u = 1 {0.2}) = 0.75 P( u = 2 {0.2}) = 0.25 Control Probe Control Probe SLIDE 18of 24

19 Recall the objectives: Estimate the unknown parameter Satisfactorily control the Markov chain Summary: Relaxed restrictive identifiability condition through local probing SLIDE 19of 24

20 Identifiability Estimation Adaptive Control Results With Results W/O Identifiability Identifiability Mandl(1973) Borkar, Varaiya (1979,82) Conclusions SLIDE 20of 24

21 Simultaneously identify and control a Markov chain with unknown parameters MLE converges to true parameter in closed-loop, under identifiability condition Without identifiabilitycondition, MLE may not converge to true parameter Through probing control only need partial identifiability to converge to the true parameter SLIDE 21of 24

22 Theoretical Questions What if 0 A? Can control randomization be stopped eventually? Alternative estimation methods? Application Areas Adaptive control of hybrid vehicles through drive cycle identification Algorithms for calculating the maximum likelihood estimate SLIDE 22of 24

23 Thank you for your attention! Questions? Comments? Suggestions? SLIDE 23of 24

24 Appendix Slides SLIDE 24of 24

25 Maximum Likelihood Estimator t 1 ( ) = (, ;, ) L P X X U t t t+ 1 t s= 0 ˆ = t arg max ( ) { L } t Likelihood Ratio ( ) Λ = t L L t t ( ) ( 0 ) SLIDE 25of 24

26 Construction of Randomized Controls Suppose current parameter estimate is and corresponding control is u = g ( x) Construct an open ball with radius ε > 0small centered at u, denotedby B ε (u t ), for each Construct a probability measureµon Uwhich assigns probabilities to open balls Performindependent experiment to generate random control u t from B ε (g (x t )) usingµ Sequence {u t, t= 0,1, } is called the ε-randomization of g u U SLIDE 26of 24

27 Model and Problem Formulation Main Results under Identifiability(1 st paper) Results without Identifiability(2 nd paper) Control Randomization (3 rd paper) Concluding Remarks SLIDE 27of 24

28 Motivation The identifiability assumption is too restrictive For every βin A, there exist at least one i S [ P ( i,1; u, ) P ( i,2; u, ) L P ( i, I ; u, ) ] [ P( i,1; u, β ) P( i,2; u, β ) L P( i, I; u, β )] u U Question What happens if we relax this assumption? SLIDE 28of 24

29 Model and Problem Formulation Main Results under Identifiability(1 st paper) Results without Identifiability(2 nd paper) Control Randomization (3 rd paper) Concluding Remarks SLIDE 29of 24

30 Controlled Markov chain (MC) Finite state space S = {1, 2,, I} Finite action space U Finite parameter space A Transition probabilities P( i, j; u, ) i, j S u U A { } Stationary Markov policy (SMP) U t = g(x t ) Perfect Recall SLIDE 30of 24

31 Estimate the unknown parameter Adaptively Control the Markov chain Analyze the asymptotic convergence properties Convergence or not? What assumptions are necessary? Quantify performance of adaptive MC SLIDE 31of 24

32 1. Controlled MC is irreducible, 0 2. True parameter A u U, 3. The SMP corresponding to, g, precomputed 4. Identifiability condition: If βin A, then P ( u, ) P ( u, β ) ij ij For anyu œu A SLIDE 32of 24

33 Model and Problem Formulation Main Results under Identifiability(1 st paper) Results without Identifiability(2 nd paper) Control Randomization (3 rd paper) Concluding Remarks SLIDE 33of 24

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete