Zhongping Jiang Tengfei Liu

Size: px

Start display at page:

Download "Zhongping Jiang Tengfei Liu"

Elijah Potter
5 years ago
Views:

1 Zhongpng Jang Tengfe Lu 4

MAS- Renforcement Learnng Soluton for Unknown Systems Supported by : Chna Qan Ren Program, NEU Chna Educaton Mnstry Project 111 (No.

2 F.L. Lews, NAI Moncref-O Donnell Char, UTA Research Insttute (UTARI) The Unversty of Texas at Arlngton, USA and Qan Ren Consultng Professor, State Key Laboratory of Synthetcal Automaton for Process Industres, Northeastern Unversty, Shenyang, Chna Output Regulaton of Heterogeneous MAS- Renforcement Learnng Soluton for Unknown Systems Supported by : Chna Qan Ren Program, NEU Chna Educaton Mnstry Project 111 (No.B08015) US NSF ONR Work of Reza Modares and S. Nageshrao and Zuo Shan wth Davd Song and Dr. A. Davoud Talk avalable onlne at

3 Output Synchronzaton of Heterogeneous MAS 7

4 Heterogeneous Mult Agents = + x A x B u y = C x leader node x 0 Leader z y = S z = Rz Output regulaton error () t y () t y () t 0 0 Output regulator equatons AP + BG = PS C P = R Dynamcs are dfferent, state dmensons can be dfferent o/p reg eqs capture the common core of all the agents dynamcs And defne a synchronzaton manfold 8

5 Heterogeneous Mult Agents = + x A x B u y = C x Leader z = S z y = Rz Output regulator equatons P + G = P A B S C P = R Trackng error x 0 Output regulaton error () t y () t y () t 0 0 s the nserton map of S n A Dynamcs are dfferent, state dmensons can be dfferent o/p reg eqs capture the common core of all the agents dynamcs And defne a synchronzaton manfold 9

6 Two Control Methods Output regulator equatons AP + BG = PS C P = R Control Method #1 é N ù z = Sz + c a ( z - z ) + g ( z -z ) j j 0 êå ë ú j = 1 û u = K ( x -P z ) +G z = K x + ( G-K P ) z º K x + K z Control Method #2- more ntrgung Local neghborhood output trackng error compensator e a y y g y y y 0. jn j j z Fz Ge y u K x H z ether u K x H z K x ( K ) z Must know agent and leader s dynamcs S,R Or, assume p copy n compensator Then K, H are ndependent

7 Data drven Adaptve Soluton of o/p Reg Eqs n Real Tme Output regulator equatons AP + BG = PS C P = R Solve o/p regulator eqs onlne usng measured data by Renforcement Learnng

8 Optmal Output Synchronzaton of Heterogeneous MAS Usng Off-polcy IRL Nageshrao, Modares, Lopes, Babuska, Lews MAS = + x A x B u y = C x Optmal Tracker Problem Augmented Systems T T T n Xt () = éx() t z ù ê Î 0 ë úû + p z y = S z 0 0 = Rz 0 0 X = T X + B u T B 1 1 Leader éa 0 ù éb ù, = = 0 S 0 êë úû êë úû Performance ndex - g ( t t ) - T T T = ò + t 1 1 T = X () t P X () t VX ( ( t)) e X ( C QC KWK) Xdt Control u = K x + K z = K X 1 2 0

9 Optmal Tracker Soluton by Renforcement Learnng Tracker ARE K = [ K, K ] =-W B P -1 T T P TP P C QC P B W B P T T -1 T + - g + - = Algorthm 1. On-polcy IRL State-feedback algorthm Polcy Evaluaton Solve IRL Bellman equaton t+ t -gdt ( t) T k T k d -g t- T =- ò - - t 0 0 e X ( t d t ) P X ( t d t ) X ( t ) P X ( t ) e ( y y ) Q ( y y ) d t Polcy Update K = [ K, K ] =-W B P k + 1 k + 1 k T k Theorem Algorthm 1 converges to the soluton to the ARE Bellman equaton s solved usng RLS or batch LS It requres a Persstence of Exctaton (PE) condton that may be hard to satsfy Must know B1 14

10 On polcy RL Target polcy: The polcy that we are learnng about. Behavor polcy: The polcy that generates actons and behavor Ref. Target and behavor polcy System Target polcy and behavor polcy are the same 15

11 Off polcy RL Humans can learn optmal polces whle actually applyng suboptmal polces Target polcy Ref. Behavor Polcy System Target polcy and behavor polcy are dfferent 16

12 Off Polcy RL Tracker dynamcs X = TX + B u 1 Rewrte as k k k X = ( T + B K ) X + B ( u - K X ) º T X + B ( u -K X ) Now the Bellman equaton becomes t+ dt -gdt ( ) T k T k -g t-t T ( + d ) ( + d )- () () =- ( ) ( ) ò - - t t 0 0 t+ dt -g ( t- t) k T k ò ( - ) t e X t t P X t t X t P X t e y y Q y y d 1 Extra term contanng K k+ Algorthm 2. Off-polcy IRL Data-based algorthm e u K X W K X dt k k+ 1 Iterate on ths equaton and solve for P, K smultaneously at each step Note about probng nose If u K k k = X + e then ( u - K X ) = e Do not have to know any dynamcs agent = + x A x B u y = C x Or leader z y = S z = Rz

13 Theorem Off polcy Algorthm 2converges to the soluton to the ARE T P TP P C QC P B W B P T T -1 T + - g + - = Theorem o/p reg eq soluton ép P ù = ê ë úû Let P ê êp P AP + BG = PS Then the soluton to the output regulator equatons 1 Is gven by ( - P =-P ) P G = ( - K K P ) P C P = R Do not have to know the Agent dynamcs or the leader s dynamcs (S,R) 18

14 Observer for Leader s State and Dynamcs To avod knowledge of leader s state n u = K x + K z = K X Use adaptve observer for leader s state é N ù z = Sˆ z + c a ( z - z ) + g ( z -z ) j j 0 êå ë ú j = 1 û ˆ é N ù S =-G ( I Äz ) a ( z - z ) + g ( z -z ) vec S q êå j j 0 ë ú j = 1 û Then use control éx ù ˆ u = K x + K z º K X º K 1 2 z ê ë úû So ths s the Control Method #1 Do not have to know the leader s dynamcs (S,R) 19

16 Two Control Methods Output regulator equatons AP + BG = PS C P = R We just found a Controller #1 usng data-based control é N ù z = Sz + c a ( z - z ) + g ( z -z ) j j 0 êå ë ú j = 1 û u = K ( x -P z ) +G z = K x + ( G-K P ) z º K x + K z Now we seek a Controller usng Method #2- more ntrgung Local neghborhood output trackng error e a y y g y y y 0. jn j j compensator z Fz Ge y u Kx Hz ether u K x H z K x ( K ) z Or, assume p copy n compensator Then K, H are ndependent Do not want to solve o/p reg eqs or know any dynamcs

17 Overall Dynamcs Structure Agents Leader Compensator Control Input Local neghborhood output error Global form x 1 e dag{ }( ) d { } g L G I p dag C x

18 Global Dynamcs A dag{ A } x z H dag{ }( LG) I d 1 g p 31

19 Problems to Get Local Desgn Procedure 34

21 39

22 Heterogeneous Mult agent Dynamcs Leader Dynamcs Output Regulaton Problem 40

23 Local Neghborhood Output Error Compensator 1 local state feedback Compensator 2 local output feedback Compensator 3 no local system feedback 41

24 Local Systems wth Interactons from Neghbors Rewrte local o/p error Local Dynamcs Interacton terms x A BK BH x = z GC F z 0 x j 0 aj Cj 0 gr 0, jn G z j G x C 0 R0. z 42

25 State Regulaton Errors x X 0 o/p reg eqs AX+ GR XS. CX R Error dynamcs A G a C jn j j j A BK BH GC F A BK BH GC F G jn j j j Gz a C C Local transfer functons 1 T s C s A G 43

26 State Regulaton Errors Error Dynamcs Global Form 44

27 Man Theorem Local Desgn Interacton Small Gan Condton 46

28 Local Desgn Procedure z u 1 ˆ A Bu 1 Gz C w Eˆ ˆ w Closed loop Systems Open loop Systems A G a C jn j j j A BK BH G a j N jcj j GC F A BK BH Gz GC F A 0 B K H G z GC F 0 A 0 B K G z. GC F 0 dag K I p Then problem s formulated as ˆ ˆ A Bu 1 Gz, C Statc State feedback u K 1 48

29 Optmal Local Desgn Algorthm H nfnty desgn 49

30 Off polcy RL Algorthm to Solve Output Regulaton for Heterogeneous MAS T T t e t t P t t t P t T t 2 tt t t t t T T e Q K RK d e z z d t tt t t t 1 T T t 2 1 T e 2 K R K K d e 2 z z z d. t t T Ths s a standard parameter ID equaton from Adaptve Control It s lnear n the unknown parameters Solve usng Batch LS or RLS to get the updated values P, K, z 1 1 Ths solves the o/p regulaton problem for heterogeneous MAS Wthout solvng o/p reg equatons and wthout knowng any agent or leader s dynamcs 51

32 55

Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems

Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems Off-polcy Renforcement Learnng for Robust Control of Dscrete-tme Uncertan Lnear Systems Yonglang Yang 1 Zhshan Guo 2 Donald Wunsch 3 Yxn Yn 1 1 School of Automatc and Electrcal Engneerng Unversty of Scence