Data-Enabled Predictive Control : In the Shallows of the DeePC

Size: px

Start display at page:

Download "Data-Enabled Predictive Control : In the Shallows of the DeePC"

Laurence Knight
5 years ago
Views:

1 Data-Enabled Predictive Control : In the Shallows of the DeePC Florian Do rfler Automatic Control Laboratory, ETH Zu rich

2 Acknowledgements Linbin Huang Jeremy Coulson Brain-storming: P. Mohajerin Esfahani, B. Recht, R. Smith, B. Bamieh, and M. Morari John Lygeros 1/34

Big, deep, intelligent and so on unprecedented availability of computation, storage, and data theoretical advances in optimization, statistics, and machine learning.

3 Big, deep, intelligent and so on unprecedented availability of computation, storage, and data theoretical advances in optimization, statistics, and machine learning... and big-data frenzy increasing importance of data-centric methods in all of science / engineering Make up your own opinion, but machine learning works too well to be ignored. 2/34

4 Feedback our central paradigm actuation making a difference to the world physical world sensing making sense of the world automation and control information technology inference and data science 3/34

Control in a data-rich world ever-growing trend in CS and robotics: data-driven control by-passing models canonical problem: black/gray-box system control based on I/O samples u 2 y 2 Q: Why give up

5 Control in a data-rich world ever-growing trend in CS and robotics: data-driven control by-passing models canonical problem: black/gray-box system control based on I/O samples u 2 y 2 Q: Why give up physical modeling and reliable model-based algorithms? Data-driven control is viable alternative when models are too complex to be useful (e.g., fluid dynamics & building automation) first-principle models are not conceivable (e.g., human-in-the-loop & perception) modeling & system ID is too cumbersome (e.g., robotics & power applications) u 1 y 1 data-driven control Central promise: It is often easier to learn control policies directly from data, rather than learning a model. Example: PID 4/34

6 Snippets from the literature 1. reinforcement learning / or stochastic adaptive control / or approximate dynamic programming with key mathematical challenges (approximate/neuro) DP to learn approx. value/q-function or optimal policy (stochastic) function approximation exploration-exploitation trade-offs and practical limitations inefficiency: computation & samples complex and fragile algorithms safe real-time exploration ø suitable for physical control systems with real-time & safety constraints? action unknown system reinforcement learning control estimate reward observation 5/34

7 Snippets from the literature cont d 2. gray-box safe learning & control robust conservative & complex control adaptive hard & asymptotic performance y? contemporary learning algorithms (e.g., MPC + Gaussian processes / RL) non-conservative, optimal, & safe ø limited applicability: need a-priori safety u robust/adaptive control 3. Sequential system ID + control ID with uncertainty quantification followed by robust control design recent finite-sample & end-to-end ID + control pipelines out-performing RL ø ID seeks best but not most useful model ø easier to learn policies than models u 2 y 2 u 1 y 1 +? 6/34

8 Key take-aways claim: easier to learn controllers from data rather than models data-driven approach is no silver bullet (see previous ø) predictive models are preferable over data (even approximate) models are tidied-up, compressed, & de-noised representations model-based methods vastly out-perform model-agnostic ones ø deadlock? a useful ML insight: non-parametric methods are often preferable over parametric ones (e.g., basis functions vs. kernels) build a predictive & non-parametric model directly from raw data? 7/34

9 Colorful idea u 1 = 1 u 2 = u 3 = = 0 x 0 =0 y 1 y 3 y 5 y 2 y 4 y 6 y 7 If you had the impulse response of a LTI system, then... can build state-space system identification (Kalman-Ho realization)... but can also build predictive model directly from raw data : u future (t) y future (t) = [ y 1 y 2 y 3... ] u future (t 1) u future (t 2). model predictive control from data: dynamic matrix control (DMC) today: can we do so with arbitrary, finite, and corrupted I/O samples? 8/34

10 Contents I. Data-Enabled Predictive Control (DeePC): Basic Idea J. Coulson, J. Lygeros, and F. Dörfler. Data-Enabled Predictive Control: In the Shallows of the DeePC. arxiv.org/abs/ II. From Heuristics & Numerical Promises to Theorems J. Coulson, J. Lygeros, and F. Dörfler. Regularized and Distributionally Robust Data-Enabled Predictive Control. arxiv.org/abs/ III. Application: End-to-End Automation in Energy Systems L. Huang, J. Coulson, J. Lygeros, and F. Dörfler. Data-Enabled Predictive Control for Grid-Connected Power Converters. arxiv.org/abs/

11 Preview complex 2-area power system: large (n 10 2 ), nonlinear, noisy, stiff, & with input constraints control objective: damping of inter-area oscillations via HVDC but without model control ( no control ) collect data control seek method that works reliably, can be efficiently implemented, & certifiable automating ourselves 9/34

Behavioral view on LTI systems Definition: A discrete-time dynamical system is a 3-tuple (Z 0, W, B) where (i) Z 0 is the discrete-time axis, (ii) W is a signal space, and (iii) B W Z 0 is the

12 Behavioral view on LTI systems Definition: A discrete-time dynamical system is a 3-tuple (Z 0, W, B) where (i) Z 0 is the discrete-time axis, (ii) W is a signal space, and (iii) B W Z 0 is the behavior. Definition: The dynamical system (Z 0, W, B) is (i) linear if W is a vector space & B is a subspace of W Z 0, (ii) time-invariant if B σb, where σw t = w t+1, and y (iii) complete if B is closed W is finite dimensional. In the remainder we focus on discrete-time LTI systems. u 10/34

13 Behavioral view cont d B = set of trajectories in W Z 0 & BT is restriction to t [0, T ] A system (Z 0, W, B) is controllable if any two trajectories w 1, w 2 B can be patched with a trajectory w B T. w w 2 w T I/O : B = B u B y where B u = (R m ) Z 0 and B y (R p ) Z 0 are the spaces of input and output signals w = col(u, y) B different parametric representations: state space, kernel, image,... kernel representation (ARMA) : B = col(u, y) (R m+p ) Z 0 s.t. b 0 u + b 1 σu + + b n σ n u + a 0 y + a 1 σy +... a n σ n y = 0 11/34

14 LTI systems and matrix time series foundation of state-space subspace system ID & signal recovery algorithms u(t) u 1 u 3 u 2 u 4 u 5 u 6 u 7 t y(t) y 4 y 1 y 3 y 5 y 2 y 6 y 7 t ( ) u(t), y(t) satisfy recursive difference equation b 0 u t +b 1 u t b n u t+n + a 0 y t +a 1 y t a n y t+n = 0 (ARMA / kernel representation) under assumptions [ b 0 a 0 b 1 a 1... b n a n ] spans left nullspace of Hankel matrix (collected from data) ( u1 y 1 ) ( u2 y 2 ) ( u3 y 3 ) (u T L+1 ) y T L+1 ( u2 y 2 ) ( u3 y 3 ) ( u4 y 4 ). H L ( u y ) = ( u3 y 3 ) ( u4 y 4 ) ( u5 y 5 ) ( u L yl ) ( u T y T ) 12/34

15 The Fundamental Lemma Definition : The signal u = col(u 1,..., u T ) R mt is persistently exciting of order L if H L (u) = u 1 u T L u L u T is of full row rank, i.e., if the signal is sufficiently rich and long (T L + 1 ml). Fundamental lemma [Willems et al, 05] : Let T, t Z >0, Consider a controllable LTI system (Z 0, R m+p, B), and a T -sample long trajectory col(u d, y d ) B T, where u is persistently exciting of order t + n (prediction span + # states). Then colspan (H t ( u y )) = B t. 13/34

Cartoon of Fundamental Lemma u(t) y(t) y 3 u 4 y u 1 u 3 u 7 y 4 1 y 5 y 7 u 5 u 2 u 6 t y 2 t y 6 persistently exciting controllable LTI sufficiently many samples ( u1 y 1 ) ( u2 y 2 ) ( u3 y 3 ).

16 Cartoon of Fundamental Lemma u(t) y(t) y 3 u 4 y u 1 u 3 u 7 y 4 1 y 5 y 7 u 5 u 2 u 6 t y 2 t y 6 persistently exciting controllable LTI sufficiently many samples ( u1 y 1 ) ( u2 y 2 ) ( u3 y 3 )... x k+1 =Ax k + Bu k ( u2 y 2 ) ( u3 y 3 ) ( u4 y 4 )... colspan y k =Cx k + Du ( u3 y 3 ) ( u4 y 4 ) ( u5 y 5 )... k }{{}}{{} parametric state-space model non-parametric model from raw data all trajectories constructible from finitely many previous trajectories 14/34

17 Data-driven simulation [Markovsky & Rapisarda 08] Problem : predict future output y R p T future based on input signal u R m T future to predict forward past data col(u d, y d ) B Tdata to form Hankel matrix Assume: B controllable & u d persistently exciting of order T future + n Solution: given (u 1,..., u Tfuture ) compute g & (y 1,..., y Tfuture ) from u d 1 u d 2 u d T N+1 u u d T future u d T future +1 u d. T u Tfuture g = y1 d y2 d yt d N+1 y yt d future yt d future +1 yt d y Tfuture Issue: predicted output is not unique need to set initial conditions! 15/34

18 Refined problem : predict future output y R p T future based on initial trajectory col(u ini, y ini ) R (m+p)t ini input signal u R m T future past data col(u d, y d ) B Tdata to estimate initial x ini to predict forward to form Hankel matrix Assume: B controllable & u d persist. exciting of order T ini +T future +n Solution: given (u 1,..., u Tfuture ) & col(u ini, y ini ) compute g & (y 1,..., y Tfuture ) from if T ini lag of system, then y is unique U p Y p U f g = Y f u ini y ini u y u d 1 u d T T future T ini [ ] Up u d T u d U ini T T future f u d T ini +1 u d T T future u d T ini +T u d future T y1 d yt d T future T ini [ ] Yp yt d y d Y ini T T future f yt d ini +1 yt d T future yt d ini +T y d future T 16/34

19 Output Model Predictive Control The canonical receding-horizon MPC optimization problem : minimize u, x, y T future 1 k=0 y k r t+k 2 Q + u k 2 R subject to x k+1 = Ax k + Bu k, k {0,..., T future 1}, y k = Cx k + Du k, k {0,..., T future 1}, x k+1 = Ax k + Bu k, k { T ini 1,..., 1}, y k = Cx k + Du k, k { T ini 1,..., 1}, u k U, k {0,..., T future 1}, y k Y, k {0,..., T future 1} quadratic cost with R 0, Q 0 & ref. r model for prediction over k [0, T future 1] model for estimation (many variations) hard operational or safety constraints For a deterministic LTI plant and an exact model of the plant, MPC is the gold standard of control : safe, optimal, tracking,... 17/34

20 Data-Enabled Predictive Control DeePC uses non-parametric and data-based Hankel matrix time series as prediction/estimation model inside MPC optimization problem: minimize g, u, y subject to T future 1 k=0 U p Y p U f Y f y k r t+k 2 Q + u k 2 R g = u ini y ini u y, u k U, k {0,..., T future 1}, y k Y, k {0,..., T future 1} quadratic cost with R 0, Q 0 & ref. r non-parametric model for prediction and estimation hard operational or safety constraints Hankel matrix with T ini + T future rows from past data [ ] [ ] Up = H U Tini +T f future (u d Yp ) and = H Y Tini +T f future (y d ) collected offline (could be adapted online) past T ini lag samples (u ini, y ini ) for x ini estimation updated online 18/34

21 Correctness for LTI Systems Theorem: Consider a controllable LTI system and the DeePC & MPC optimization problems with persistently exciting data of order T ini +T future +n. Then the feasible sets of DeePC & MPC coincide. Corollary: If U, Y are convex, then also the trajectories coincide. Aerial robotics case study : 19/34

22 Thus, MPC carries over to DeePC... at least in the nominal case. Beyond LTI, what about measurement noise, corrupted past data, and nonlinearities?

23 Noisy real-time measurements minimize g, u, y subject to T future 1 k=0 U p Y p U f Y f y k r t+k 2 Q + u k 2 R + λ y σ y 1 g = u ini y ini u y 0 + σ y 0, 0 u k U, k {0,..., T future 1}, y k Y, k {0,..., T future 1} Solution : add slack to ensure feasibility with l 1 -penalty for λ y sufficiently large σ y 0 only if constraint infeasible c.f. sensitivity analysis over randomized sims cost average cost average constraint violations duration violations (s)20 20/34

24 Hankel matrix corrupted by noise minimize g, u, y subject to T future 1 k=0 U p Y p U f Y f y k r t+k 2 Q + u k 2 R + λ g g 1 g = u ini y ini u y, u k U, k {0,..., T future 1}, y k Y, k {0,..., T future 1} Solution : add a l 1 -penalty on g intuition: l 1 sparsely selects {Hankel matrix columns} = {past trajectories} = {motion primitives} c.f. sensitivity analysis over randomized sims cost average cost average constraint violations duration violations (s)20 21/34

25 Towards nonlinear systems... Idea : lift nonlinear system to large/ -dimensional bi-/linear system Carleman, Volterra, Fliess, Koopman, Sturm-Liouville methods exploit size rather than nonlinearity and find features in data exploit size, collect more data, & build a larger Hankel matrix regularization singles out relevant features / basis functions DeePC 3 2 x DeePC ydeepc 2 zdeepc x ref 1 yref 1 zref 0.5 Constraints m case study : regularization for g and σy s 22/34

26 recall the central promise : it is easier to learn control policies directly from data, rather than learning a model

27 Comparison to system ID + MPC Setup : nonlinear stochastic quadcopter model with full state info DeePC + l 1 -regularization for g and σ y MPC : system ID via prediction error method + nominal MPC 3 DeePC 5 MPC 2 m x DeePC y DeePC z DeePC 4 x ref y ref z ref Constraints single fig-8 run m x MPC y MPC z MPC x ref y ref z ref Constraints s s Number of simulations Cost DeePC System ID + MPC random sims Number of simulations Constraint Violations DeePC System ID + MPC Cost Duration constraints violated 23/34

28 from heuristics & numerical promises to theorems

29 Robust problem formulation 1. the nominal problem (without g-regularization) minimize g, u, y subject to T future 1 k=0 Û p Ŷ p Û f g = Ŷ f y k r t+k 2 Q + u k 2 R + λ y σ y 1 u ini ŷ ini u y 0 + σ y 0, 0 u k U, k {0,..., T future 1} where denotes measured & thus possibly corrupted data ) 2. an abstraction of this problem (Ûf minimize g G f g, Ŷfg 1 + λ y Ŷ p g ŷ ini { } where G = g : Û p g = u ini & Ûfg U 24/34

30 ) 3. a further abstraction minimize ( ξ, g G c g = minimize g G E P [c (ξ, g)] { } with G = g : Û p g = u ini & Ûfg U, measured ξ ) = (Ŷp, Ŷf, ŷ ini, & P = δ ξ denotes the empirical distribution from which we obtained ξ 4. the solution g of the above problem gives poor out-of-sample performance for the problem we really want to solve: E P [c (ξ, g )] where P is the unknown probability distribution of ξ 5. distributionally robust formulation inf g G sup E Q [c (ξ, g)] Q B ɛ( P ) where the ambiguity set B ɛ ( P ) is an ɛ-wasserstein ball centered at P : { } B ɛ ( P ) = P : inf ξ ξ W dπ ɛ where Π has marginals ˆP and P Π 25/34

31 5. distributionally robust formulation inf g G sup E Q [c (ξ, g)] Q B ɛ( P ) where the ambiguity set B ɛ ( P ) is an ɛ-wasserstein ball centered at P : { } B ɛ ( P ) = P : inf ξ ξ W dπ ɛ where Π has marginals ˆP and P Π Theorem : Under minor technical conditions: inf g G ) sup E Q [c (ξ, g)] min ( ξ, c g + ɛλ y g Q B ɛ( P W g G ) Cor: l -robustness in trajectory space l 1 -regularization of DeePC Proof uses methods by Kuhn & Esfahani: semi-infinite problem becomes finite after marginalization & for discrete worst case Cost cost ǫ 26/34

32 Relation to system ID & MPC 1. regularized DeePC problem minimize g, u U, y Y f(u, y) + λ g g 2 2 subject to U p Y p U f Y f g = u ini y ini u y 2. standard model-based MPC (ARMA parameterization) minimize u U, y Y subject to f(u, y) u ini y = K y ini u 3. subspace ID y = Y f g where g = g (u ini, y ini, u) solves arg min g g 2 2 U p u ini subject to Y p g = y ini U f u 4. equivalent prediction error ID d u inij minimize K j yd j K d y inij u d j u ini y = K y ini = Y f g u 2 27/34

33 subsequent ID & MPC minimize u U, y Y subject to f(u, y) u ini y = K y ini u where K solves u inij arg min K j y j K y inij u j regularized DeePC minimize g, u U, y Y f(u, y) + λ g g 2 2 subject to U p Y p U f g = Y f 2 u ini y ini u y minimize u U, y Y subject to f(u, y) [ ] y = u where g solves [ Yf U f ] g arg min g 2 2 g subject to U p Y p g = u ini y ini U f u feasible set of ID & MPC feasible set for DeePC DeePC MPC + λ g ID easier to learn control policies from data rather than models 28/34

34 application: end-to-end automation in energy systems

35 Grid-connected converter control Task: control converter (nonlinear, noisy & constrained) without a model of the grid, line, passives, or inner loops '"$ '"# '"! DeePC!"& open!"% loop!"!!"#!"$!"%!"& '"! '"# '"$ Three-Phase VSC * Uabc Iabc LF LCL Vabc Igabc CF Lg Line Rg Uabc AC Grid Power Part!"#!"'!"! DeePC dq abc Current Control Loop θ ω PLL PI Vq )!"' open )!"# loop!"!!"#!"$!"%!"& '"! '"# '"$ * Ud * Uq PI PI Id! "! " ref Id ref Iq Vd Vq Id Iq abc dq Vabc Iabc #"! '"( '"! DeePC Iq u1 u2 y1 y2 y3 Control Part!"( open!"! loop!"!!"#!"$!"%!"& '"! '"# '"$ DeePC tracking constant dq-frame references subject to constraints open loop inject noise collect data activate DeePC 29/34

36 Effect of regularizations DeePC time-domain cost!"#$%$&'#$()*+(,# DeePC Sys ID + MPC λ g DeePC time-domain cost = k y k r k 2 Q + u k 2 R (closed-loop measurements)! " Optimization cost = k y k r k 2 Q + u k 2 R +λ g g 2 (closed-loop measurements) 30/34

37 DeePC time-domain cost PEM-MPC time-domain c Data length T ini = 40, T future = 30 Sys ID + MPC T ini '"$ '"# '"!!"&!"%!"!!"#!"$!"%!"& '"!!"#!"'!"! (!"' (!"#!"!!"#!"$!"%!"& '"! works like a charm for T large, but card(g) = T T ini T future + 1 (possibly?) prohibitive on µdsp T #"! '") '"!!")!"!!"!!"#!"$!"%!"& '"! 31/34

38 Power system case study extrapolation from previous case study: const. voltage grid control complex 2-area power system: large (n 10 2 ), nonlinear, noisy, stiff, & with input constraints ( no control ) control objective: damping of inter-area oscillations via HVDC real-time closed-loop MPC & DeePC become prohibitive (on laptop) choose T, T ini, and T future wisely 32/34

39 time-domain cost PEM-MPC time-domain cost Choice of time constants!')(!'(!!'"( T ini = 200, T future = 80! " # $ % &! &" &# choose T sufficiently large short horizon T future 10 T ini 10 estimates sufficiently rich model complexity!')(!'(!!'"( T ini = 10, T future = 10! " # $ % &! &" &#!')(!'(!!'"( T ini = 5, T future = 10! " # $ % &! &" &# T ini time-domain cost = k y k r k 2 Q + u k 2 R (closed-loop measurements) 33/34

40 Summary & conclusions fundamental lemma from behavioral systems matrix time series serves as predictive model data-enabled predictive control (DeePC) X certificates for deterministic LTI systems X distributional robustness via regularizations -1 * Uabc X outperforms ID + MPC in optimization metric I abc Vabc LF Three-Phase VSC I gabc adaptive extensions, explicit policies,... applications to building automation, bio, etc. Why have these powerful ideas not been mixed long before? Uabc Rg AC Grid LCL Power Part θ abc certificates for nonlinear & stochastic setup Line Lg CF -0.2 dq ω Id Ud* PI! " PI "! Vq PLL Current Control Loop Uq* PI Vd Idref abc Vabc Vq Iqref Id Iq Iq I abc dq Control Part u1 u2 y1 y2 y3 Willems 07: [MPC] has perhaps too little system theory and too much brute force computation in it. The other side often proclaims behavioral systems theory is beautiful but did not prove utterly useful 34/34

ELEC system identification workshop. Subspace methods

1 / 33 ELEC system identification workshop Subspace methods Ivan Markovsky 2 / 33 Plan 1. Behavioral approach 2. Subspace methods 3. Optimization methods 3 / 33 Outline Exact modeling Algorithms 4 / 33