FPGA Implementation of a Predictive Controller

Size: px

Start display at page:

Download "FPGA Implementation of a Predictive Controller"

Robert Harris
5 years ago
Views:

1 FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan May 18, / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

2 MPC Problem Formulation Contents Field Programmable Gate Array (FPGA) Algorithms for Quadratic Programming Implementation Details Results Related Work 2 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

3 Optimal control problem subject to min θ x T N Qx N + N 1 k=0 [ xk u k ] T [ Q S S T R ] [ xk u k ] (1) x 0 = x (2a) x k+1 = Ax k + Bu k for k = 0, 1, 2,..., N 1 (2b) Jx k + Eu k d for k = 0, 1, 2,..., N 1 (2c) x k R n, u k R m Goal Accelerate the computation of the optimal value θ such that MPC can be implemented at faster sampling rates 3 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

4 where Quadratic Programming Formulation 1 min θ 2 θt Hθ subject to F θ = f, Gθ g θ := [x0 T u0 T x1 T u1 T x2 T u2 T... xn 1 T un 1 T xn T ] T R N(n+m)+n, [ ] Q S I H := N S T 0 R, 0 Q I n x A B I n F :=..., f := 0., A B I n 0 G := I N [ J E ], g := d := 1 N d. 4 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

5 where Quadratic Programming Formulation 1 min θ 2 θt Hθ subject to F θ = f, Gθ g θ := [x0 T u0 T x1 T u1 T x2 T u2 T... xn 1 T un 1 T xn T ] T R N(n+m)+n, [ ] Q S I RESULT H := N S T 0 R, DATA 0 Q I n x A B I n F :=..., f := 0., A B I n 0 G := I N [ J E ], g := d := 1 N d. 4 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

6 Reconfigurable logic blocks Reconfigurable interconnect Other reconfigurable hard blocks What is an FPGA? On-chip memories Embedded multipliers Advantages for embedded real-time applications Deterministic execution time Computational/Energy efficiency Much reduced low volume cost compared to ASIC Disadvantages Clock frequency < 350MHz Hardware design process 5 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

7 Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

8 Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

9 Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

10 Is MPC suitable for FPGA computation? Cycle accurate completion guarantee No jitter Compute-bound application O(n + m) 3 compute operations O(n + m) I/O operations Fixed-point computation is faster and uses less resources 7 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

11 Algorithms for Quadratic Programming Active-Set methods Worst-case exponential complexity Varying matrix structure Interior-Point methods Polynomial complexity Predictable matrix structure S. Mehrotra: Solves two systems of linear equations every iteration S. Wright [1]: Solves one system of linear equations [1] Applying new optimization algorithms to model predictive control. In Proc. Int. Conf. Chemical Process Control, Jan / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

12 Why iterative linear solvers? Small number of division operations Matrix vector multiplications Easy to parallelise Trade off between computation time and accuracy Conserve matrix structure (no fill-in) Allows exploiting fine structure to reduce memory requirements Examples Conjugate Gradient (CG) for SPD matrices Minimum Residual (MINRES) for indefinite symmetric matrices 9 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

13 Infeasible Primal-Dual Interior-Point algorithm Initialization (θ 0, ν 0, λ 0, s 0) with [λ T 0 s T 0 ] T > 0 for k = 0 to I IP 1 do [ H + G T W Linearization A k := k G F T F 0 [ (H + G T W b k := k G)θ k F T ν G T (λ k W k g + σµs 1 k ) F θ k + f [ ] θk Solve A k z k = b k for z k =: ν k Compute λ k := W k (G(θ k + θ k ) g + σµs 1 k ) s k := s k λ k [ ] λk + α λ Line Search α k := max (0,1] α : k > 0. s k + α s k Update (θ k+1, ν k+1, λ k+1, s k+1 ) := (θ k, ν k, λ k, s k ) + α k ( θ k, ν k, λ k, s k ) end for 10 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan ], ]

14 Coefficient Matrix A k After variable re-ordering: I I Q 0 S A T S T R 0 B T A B I I Q 1 S A T S T R 1 B T A B I... I Q N 1 S A T S T R N 1 B T A B I I Q N Banded Size Symmetric Halfband Indefinite 11 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Z := N(2n + m) + 2n M := 2n + m

15 Coefficient Matrix A k After variable re-ordering: I I Q 0 S A T S T R 0 B T A B I I Q 1 S A T S T R 1 B T A B I... I Q N 1 S A T S T R N 1 B T A B I I Q N Banded Size Symmetric Halfband Indefinite 11 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Z := N(2n + m) + 2n M := 2n + m

16 Matrix storage Columns of symmetric CDS matrix are stored in separate on-chip memories In-band zeros and ones do not need to be stored Constant columns consist of repeated blocks and are constant for all problems being solved simultaneously 12 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

17 Matrix storage Columns of symmetric CDS matrix are stored in separate on-chip memories In-band zeros and ones do not need to be stored Constant columns consist of repeated blocks and are constant for all problems being solved simultaneously 12 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

18 Reduction in storage requirements 13 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

19 MINRES implementation Hardware architecture for computing Aq i RAMcolumn1 RAMcolumnM-1 RAMcolumnM Z -(M-1) Z -(M-2) vector x x x x 1 2 M 2M-2 x2m log2(2m-1) latency = 2Z + M + k 1 log 2 (2M 1) + k 2 throughput = Z #problems = 2Z+M+k1 log 2 (2M 1) +k 2 Z + Z 3 q 1 = b, β 1 = q 1 2 for k = 1 to I MR do q i = q i β i z = Aq i α = qi T z q i+1 = z αq i β i q i 1 β i+1 = q i+1 2. γ i+1 = δ ρ 1 σ i+1 = β i+1 ρ 1 w i = q i ρ 3w i 2 ρ 2w i 1 ρ 1 x i = x i 1 + γ i+1 ηw i η = σ i+1 η end for 14 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

20 QP solver design overview maximise throughput: latency IP = 2 latency Stage2 (solves 2 #problems) For large problems, a sequential implementation of Stage 1 is sufficient for latency Stage1 < latency Stage2 minimise latency: latency IP = latency Stage1 + latency Stage2 (solves 1 problem) 15 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

21 Number of free parallel channels 25 Number of parallel channels Number of states (n) Number of inputs (m) [1] An FPGA Implementation of a Sparse Quadratic Programming Solver for Constrained Predictive Control. In Proc. ACM/SIGDA Symposium on Field Programmable Gate Arrays. Mar / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

22 Performance Hardware : Xilinx Virtex 6 SX 250MHz (40nm) Software : Intel Core2 2.5GHz, 3GB RAM, 4MB L2 Cache (45nm) Time per interior point iteration, seconds CPU measured FPGA latency (2 #problems) FPGA throughput (2 #problems) FPGA latency (1 problem) Number of states, n 17 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan For small problems there is no performance improvement. For the largest problem, the improvement is: Red curve: 14x Black curve: 36x Blue curve: 85x 3 inputs 3 outputs 20 steps state and input constraints

23 Filling the pipeline Parallel Multiplexed MPC [1][2] Each thread optimizes over a subset of the m inputs assuming a fixed value for the rest. Effect on the size of the problem: m m 2 #problems Parallel Move Blocking MPC [3] The horizon N is split into blocks Each independent thread solves a problem with different splitting pattern to guarantee recursive feasibility Effect on the size of the problem: N N 2 #problems [1] MPC for Deeply Pipelined FPGA Implementation: Algorithms and Circuitry. In IET Control Theory and Applications [2] Parallel MPC for Real-time FPGA-based Implementation. In Proc. IFAC World Congress Aug [3] Parallel Move Blocking Model Predictive Control. Submitted to Conference on Decision and Control Dec / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

24 Filling the pipeline Other possible strategies: Distributed algorithms Sampling faster than the computational delay Moving horizon estimation 19 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

25 Questions 20 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

A Condensed and Sparse QP Formulation for Predictive Control

211 5th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 211 A Condensed and Sparse QP Formulation for Predictive Control Juan L Jerez,