FPGA Implementation of a Predictive Controller

Similar documents
A Condensed and Sparse QP Formulation for Predictive Control

HPMPC - A new software package with efficient solvers for Model Predictive Control

Towards a Fixed Point QP Solver for Predictive Control

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Model Predictive Control on an FPGA: Aerospace and Space Scenarios

Parallel Move Blocking Model Predictive Control

IMPLICIT generation of a control law through solution of a

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

Algorithms and Methods for Fast Model Predictive Control

2. Accelerated Computations

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph

Efficient random number generation on FPGA-s

Incomplete Cholesky preconditioners that exploit the low-rank property

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Novel Devices and Circuits for Computing

What s New in Active-Set Methods for Nonlinear Optimization?

VLSI Signal Processing

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes

ERLANGEN REGIONAL COMPUTING CENTER

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

A Warm-start Interior-point Method for Predictive Control

Lecture 18: Optimization Programming

CMP 338: Third Class

ABHELSINKI UNIVERSITY OF TECHNOLOGY

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory


An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

Arithmetic Operators for Pairing-Based Cryptography

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Reorganized and Compact DFA for Efficient Regular Expression Matching

Course Notes: Week 1

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

Iterative Methods for Solving A x = b

Hardware Acceleration of the Tate Pairing in Characteristic Three

CORDIC, Divider, Square Root

Distributed and Real-time Predictive Control

Block Structured Preconditioning within an Active-Set Method for Real-Time Optimal Control

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

School of EECS Seoul National University

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

Review: From problem to parallel algorithm

Parallelized Model Predictive Control

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

CIS 371 Computer Organization and Design

Fixed-Point Dual Gradient Projection for Embedded Model Predictive Control

SLIM. University of British Columbia

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality

THE solution of the absolute value equation (AVE) of

GF(2 m ) arithmetic: summary

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Conversion from Linear to Circular Polarization and Stokes Parameters in FPGA. Koyel Das, Alan Roy, Gino Tuccari, Reinhard Keller

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

COVER SHEET: Problem#: Points

Iterative Methods for Linear Systems of Equations

GPU Acceleration of BCP Procedure for SAT Algorithms

Constrained Nonlinear Optimization Algorithms

What is Performance Analysis?

Digital Circuits and Systems

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

LINEAR AND NONLINEAR PROGRAMMING

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes

Fast Model Predictive Control with Soft Constraints

csci 210: Data Structures Program Analysis

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

Fast model predictive control based on linear input/output models and bounded-variable least squares

Construction of a reconfigurable dynamic logic cell

Pipelining and Parallel Processing

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Controlling the level of sparsity in MPC

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

IHS 3: Test of Digital Systems R.Ubar, A. Jutman, H-D. Wuttke

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1

EECS 579: Logic and Fault Simulation. Simulation

Hilbert Transformator IP Cores

Classification of Hand-Written Digits Using Scattering Convolutional Network

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

WITH rapid growth of traditional FPGA industry, heterogeneous

L15: Custom and ASIC VLSI Integration

Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

csci 210: Data Structures Program Analysis

Runtime Model Predictive Verification on Embedded Platforms 1

DELFT UNIVERSITY OF TECHNOLOGY

Transcription:

FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan May 18, 2011 1 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

MPC Problem Formulation Contents Field Programmable Gate Array (FPGA) Algorithms for Quadratic Programming Implementation Details Results Related Work 2 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Optimal control problem subject to min θ x T N Qx N + N 1 k=0 [ xk u k ] T [ Q S S T R ] [ xk u k ] (1) x 0 = x (2a) x k+1 = Ax k + Bu k for k = 0, 1, 2,..., N 1 (2b) Jx k + Eu k d for k = 0, 1, 2,..., N 1 (2c) x k R n, u k R m Goal Accelerate the computation of the optimal value θ such that MPC can be implemented at faster sampling rates 3 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

where Quadratic Programming Formulation 1 min θ 2 θt Hθ subject to F θ = f, Gθ g θ := [x0 T u0 T x1 T u1 T x2 T u2 T... xn 1 T un 1 T xn T ] T R N(n+m)+n, [ ] Q S I H := N S T 0 R, 0 Q I n x A B I n F :=..., f := 0., A B I n 0 G := I N [ J E ], g := d := 1 N d. 4 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

where Quadratic Programming Formulation 1 min θ 2 θt Hθ subject to F θ = f, Gθ g θ := [x0 T u0 T x1 T u1 T x2 T u2 T... xn 1 T un 1 T xn T ] T R N(n+m)+n, [ ] Q S I RESULT H := N S T 0 R, DATA 0 Q I n x A B I n F :=..., f := 0., A B I n 0 G := I N [ J E ], g := d := 1 N d. 4 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Reconfigurable logic blocks Reconfigurable interconnect Other reconfigurable hard blocks What is an FPGA? On-chip memories Embedded multipliers Advantages for embedded real-time applications Deterministic execution time Computational/Energy efficiency Much reduced low volume cost compared to ASIC Disadvantages Clock frequency < 350MHz Hardware design process 5 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Is MPC suitable for FPGA computation? Cycle accurate completion guarantee No jitter Compute-bound application O(n + m) 3 compute operations O(n + m) I/O operations Fixed-point computation is faster and uses less resources 7 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Algorithms for Quadratic Programming Active-Set methods Worst-case exponential complexity Varying matrix structure Interior-Point methods Polynomial complexity Predictable matrix structure S. Mehrotra: Solves two systems of linear equations every iteration S. Wright [1]: Solves one system of linear equations [1] Applying new optimization algorithms to model predictive control. In Proc. Int. Conf. Chemical Process Control, Jan 1996. 8 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Why iterative linear solvers? Small number of division operations Matrix vector multiplications Easy to parallelise Trade off between computation time and accuracy Conserve matrix structure (no fill-in) Allows exploiting fine structure to reduce memory requirements Examples Conjugate Gradient (CG) for SPD matrices Minimum Residual (MINRES) for indefinite symmetric matrices 9 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Infeasible Primal-Dual Interior-Point algorithm Initialization (θ 0, ν 0, λ 0, s 0) with [λ T 0 s T 0 ] T > 0 for k = 0 to I IP 1 do [ H + G T W Linearization A k := k G F T F 0 [ (H + G T W b k := k G)θ k F T ν G T (λ k W k g + σµs 1 k ) F θ k + f [ ] θk Solve A k z k = b k for z k =: ν k Compute λ k := W k (G(θ k + θ k ) g + σµs 1 k ) s k := s k λ k [ ] λk + α λ Line Search α k := max (0,1] α : k > 0. s k + α s k Update (θ k+1, ν k+1, λ k+1, s k+1 ) := (θ k, ν k, λ k, s k ) + α k ( θ k, ν k, λ k, s k ) end for 10 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan ], ]

Coefficient Matrix A k After variable re-ordering: I I Q 0 S A T S T R 0 B T A B I I Q 1 S A T S T R 1 B T A B I... I Q N 1 S A T S T R N 1 B T A B I I Q N Banded Size Symmetric Halfband Indefinite 11 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Z := N(2n + m) + 2n M := 2n + m

Coefficient Matrix A k After variable re-ordering: I I Q 0 S A T S T R 0 B T A B I I Q 1 S A T S T R 1 B T A B I... I Q N 1 S A T S T R N 1 B T A B I I Q N Banded Size Symmetric Halfband Indefinite 11 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Z := N(2n + m) + 2n M := 2n + m

Matrix storage 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 Columns of symmetric CDS matrix are stored in separate on-chip memories In-band zeros and ones do not need to be stored Constant columns consist of repeated blocks and are constant for all problems being solved simultaneously 12 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Matrix storage 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 Columns of symmetric CDS matrix are stored in separate on-chip memories In-band zeros and ones do not need to be stored Constant columns consist of repeated blocks and are constant for all problems being solved simultaneously 12 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Reduction in storage requirements 13 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

MINRES implementation Hardware architecture for computing Aq i RAMcolumn1 RAMcolumnM-1 RAMcolumnM Z -(M-1) Z -(M-2) vector x x x x 1 2 M 2M-2 x2m-1 + + + + log2(2m-1) latency = 2Z + M + k 1 log 2 (2M 1) + k 2 throughput = Z #problems = 2Z+M+k1 log 2 (2M 1) +k 2 Z + Z 3 q 1 = b, β 1 = q 1 2 for k = 1 to I MR do q i = q i β i z = Aq i α = qi T z q i+1 = z αq i β i q i 1 β i+1 = q i+1 2. γ i+1 = δ ρ 1 σ i+1 = β i+1 ρ 1 w i = q i ρ 3w i 2 ρ 2w i 1 ρ 1 x i = x i 1 + γ i+1 ηw i η = σ i+1 η end for 14 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

QP solver design overview maximise throughput: latency IP = 2 latency Stage2 (solves 2 #problems) For large problems, a sequential implementation of Stage 1 is sufficient for latency Stage1 < latency Stage2 minimise latency: latency IP = latency Stage1 + latency Stage2 (solves 1 problem) 15 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Number of free parallel channels 25 Number of parallel channels 20 15 10 5 0 5 10 15 20 Number of states (n) 25 30 0 5 10 15 20 Number of inputs (m) 25 30 [1] An FPGA Implementation of a Sparse Quadratic Programming Solver for Constrained Predictive Control. In Proc. ACM/SIGDA Symposium on Field Programmable Gate Arrays. Mar 2011. 16 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Performance Hardware : Xilinx Virtex 6 SX 475T @ 250MHz (40nm) Software : Intel Core2 Q8300 @ 2.5GHz, 3GB RAM, 4MB L2 Cache (45nm) Time per interior point iteration, seconds 10 1 10 0 CPU measured FPGA latency (2 #problems) FPGA throughput (2 #problems) FPGA latency (1 problem) 10 1 10 2 10 3 10 4 10 0 10 1 Number of states, n 17 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan For small problems there is no performance improvement. For the largest problem, the improvement is: Red curve: 14x Black curve: 36x Blue curve: 85x 3 inputs 3 outputs 20 steps state and input constraints

Filling the pipeline Parallel Multiplexed MPC [1][2] Each thread optimizes over a subset of the m inputs assuming a fixed value for the rest. Effect on the size of the problem: m m 2 #problems Parallel Move Blocking MPC [3] The horizon N is split into blocks Each independent thread solves a problem with different splitting pattern to guarantee recursive feasibility Effect on the size of the problem: N N 2 #problems [1] MPC for Deeply Pipelined FPGA Implementation: Algorithms and Circuitry. In IET Control Theory and Applications 2011. [2] Parallel MPC for Real-time FPGA-based Implementation. In Proc. IFAC World Congress Aug 2011. [3] Parallel Move Blocking Model Predictive Control. Submitted to Conference on Decision and Control Dec 2011. 18 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Filling the pipeline Other possible strategies: Distributed algorithms Sampling faster than the computational delay Moving horizon estimation 19 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

Questions 20 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan