FPGA Implementation of a Predictive Controller

Size: px
Start display at page:

Download "FPGA Implementation of a Predictive Controller"

Transcription

1 FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan May 18, / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

2 MPC Problem Formulation Contents Field Programmable Gate Array (FPGA) Algorithms for Quadratic Programming Implementation Details Results Related Work 2 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

3 Optimal control problem subject to min θ x T N Qx N + N 1 k=0 [ xk u k ] T [ Q S S T R ] [ xk u k ] (1) x 0 = x (2a) x k+1 = Ax k + Bu k for k = 0, 1, 2,..., N 1 (2b) Jx k + Eu k d for k = 0, 1, 2,..., N 1 (2c) x k R n, u k R m Goal Accelerate the computation of the optimal value θ such that MPC can be implemented at faster sampling rates 3 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

4 where Quadratic Programming Formulation 1 min θ 2 θt Hθ subject to F θ = f, Gθ g θ := [x0 T u0 T x1 T u1 T x2 T u2 T... xn 1 T un 1 T xn T ] T R N(n+m)+n, [ ] Q S I H := N S T 0 R, 0 Q I n x A B I n F :=..., f := 0., A B I n 0 G := I N [ J E ], g := d := 1 N d. 4 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

5 where Quadratic Programming Formulation 1 min θ 2 θt Hθ subject to F θ = f, Gθ g θ := [x0 T u0 T x1 T u1 T x2 T u2 T... xn 1 T un 1 T xn T ] T R N(n+m)+n, [ ] Q S I RESULT H := N S T 0 R, DATA 0 Q I n x A B I n F :=..., f := 0., A B I n 0 G := I N [ J E ], g := d := 1 N d. 4 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

6 Reconfigurable logic blocks Reconfigurable interconnect Other reconfigurable hard blocks What is an FPGA? On-chip memories Embedded multipliers Advantages for embedded real-time applications Deterministic execution time Computational/Energy efficiency Much reduced low volume cost compared to ASIC Disadvantages Clock frequency < 350MHz Hardware design process 5 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

7 Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

8 Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

9 Is MPC suitable for FPGA computation? Parallelisation opportunities Level 2 BLAS operations Deep pipelining is necessary to maintain high clock frequency 6 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

10 Is MPC suitable for FPGA computation? Cycle accurate completion guarantee No jitter Compute-bound application O(n + m) 3 compute operations O(n + m) I/O operations Fixed-point computation is faster and uses less resources 7 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

11 Algorithms for Quadratic Programming Active-Set methods Worst-case exponential complexity Varying matrix structure Interior-Point methods Polynomial complexity Predictable matrix structure S. Mehrotra: Solves two systems of linear equations every iteration S. Wright [1]: Solves one system of linear equations [1] Applying new optimization algorithms to model predictive control. In Proc. Int. Conf. Chemical Process Control, Jan / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

12 Why iterative linear solvers? Small number of division operations Matrix vector multiplications Easy to parallelise Trade off between computation time and accuracy Conserve matrix structure (no fill-in) Allows exploiting fine structure to reduce memory requirements Examples Conjugate Gradient (CG) for SPD matrices Minimum Residual (MINRES) for indefinite symmetric matrices 9 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

13 Infeasible Primal-Dual Interior-Point algorithm Initialization (θ 0, ν 0, λ 0, s 0) with [λ T 0 s T 0 ] T > 0 for k = 0 to I IP 1 do [ H + G T W Linearization A k := k G F T F 0 [ (H + G T W b k := k G)θ k F T ν G T (λ k W k g + σµs 1 k ) F θ k + f [ ] θk Solve A k z k = b k for z k =: ν k Compute λ k := W k (G(θ k + θ k ) g + σµs 1 k ) s k := s k λ k [ ] λk + α λ Line Search α k := max (0,1] α : k > 0. s k + α s k Update (θ k+1, ν k+1, λ k+1, s k+1 ) := (θ k, ν k, λ k, s k ) + α k ( θ k, ν k, λ k, s k ) end for 10 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan ], ]

14 Coefficient Matrix A k After variable re-ordering: I I Q 0 S A T S T R 0 B T A B I I Q 1 S A T S T R 1 B T A B I... I Q N 1 S A T S T R N 1 B T A B I I Q N Banded Size Symmetric Halfband Indefinite 11 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Z := N(2n + m) + 2n M := 2n + m

15 Coefficient Matrix A k After variable re-ordering: I I Q 0 S A T S T R 0 B T A B I I Q 1 S A T S T R 1 B T A B I... I Q N 1 S A T S T R N 1 B T A B I I Q N Banded Size Symmetric Halfband Indefinite 11 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Z := N(2n + m) + 2n M := 2n + m

16 Matrix storage Columns of symmetric CDS matrix are stored in separate on-chip memories In-band zeros and ones do not need to be stored Constant columns consist of repeated blocks and are constant for all problems being solved simultaneously 12 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

17 Matrix storage Columns of symmetric CDS matrix are stored in separate on-chip memories In-band zeros and ones do not need to be stored Constant columns consist of repeated blocks and are constant for all problems being solved simultaneously 12 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

18 Reduction in storage requirements 13 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

19 MINRES implementation Hardware architecture for computing Aq i RAMcolumn1 RAMcolumnM-1 RAMcolumnM Z -(M-1) Z -(M-2) vector x x x x 1 2 M 2M-2 x2m log2(2m-1) latency = 2Z + M + k 1 log 2 (2M 1) + k 2 throughput = Z #problems = 2Z+M+k1 log 2 (2M 1) +k 2 Z + Z 3 q 1 = b, β 1 = q 1 2 for k = 1 to I MR do q i = q i β i z = Aq i α = qi T z q i+1 = z αq i β i q i 1 β i+1 = q i+1 2. γ i+1 = δ ρ 1 σ i+1 = β i+1 ρ 1 w i = q i ρ 3w i 2 ρ 2w i 1 ρ 1 x i = x i 1 + γ i+1 ηw i η = σ i+1 η end for 14 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

20 QP solver design overview maximise throughput: latency IP = 2 latency Stage2 (solves 2 #problems) For large problems, a sequential implementation of Stage 1 is sufficient for latency Stage1 < latency Stage2 minimise latency: latency IP = latency Stage1 + latency Stage2 (solves 1 problem) 15 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

21 Number of free parallel channels 25 Number of parallel channels Number of states (n) Number of inputs (m) [1] An FPGA Implementation of a Sparse Quadratic Programming Solver for Constrained Predictive Control. In Proc. ACM/SIGDA Symposium on Field Programmable Gate Arrays. Mar / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

22 Performance Hardware : Xilinx Virtex 6 SX 250MHz (40nm) Software : Intel Core2 2.5GHz, 3GB RAM, 4MB L2 Cache (45nm) Time per interior point iteration, seconds CPU measured FPGA latency (2 #problems) FPGA throughput (2 #problems) FPGA latency (1 problem) Number of states, n 17 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan For small problems there is no performance improvement. For the largest problem, the improvement is: Red curve: 14x Black curve: 36x Blue curve: 85x 3 inputs 3 outputs 20 steps state and input constraints

23 Filling the pipeline Parallel Multiplexed MPC [1][2] Each thread optimizes over a subset of the m inputs assuming a fixed value for the rest. Effect on the size of the problem: m m 2 #problems Parallel Move Blocking MPC [3] The horizon N is split into blocks Each independent thread solves a problem with different splitting pattern to guarantee recursive feasibility Effect on the size of the problem: N N 2 #problems [1] MPC for Deeply Pipelined FPGA Implementation: Algorithms and Circuitry. In IET Control Theory and Applications [2] Parallel MPC for Real-time FPGA-based Implementation. In Proc. IFAC World Congress Aug [3] Parallel Move Blocking Model Predictive Control. Submitted to Conference on Decision and Control Dec / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

24 Filling the pipeline Other possible strategies: Distributed algorithms Sampling faster than the computational delay Moving horizon estimation 19 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

25 Questions 20 / 20 Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

A Condensed and Sparse QP Formulation for Predictive Control

A Condensed and Sparse QP Formulation for Predictive Control 211 5th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 211 A Condensed and Sparse QP Formulation for Predictive Control Juan L Jerez,

More information

HPMPC - A new software package with efficient solvers for Model Predictive Control

HPMPC - A new software package with efficient solvers for Model Predictive Control - A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model

More information

Towards a Fixed Point QP Solver for Predictive Control

Towards a Fixed Point QP Solver for Predictive Control 5st IEEE Conference on Decision and Control December -3,. Maui, Hawaii, USA Towards a Fixed Point QP Solver for Predictive Control Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan Abstract

More information

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,

More information

Model Predictive Control on an FPGA: Aerospace and Space Scenarios

Model Predictive Control on an FPGA: Aerospace and Space Scenarios Model Predictive Control on an FPGA: Aerospace and Space Scenarios Edward Hartley (edward.hartley@eng.cam.ac.uk) Workshop on Embedded Optimisation EMBOPT 2014, IMT Lucca Monday 8th September 2014: 14:00

More information

Parallel Move Blocking Model Predictive Control

Parallel Move Blocking Model Predictive Control Parallel Move Blocking Model Predictive Control Stefano Longo, Eric C. Kerrigan, Keck Voon Ling and George A. Constantinides Abstract This paper proposes the use of parallel computing architectures (multi-core,

More information

IMPLICIT generation of a control law through solution of a

IMPLICIT generation of a control law through solution of a 2013 European Control Conference (ECC) July 17-19, 2013, Zürich, Switzerland. Predictive Control for Spacecraft Rendezvous in an Elliptical Orbit using an FPGA Edward N. Hartley and Jan M. Maciejowski

More information

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem Abid Rafique, Nachiket Kapre, and George A. Constantinides Electrical and Electronic Engineering,

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

Algorithms and Methods for Fast Model Predictive Control

Algorithms and Methods for Fast Model Predictive Control Algorithms and Methods for Fast Model Predictive Control Technical University of Denmark Department of Applied Mathematics and Computer Science 13 April 2016 Background: Model Predictive Control Model

More information

2. Accelerated Computations

2. Accelerated Computations 2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message

More information

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph Efficient robust optimization for robust control with constraints p. 1 Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph Efficient robust optimization

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &

More information

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan University of Texas at Austin wuxili@utexas.edu November 14, 2017 UT DA Wuxi Li

More information

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic

More information

Novel Devices and Circuits for Computing

Novel Devices and Circuits for Computing Novel Devices and Circuits for Computing UCSB 594BB Winter 2013 Lecture 4: Resistive switching: Logic Class Outline Material Implication logic Stochastic computing Reconfigurable logic Material Implication

More information

What s New in Active-Set Methods for Nonlinear Optimization?

What s New in Active-Set Methods for Nonlinear Optimization? What s New in Active-Set Methods for Nonlinear Optimization? Philip E. Gill Advances in Numerical Computation, Manchester University, July 5, 2011 A Workshop in Honor of Sven Hammarling UCSD Center for

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data

More information

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel Thuerck 1,2 (advisors Michael Goesele 1,2 and Marc Pfetsch 1 ) Maxim Naumov 3 1 Graduate School of Computational Engineering, TU Darmstadt

More information

Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes

Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes Stefan Scholl, Norbert Wehn Microelectronic Systems Design Research Group TU Kaiserslautern, Germany Overview Soft decoding decoding

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

A Warm-start Interior-point Method for Predictive Control

A Warm-start Interior-point Method for Predictive Control A Warm-start Interior-point Method for Predictive Control Amir Shahzad Eric C Kerrigan George A Constantinides Department of Electrical and Electronic Engineering, Imperial College London, SW7 2AZ, UK

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY On Repeated Squarings in Binary Fields Kimmo Järvinen Helsinki University of Technology August 14, 2009 K. Järvinen On Repeated Squarings in Binary Fields 1/1 Introduction Repeated squaring Repeated squaring:

More information

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

Reorganized and Compact DFA for Efficient Regular Expression Matching

Reorganized and Compact DFA for Efficient Regular Expression Matching Reorganized and Compact DFA for Efficient Regular Expression Matching Kai Wang 1,2, Yaxuan Qi 1,2, Yibo Xue 2,3, Jun Li 2,3 1 Department of Automation, Tsinghua University, Beijing, China 2 Research Institute

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power

BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Hardware Acceleration of the Tate Pairing in Characteristic Three

Hardware Acceleration of the Tate Pairing in Characteristic Three Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)

More information

CORDIC, Divider, Square Root

CORDIC, Divider, Square Root 4// EE6B: VLSI Signal Processing CORDIC, Divider, Square Root Prof. Dejan Marković ee6b@gmail.com Iterative algorithms CORDIC Division Square root Lecture Overview Topics covered include Algorithms and

More information

Distributed and Real-time Predictive Control

Distributed and Real-time Predictive Control Distributed and Real-time Predictive Control Melanie Zeilinger Christian Conte (ETH) Alexander Domahidi (ETH) Ye Pu (EPFL) Colin Jones (EPFL) Challenges in modern control systems Power system: - Frequency

More information

Block Structured Preconditioning within an Active-Set Method for Real-Time Optimal Control

Block Structured Preconditioning within an Active-Set Method for Real-Time Optimal Control MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Block Structured Preconditioning within an Active-Set Method for Real-Time Optimal Control Quirynen, R.; Knyazev, A.; Di Cairano, S. TR2018-081

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH

More information

School of EECS Seoul National University

School of EECS Seoul National University 4!4 07$ 8902808 3 School of EECS Seoul National University Introduction Low power design 3974/:.9 43 Increasing demand on performance and integrity of VLSI circuits Popularity of portable devices Low power

More information

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12, V. Ariyarathna 3, D. F. G. Coelho 1, L. Rakai 1, A. Madanayake 3, R. J. Cintra 4 1 ECE Department,

More information

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm An Optimized Hardware Architecture of Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, and Tarek El-Ghazawi 1 1 The George Washington University, Washington, DC 20052,

More information

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft vishy@purdue.edu October 9, 2012 S.V. N. Vishwanathan (Purdue,

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Parallelized Model Predictive Control

Parallelized Model Predictive Control Parallelized Model Predictive Control The MIT Faculty has made this article openly available Please share how this access benefits you Your story matters Citation As Published Publisher Soudbakhsh, Damoon

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by

More information

Fixed-Point Dual Gradient Projection for Embedded Model Predictive Control

Fixed-Point Dual Gradient Projection for Embedded Model Predictive Control 2013 European Control Conference ECC) July 17-19, 2013, Zürich, Switzerland. Fixed-Point Dual Gradient Projection for Embedded Model Predictive Control Panagiotis Patrinos, Alberto Guiggiani, Alberto Bemporad

More information

SLIM. University of British Columbia

SLIM. University of British Columbia Accelerating an Iterative Helmholtz Solver Using Reconfigurable Hardware Art Petrenko M.Sc. Defence, April 9, 2014 Seismic Laboratory for Imaging and Modelling Department of Earth, Ocean and Atmospheric

More information

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality Fast ADMM for Sum of Squares Programs Using Partial Orthogonality Antonis Papachristodoulou Department of Engineering Science University of Oxford www.eng.ox.ac.uk/control/sysos antonis@eng.ox.ac.uk with

More information

THE solution of the absolute value equation (AVE) of

THE solution of the absolute value equation (AVE) of The nonlinear HSS-like iterative method for absolute value equations Mu-Zheng Zhu Member, IAENG, and Ya-E Qi arxiv:1403.7013v4 [math.na] 2 Jan 2018 Abstract Salkuyeh proposed the Picard-HSS iteration method

More information

GF(2 m ) arithmetic: summary

GF(2 m ) arithmetic: summary GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Conversion from Linear to Circular Polarization and Stokes Parameters in FPGA. Koyel Das, Alan Roy, Gino Tuccari, Reinhard Keller

Conversion from Linear to Circular Polarization and Stokes Parameters in FPGA. Koyel Das, Alan Roy, Gino Tuccari, Reinhard Keller Conversion from Linear to Circular Polarization and Stokes Parameters in FPGA Koyel Das, Alan Roy, Gino Tuccari, Reinhard Keller Purpose 1. Conventionally, for the formation of circular polarization, analogue

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

GPU Acceleration of BCP Procedure for SAT Algorithms

GPU Acceleration of BCP Procedure for SAT Algorithms GPU Acceleration of BCP Procedure for SAT Algorithms Hironori Fujii 1 and Noriyuki Fujimoto 1 1 Graduate School of Science Osaka Prefecture University 1-1 Gakuencho, Nakaku, Sakai, Osaka 599-8531, Japan

More information

Constrained Nonlinear Optimization Algorithms

Constrained Nonlinear Optimization Algorithms Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016

More information

What is Performance Analysis?

What is Performance Analysis? 1.2 Basic Concepts What is Performance Analysis? Performance Analysis Space Complexity: - the amount of memory space used by the algorithm Time Complexity - the amount of computing time used by the algorithm

More information

Digital Circuits and Systems

Digital Circuits and Systems EE201: Digital Circuits and Systems 4 Sequential Circuits page 1 of 11 EE201: Digital Circuits and Systems Section 4 Sequential Circuits 4.1 Overview of Sequential Circuits: Definition The circuit whose

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

LINEAR AND NONLINEAR PROGRAMMING

LINEAR AND NONLINEAR PROGRAMMING LINEAR AND NONLINEAR PROGRAMMING Stephen G. Nash and Ariela Sofer George Mason University The McGraw-Hill Companies, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes Wen Wang 1, Jakub Szefer 1, and Ruben Niederhagen 2 1. Yale University, USA 2. Fraunhofer Institute SIT, Germany April 9, 2018 PQCrypto 2018

More information

Fast Model Predictive Control with Soft Constraints

Fast Model Predictive Control with Soft Constraints European Control Conference (ECC) July 7-9,, Zürich, Switzerland. Fast Model Predictive Control with Soft Constraints Arthur Richards Department of Aerospace Engineering, University of Bristol Queens Building,

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis 1 Summary Summary analysis of algorithms asymptotic analysis big-o big-omega big-theta asymptotic notation commonly used functions discrete math refresher READING:

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010 A FPGA Implementation of Large Restricted Boltzmann Machines by Charles Lo Supervisor: Paul Chow April 2010 Abstract A FPGA Implementation of Large Restricted Boltzmann Machines Charles Lo Engineering

More information

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA Efficient Polynomial Evaluation Algorithm and Implementation on FPGA by Simin Xu School of Computer Engineering A thesis submitted to Nanyang Technological University in partial fullfillment of the requirements

More information

Fast model predictive control based on linear input/output models and bounded-variable least squares

Fast model predictive control based on linear input/output models and bounded-variable least squares 7 IEEE 56th Annual Conference on Decision and Control (CDC) December -5, 7, Melbourne, Australia Fast model predictive control based on linear input/output models and bounded-variable least squares Nilay

More information

Construction of a reconfigurable dynamic logic cell

Construction of a reconfigurable dynamic logic cell PRAMANA c Indian Academy of Sciences Vol. 64, No. 3 journal of March 2005 physics pp. 433 441 Construction of a reconfigurable dynamic logic cell K MURALI 1, SUDESHNA SINHA 2 and WILLIAM L DITTO 3 1 Department

More information

Pipelining and Parallel Processing

Pipelining and Parallel Processing Pipelining and Parallel Processing Pipelining ---reduction in the critical path increase the clock speed, or reduce power consumption at same speed Parallel Processing ---multiple outputs are computed

More information

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Janus: FPGA Based System for Scientific Computing Filippo Mantovani Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

Controlling the level of sparsity in MPC

Controlling the level of sparsity in MPC Controlling the level of sparsity in MPC Daniel Axehill Linköping University Post Print N.B.: When citing this work, cite the original article. Original Publication: Daniel Axehill. Controlling the level

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

IHS 3: Test of Digital Systems R.Ubar, A. Jutman, H-D. Wuttke

IHS 3: Test of Digital Systems R.Ubar, A. Jutman, H-D. Wuttke IHS 3: Test of Digital Systems R.Ubar, A. Jutman, H-D. Wuttke Integrierte Hard- und Softwaresysteme RT-Level Design data path and control path on RT-level RT level simulation Functional units (F1,..,F4)

More information

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m ) A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,

More information

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1 ECC for NAND Flash Osso Vahabzadeh TexasLDPC Inc. 1 Overview Why Is Error Correction Needed in Flash Memories? Error Correction Codes Fundamentals Low-Density Parity-Check (LDPC) Codes LDPC Encoding and

More information

EECS 579: Logic and Fault Simulation. Simulation

EECS 579: Logic and Fault Simulation. Simulation EECS 579: Logic and Fault Simulation Simulation: Use of computer software models to verify correctness Fault Simulation: Use of simulation for fault analysis and ATPG Circuit description Input data for

More information

Hilbert Transformator IP Cores

Hilbert Transformator IP Cores Introduction Hilbert Transformator IP Cores Martin Kumm December 27, 28 The Hilbert Transform is an important component in communication systems, e.g. for single sideband modulation/demodulation, amplitude

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

WITH rapid growth of traditional FPGA industry, heterogeneous

WITH rapid growth of traditional FPGA industry, heterogeneous INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2012, VOL. 58, NO. 1, PP. 15 20 Manuscript received December 31, 2011; revised March 2012. DOI: 10.2478/v10177-012-0002-x Input Variable Partitioning

More information

L15: Custom and ASIC VLSI Integration

L15: Custom and ASIC VLSI Integration L15: Custom and ASIC VLSI Integration Average Cost of one transistor 10 1 0.1 0.01 0.001 0.0001 0.00001 $ 0.000001 Gordon Moore, Keynote Presentation at ISSCC 2003 0.0000001 '68 '70 '72 '74 '76 '78 '80

More information

Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator

Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator Chalermpol Saiprasert A thesis submitted for the degree of Doctor of Philosophy in Electrical and Electronic Engineering

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis Summary Topics commonly used functions analysis of algorithms experimental asymptotic notation asymptotic analysis big-o big-omega big-theta READING: GT textbook

More information

Runtime Model Predictive Verification on Embedded Platforms 1

Runtime Model Predictive Verification on Embedded Platforms 1 Runtime Model Predictive Verification on Embedded Platforms 1 Pei Zhang, Jianwen Li, Joseph Zambreno, Phillip H. Jones, Kristin Yvonne Rozier Presenter: Pei Zhang Iowa State University peizhang@iastate.edu

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information