A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

Size: px

Start display at page:

Download "A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series"

Dana Barber
5 years ago
Views:

1 A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12, V. Ariyarathna 3, D. F. G. Coelho 1, L. Rakai 1, A. Madanayake 3, R. J. Cintra 4 1 ECE Department, University of Calgary, Canada 2 Computer Modelling Group, Canada 3 ECE Department, University of Akron, USA 4 Statistics Department, Universidade Federal de Pernambuco, Brazil July 20, 2017 Coelho and Dimitrov (UofC) July 20, / 21

2 Introduction Problems in many areas require the solution of sets of linear, constant coefficient differential equations in the form: ẋ(t) = Ax(t) = x(t) = exp(ta)x 0 When multiple inputs are used for the same system, it might be advantageous compute the matrix exponential. Coelho and Dimitrov (UofC) July 20, / 21

3 Methods for Matrix Exponential Series expansion: Taylor; Padé; Scaling & Squaring. Newton Interpolation; Cayley-Hamilton method; Eigenvectors decomposition. Coelho and Dimitrov (UofC) July 20, / 21

4 Matrix Exponential Series Expansion The problem can be treated as the evaluation of a polynomial. Existing methods: Horner rule; Estrin method; Binary tree. Coelho and Dimitrov (UofC) July 20, / 21

5 Conventions Let A be a square matrix of size n n. Let p N ( ) be a polynomial of degree N 1 over the real numbers. Let also g N ( ) be a geometric series with N terms. Coelho and Dimitrov (UofC) July 20, / 21

6 Definition The critical path associated with the computation of a matrix polynomial p N (A) is the largest chain of matrix multiplications (MM) in order to evaluate p N (A). Definition (Critical Path for Matrix Polynomial) Horner rule: N 1 MM; Estrin method: 2log 2 (N 1) MM; Binary tree: 2log 2 (N 1) MM. Coelho and Dimitrov (UofC) July 20, / 21

7 Geometric Series Geometric series of matrix arguments can be computed efficiently with the use of different polynomial factorizations. { (I+A g N (A) = 2 ) g N/2 (A 2 ), if N 0 mod 2 I+(A+A 2 ) g (N 1)/2 (A 2 ), if N 1 mod 2. (I+A+A 2 ) g N/3 (A 3 ), if N 0 mod 3, g N (A) = I+(A+A 2 + A 3 ) g (N 1)/3 (A 3 ), if N 1 mod 3, I+A+(A 2 + A 3 + A 4 ) g (N 2)/3 (A 3 ), if N 2 mod 3. In general, the use of basis P demands P log 2 (N) 2 < 2log 2 (N) 2. Coelho and Dimitrov (UofC) July 20, / 21

8 Geometric Series In general, the use of basis P demands P log 2 (N) 2 < 2log 2 (N) 2. Examples: Basis 2: 2log 2 (N) 2; Basis 3: log 2 (N) 2; Basis 5: log 2 (N) 2; Basis 6: log 2 (N) 2; Basis 26: log 2 (N) 2. Coelho and Dimitrov (UofC) July 20, / 21

9 The Matrix Exponential as Several Neumann Series We write the matrix exponential truncated series expansion p N (A) as a linear combination of different geometric series on α k A, k = 0,1,...,N 1: p N (A) = = = N 1 g n+1 (α n A) n=0 ( N 1 n 1 n=0 N 1 n=0 k=0 ( N 1 k=n α k n Ak ) α n k ) A n. Coelho and Dimitrov (UofC) July 20, / 21

10 The Matrix Exponential as Several Neumann Series If the coefficients of p N ( ) are p 0,p 1,...,p N 1, we have he system α 0 +α 1 +α α N 1 = p 0 α 2 1 +α α 2 N 1 = p 1 α a 3 N 1 = p 2. α N N 1 = p N 1. This system has several complex solutions that can be found by back substitution. Coelho and Dimitrov (UofC) July 20, / 21

11 A Numerical Example Small degree polynomials does not require complex solutions. Considering N = 4, we have α 0 = α 1 = α 2 = α 3 = Coelho and Dimitrov (UofC) July 20, / 21

12 Another Numerical Example Table: Calculated coefficients for N = 9. Coefficient Value α α α α α α β β Coelho and Dimitrov (UofC) July 20, / 21

13 A Different Approach If we modify the formulation to we obtain p N (A) = N 1 n=0 g N (α n A) = N 1 n=0 ( N 1 k=0 α n k α 0 +α 1 +α α N 1 = 1 ) A n. α 2 0 +α2 1 +α α2 N 1 = 1 2 α N 1 0 +α N 1 1 +α N α N 1 N 1 = 1 (N 1)!. Coelho and Dimitrov (UofC) July 20, / 21

14 Algorithmic Example for N = 9 Pre Computation: B = A 2, C = B 2, broadcast A, B, C, β 0, and β 1 Processor 0 computes H 9 (A) β 0 A+β 1 B (N 4)I Processor 1 computes g 4 (α 3 A) (I +α 3 A)(I +α 2 3 B) Processor 2 computes g 5 (α 4 A) I+(α 4 A+α 2 4 B)(I +α2 4 B) Processor 3 computes g 6 (α 5 A) (I +α 5 A)(I +α 2 5 B+α4 5 C) Processor 4 computes g 7 (α 6 A) I+(α 6 A+α 2 6 B)(I +α2 6 B+α4 6 C) Processor 5 computes g 8 (α 7 A) (I +α 7 A)(I +α 2 7 B)(I +α4 7 C) Processor 6 computes g 9 (α 8 A) I+(α 8 A+α 2 8 B)(I +α2 8 B)(I +α4 8 C) Return E 9 (A) = 9 n=3 g n+1(α na)+h 9 (A) Figure: Fragment of the algorithm for computing E 9 (A). Coelho and Dimitrov (UofC) July 20, / 21

15 Computing Time Trade-Off in Software 10 0 m= Error Time expm time m= Error Time (s) m= N Figure: Illustration of the accuracy versus computing time trade-off for different values of N and m. Coelho and Dimitrov (UofC) July 20, / 21

16 Hardware Realization H9(A) 4 S/P 4 G4(α3A) 4 a 12 a 11 a 22 a 21 t = 1 t = 0 Re arrange 2 A A P/S 4 G5(α4A) G6(α5A) G7(α6A) Addition block E9(A) A 2 A 2 4 G8(α7A) G9(α8A) 4 4 Figure: Top level view of the implementation of the proposed algorithm. Coelho and Dimitrov (UofC) July 20, / 21

17 Hardware Realization v 22 v 12 v 21 v 11 D D u 12 u 11 w 11 w 12 u 22 u 21 D D t = 1 t = 0 w 21 w 22 Figure: Multiplication block for 2 2 matrices. Coelho and Dimitrov (UofC) July 20, / 21

18 Hardware Realization Results: FPGA Table: Timing and resource consumption comparison for Xilinx xc6vlx240t-ff1156 FPGA Figure of merit Horner s Rule New Algorithm Latency (clock cycles) 16 6 Critical path delay (ns) Slice LUTs used No. of adders No. of multipliers Coelho and Dimitrov (UofC) July 20, / 21

19 Hardware Realization Results: ASIC Figure of merit Table: ASIC synthesis results Horner s New Percentage Rule Algorithm Change T (ns) % Occupied area (A, mm 2 ) Dynamic power (mw/ghz) % % AT (mm 2 ns) % AT 2 (mm 2 ns 2 ) % Max frequency (GHz) Latency (Clock cycles) % % Total gate count % Coelho and Dimitrov (UofC) July 20, / 21

20 Final Comments Advantages: the proposed method reduce critical path; Disadvantages: requires more processors and memory (software); requires more hardware resources such as LUT and gates (hardware). Future works: consider different combinations of Neumann series for different solutions (real solution possible?); consider more matrix functions and general polynomials; provide accurate error analysis. Coelho and Dimitrov (UofC) July 20, / 21

21 Questions? Coelho and Dimitrov (UofC) July 20, / 21

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY On Repeated Squarings in Binary Fields Kimmo Järvinen Helsinki University of Technology August 14, 2009 K. Järvinen On Repeated Squarings in Binary Fields 1/1 Introduction Repeated squaring Repeated squaring: