A Compiler for Linear Algebra Operations

Size: px

Start display at page:

Download "A Compiler for Linear Algebra Operations"

Todd Park
6 years ago
Views:

1 A Compiler for Linear Algebra Operations Paolo Bientinesi In collaboration with Diego Fabregat AICES, RWTH Aachen CScADS Autotuning Workshop 2012 August 13-14, 2012 Snowbird, Utah Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

2 1 Objective 2 Automation: CLAK 3 Algorithms and results 4 Conclusions & future work Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

3 int main( ) { double vec[] = {.0, 0.1, 1.2, 3.3, -4.0, -55,.66, 7}; int i; for( i=0; i<6; i++ ) printf(" array[%d]=%g\n", i, vec[i] ); } return( 0 ); Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

4 int main( ) { double vec[] = {.0, 0.1, 1.2, 3.3, -4.0, -55,.66, 7}; } int i; for( i=0; i<6; i++ ) printf(" array[%d]=%g\n", i, vec[i] ); return( 0 ); [...] L3: movl -4(%rbp), %eax cltq movsd -96(%rbp,%rax,8), %xmm0 movl -4(%rbp), %esi leaq LC10(%rip), %rdi movl $1, %eax call _printf incl -4(%rbp) L2: cmpl $9, -4(%rbp) jle L3 movl $0, %eax leave ret Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

5 M := L 1 AL T Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

6 M := L 1 AL T >> M = inv(l) * A * inv(l ); Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

7 M := L 1 AL T >> M = inv(l) * A * inv(l ); >> M == M ans = Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

8 M := L 1 AL T >> M = L \ A / L ; Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

9 M := L 1 AL T >> M = L \ A / L ; >> M == M ans = Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

10 M := L 1 AL T >> M = L \ A / L ; >> M = tril(m,-1) + tril(m,-1) + diag(real(diag(m))); Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

11 M := L 1 AL T >> M = L \ A / L ; >> M = tril(m,-1) + tril(m,-1) + diag(real(diag(m))); >> M == M ans = Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

12 Domain-specific Compiler Input: M := L^-1 * A * L^-T; Input(A, L); Output(M); Hermitian(A); LowerTriangular(L); Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

13 Domain-specific Compiler Input: M := L^-1 * A * L^-T; Input(A, L); Output(M); Hermitian(A); LowerTriangular(L); Objectives Express M in terms of available building blocks (BLAS, LAPACK,... ) Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

14 Domain-specific Compiler Input: M := L^-1 * A * L^-T; Input(A, L); Output(M); Hermitian(A); LowerTriangular(L); Objectives Express M in terms of available building blocks (BLAS, LAPACK,... ) High-performance Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

15 Domain-specific Compiler Input: M := L^-1 * A * L^-T; Input(A, L); Output(M); Hermitian(A); LowerTriangular(L); Objectives Express M in terms of available building blocks (BLAS, LAPACK,... ) High-performance Exploit problem-specific knowledge Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

16 Domain-specific Compiler Input: M := L^-1 * A * L^-T; Input(A, L); Output(M); Hermitian(A); LowerTriangular(L); Objectives Express M in terms of available building blocks (BLAS, LAPACK,... ) High-performance Exploit problem-specific knowledge One vs. many algorithms Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

17 What this is talk is NOT about {LU = A, A, L, U R n n, LowerTriUni(L), UpperTri(U)} ( ) AT L A T R Partition A A BL A BR where A T L is 0 0 While size(a T L ) < size(a) do Repartition ( ) A AT L A 00 A 01 A 02 T R A A BL A 10 A 11 A 12 BR A 20 A 21 A 22 A 01 := L 1 00 A01 A 10 := A 10U 1 00 A 11 := LU(A 11 A 10A 01) Continue ( ) A AT L A 00 A 01 A 02 T R A 10 A 11 A 12 A BL A BR A 20 A 21 A 22 endwhile Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

18 What this is talk is NOT about ( AT L A T R Partition A A BL A BR where A T L is 0 0 While size(a T L ) < size(a) do Repartition ( ) A {LU = A, AT L A 00 A 01 A 02 T R A A, L, U R n n A BL A 10 A 11 A 12 BR, A 20 A 21 A 22 LowerTriUni(L), A 01 := L 1 00 A01 UpperTri(U)} A 10 := A 10U 1 00 A 11 := LU(A 11 A 10A 01) Continue ( ) A AT L A 00 A 01 A 02 T R A 10 A 11 A 12 A BL A BR A 20 A 21 A 22 endwhile ) Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

19 Target operations b := ( X T M 1 X ) 1 X T M 1 y Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

20 Target operations b := ( X T M 1 X ) 1 X T M 1 y d (Ax = b) dv =? Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

21 Target operations b := ( X T M 1 X ) 1 X T M 1 y d (Ax = b) dv =? { b := ( X T M 1 X ) 1 X T M 1 y M := hφ + (1 h)i Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

22 Target operations b := ( X T M 1 X ) 1 X T M 1 y d (Ax = b) dv =? { b := ( X T M 1 X ) 1 X T M 1 y M := hφ + (1 h)i Ax i = y i with i = 1... n Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

23 Example Linear regression with non-independent outcomes b := ( X T M 1 X ) 1 X T M 1 y Genome-wide association study Inputs M R n n, SP D(M), n [10 3,..., 10 4 ] X R n p, p [1,..., 20] y R n Output b R p To be repeated thousands of times Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

24 Algorithm: full breakdown b := ( X T M 1 X ) 1 X T M 1 y Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

25 Algorithm: full breakdown b := ( X T M 1 X ) 1 X T M 1 y 1 LL T = M Chol Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

26 Algorithm: full breakdown b := ( X T (LL T ) 1 X ) 1 X T (LL T ) 1 y 1 LL T = M Chol Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

27 Algorithm: full breakdown b := ( X T L T L 1 X ) 1 X T L T L 1 y 1 LL T = M Chol Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

28 Algorithm: full breakdown b := ( X T L T L 1 X ) 1 X T L T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

29 Algorithm: full breakdown b := ( X T X ) 1 X T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

30 Algorithm: full breakdown b := ( X T X ) 1 X T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 y L 1 y TRSV Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

31 Algorithm: full breakdown b := ( X T X ) 1 X T y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 y L 1 y TRSV Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

32 Algorithm: full breakdown b := ( X T X ) 1 X T y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 y L 1 y TRSV 4 b (X T X) 1 X T y DGELS Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

33 Algorithm: full breakdown b := ( X T X ) 1 X T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

34 Algorithm: full breakdown b := ( X T X ) 1 X T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

35 Algorithm: full breakdown b := ( R T Q T Q R ) 1 R T Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

36 Algorithm: full breakdown b := ( R T ( Q T Q ) R ) 1 R T Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

37 Algorithm: full breakdown b := ( R T R ) 1 R T Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

38 Algorithm: full breakdown b := R 1 R T R T Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

39 Algorithm: full breakdown b := R 1 ( R T R T ) Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

40 Algorithm: full breakdown b := R 1 Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

41 Algorithm: full breakdown b := R 1 Q T L 1 y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact 4 y L 1 y TRSV Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

42 Algorithm: full breakdown b := R 1 Q T y 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact 4 y L 1 y TRSV 5 b Q T y GEMV Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

43 Algorithm: full breakdown b := R 1 b 1 LL T = M Chol Fact 2 X T X T L T TRSM 3 QR = X QR Fact 4 y L 1 y TRSV 5 b Q T y GEMV 6 b R 1 b TRSV Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

44 1 Objective 2 Automation: CLAK 3 Algorithms and results 4 Conclusions & future work Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

45 CLAK Domain-specific compiler Symbolic system (Mathematica) Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

46 CLAK Domain-specific compiler Symbolic system (Mathematica) Pattern matching & rewrite rules Constructive: no exhaustive search Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

47 CLAK Domain-specific compiler Symbolic system (Mathematica) Pattern matching & rewrite rules Constructive: no exhaustive search Knowledge: 1) encoded + 2) derived Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

48 CLAK Domain-specific compiler Symbolic system (Mathematica) Pattern matching & rewrite rules Constructive: no exhaustive search Knowledge: 1) encoded + 2) derived Matrix algebra Sequences dependencies Inference of properties Cost analysis Building blocks Code generation Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

49 1) Knowledge encoded Operand types: scalars, vectors, matrices Size: (1 m) (1 n) Properties: full rank, orthogonal, symmetric, triangular, SPD, diagonal, identity,... Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

50 1) Knowledge encoded Operand types: scalars, vectors, matrices Size: (1 m) (1 n) Properties: full rank, orthogonal, symmetric, triangular, SPD, diagonal, identity,... Linear Algebra operators: +, -,, T, 1 Linear Algebra properties: precedence, associativity, commutativity, distributivity of transposition and inversion... Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

51 1) Knowledge encoded Operand types: scalars, vectors, matrices Size: (1 m) (1 n) Properties: full rank, orthogonal, symmetric, triangular, SPD, diagonal, identity,... Linear Algebra operators: +, -,, T, 1 Linear Algebra properties: precedence, associativity, commutativity, distributivity of transposition and inversion... Interface to building blocks Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

52 1) Knowledge encoded Operand types: scalars, vectors, matrices Size: (1 m) (1 n) Properties: full rank, orthogonal, symmetric, triangular, SPD, diagonal, identity,... Linear Algebra operators: +, -,, T, 1 Linear Algebra properties: precedence, associativity, commutativity, distributivity of transposition and inversion... Interface to building blocks Guidelines, priorities Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

53 2) Knowledge inferred Operand type and size X R n p, M R n n X T M 1 X R p p Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

54 2) Knowledge inferred Operand type and size X R n p, M R n n X T M 1 X R p p Properties L is lower tri., D is diagonal, U is upper tri. LD 1 U T is lower triangular X is full rank, SP D(M) SP D(X T M 1 X) Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

55 2) Knowledge inferred Operand type and size X R n p, M R n n X T M 1 X R p p Properties L is lower tri., D is diagonal, U is upper tri. LD 1 U T is lower triangular X is full rank, SP D(M) SP D(X T M 1 X) Arithmetic (R T Q T QR) 1 R T Q T R 1 Q T Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

56 Sequences Naive approach: for i, for j,... b ij = ( X T i M 1 j X i ) 1 X T i M 1 j y j for i = 1 : m for j = 1 : t LL T = M j X T Xi T L T QR = X y L 1 y j b Q T y b ij R 1 b Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

57 Sequences Tracking the dependencies LL T = M j L j L T j = M j X T Xi T L T j Xij T Xi T L T j for i = 1 : m for j = 1 : t L j L T j = M j X T ij XT i L T j Q ij R ij = X ij y j L 1 j y j b ij Q T ij y j b ij R 1 ij b ij Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

58 Sequences Loop Transposition LL T = M j L j L T j = M j X T Xi T L T j Xij T Xi T L T j for i = 1 : m for j = 1 : t for j = 1 : t for i = 1 : m L j L T j = M j L j L T j = M j X T ij XT i L T j Q ij R ij = X ij y j L 1 j y j b ij Q T ij y j b ij R 1 ij b ij X T ij XT i L T j Q ij R ij = X ij y j L 1 j y j b ij Q T ij y j b ij R 1 ij b ij Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

59 Sequences Reordering LL T = M j L j L T j = M j X T Xi T L T j Xij T Xi T L T j for i = 1 : m for j = 1 : t for j = 1 : t for i = 1 : m L j L T j = M j L j L T j = M j X T ij XT i L T j Q ij R ij = X ij y j L 1 j y j b ij Q T ij y j b ij R 1 ij b ij X T ij XT i L T j Q ij R ij = X ij y j L 1 j y j b ij Q T ij y j b ij R 1 ij b ij Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

60 Sequences Reordering LL T = M j L j L T j = M j X T Xi T L T j Xij T Xi T L T j for i = 1 : m for j = 1 : t L j L T j = M j X T ij XT i L T j Q ij R ij = X ij y j L 1 j y j b ij Q T ij y j b ij R 1 ij b ij for j = 1 : t L j L T j = M j y j L 1 j y j for i = 1 : m X T ij XT i L T j Q ij R ij = X ij b ij Q T ij y j b ij R 1 ij b ij Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

61 1 Objective 2 Automation: CLAK 3 Algorithms and results 4 Conclusions & future work Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

62 GWAS b ij := (X T i M 1 j X i ) 1 X T i M 1 j y j Algorithm 1 Algorithm 2... Algorithm LL T = M X := L 1 X S := X T X GG T = S y := L 1 y b := X T y b := G 1 b b := G T b LL T = M X := L 1 X QR := X y := L 1 y b := Q T y b := R 1 b ZW Z T = Φ D :=(hw +(1 h)i) 1 KK T = D X := Z T X X := K T X QR := X y := L 1 y b := Q T y b := R 1 b Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

63 Cost analysis Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

64 Cost analysis Flop count Scenario Alg. 1 Alg. 2 Alg. 3 Single instance O(n 3 ) O(n 3 ) O(n 3 ) 2D sequence O(tn 3 + mtn 2 ) O(tn 3 + mtn 2 ) O(n 3 + mtn) Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

65 Cost analysis Sample-based prediction Sample the kernels not the algorithm! Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

66 Cost analysis Sample-based prediction Sample the kernels not the algorithm! Build polynomial models Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

67 Cost analysis Sample-based prediction Sample the kernels not the algorithm! Build polynomial models Create a database Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

68 Cost analysis Sample-based prediction Sample the kernels not the algorithm! Build polynomial models Create a database Algorithm execution querying Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

69 Performance b := (X T M 1 X) 1 X T M 1 y n = 5000 p = 4 m = 3000 time [s] Cholesky QR EV decomp. Measurements Prediction t Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

70 1 Objective 2 Automation: CLAK 3 Algorithms and results 4 Conclusions & future work Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

71 Conclusions & future work CLAK Linear algebra compiler: algorithm & code generator Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

72 Conclusions & future work CLAK Linear algebra compiler: algorithm & code generator Domain specific knowledge: 1) encoded + 2) inferred Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

73 Conclusions & future work CLAK Linear algebra compiler: algorithm & code generator Domain specific knowledge: 1) encoded + 2) inferred Performance: n-fold speedups Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

74 Conclusions & future work CLAK Linear algebra compiler: algorithm & code generator Domain specific knowledge: 1) encoded + 2) inferred Performance: n-fold speedups TODO list... we ve only started! Paolo Bientinesi (AICES, RWTH Aachen) A Compiler for Linear Algebra Operations August 13-14, / 20

Knowledge-Based Automatic Generation of Algorithms and Code

Knowledge-Based Automatic Generation of Algorithms and Code Diego Fabregat Traver AICES, RWTH Aachen fabregat@aices.rwth-aachen.de Doctoral Defense Aachen, December 6th, 2013 Diego Fabregat (AICES, RWTH