Communication Lower Bounds for Programs that Access Arrays

Size: px
Start display at page:

Download "Communication Lower Bounds for Programs that Access Arrays"

Transcription

1 Communication Lower Bounds for Programs that Access Arrays Nicholas Knight, Michael Christ, James Demmel, Thomas Scanlon, Katherine Yelick UC-Berkeley Scientific Computing and Matrix Computations Seminar Apr. 24, 2013 We acknowledge funding from Microsoft (award #024263) and Intel (award #024894), and matching funding by UC Discovery (award #DIG ), with additional support from ParLab affiliates National Instruments, Nokia, NVIDIA, Oracle, and Samsung, and support from MathWorks. We also acknowledge the support of the US DOE (grants DE-SC , DE-SC , DE-SC , DE-SC , DE-AC02-05CH11231, DE-FC02-06ER25753, and DE-FC02-07ER25799), DARPA (award #HR ), and NSF (grant DMS ).

2 Communication is expensive! 2 Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network

3 Communication is expensive! 2 Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network Communication usually dominates runtime, energy cost Avoid communication to save time and energy!

4 2 Communication is expensive! Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network Communication usually dominates runtime, energy cost Avoid communication to save time and energy! How much can you avoid? Lower bound on data movement

5 2 Communication is expensive! Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network Communication usually dominates runtime, energy cost Avoid communication to save time and energy! How much can you avoid? Lower bound on data movement Attain lower bound Communication-optimal algorithm

6 Outline 1 Avoiding Communication in Linear Algebra Lower bounds for matrix multiplication attainable by blocking Lower bounds for linear algebra (... attainable? see [BDHS11]) 2 Beyond Linear Algebra: Affine Array References E.g., A(i + 2j, 3k + 4) Extends previous lower bounds to larger class of programs. Lower bounds are computable. Matching upper bounds (i.e., optimal algorithms) in special case: when array references pick a subset of the loop indices (e.g., A(i, k), B(k, j)). Ongoing work addresses attainability in the general case.

7 First Lower Bound: Matrix Multiplication ( Matmul ) A, B, C are N-by-N matrices. C := C + A B for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) 3

8 First Lower Bound: Matrix Multiplication ( Matmul ) A, B, C are N-by-N matrices. C := C + A B Theorem ([HK81]) for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) Consider computing C + A B as above (in serial), with any order on the N 3 iterations. A processor must move ( ) #iterations # Words Moved = Ω (fast memory size) 1/2 ( ) N 3 = Ω M 1/2 words between slow memory (of unbounded capacity) and fast memory (of size M words). 3

9 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j)

10 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts

11 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N

12 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N S = x y z = (xz zy yx) 1/2 = S A 1/2 S B 1/2 S C 1/2

13 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y N C S C k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N k A S A S = {(i,j,k)} B 1 1 i N S B j N S = x y z = (xz zy yx) 1/2 = S A 1/2 S B 1/2 S C 1/2

14 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y N C S C k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N k A S A S = {(i,j,k)} B 1 1 i N S B j N S = x y z = (xz zy yx) 1/2 = S A 1/2 S B 1/2 S C 1/2 S S A 1/2 S B 1/2 S C 1/2 by Loomis-Whitney ineq. [LW49]

15 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement 5

16 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M 5

17 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M Upper bound on number of iterations doable given M operands: S S A 1/2 S B 1/2 S C 1/2 M 3/2 5

18 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M Upper bound on number of iterations doable given M operands: S S A 1/2 S B 1/2 S C 1/2 M 3/2 Data reuse = # iterations/# operands = O(M 1/2 ) (See [BDHS11] for precise argument.) 5

19 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M Upper bound on number of iterations doable given M operands: S S A 1/2 S B 1/2 S C 1/2 M 3/2 Data reuse = # iterations/# operands = O(M 1/2 ) (See [BDHS11] for precise argument.) # words moved total # iterations/max data reuse = Ω(N 3 /M 1/2 ) 5

20 Attaining Lower Bounds Blocking Matmul (1/3) for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) j j i = A B C i N 6

21 Attaining Lower Bounds Blocking Matmul (1/3) Suppose M < N. for i = 1 : N, for j = 1 : N, Load C(i, j) for k = 1 : N, Load A(i, k) and B(k, j) C(i, j) += A(i, k) B(k, j) Store C(i, j)... N 2 loads total... 2N 3 loads total... N 2 stores total which is suboptimal. # Words Moved = N 2 + 2N 3 + N 2 = O(N 3 ), 6

22 Attaining Lower Bounds Blocking Matmul (2/3) 7 = A B C

23 Attaining Lower Bounds Blocking Matmul (3/3) Suppose M < N, and let block size b satisfy 3b 2 M and assume b N. for i = 1 : N/b, for j = 1 : N/b, Load block C(i, j) for k = 1 : N/b, Load blocks A(i, k) and B(k, j) C(i, j) += A(i, k) B(k, j) Store block C(i, j)... (N/b) 2 block loads total... 2(N/b) 3 block loads total... (N/b) 2 block stores total and for the choice b = (M/3) 1/2 (assume integer), ( N 2 # Words Moved = which is asymptotically optimal. b 2 + 2N3 b 3 + N2 b 2 ) ( ) N b 2 3 = O M 1/2, 8

24 Lower Bounds for Linear Algebra Theorem ([BDHS11]) Suppose we are given an index set Z Z 3. Then the Matmul-like program for (i, j, k) Z, C(i, j) = C(i, j) + ij A(i, k) ijk B(k, j) must move Ω( Z /M 1/2 ) words. 9

25 Lower Bounds for Linear Algebra Theorem ([BDHS11]) Suppose we are given an index set Z Z 3. Then the Matmul-like program for (i, j, k) Z, C(i, j) = C(i, j) + ij A(i, k) ijk B(k, j) must move Ω( Z /M 1/2 ) words. Under some technical assumptions, this yields lower bounds for BLAS 3, e.g., A B, A 1 B; One-sided factorizations, e.g., LU, Cholesky, LDL T, ILU(t); Orthogonal factorizations, e.g., Gram Schmidt, QR, eigenvalue/singular value problems; Tensor contractions, some graph algorithms; and Sequences of these operations, interleaved arbitrarily. 9

26 Outline 1 Avoiding Communication in Linear Algebra Lower bounds for matrix multiplication attainable by blocking Lower bounds for linear algebra (... attainable? see [BDHS11]) 2 Beyond Linear Algebra: Affine Array References E.g., A(i + 2j, 3k + 4) Extends previous lower bounds to larger class of programs. Lower bounds are computable. Matching upper bounds (i.e., optimal algorithms) in special case: when array references pick a subset of the loop indices (e.g., A(i, k), B(k, j)). Ongoing work addresses attainability in the general case.

27 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul 10

28 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) 10

29 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 10

30 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j 10

31 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j φ j subscripts φ j : Z d Z d j (affine {(i, k), (k, j), (i, j)} combinations of loop indices) 10

32 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j φ j subscripts φ j : Z d Z d j (affine {(i, k), (k, j), (i, j)} combinations of loop indices) m number of arrays (injections into memory locations) 3 10

33 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j φ j subscripts φ j : Z d Z d j (affine {(i, k), (k, j), (i, j)} combinations of loop indices) m number of arrays (injections into memory locations) 3 for i = (i 1,..., i d ) Z Z d, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) 10

34 Lower Bound Strategy for Affine Array References 11 Proof strategy: 1 For any (finite) set E Z of loop iterations that accesses O(M) operands (i.e., a cache block), find σ such that E = O(M σ ). Matmul: σ = 3/2.

35 Lower Bound Strategy for Affine Array References 11 Proof strategy: 1 For any (finite) set E Z of loop iterations that accesses O(M) operands (i.e., a cache block), find σ such that E = O(M σ ). Matmul: σ = 3/2. 2 Since data reuse = O(M σ 1 ), we conclude the program must move Ω( Z /M σ 1 ) words. Matmul: Ω(N 3 /M 1/2 ) words.

36 Upper Bounds via Hölder-Brascamp-Lieb (HBL) theory 12 Theorem (Extension of [BCCT10, Theorem 2.4]) For j {1,..., m}, let φ j : Z d Z d j be a nonnegative number. Then, be a group homomorphism and s j for all subgroups H of Z d, if and only if, rank(h) m s j rank(φ j (H)), j=1 for all finite subsets E of Z d, m E φ j (S) s j. j=1

37 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. 13

38 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. Proof sketch: Let E be any cache block ; bound data reuse (#iterations/#operands) 1 Operands: max j A j (φ j (E)) = max j φ j (E) M 13

39 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. Proof sketch: Let E be any cache block ; bound data reuse (#iterations/#operands) 1 Operands: max j A j (φ j (E)) = max j φ j (E) M 2 Iterations: E j φ j(e) s j (max j φ j (E) ) j s j 13

40 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. Proof sketch: Let E be any cache block ; bound data reuse (#iterations/#operands) 1 Operands: max j A j (φ j (E)) = max j φ j (E) M 2 Iterations: E j φ j(e) s j (max j φ j (E) ) j s j 3 Data reuse: O(M ( j s j ) 1 ) # words moved: Ω( Z /M ( j s j ) 1 ). (See paper for precise argument.) 13

41 A Linear Program to Compute σ (1/2) 14 We can write the set of inequalities (for subgroups H 1,..., H i,... of Z d ) rank(h 1 ) m j=1 s j rank(φ j (H 1 )). rank(h i ) m j=1 s j rank(φ j (H i )). as a system of inequalities rank(φ 1 (H 1 )) rank(φ m (H 1 )).. rank(φ 1 (H i )) rank(φ m (H i )).. or more succinctly, as s r. s 1. s m rank(h 1 ). rank(h i ),.

42 A Linear Program to Compute σ (2/2) Observation Any s [0, ) m that satisfies s r leads to a valid upper bound M σ, with σ = σ(s) = j s j = 1 T s. The smallest σ(s) leads to the tightest upper bound M σ(s), thus the tightest lower bound Ω( Z /M σ(s) 1 ). 15

43 A Linear Program to Compute σ (2/2) Observation Any s [0, ) m that satisfies s r leads to a valid upper bound M σ, with σ = σ(s) = j s j = 1 T s. The smallest σ(s) leads to the tightest upper bound M σ(s), thus the tightest lower bound Ω( Z /M σ(s) 1 ). Definition (HBL LP) minimize σ(s) = 1 T s s.t. s r 15

44 (Un-) Decidability (1/4) How do we write down the infinitely many constraints s r? Observation Entries of, r in {0, 1,..., d}, so (d + 1) m+1 unique constraints. 16

45 (Un-) Decidability (1/4) How do we write down the infinitely many constraints s r? Observation Entries of, r in {0, 1,..., d}, so (d + 1) m+1 unique constraints. For each of the (d + 1) m+1 possible constraints, is it a row of (, r)? Question (Version (1/3)) Given natural numbers r 0, r 1,..., r m, is there a subgroup H of Z d with rank(h) = r 0 and rank(φ j (H)) = r j for each j? 16

46 (Un-) Decidability (1/4) How do we write down the infinitely many constraints s r? Observation Entries of, r in {0, 1,..., d}, so (d + 1) m+1 unique constraints. For each of the (d + 1) m+1 possible constraints, is it a row of (, r)? Question (Version (1/3)) Given natural numbers r 0, r 1,..., r m, is there a subgroup H of Z d with rank(h) = r 0 and rank(φ j (H)) = r j for each j? We can also treat each φ j as a d-by-d integer matrix, and ask Question (Version (2/3))... is there a d-by-d integer matrix H with rank(h) = r 0 and rank(φ j H) = r j for each j? 16

47 (Un-) Decidability (2/4) 17 A matrix has rank r iff all (r + 1)-by-(r + 1) minors are zero and at least one r-by-r minor is nonzero. Each minor of H, φ 1 H,..., φ m H is a polynomial with variables in the d 2 entries of H, and coefficients computed from the d 2 entries of φ j. a11 a12 a13 a14 a15 a16 a21 a22 a23 a24 a25 a26 a31 a32 a33 a34 a35 a36 a41 a42 a43 a44 a45 a46 a51 a52 a53 a54 a55 a56 a61 a62 a63 a64 a65 a66 a71 a72 a73 a74 a75 a76

48 (Un-) Decidability (2/4) 17 A matrix has rank r iff all (r + 1)-by-(r + 1) minors are zero and at least one r-by-r minor is nonzero. Each minor of H, φ 1 H,..., φ m H is a polynomial with variables in the d 2 entries of H, and coefficients computed from the d 2 entries of φ j. a11 a12 a13 a14 a15 a16 a21 a22 a23 a24 a25 a26 a31 a32 a33 a34 a35 a36 a41 a42 a43 a44 a45 a46 a51 a52 a53 a54 a55 a56 a61 a62 a63 a64 a65 a66 a71 a72 a73 a74 a75 a76 Eliminate quantifiers, obtain system S of polynomial equations. Question (Version (3/3)) Does S have an integer solution?

49 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? 18

50 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? No, there are undecidable systems (MRDP Theorem) 18

51 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? No, there are undecidable systems (MRDP Theorem) Is our problem really this hard? 18

52 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? No, there are undecidable systems (MRDP Theorem) Is our problem really this hard? Observation If a rational matrix H has the desired rank properties, we can obtain an integer matrix with the same properties. 18

53 (Un-) Decidability (4/4) Question (Hilbert s Tenth Problem for Q) Given any system of rational polynomial equations, can we decide whether it has a rational solution? A longstanding open question. 19

54 (Un-) Decidability (4/4) Question (Hilbert s Tenth Problem for Q) Given any system of rational polynomial equations, can we decide whether it has a rational solution? A longstanding open question. Theorem Our question is decidable if and only if the answer to HTP(Q) is Yes. 19

55 (Un-) Decidability (4/4) Question (Hilbert s Tenth Problem for Q) Given any system of rational polynomial equations, can we decide whether it has a rational solution? A longstanding open question. Theorem Our question is decidable if and only if the answer to HTP(Q) is Yes. Proof sketch: ( ) Solve polynomial system. ( ) Code any polynomial system as a set of matrices φ 1,..., φ m with certain properties... 19

56 An Equivalent Linear Program (1/2) Observation The feasible region of HBL LP is a convex polytope P. P = intersection of half-spaces defined by rows of s r. There may be many (, r) that have the same feasible region. At least one having a finite number of rows 20

57 An Equivalent Linear Program (2/2) 21 Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P.

58 An Equivalent Linear Program (2/2) Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P. Lemma There is a finite M where for all N M, P N = P. 21

59 An Equivalent Linear Program (2/2) Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P. Lemma There is a finite M where for all N M, P N = P. Lemma There is a (decidable) algorithm to check if P N = P. 21

60 An Equivalent Linear Program (2/2) Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P. Lemma There is a finite M where for all N M, P N = P. Lemma There is a (decidable) algorithm to check if P N = P. Theorem P is computable. 21

61 An Approximate Linear Program It is also possible to obtain an approximation of (, r): 1 Compute a superpolytope of P. Sample subgroups H 1,..., H N ; the resulting ˇσ σ. Comm. lower bound Ω( Z /M ˇσ 1 ) is possibly invalid (too tight) or attainable. 22

62 An Approximate Linear Program 22 It is also possible to obtain an approximation of (, r): 1 Compute a superpolytope of P. Sample subgroups H 1,..., H N ; the resulting ˇσ σ. Comm. lower bound Ω( Z /M ˇσ 1 ) is possibly invalid (too tight) or attainable. 2 Compute a subpolytope of P. Embed into R, i.e., φ j : R d R d j. HTP(R) is Tarski-decidable; the resulting ˆσ σ. Comm. lower bound Ω( Z /M ˆσ 1 ) is valid and possibly unattainable.

63 Special Case: Subscripts are Subsets of Indices e.g., A(i, k), B(k, j), C(i, j), subsets of {i, j, k} 23

64 Special Case: Subscripts are Subsets of Indices e.g., A(i, k), B(k, j), C(i, j), subsets of {i, j, k} Theorem Suppose every φ j projects a subset of the loop indices i 1,..., i d. Let e be the d rows of corresponding to subgroups H i = e i for i = 1 to d. then the linear program minimize 1 T s s.t. e s 1 yields the same optimum σ(s) as HBL LP, and furthermore, the dual linear program maximize 1 T x s.t. T e x 1 gives the optimal block size M x i for each loop. 23

65 Special Case: Subscripts are Subsets of Indices e.g., A(i, k), B(k, j), C(i, j), subsets of {i, j, k} Theorem Suppose every φ j projects a subset of the loop indices i 1,..., i d. Let e be the d rows of corresponding to subgroups H i = e i for i = 1 to d. then the linear program minimize 1 T s s.t. e s 1 yields the same optimum σ(s) as HBL LP, and furthermore, the dual linear program maximize 1 T x s.t. T e x 1 gives the optimal block size M x i for each loop. Note: In practice, we only need to solve the dual: 1 T x = 1 T s = σ(s). 23

66 Special Case: Example 1/3: Matmul 24 Original code: for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j)

67 Special Case: Example 1/3: Matmul 24 Original code: for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) Now we write down and solve the linear program: e = i j k A /2 B x = 1/2 σ(s) = 3/2 C /2

68 24 Special Case: Example 1/3: Matmul Original code: for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) Corollary For any execution of the code, the number of words moved is Ω( Z /M σ(s) 1 ) = Ω(N 3 /M 1/2 ), and this is attained by blocks of size M 1/2 -by-m 1/2 -by-m 1/2 in the following code (b = M 1/2 ). for i 1 = 1 : b : N, for j 1 = 1 : b : N, for k 1 = 1 : b : N, for i 2 = 0 : b 1, for j 2 = 0 : b 1, for k 2 = 0 : b 1, (i, j, k) = (i 1, j 1, k 1 ) + (i 2, j 2, k 2 )... // inner loop with index (i, j, k) The block sizes may have to be smaller by a constant factor, e.g. b i = (M/m) x i = (M/3) 1/2, to fit in cache simultaneously.

69 Special Case: Example 2/3: N-body Original code: for i = 1 : N, for j = 1 : N, F (i) += compute_force(p(i), P(j)) 25

70 Special Case: Example 2/3: N-body Original code: for i = 1 : N, for j = 1 : N, F (i) += compute_force(p(i), P(j)) Now we write down and solve the linear program: e = i j F 1 0 P x = P ( ) 1 1 σ(s) = 2 25

71 Special Case: Example 2/3: N-body Original code: for i = 1 : N, for j = 1 : N, F (i) += compute_force(p(i), P(j)) Corollary For any execution of the code, the number of words moved is Ω( Z /M σ(s) 1 ) = Ω(N 2 /M), and this is attained by blocks of size M-by-M in the following code (b = M). for i 1 = 1 : b : N, for j 1 = 1 : b : N, for i 2 = 0 : b 1, for j 2 = 0 : b 1, (i, j) = (i 1, j 1 ) + (i 2, j 2 )... // inner loop with index (i, j) The block sizes may have to be smaller by a constant factor, e.g. b i = (M/m) x i = M/3, to fit in cache simultaneously. 25

72 Special Case: Example 3/3: Complicated Code Original code: for i 1 = 1 : N, for i 2 = 1 : N,..., for i 6 = 1 : N, A 1 (i 1, i 3, i 6 ) += func 1 (A 2 (i 1, i 2, i 4 ), A 3 (i 2, i 3, i 5 ), A 4 (i 3, i 4, i 6 )) A 5 (i 2, i 6 ) += func 2 (A 6 (i 1, i 4, i 5 ), A 3 (i 3, i 4, i 6 )) 26

73 Special Case: Example 3/3: Complicated Code Original code: for i 1 = 1 : N, for i 2 = 1 : N,..., for i 6 = 1 : N, A 1 (i 1, i 3, i 6 ) += func 1 (A 2 (i 1, i 2, i 4 ), A 3 (i 2, i 3, i 5 ), A 4 (i 3, i 4, i 6 )) A 5 (i 2, i 6 ) += func 2 (A 6 (i 1, i 4, i 5 ), A 3 (i 3, i 4, i 6 )) Now we write down and solve the linear program: e = i 1 i 2 i 3 i 4 i 5 i 6 A A A 3, A 4, A 3, A A /7 3/7 x = 1/7 2/7 3/7 4/7 σ(s) = 15/7 26

74 Special Case: Example 3/3: Complicated Code Original code: for i 1 = 1 : N, for i 2 = 1 : N,..., for i 6 = 1 : N, A 1 (i 1, i 3, i 6 ) += func 1 (A 2 (i 1, i 2, i 4 ), A 3 (i 2, i 3, i 5 ), A 4 (i 3, i 4, i 6 )) A 5 (i 2, i 6 ) += func 2 (A 6 (i 1, i 4, i 5 ), A 3 (i 3, i 4, i 6 )) Corollary For any execution of the code, the number of words moved is Ω( Z /M σ(s) 1 ) = Ω(N 6 /M 8/7 ), and this is attained by blocks of size M 2/7 -by-m 3/7 -by-m 1/7 -by-m 2/7 -by-m 3/7 -by-m 4/7 in the following code (b i = M x i ). for i 1,1 = 1 : b 1 : N,..., for i 1,6 = 1 : b 6 : N, for i 2,1 = 0 : b 1 1,..., for i 2,6 = 0 : b 6 1, (i 1,..., i 6 ) = (i 1,1,..., i 1,6 ) + (i 2,1,..., i 2,6 )... // inner loop with index (i 1,..., i 6 ) The block sizes may have to be smaller by a constant factor, e.g. b i = (M/m) x i = (M/6) x i, to fit in cache simultaneously. 26

75 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: SLOW FAST

76 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels SLOW FAST 1 FAST 2 FAST 3

77 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels Homogeneous parallel processors: Z /P work per processor SLOW FAST 1 FAST 2 FAST 3 LOCAL LOCAL LOCAL LOCAL LOCAL

78 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels Homogeneous parallel processors: Z /P work per processor Hierarchical parallel processors SLOW FAST 1 FAST 2 FAST 3 LOCAL LOCAL LOCAL LOCAL LOCAL

79 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels Homogeneous parallel processors: Z /P work per processor Hierarchical parallel processors Heterogeneous machines: optimization problem to balance Z work SLOW FAST 1 FAST 2 FAST 3 LOCAL LOCAL LOCAL LOCAL LOCAL

80 Optimal Parallel Algorithms Sequential blocking gives parallel working sets. 28

81 Optimal Parallel Algorithms Sequential blocking gives parallel working sets. Theory suggests replicating inputs/outputs (e.g., 3D Matmul). Generalize 2.5D algorithms: N body, tensor operations, etc. 28

82 Optimal Parallel Algorithms Sequential blocking gives parallel working sets. Theory suggests replicating inputs/outputs (e.g., 3D Matmul). Generalize 2.5D algorithms: N body, tensor operations, etc. Optimal algorithms for special case, i.e., Ω(( Z /P)/M σ 1 ) is attainable. Ongoing work addresses general case... 28

83 Ongoing Work: Lower Bounds Want to compute lower bounds more efficiently. Claim Rather than enumerating all subgroups H 1, H 2,... of Z d, we only need to check a subset: the lattice L generated by the kernels of φ 1,..., φ m. 29

84 Ongoing Work: Lower Bounds Want to compute lower bounds more efficiently. Claim Rather than enumerating all subgroups H 1, H 2,... of Z d, we only need to check a subset: the lattice L generated by the kernels of φ 1,..., φ m. L may be finite subsets of indices case m 3 May only need to enumerate a finite semilattice L L All subscripts φ j have rank in {0, 1, d 1, d} d 4 29

85 Ongoing Work: Optimal Algorithms Claim For any s on the surface of the polytope, there is a sequence of subsets E of Z d with increasing cardinality which satisfy E = Θ ( j φ j(e) s j ). 30

86 Ongoing Work: Optimal Algorithms Claim For any s on the surface of the polytope, there is a sequence of subsets E of Z d with increasing cardinality which satisfy E = Θ ( j φ j(e) s j ). Question Are there sets which satisfy E = Θ ( (max j φ j (E) ) min s P j s ) j? 30

87 Ongoing Work: Optimal Algorithms Claim For any s on the surface of the polytope, there is a sequence of subsets E of Z d with increasing cardinality which satisfy E = Θ ( j φ j(e) s j ). Question Are there sets which satisfy E = Θ ( (max j φ j (E) ) min s P j s ) j? Question How big are the hidden constants? 30

88 Ongoing Work: Data Dependencies Partial order on Z encoded as DAG Some sets E are inadmissible (cannot be blocked) Question Can we tighten our bound E j φ j(e) s j for admissible sets? 31

89 Ongoing Work: Data Dependencies Partial order on Z encoded as DAG Some sets E are inadmissible (cannot be blocked) Question Can we tighten our bound E j φ j(e) s j for admissible sets? Question Can we extend our parallel theory to expose tradeoffs between: Concurrency, efficiency, memory, communication,... 31

90 Ongoing Work: Cost Model Only discussed communication volume (bandwidth cost). 32

91 Ongoing Work: Cost Model Claim Only discussed communication volume (bandwidth cost). Extend model to address # Messages/synchronizations (latency cost) Message size 1 w M words, so a latency lower bound is # words moved/w = Ω( Z /(wm σ 1 )) = Ω( Z /M σ ) messages. 32

92 Ongoing Work: Cost Model Claim Only discussed communication volume (bandwidth cost). Extend model to address # Messages/synchronizations (latency cost) Message size 1 w M words, so a latency lower bound is # words moved/w = Ω( Z /(wm σ 1 )) = Ω( Z /M σ ) messages. Energy/power costs 32

93 Ongoing Work: Cost Model Claim Only discussed communication volume (bandwidth cost). Extend model to address # Messages/synchronizations (latency cost) Message size 1 w M words, so a latency lower bound is # words moved/w = Ω( Z /(wm σ 1 )) = Ω( Z /M σ ) messages. Energy/power costs Network topology, congestion 32

94 Conclusions 33

95 Conclusions Communication is slowing you down! 33

96 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms 33

97 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms Previous work: Matmul [HK81, ITT04], linear algebra [BDHS11] 33

98 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms Previous work: Matmul [HK81, ITT04], linear algebra [BDHS11] This work: programs with affine array references 33

99 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms Previous work: Matmul [HK81, ITT04], linear algebra [BDHS11] This work: programs with affine array references Goal: Compiler generates communication-optimal code Tech. report to appear soon at bebop.cs.berkeley.edu Thank You 33

100 References I J. Bennett, A. Carbery, M. Christ, and T. Tao. Finite bounds for Hölder-Brascamp-Lieb multilinear inequalities. Mathematical Research Letters, 17(4): , G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications, 32(3): , J.-W. Hong and H.T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pages ACM, D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing, 64(9): , L.H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55: ,

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1 Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1 Michael Christ James Demmel Nicholas Knight Thomas Scanlon Katherine A. Yelick Electrical Engineering and Computer

More information

Practicality of Large Scale Fast Matrix Multiplication

Practicality of Large Scale Fast Matrix Multiplication Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by

More information

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013 Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions James Demmel, Andrew Gearhart, Benjamin Lipshitz and Oded Schwartz Electrical Engineering and Computer Sciences University

More information

Minimizing Communication in Linear Algebra

Minimizing Communication in Linear Algebra inimizing Communication in Linear Algebra Grey Ballard James Demmel lga Holtz ded Schwartz Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-62

More information

James Demmel UC Berkeley Math and EECS Depts. Joint work with Ioana Dumitriu, Olga Holtz, Plamen Koev

James Demmel UC Berkeley Math and EECS Depts. Joint work with Ioana Dumitriu, Olga Holtz, Plamen Koev . Accurate and efficient expression evaluation and linear algebra, or Why it can be easier to compute accurate eigenvalues of a Vandermonde matrix than the accurate sum of 3 numbers James Demmel UC Berkeley

More information

Communication avoiding parallel algorithms for dense matrix factorizations

Communication avoiding parallel algorithms for dense matrix factorizations Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik Erin Carson Nicholas Knight James Demmel Electrical Engineering and Computer Sciences

More information

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel Department of EECS, UC Berkeley February,

More information

Strassen-like algorithms for symmetric tensor contractions

Strassen-like algorithms for symmetric tensor contractions Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast

More information

Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication

Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication Grey Ballard James Demmel Olga Holtz Benjamin Lipshitz Oded Schwartz Electrical Engineering and Computer Sciences

More information

MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA *

MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA * SIAM J. MATRIX ANAL. & APPL. Vol. 32, No. 3, pp. 866 901 2011 Society for Industrial and Applied Mathematics MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA * GREY BALLARD, JAMES DEMMEL, OLGA HOLTZ,

More information

Bjorn Poonen. MSRI Introductory Workshop on Rational and Integral Points on Higher-dimensional Varieties. January 18, 2006

Bjorn Poonen. MSRI Introductory Workshop on Rational and Integral Points on Higher-dimensional Varieties. January 18, 2006 University of California at Berkeley MSRI Introductory Workshop on Rational and Integral Points on Higher-dimensional Varieties January 18, 2006 The original problem H10: Find an algorithm that solves

More information

Provably efficient algorithms for numerical tensor algebra

Provably efficient algorithms for numerical tensor algebra Provably efficient algorithms for numerical tensor algebra Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley Dissertation Defense Adviser: James Demmel Aug 27, 2014 Edgar Solomonik

More information

Introduction to communication avoiding linear algebra algorithms in high performance computing

Introduction to communication avoiding linear algebra algorithms in high performance computing Introduction to communication avoiding linear algebra algorithms in high performance computing Laura Grigori Inria Rocquencourt/UPMC Contents 1 Introduction............................ 2 2 The need for

More information

Communication-Avoiding Algorithms for Linear Algebra and Beyond

Communication-Avoiding Algorithms for Linear Algebra and Beyond Communication-Avoiding Algorithms for Linear Algebra and Beyond Longer version of slides available at: www.cs.berkeley.edu/~demmel/sc16_tutorial_f inal Jim Demmel, EECS & Math Depts, UC Berkeley and many,

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

A Potpourri of Nonlinear Algebra

A Potpourri of Nonlinear Algebra a a 2 a 3 + c c 2 c 3 =2, a a 3 b 2 + c c 3 d 2 =0, a 2 a 3 b + c 2 c 3 d =0, a 3 b b 2 + c 3 d d 2 = 4, a a 2 b 3 + c c 2 d 3 =0, a b 2 b 3 + c d 2 d 3 = 4, a 2 b b 3 + c 2 d 3 d =4, b b 2 b 3 + d d 2

More information

Vector spaces. EE 387, Notes 8, Handout #12

Vector spaces. EE 387, Notes 8, Handout #12 Vector spaces EE 387, Notes 8, Handout #12 A vector space V of vectors over a field F of scalars is a set with a binary operator + on V and a scalar-vector product satisfying these axioms: 1. (V, +) is

More information

On the Complexity of the Minimum Independent Set Partition Problem

On the Complexity of the Minimum Independent Set Partition Problem On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

Lecture Notes Introduction to Cluster Algebra

Lecture Notes Introduction to Cluster Algebra Lecture Notes Introduction to Cluster Algebra Ivan C.H. Ip Updated: April 14, 017 Total Positivity The final and historically the original motivation is from the study of total positive matrices, which

More information

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Performance and Scalability. Lars Karlsson

Performance and Scalability. Lars Karlsson Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

Strassen-like algorithms for symmetric tensor contractions

Strassen-like algorithms for symmetric tensor contractions Strassen-lie algorithms for symmetric tensor contractions Edgar Solomoni University of Illinois at Urbana-Champaign Scientfic and Statistical Computing Seminar University of Chicago April 13, 2017 1 /

More information

Nonlinear Discrete Optimization

Nonlinear Discrete Optimization Nonlinear Discrete Optimization Technion Israel Institute of Technology http://ie.technion.ac.il/~onn Billerafest 2008 - conference in honor of Lou Billera's 65th birthday (Update on Lecture Series given

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Discrete Optimization

Discrete Optimization Prof. Friedrich Eisenbrand Martin Niemeier Due Date: April 15, 2010 Discussions: March 25, April 01 Discrete Optimization Spring 2010 s 3 You can hand in written solutions for up to two of the exercises

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture 2 Date: 03/18/2010

47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture 2 Date: 03/18/2010 47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture Date: 03/18/010 We saw in the previous lecture that a lattice Λ can have many bases. In fact, if Λ is a lattice of a subspace L with

More information

Lecture #21. c T x Ax b. maximize subject to

Lecture #21. c T x Ax b. maximize subject to COMPSCI 330: Design and Analysis of Algorithms 11/11/2014 Lecture #21 Lecturer: Debmalya Panigrahi Scribe: Samuel Haney 1 Overview In this lecture, we discuss linear programming. We first show that the

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Algebraic and Geometric ideas in the theory of Discrete Optimization

Algebraic and Geometric ideas in the theory of Discrete Optimization Algebraic and Geometric ideas in the theory of Discrete Optimization Jesús A. De Loera, UC Davis Three Lectures based on the book: Algebraic & Geometric Ideas in the Theory of Discrete Optimization (SIAM-MOS

More information

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs LP-Duality ( Approximation Algorithms by V. Vazirani, Chapter 12) - Well-characterized problems, min-max relations, approximate certificates - LP problems in the standard form, primal and dual linear programs

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

A Review of Linear Programming

A Review of Linear Programming A Review of Linear Programming Instructor: Farid Alizadeh IEOR 4600y Spring 2001 February 14, 2001 1 Overview In this note we review the basic properties of linear programming including the primal simplex

More information

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions A. LINEAR ALGEBRA. CONVEX SETS 1. Matrices and vectors 1.1 Matrix operations 1.2 The rank of a matrix 2. Systems of linear equations 2.1 Basic solutions 3. Vector spaces 3.1 Linear dependence and independence

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic

More information

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009 LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix

More information

The simplex algorithm

The simplex algorithm The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case. It does yield insight into linear programs, however,

More information

Lectures 6, 7 and part of 8

Lectures 6, 7 and part of 8 Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June

More information

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Part a: You are given a graph G = (V,E) with edge weights w(e) > 0 for e E. You are also given a minimum cost spanning tree (MST) T. For one particular edge

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

SUMS OF VALUES OF A RATIONAL FUNCTION. x k i

SUMS OF VALUES OF A RATIONAL FUNCTION. x k i SUMS OF VALUES OF A RATIONAL FUNCTION BJORN POONEN Abstract. Let K be a number field, and let f K(x) be a nonconstant rational function. We study the sets { n } f(x i ) : x i K {poles of f} and { n f(x

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Linear Algebra. Preliminary Lecture Notes

Linear Algebra. Preliminary Lecture Notes Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........

More information

On embedding all n-manifolds into a single (n + 1)-manifold

On embedding all n-manifolds into a single (n + 1)-manifold On embedding all n-manifolds into a single (n + 1)-manifold Jiangang Yao, UC Berkeley The Forth East Asian School of Knots and Related Topics Joint work with Fan Ding & Shicheng Wang 2008-01-23 Problem

More information

Inderjit Dhillon The University of Texas at Austin

Inderjit Dhillon The University of Texas at Austin Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance

More information

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2 Lecture 1 LPs: Algebraic View 1.1 Introduction to Linear Programming Linear programs began to get a lot of attention in 1940 s, when people were interested in minimizing costs of various systems while

More information

Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems

Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16:38 2001 Linear programming Optimization Problems General optimization problem max{z(x) f j (x) 0,x D} or min{z(x) f j (x) 0,x D}

More information

ACO Comprehensive Exam October 14 and 15, 2013

ACO Comprehensive Exam October 14 and 15, 2013 1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for

More information

Chapter 3: Homology Groups Topics in Computational Topology: An Algorithmic View

Chapter 3: Homology Groups Topics in Computational Topology: An Algorithmic View Chapter 3: Homology Groups Topics in Computational Topology: An Algorithmic View As discussed in Chapter 2, we have complete topological information about 2-manifolds. How about higher dimensional manifolds?

More information

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The linear algebra of linear programs (March 15 and 22, 2015) 1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real

More information

Toward accurate polynomial evaluation in rounded arithmetic (short report)

Toward accurate polynomial evaluation in rounded arithmetic (short report) Toward accurate polynomial evaluation in rounded arithmetic (short report) James Demmel, Ioana Dumitriu, and Olga Holtz November 12, 2005 Abstract Given a multivariate real (or complex) polynomial p and

More information

BBM402-Lecture 20: LP Duality

BBM402-Lecture 20: LP Duality BBM402-Lecture 20: LP Duality Lecturer: Lale Özkahya Resources for the presentation: https://courses.engr.illinois.edu/cs473/fa2016/lectures.html An easy LP? which is compact form for max cx subject to

More information

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2

More information

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM

PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM ALEX FINK 1. Introduction and background Consider the discrete conditional independence model M given by {X 1 X 2 X 3, X 1 X 3 X 2 }. The intersection axiom

More information

The Shortest Vector Problem (Lattice Reduction Algorithms)

The Shortest Vector Problem (Lattice Reduction Algorithms) The Shortest Vector Problem (Lattice Reduction Algorithms) Approximation Algorithms by V. Vazirani, Chapter 27 - Problem statement, general discussion - Lattices: brief introduction - The Gauss algorithm

More information

MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT

MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT The following is the list of questions for the oral exam. At the same time, these questions represent all topics for the written exam. The procedure for

More information

Linear Classification: Linear Programming

Linear Classification: Linear Programming Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming Recall the definition

More information

Reid 5.2. Describe the irreducible components of V (J) for J = (y 2 x 4, x 2 2x 3 x 2 y + 2xy + y 2 y) in k[x, y, z]. Here k is algebraically closed.

Reid 5.2. Describe the irreducible components of V (J) for J = (y 2 x 4, x 2 2x 3 x 2 y + 2xy + y 2 y) in k[x, y, z]. Here k is algebraically closed. Reid 5.2. Describe the irreducible components of V (J) for J = (y 2 x 4, x 2 2x 3 x 2 y + 2xy + y 2 y) in k[x, y, z]. Here k is algebraically closed. Answer: Note that the first generator factors as (y

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS

MATHEMATICAL ENGINEERING TECHNICAL REPORTS MATHEMATICAL ENGINEERING TECHNICAL REPORTS Combinatorial Relaxation Algorithm for the Entire Sequence of the Maximum Degree of Minors in Mixed Polynomial Matrices Shun SATO (Communicated by Taayasu MATSUO)

More information

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 Definition: An Integer Linear Programming problem is an optimization problem of the form (ILP) min

More information

Duality of LPs and Applications

Duality of LPs and Applications Lecture 6 Duality of LPs and Applications Last lecture we introduced duality of linear programs. We saw how to form duals, and proved both the weak and strong duality theorems. In this lecture we will

More information

Notes taken by Graham Taylor. January 22, 2005

Notes taken by Graham Taylor. January 22, 2005 CSC4 - Linear Programming and Combinatorial Optimization Lecture : Different forms of LP. The algebraic objects behind LP. Basic Feasible Solutions Notes taken by Graham Taylor January, 5 Summary: We first

More information

Hilbert s Tenth Problem for Subrings of the Rationals

Hilbert s Tenth Problem for Subrings of the Rationals Hilbert s Tenth Problem for Subrings of the Rationals Russell Miller Queens College & CUNY Graduate Center Special Session on Computability Theory & Applications AMS Sectional Meeting Loyola University,

More information

Introduction to Semidefinite Programming I: Basic properties a

Introduction to Semidefinite Programming I: Basic properties a Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

Mathematical Research Letters xxx, NN (2008) FINITE BOUNDS FOR HÖLDER-BRASCAMP-LIEB MULTILINEAR INEQUALITIES

Mathematical Research Letters xxx, NN (2008) FINITE BOUNDS FOR HÖLDER-BRASCAMP-LIEB MULTILINEAR INEQUALITIES Mathematical Research Letters xxx, 10001 100NN (2008) FINITE BOUNDS FOR HÖLDER-BRASCAMP-LIEB MULTILINEAR INEQUALITIES Jonathan Bennett, Anthony Carbery, Michael Christ, and Terence Tao Abstract. A criterion

More information

COMBINATORIAL CURVE NEIGHBORHOODS FOR THE AFFINE FLAG MANIFOLD OF TYPE A 1 1

COMBINATORIAL CURVE NEIGHBORHOODS FOR THE AFFINE FLAG MANIFOLD OF TYPE A 1 1 COMBINATORIAL CURVE NEIGHBORHOODS FOR THE AFFINE FLAG MANIFOLD OF TYPE A 1 1 LEONARDO C. MIHALCEA AND TREVOR NORTON Abstract. Let X be the affine flag manifold of Lie type A 1 1. Its moment graph encodes

More information

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory 6895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 3: Coding Theory Lecturer: Dana Moshkovitz Scribe: Michael Forbes and Dana Moshkovitz 1 Motivation In the course we will make heavy use of

More information

DM559 Linear and Integer Programming. Lecture 2 Systems of Linear Equations. Marco Chiarandini

DM559 Linear and Integer Programming. Lecture 2 Systems of Linear Equations. Marco Chiarandini DM559 Linear and Integer Programming Lecture Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. Outline 1. 3 A Motivating Example You are organizing

More information

Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point

Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point Gérard Cornuéjols 1 and Yanjun Li 2 1 Tepper School of Business, Carnegie Mellon

More information

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra LP Duality: outline I Motivation and definition of a dual LP I Weak duality I Separating hyperplane theorem and theorems of the alternatives I Strong duality and complementary slackness I Using duality

More information

CSC Design and Analysis of Algorithms. LP Shader Electronics Example

CSC Design and Analysis of Algorithms. LP Shader Electronics Example CSC 80- Design and Analysis of Algorithms Lecture (LP) LP Shader Electronics Example The Shader Electronics Company produces two products:.eclipse, a portable touchscreen digital player; it takes hours

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh June 2009 1 Linear independence These problems both appeared in a course of Benny Sudakov at Princeton, but the links to Olympiad problems are due to Yufei

More information

Divisible Load Scheduling

Divisible Load Scheduling Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute

More information

An algebraic perspective on integer sparse recovery

An algebraic perspective on integer sparse recovery An algebraic perspective on integer sparse recovery Lenny Fukshansky Claremont McKenna College (joint work with Deanna Needell and Benny Sudakov) Combinatorics Seminar USC October 31, 2018 From Wikipedia:

More information

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline

More information

Semidefinite programs and combinatorial optimization

Semidefinite programs and combinatorial optimization Semidefinite programs and combinatorial optimization Lecture notes by L. Lovász Microsoft Research Redmond, WA 98052 lovasz@microsoft.com http://www.research.microsoft.com/ lovasz Contents 1 Introduction

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets

More information

III. Linear Programming

III. Linear Programming III. Linear Programming Thomas Sauerwald Easter 2017 Outline Introduction Standard and Slack Forms Formulating Problems as Linear Programs Simplex Algorithm Finding an Initial Solution III. Linear Programming

More information

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming January 26, 2018 1 / 38 Liability/asset cash-flow matching problem Recall the formulation of the problem: max w c 1 + p 1 e 1 = 150

More information

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1) Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3

More information

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,

More information

The use of predicates to state constraints in Constraint Satisfaction is explained. If, as an

The use of predicates to state constraints in Constraint Satisfaction is explained. If, as an of Constraint Satisfaction in Integer Programming H.P. Williams Hong Yan Faculty of Mathematical Studies, University of Southampton, Southampton, UK Department of Management, The Hong Kong Polytechnic

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

CO 250 Final Exam Guide

CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information