Communication Lower Bounds for Programs that Access Arrays
|
|
- Marybeth Boyd
- 5 years ago
- Views:
Transcription
1 Communication Lower Bounds for Programs that Access Arrays Nicholas Knight, Michael Christ, James Demmel, Thomas Scanlon, Katherine Yelick UC-Berkeley Scientific Computing and Matrix Computations Seminar Apr. 24, 2013 We acknowledge funding from Microsoft (award #024263) and Intel (award #024894), and matching funding by UC Discovery (award #DIG ), with additional support from ParLab affiliates National Instruments, Nokia, NVIDIA, Oracle, and Samsung, and support from MathWorks. We also acknowledge the support of the US DOE (grants DE-SC , DE-SC , DE-SC , DE-SC , DE-AC02-05CH11231, DE-FC02-06ER25753, and DE-FC02-07ER25799), DARPA (award #HR ), and NSF (grant DMS ).
2 Communication is expensive! 2 Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network
3 Communication is expensive! 2 Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network Communication usually dominates runtime, energy cost Avoid communication to save time and energy!
4 2 Communication is expensive! Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network Communication usually dominates runtime, energy cost Avoid communication to save time and energy! How much can you avoid? Lower bound on data movement
5 2 Communication is expensive! Communication means moving data Serial communication = moving data across memory hierarchy Parallel communication = moving data across network Communication usually dominates runtime, energy cost Avoid communication to save time and energy! How much can you avoid? Lower bound on data movement Attain lower bound Communication-optimal algorithm
6 Outline 1 Avoiding Communication in Linear Algebra Lower bounds for matrix multiplication attainable by blocking Lower bounds for linear algebra (... attainable? see [BDHS11]) 2 Beyond Linear Algebra: Affine Array References E.g., A(i + 2j, 3k + 4) Extends previous lower bounds to larger class of programs. Lower bounds are computable. Matching upper bounds (i.e., optimal algorithms) in special case: when array references pick a subset of the loop indices (e.g., A(i, k), B(k, j)). Ongoing work addresses attainability in the general case.
7 First Lower Bound: Matrix Multiplication ( Matmul ) A, B, C are N-by-N matrices. C := C + A B for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) 3
8 First Lower Bound: Matrix Multiplication ( Matmul ) A, B, C are N-by-N matrices. C := C + A B Theorem ([HK81]) for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) Consider computing C + A B as above (in serial), with any order on the N 3 iterations. A processor must move ( ) #iterations # Words Moved = Ω (fast memory size) 1/2 ( ) N 3 = Ω M 1/2 words between slow memory (of unbounded capacity) and fast memory (of size M words). 3
9 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j)
10 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts
11 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N
12 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N S = x y z = (xz zy yx) 1/2 = S A 1/2 S B 1/2 S C 1/2
13 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y N C S C k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N k A S A S = {(i,j,k)} B 1 1 i N S B j N S = x y z = (xz zy yx) 1/2 = S A 1/2 S B 1/2 S C 1/2
14 First Lower Bound: Geometric Intuition [ITT04] (1/2) 4 for i = 1 : N, for j = 1 : N, for k = 1 : N C(i, j) += A(i, k) B(k, j) Idea Bound volume(s) by the areas of the shadows S casts N C S C x y N C S C k A z S A x S = {(i,j,k)} 1 1 i N S B B z y j N k A S A S = {(i,j,k)} B 1 1 i N S B j N S = x y z = (xz zy yx) 1/2 = S A 1/2 S B 1/2 S C 1/2 S S A 1/2 S B 1/2 S C 1/2 by Loomis-Whitney ineq. [LW49]
15 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement 5
16 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M 5
17 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M Upper bound on number of iterations doable given M operands: S S A 1/2 S B 1/2 S C 1/2 M 3/2 5
18 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M Upper bound on number of iterations doable given M operands: S S A 1/2 S B 1/2 S C 1/2 M 3/2 Data reuse = # iterations/# operands = O(M 1/2 ) (See [BDHS11] for precise argument.) 5
19 First Lower Bound: Geometric Intuition [ITT04] (2/2) Idea Bound volume(s) by the areas of the shadows S casts Idea Upper bound on data reuse lower bound on data movement Upper bound on number of operands: M max( S A, S B, S C ) M S A + S B + S C M Upper bound on number of iterations doable given M operands: S S A 1/2 S B 1/2 S C 1/2 M 3/2 Data reuse = # iterations/# operands = O(M 1/2 ) (See [BDHS11] for precise argument.) # words moved total # iterations/max data reuse = Ω(N 3 /M 1/2 ) 5
20 Attaining Lower Bounds Blocking Matmul (1/3) for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) j j i = A B C i N 6
21 Attaining Lower Bounds Blocking Matmul (1/3) Suppose M < N. for i = 1 : N, for j = 1 : N, Load C(i, j) for k = 1 : N, Load A(i, k) and B(k, j) C(i, j) += A(i, k) B(k, j) Store C(i, j)... N 2 loads total... 2N 3 loads total... N 2 stores total which is suboptimal. # Words Moved = N 2 + 2N 3 + N 2 = O(N 3 ), 6
22 Attaining Lower Bounds Blocking Matmul (2/3) 7 = A B C
23 Attaining Lower Bounds Blocking Matmul (3/3) Suppose M < N, and let block size b satisfy 3b 2 M and assume b N. for i = 1 : N/b, for j = 1 : N/b, Load block C(i, j) for k = 1 : N/b, Load blocks A(i, k) and B(k, j) C(i, j) += A(i, k) B(k, j) Store block C(i, j)... (N/b) 2 block loads total... 2(N/b) 3 block loads total... (N/b) 2 block stores total and for the choice b = (M/3) 1/2 (assume integer), ( N 2 # Words Moved = which is asymptotically optimal. b 2 + 2N3 b 3 + N2 b 2 ) ( ) N b 2 3 = O M 1/2, 8
24 Lower Bounds for Linear Algebra Theorem ([BDHS11]) Suppose we are given an index set Z Z 3. Then the Matmul-like program for (i, j, k) Z, C(i, j) = C(i, j) + ij A(i, k) ijk B(k, j) must move Ω( Z /M 1/2 ) words. 9
25 Lower Bounds for Linear Algebra Theorem ([BDHS11]) Suppose we are given an index set Z Z 3. Then the Matmul-like program for (i, j, k) Z, C(i, j) = C(i, j) + ij A(i, k) ijk B(k, j) must move Ω( Z /M 1/2 ) words. Under some technical assumptions, this yields lower bounds for BLAS 3, e.g., A B, A 1 B; One-sided factorizations, e.g., LU, Cholesky, LDL T, ILU(t); Orthogonal factorizations, e.g., Gram Schmidt, QR, eigenvalue/singular value problems; Tensor contractions, some graph algorithms; and Sequences of these operations, interleaved arbitrarily. 9
26 Outline 1 Avoiding Communication in Linear Algebra Lower bounds for matrix multiplication attainable by blocking Lower bounds for linear algebra (... attainable? see [BDHS11]) 2 Beyond Linear Algebra: Affine Array References E.g., A(i + 2j, 3k + 4) Extends previous lower bounds to larger class of programs. Lower bounds are computable. Matching upper bounds (i.e., optimal algorithms) in special case: when array references pick a subset of the loop indices (e.g., A(i, k), B(k, j)). Ongoing work addresses attainability in the general case.
27 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul 10
28 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) 10
29 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 10
30 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j 10
31 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j φ j subscripts φ j : Z d Z d j (affine {(i, k), (k, j), (i, j)} combinations of loop indices) 10
32 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j φ j subscripts φ j : Z d Z d j (affine {(i, k), (k, j), (i, j)} combinations of loop indices) m number of arrays (injections into memory locations) 3 10
33 Generalization: Affine Array References Given loop iterations indexed (i 1,..., i d ) Z d, and parameters Param. Description Example: Matmul d dimension of iteration space (rank 3 of Z d ) Z iteration space, a subset of Z d {1,..., N} 3 A j, d j Each d j dimensional array A j is {A, B, C}, {2, 2, 2} subscripted by Z d j φ j subscripts φ j : Z d Z d j (affine {(i, k), (k, j), (i, j)} combinations of loop indices) m number of arrays (injections into memory locations) 3 for i = (i 1,..., i d ) Z Z d, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) 10
34 Lower Bound Strategy for Affine Array References 11 Proof strategy: 1 For any (finite) set E Z of loop iterations that accesses O(M) operands (i.e., a cache block), find σ such that E = O(M σ ). Matmul: σ = 3/2.
35 Lower Bound Strategy for Affine Array References 11 Proof strategy: 1 For any (finite) set E Z of loop iterations that accesses O(M) operands (i.e., a cache block), find σ such that E = O(M σ ). Matmul: σ = 3/2. 2 Since data reuse = O(M σ 1 ), we conclude the program must move Ω( Z /M σ 1 ) words. Matmul: Ω(N 3 /M 1/2 ) words.
36 Upper Bounds via Hölder-Brascamp-Lieb (HBL) theory 12 Theorem (Extension of [BCCT10, Theorem 2.4]) For j {1,..., m}, let φ j : Z d Z d j be a nonnegative number. Then, be a group homomorphism and s j for all subgroups H of Z d, if and only if, rank(h) m s j rank(φ j (H)), j=1 for all finite subsets E of Z d, m E φ j (S) s j. j=1
37 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. 13
38 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. Proof sketch: Let E be any cache block ; bound data reuse (#iterations/#operands) 1 Operands: max j A j (φ j (E)) = max j φ j (E) M 13
39 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. Proof sketch: Let E be any cache block ; bound data reuse (#iterations/#operands) 1 Operands: max j A j (φ j (E)) = max j φ j (E) M 2 Iterations: E j φ j(e) s j (max j φ j (E) ) j s j 13
40 Lower Bounds for Affine Array References Theorem (Communication Lower Bound) A program of the form for i = (i 1,..., i d ) Z, inner_loop i (A 1 (φ 1 (i)),..., A m (φ m (i))) must move Ω( Z /M ( j s j ) 1 ) words, where s = (s 1,..., s m ) satisfies rank(h) j s j rank(φ j (H)) for all subgroups H of Z d. Proof sketch: Let E be any cache block ; bound data reuse (#iterations/#operands) 1 Operands: max j A j (φ j (E)) = max j φ j (E) M 2 Iterations: E j φ j(e) s j (max j φ j (E) ) j s j 3 Data reuse: O(M ( j s j ) 1 ) # words moved: Ω( Z /M ( j s j ) 1 ). (See paper for precise argument.) 13
41 A Linear Program to Compute σ (1/2) 14 We can write the set of inequalities (for subgroups H 1,..., H i,... of Z d ) rank(h 1 ) m j=1 s j rank(φ j (H 1 )). rank(h i ) m j=1 s j rank(φ j (H i )). as a system of inequalities rank(φ 1 (H 1 )) rank(φ m (H 1 )).. rank(φ 1 (H i )) rank(φ m (H i )).. or more succinctly, as s r. s 1. s m rank(h 1 ). rank(h i ),.
42 A Linear Program to Compute σ (2/2) Observation Any s [0, ) m that satisfies s r leads to a valid upper bound M σ, with σ = σ(s) = j s j = 1 T s. The smallest σ(s) leads to the tightest upper bound M σ(s), thus the tightest lower bound Ω( Z /M σ(s) 1 ). 15
43 A Linear Program to Compute σ (2/2) Observation Any s [0, ) m that satisfies s r leads to a valid upper bound M σ, with σ = σ(s) = j s j = 1 T s. The smallest σ(s) leads to the tightest upper bound M σ(s), thus the tightest lower bound Ω( Z /M σ(s) 1 ). Definition (HBL LP) minimize σ(s) = 1 T s s.t. s r 15
44 (Un-) Decidability (1/4) How do we write down the infinitely many constraints s r? Observation Entries of, r in {0, 1,..., d}, so (d + 1) m+1 unique constraints. 16
45 (Un-) Decidability (1/4) How do we write down the infinitely many constraints s r? Observation Entries of, r in {0, 1,..., d}, so (d + 1) m+1 unique constraints. For each of the (d + 1) m+1 possible constraints, is it a row of (, r)? Question (Version (1/3)) Given natural numbers r 0, r 1,..., r m, is there a subgroup H of Z d with rank(h) = r 0 and rank(φ j (H)) = r j for each j? 16
46 (Un-) Decidability (1/4) How do we write down the infinitely many constraints s r? Observation Entries of, r in {0, 1,..., d}, so (d + 1) m+1 unique constraints. For each of the (d + 1) m+1 possible constraints, is it a row of (, r)? Question (Version (1/3)) Given natural numbers r 0, r 1,..., r m, is there a subgroup H of Z d with rank(h) = r 0 and rank(φ j (H)) = r j for each j? We can also treat each φ j as a d-by-d integer matrix, and ask Question (Version (2/3))... is there a d-by-d integer matrix H with rank(h) = r 0 and rank(φ j H) = r j for each j? 16
47 (Un-) Decidability (2/4) 17 A matrix has rank r iff all (r + 1)-by-(r + 1) minors are zero and at least one r-by-r minor is nonzero. Each minor of H, φ 1 H,..., φ m H is a polynomial with variables in the d 2 entries of H, and coefficients computed from the d 2 entries of φ j. a11 a12 a13 a14 a15 a16 a21 a22 a23 a24 a25 a26 a31 a32 a33 a34 a35 a36 a41 a42 a43 a44 a45 a46 a51 a52 a53 a54 a55 a56 a61 a62 a63 a64 a65 a66 a71 a72 a73 a74 a75 a76
48 (Un-) Decidability (2/4) 17 A matrix has rank r iff all (r + 1)-by-(r + 1) minors are zero and at least one r-by-r minor is nonzero. Each minor of H, φ 1 H,..., φ m H is a polynomial with variables in the d 2 entries of H, and coefficients computed from the d 2 entries of φ j. a11 a12 a13 a14 a15 a16 a21 a22 a23 a24 a25 a26 a31 a32 a33 a34 a35 a36 a41 a42 a43 a44 a45 a46 a51 a52 a53 a54 a55 a56 a61 a62 a63 a64 a65 a66 a71 a72 a73 a74 a75 a76 Eliminate quantifiers, obtain system S of polynomial equations. Question (Version (3/3)) Does S have an integer solution?
49 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? 18
50 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? No, there are undecidable systems (MRDP Theorem) 18
51 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? No, there are undecidable systems (MRDP Theorem) Is our problem really this hard? 18
52 (Un-) Decidability (3/4) Question (Hilbert s Tenth Problem for Z) Given any system of integer polynomial equations, can we decide whether it has a integer solution? No, there are undecidable systems (MRDP Theorem) Is our problem really this hard? Observation If a rational matrix H has the desired rank properties, we can obtain an integer matrix with the same properties. 18
53 (Un-) Decidability (4/4) Question (Hilbert s Tenth Problem for Q) Given any system of rational polynomial equations, can we decide whether it has a rational solution? A longstanding open question. 19
54 (Un-) Decidability (4/4) Question (Hilbert s Tenth Problem for Q) Given any system of rational polynomial equations, can we decide whether it has a rational solution? A longstanding open question. Theorem Our question is decidable if and only if the answer to HTP(Q) is Yes. 19
55 (Un-) Decidability (4/4) Question (Hilbert s Tenth Problem for Q) Given any system of rational polynomial equations, can we decide whether it has a rational solution? A longstanding open question. Theorem Our question is decidable if and only if the answer to HTP(Q) is Yes. Proof sketch: ( ) Solve polynomial system. ( ) Code any polynomial system as a set of matrices φ 1,..., φ m with certain properties... 19
56 An Equivalent Linear Program (1/2) Observation The feasible region of HBL LP is a convex polytope P. P = intersection of half-spaces defined by rows of s r. There may be many (, r) that have the same feasible region. At least one having a finite number of rows 20
57 An Equivalent Linear Program (2/2) 21 Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P.
58 An Equivalent Linear Program (2/2) Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P. Lemma There is a finite M where for all N M, P N = P. 21
59 An Equivalent Linear Program (2/2) Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P. Lemma There is a finite M where for all N M, P N = P. Lemma There is a (decidable) algorithm to check if P N = P. 21
60 An Equivalent Linear Program (2/2) Let H 1, H 2,... be a sequence of all subgroups of Z d, and let P be the induced polytope. Each subsequence H 1,..., H N induces a polytope P N. Lemma For each N, the corresponding polytope P N P. Lemma There is a finite M where for all N M, P N = P. Lemma There is a (decidable) algorithm to check if P N = P. Theorem P is computable. 21
61 An Approximate Linear Program It is also possible to obtain an approximation of (, r): 1 Compute a superpolytope of P. Sample subgroups H 1,..., H N ; the resulting ˇσ σ. Comm. lower bound Ω( Z /M ˇσ 1 ) is possibly invalid (too tight) or attainable. 22
62 An Approximate Linear Program 22 It is also possible to obtain an approximation of (, r): 1 Compute a superpolytope of P. Sample subgroups H 1,..., H N ; the resulting ˇσ σ. Comm. lower bound Ω( Z /M ˇσ 1 ) is possibly invalid (too tight) or attainable. 2 Compute a subpolytope of P. Embed into R, i.e., φ j : R d R d j. HTP(R) is Tarski-decidable; the resulting ˆσ σ. Comm. lower bound Ω( Z /M ˆσ 1 ) is valid and possibly unattainable.
63 Special Case: Subscripts are Subsets of Indices e.g., A(i, k), B(k, j), C(i, j), subsets of {i, j, k} 23
64 Special Case: Subscripts are Subsets of Indices e.g., A(i, k), B(k, j), C(i, j), subsets of {i, j, k} Theorem Suppose every φ j projects a subset of the loop indices i 1,..., i d. Let e be the d rows of corresponding to subgroups H i = e i for i = 1 to d. then the linear program minimize 1 T s s.t. e s 1 yields the same optimum σ(s) as HBL LP, and furthermore, the dual linear program maximize 1 T x s.t. T e x 1 gives the optimal block size M x i for each loop. 23
65 Special Case: Subscripts are Subsets of Indices e.g., A(i, k), B(k, j), C(i, j), subsets of {i, j, k} Theorem Suppose every φ j projects a subset of the loop indices i 1,..., i d. Let e be the d rows of corresponding to subgroups H i = e i for i = 1 to d. then the linear program minimize 1 T s s.t. e s 1 yields the same optimum σ(s) as HBL LP, and furthermore, the dual linear program maximize 1 T x s.t. T e x 1 gives the optimal block size M x i for each loop. Note: In practice, we only need to solve the dual: 1 T x = 1 T s = σ(s). 23
66 Special Case: Example 1/3: Matmul 24 Original code: for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j)
67 Special Case: Example 1/3: Matmul 24 Original code: for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) Now we write down and solve the linear program: e = i j k A /2 B x = 1/2 σ(s) = 3/2 C /2
68 24 Special Case: Example 1/3: Matmul Original code: for i = 1 : N, for j = 1 : N, for k = 1 : N, C(i, j) += A(i, k) B(k, j) Corollary For any execution of the code, the number of words moved is Ω( Z /M σ(s) 1 ) = Ω(N 3 /M 1/2 ), and this is attained by blocks of size M 1/2 -by-m 1/2 -by-m 1/2 in the following code (b = M 1/2 ). for i 1 = 1 : b : N, for j 1 = 1 : b : N, for k 1 = 1 : b : N, for i 2 = 0 : b 1, for j 2 = 0 : b 1, for k 2 = 0 : b 1, (i, j, k) = (i 1, j 1, k 1 ) + (i 2, j 2, k 2 )... // inner loop with index (i, j, k) The block sizes may have to be smaller by a constant factor, e.g. b i = (M/m) x i = (M/3) 1/2, to fit in cache simultaneously.
69 Special Case: Example 2/3: N-body Original code: for i = 1 : N, for j = 1 : N, F (i) += compute_force(p(i), P(j)) 25
70 Special Case: Example 2/3: N-body Original code: for i = 1 : N, for j = 1 : N, F (i) += compute_force(p(i), P(j)) Now we write down and solve the linear program: e = i j F 1 0 P x = P ( ) 1 1 σ(s) = 2 25
71 Special Case: Example 2/3: N-body Original code: for i = 1 : N, for j = 1 : N, F (i) += compute_force(p(i), P(j)) Corollary For any execution of the code, the number of words moved is Ω( Z /M σ(s) 1 ) = Ω(N 2 /M), and this is attained by blocks of size M-by-M in the following code (b = M). for i 1 = 1 : b : N, for j 1 = 1 : b : N, for i 2 = 0 : b 1, for j 2 = 0 : b 1, (i, j) = (i 1, j 1 ) + (i 2, j 2 )... // inner loop with index (i, j) The block sizes may have to be smaller by a constant factor, e.g. b i = (M/m) x i = M/3, to fit in cache simultaneously. 25
72 Special Case: Example 3/3: Complicated Code Original code: for i 1 = 1 : N, for i 2 = 1 : N,..., for i 6 = 1 : N, A 1 (i 1, i 3, i 6 ) += func 1 (A 2 (i 1, i 2, i 4 ), A 3 (i 2, i 3, i 5 ), A 4 (i 3, i 4, i 6 )) A 5 (i 2, i 6 ) += func 2 (A 6 (i 1, i 4, i 5 ), A 3 (i 3, i 4, i 6 )) 26
73 Special Case: Example 3/3: Complicated Code Original code: for i 1 = 1 : N, for i 2 = 1 : N,..., for i 6 = 1 : N, A 1 (i 1, i 3, i 6 ) += func 1 (A 2 (i 1, i 2, i 4 ), A 3 (i 2, i 3, i 5 ), A 4 (i 3, i 4, i 6 )) A 5 (i 2, i 6 ) += func 2 (A 6 (i 1, i 4, i 5 ), A 3 (i 3, i 4, i 6 )) Now we write down and solve the linear program: e = i 1 i 2 i 3 i 4 i 5 i 6 A A A 3, A 4, A 3, A A /7 3/7 x = 1/7 2/7 3/7 4/7 σ(s) = 15/7 26
74 Special Case: Example 3/3: Complicated Code Original code: for i 1 = 1 : N, for i 2 = 1 : N,..., for i 6 = 1 : N, A 1 (i 1, i 3, i 6 ) += func 1 (A 2 (i 1, i 2, i 4 ), A 3 (i 2, i 3, i 5 ), A 4 (i 3, i 4, i 6 )) A 5 (i 2, i 6 ) += func 2 (A 6 (i 1, i 4, i 5 ), A 3 (i 3, i 4, i 6 )) Corollary For any execution of the code, the number of words moved is Ω( Z /M σ(s) 1 ) = Ω(N 6 /M 8/7 ), and this is attained by blocks of size M 2/7 -by-m 3/7 -by-m 1/7 -by-m 2/7 -by-m 3/7 -by-m 4/7 in the following code (b i = M x i ). for i 1,1 = 1 : b 1 : N,..., for i 1,6 = 1 : b 6 : N, for i 2,1 = 0 : b 1 1,..., for i 2,6 = 0 : b 6 1, (i 1,..., i 6 ) = (i 1,1,..., i 1,6 ) + (i 2,1,..., i 2,6 )... // inner loop with index (i 1,..., i 6 ) The block sizes may have to be smaller by a constant factor, e.g. b i = (M/m) x i = (M/6) x i, to fit in cache simultaneously. 26
75 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: SLOW FAST
76 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels SLOW FAST 1 FAST 2 FAST 3
77 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels Homogeneous parallel processors: Z /P work per processor SLOW FAST 1 FAST 2 FAST 3 LOCAL LOCAL LOCAL LOCAL LOCAL
78 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels Homogeneous parallel processors: Z /P work per processor Hierarchical parallel processors SLOW FAST 1 FAST 2 FAST 3 LOCAL LOCAL LOCAL LOCAL LOCAL
79 Extending to Other Machine Models 27 Our lower bounds extend to more complicated machines: Multiple levels of memory: apply lower bound to each pair of adjacent levels Homogeneous parallel processors: Z /P work per processor Hierarchical parallel processors Heterogeneous machines: optimization problem to balance Z work SLOW FAST 1 FAST 2 FAST 3 LOCAL LOCAL LOCAL LOCAL LOCAL
80 Optimal Parallel Algorithms Sequential blocking gives parallel working sets. 28
81 Optimal Parallel Algorithms Sequential blocking gives parallel working sets. Theory suggests replicating inputs/outputs (e.g., 3D Matmul). Generalize 2.5D algorithms: N body, tensor operations, etc. 28
82 Optimal Parallel Algorithms Sequential blocking gives parallel working sets. Theory suggests replicating inputs/outputs (e.g., 3D Matmul). Generalize 2.5D algorithms: N body, tensor operations, etc. Optimal algorithms for special case, i.e., Ω(( Z /P)/M σ 1 ) is attainable. Ongoing work addresses general case... 28
83 Ongoing Work: Lower Bounds Want to compute lower bounds more efficiently. Claim Rather than enumerating all subgroups H 1, H 2,... of Z d, we only need to check a subset: the lattice L generated by the kernels of φ 1,..., φ m. 29
84 Ongoing Work: Lower Bounds Want to compute lower bounds more efficiently. Claim Rather than enumerating all subgroups H 1, H 2,... of Z d, we only need to check a subset: the lattice L generated by the kernels of φ 1,..., φ m. L may be finite subsets of indices case m 3 May only need to enumerate a finite semilattice L L All subscripts φ j have rank in {0, 1, d 1, d} d 4 29
85 Ongoing Work: Optimal Algorithms Claim For any s on the surface of the polytope, there is a sequence of subsets E of Z d with increasing cardinality which satisfy E = Θ ( j φ j(e) s j ). 30
86 Ongoing Work: Optimal Algorithms Claim For any s on the surface of the polytope, there is a sequence of subsets E of Z d with increasing cardinality which satisfy E = Θ ( j φ j(e) s j ). Question Are there sets which satisfy E = Θ ( (max j φ j (E) ) min s P j s ) j? 30
87 Ongoing Work: Optimal Algorithms Claim For any s on the surface of the polytope, there is a sequence of subsets E of Z d with increasing cardinality which satisfy E = Θ ( j φ j(e) s j ). Question Are there sets which satisfy E = Θ ( (max j φ j (E) ) min s P j s ) j? Question How big are the hidden constants? 30
88 Ongoing Work: Data Dependencies Partial order on Z encoded as DAG Some sets E are inadmissible (cannot be blocked) Question Can we tighten our bound E j φ j(e) s j for admissible sets? 31
89 Ongoing Work: Data Dependencies Partial order on Z encoded as DAG Some sets E are inadmissible (cannot be blocked) Question Can we tighten our bound E j φ j(e) s j for admissible sets? Question Can we extend our parallel theory to expose tradeoffs between: Concurrency, efficiency, memory, communication,... 31
90 Ongoing Work: Cost Model Only discussed communication volume (bandwidth cost). 32
91 Ongoing Work: Cost Model Claim Only discussed communication volume (bandwidth cost). Extend model to address # Messages/synchronizations (latency cost) Message size 1 w M words, so a latency lower bound is # words moved/w = Ω( Z /(wm σ 1 )) = Ω( Z /M σ ) messages. 32
92 Ongoing Work: Cost Model Claim Only discussed communication volume (bandwidth cost). Extend model to address # Messages/synchronizations (latency cost) Message size 1 w M words, so a latency lower bound is # words moved/w = Ω( Z /(wm σ 1 )) = Ω( Z /M σ ) messages. Energy/power costs 32
93 Ongoing Work: Cost Model Claim Only discussed communication volume (bandwidth cost). Extend model to address # Messages/synchronizations (latency cost) Message size 1 w M words, so a latency lower bound is # words moved/w = Ω( Z /(wm σ 1 )) = Ω( Z /M σ ) messages. Energy/power costs Network topology, congestion 32
94 Conclusions 33
95 Conclusions Communication is slowing you down! 33
96 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms 33
97 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms Previous work: Matmul [HK81, ITT04], linear algebra [BDHS11] 33
98 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms Previous work: Matmul [HK81, ITT04], linear algebra [BDHS11] This work: programs with affine array references 33
99 Conclusions Communication is slowing you down! Lower bounds motivate new/improved algorithms Previous work: Matmul [HK81, ITT04], linear algebra [BDHS11] This work: programs with affine array references Goal: Compiler generates communication-optimal code Tech. report to appear soon at bebop.cs.berkeley.edu Thank You 33
100 References I J. Bennett, A. Carbery, M. Christ, and T. Tao. Finite bounds for Hölder-Brascamp-Lieb multilinear inequalities. Mathematical Research Letters, 17(4): , G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM Journal on Matrix Analysis and Applications, 32(3): , J.-W. Hong and H.T. Kung. I/O complexity: the red-blue pebble game. In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pages ACM, D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distributed Computing, 64(9): , L.H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55: ,
Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1
Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays - Part 1 Michael Christ James Demmel Nicholas Knight Thomas Scanlon Katherine A. Yelick Electrical Engineering and Computer
More informationPracticality of Large Scale Fast Matrix Multiplication
Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by
More informationLower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013
Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions James Demmel, Andrew Gearhart, Benjamin Lipshitz and Oded Schwartz Electrical Engineering and Computer Sciences University
More informationMinimizing Communication in Linear Algebra
inimizing Communication in Linear Algebra Grey Ballard James Demmel lga Holtz ded Schwartz Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-62
More informationJames Demmel UC Berkeley Math and EECS Depts. Joint work with Ioana Dumitriu, Olga Holtz, Plamen Koev
. Accurate and efficient expression evaluation and linear algebra, or Why it can be easier to compute accurate eigenvalues of a Vandermonde matrix than the accurate sum of 3 numbers James Demmel UC Berkeley
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationExploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning
Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February
More informationTradeoffs between synchronization, communication, and work in parallel linear algebra computations
Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik Erin Carson Nicholas Knight James Demmel Electrical Engineering and Computer Sciences
More informationTradeoffs between synchronization, communication, and work in parallel linear algebra computations
Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel Department of EECS, UC Berkeley February,
More informationStrassen-like algorithms for symmetric tensor contractions
Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline
More informationA Computation- and Communication-Optimal Parallel Direct 3-body Algorithm
A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationGraph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication
Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication Grey Ballard James Demmel Olga Holtz Benjamin Lipshitz Oded Schwartz Electrical Engineering and Computer Sciences
More informationMINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA *
SIAM J. MATRIX ANAL. & APPL. Vol. 32, No. 3, pp. 866 901 2011 Society for Industrial and Applied Mathematics MINIMIZING COMMUNICATION IN NUMERICAL LINEAR ALGEBRA * GREY BALLARD, JAMES DEMMEL, OLGA HOLTZ,
More informationBjorn Poonen. MSRI Introductory Workshop on Rational and Integral Points on Higher-dimensional Varieties. January 18, 2006
University of California at Berkeley MSRI Introductory Workshop on Rational and Integral Points on Higher-dimensional Varieties January 18, 2006 The original problem H10: Find an algorithm that solves
More informationProvably efficient algorithms for numerical tensor algebra
Provably efficient algorithms for numerical tensor algebra Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley Dissertation Defense Adviser: James Demmel Aug 27, 2014 Edgar Solomonik
More informationIntroduction to communication avoiding linear algebra algorithms in high performance computing
Introduction to communication avoiding linear algebra algorithms in high performance computing Laura Grigori Inria Rocquencourt/UPMC Contents 1 Introduction............................ 2 2 The need for
More informationCommunication-Avoiding Algorithms for Linear Algebra and Beyond
Communication-Avoiding Algorithms for Linear Algebra and Beyond Longer version of slides available at: www.cs.berkeley.edu/~demmel/sc16_tutorial_f inal Jim Demmel, EECS & Math Depts, UC Berkeley and many,
More informationChapter 1. Preliminaries
Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between
More informationMinisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu
Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially
More informationA Potpourri of Nonlinear Algebra
a a 2 a 3 + c c 2 c 3 =2, a a 3 b 2 + c c 3 d 2 =0, a 2 a 3 b + c 2 c 3 d =0, a 3 b b 2 + c 3 d d 2 = 4, a a 2 b 3 + c c 2 d 3 =0, a b 2 b 3 + c d 2 d 3 = 4, a 2 b b 3 + c 2 d 3 d =4, b b 2 b 3 + d d 2
More informationVector spaces. EE 387, Notes 8, Handout #12
Vector spaces EE 387, Notes 8, Handout #12 A vector space V of vectors over a field F of scalars is a set with a binary operator + on V and a scalar-vector product satisfying these axioms: 1. (V, +) is
More informationOn the Complexity of the Minimum Independent Set Partition Problem
On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk
More information2.5D algorithms for distributed-memory computing
ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster
More informationLecture Notes Introduction to Cluster Algebra
Lecture Notes Introduction to Cluster Algebra Ivan C.H. Ip Updated: April 14, 017 Total Positivity The final and historically the original motivation is from the study of total positive matrices, which
More informationLU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016
U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest
More informationStrassen-like algorithms for symmetric tensor contractions
Strassen-lie algorithms for symmetric tensor contractions Edgar Solomoni University of Illinois at Urbana-Champaign Scientfic and Statistical Computing Seminar University of Chicago April 13, 2017 1 /
More informationNonlinear Discrete Optimization
Nonlinear Discrete Optimization Technion Israel Institute of Technology http://ie.technion.ac.il/~onn Billerafest 2008 - conference in honor of Lou Billera's 65th birthday (Update on Lecture Series given
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationDiscrete Optimization
Prof. Friedrich Eisenbrand Martin Niemeier Due Date: April 15, 2010 Discussions: March 25, April 01 Discrete Optimization Spring 2010 s 3 You can hand in written solutions for up to two of the exercises
More informationPreliminaries and Complexity Theory
Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More information47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture 2 Date: 03/18/2010
47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture Date: 03/18/010 We saw in the previous lecture that a lattice Λ can have many bases. In fact, if Λ is a lattice of a subspace L with
More informationLecture #21. c T x Ax b. maximize subject to
COMPSCI 330: Design and Analysis of Algorithms 11/11/2014 Lecture #21 Lecturer: Debmalya Panigrahi Scribe: Samuel Haney 1 Overview In this lecture, we discuss linear programming. We first show that the
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationAlgebraic and Geometric ideas in the theory of Discrete Optimization
Algebraic and Geometric ideas in the theory of Discrete Optimization Jesús A. De Loera, UC Davis Three Lectures based on the book: Algebraic & Geometric Ideas in the Theory of Discrete Optimization (SIAM-MOS
More information- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs
LP-Duality ( Approximation Algorithms by V. Vazirani, Chapter 12) - Well-characterized problems, min-max relations, approximate certificates - LP problems in the standard form, primal and dual linear programs
More informationElementary linear algebra
Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The
More informationA Review of Linear Programming
A Review of Linear Programming Instructor: Farid Alizadeh IEOR 4600y Spring 2001 February 14, 2001 1 Overview In this note we review the basic properties of linear programming including the primal simplex
More information3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions
A. LINEAR ALGEBRA. CONVEX SETS 1. Matrices and vectors 1.1 Matrix operations 1.2 The rank of a matrix 2. Systems of linear equations 2.1 Basic solutions 3. Vector spaces 3.1 Linear dependence and independence
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory
More informationCS Data Structures and Algorithm Analysis
CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic
More informationLMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009
LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix
More informationThe simplex algorithm
The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case. It does yield insight into linear programs, however,
More informationLectures 6, 7 and part of 8
Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationA communication-avoiding parallel algorithm for the symmetric eigenvalue problem
A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June
More informationACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms
1. Computability, Complexity and Algorithms Part a: You are given a graph G = (V,E) with edge weights w(e) > 0 for e E. You are also given a minimum cost spanning tree (MST) T. For one particular edge
More informationSemidefinite Programming
Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has
More informationSUMS OF VALUES OF A RATIONAL FUNCTION. x k i
SUMS OF VALUES OF A RATIONAL FUNCTION BJORN POONEN Abstract. Let K be a number field, and let f K(x) be a nonconstant rational function. We study the sets { n } f(x i ) : x i K {poles of f} and { n f(x
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationLinear Algebra. Preliminary Lecture Notes
Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........
More informationOn embedding all n-manifolds into a single (n + 1)-manifold
On embedding all n-manifolds into a single (n + 1)-manifold Jiangang Yao, UC Berkeley The Forth East Asian School of Knots and Related Topics Joint work with Fan Ding & Shicheng Wang 2008-01-23 Problem
More informationInderjit Dhillon The University of Texas at Austin
Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance
More informationx 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2
Lecture 1 LPs: Algebraic View 1.1 Introduction to Linear Programming Linear programs began to get a lot of attention in 1940 s, when people were interested in minimizing costs of various systems while
More informationOptimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems
Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16:38 2001 Linear programming Optimization Problems General optimization problem max{z(x) f j (x) 0,x D} or min{z(x) f j (x) 0,x D}
More informationACO Comprehensive Exam October 14 and 15, 2013
1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for
More informationChapter 3: Homology Groups Topics in Computational Topology: An Algorithmic View
Chapter 3: Homology Groups Topics in Computational Topology: An Algorithmic View As discussed in Chapter 2, we have complete topological information about 2-manifolds. How about higher dimensional manifolds?
More information1 The linear algebra of linear programs (March 15 and 22, 2015)
1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real
More informationToward accurate polynomial evaluation in rounded arithmetic (short report)
Toward accurate polynomial evaluation in rounded arithmetic (short report) James Demmel, Ioana Dumitriu, and Olga Holtz November 12, 2005 Abstract Given a multivariate real (or complex) polynomial p and
More informationBBM402-Lecture 20: LP Duality
BBM402-Lecture 20: LP Duality Lecturer: Lale Özkahya Resources for the presentation: https://courses.engr.illinois.edu/cs473/fa2016/lectures.html An easy LP? which is compact form for max cx subject to
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationPRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM
PRIMARY DECOMPOSITION FOR THE INTERSECTION AXIOM ALEX FINK 1. Introduction and background Consider the discrete conditional independence model M given by {X 1 X 2 X 3, X 1 X 3 X 2 }. The intersection axiom
More informationThe Shortest Vector Problem (Lattice Reduction Algorithms)
The Shortest Vector Problem (Lattice Reduction Algorithms) Approximation Algorithms by V. Vazirani, Chapter 27 - Problem statement, general discussion - Lattices: brief introduction - The Gauss algorithm
More informationMATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT
MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT The following is the list of questions for the oral exam. At the same time, these questions represent all topics for the written exam. The procedure for
More informationLinear Classification: Linear Programming
Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming Recall the definition
More informationReid 5.2. Describe the irreducible components of V (J) for J = (y 2 x 4, x 2 2x 3 x 2 y + 2xy + y 2 y) in k[x, y, z]. Here k is algebraically closed.
Reid 5.2. Describe the irreducible components of V (J) for J = (y 2 x 4, x 2 2x 3 x 2 y + 2xy + y 2 y) in k[x, y, z]. Here k is algebraically closed. Answer: Note that the first generator factors as (y
More informationOn Two Class-Constrained Versions of the Multiple Knapsack Problem
On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic
More informationMATHEMATICAL ENGINEERING TECHNICAL REPORTS
MATHEMATICAL ENGINEERING TECHNICAL REPORTS Combinatorial Relaxation Algorithm for the Entire Sequence of the Maximum Degree of Minors in Mixed Polynomial Matrices Shun SATO (Communicated by Taayasu MATSUO)
More information5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1
5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1 Definition: An Integer Linear Programming problem is an optimization problem of the form (ILP) min
More informationDuality of LPs and Applications
Lecture 6 Duality of LPs and Applications Last lecture we introduced duality of linear programs. We saw how to form duals, and proved both the weak and strong duality theorems. In this lecture we will
More informationNotes taken by Graham Taylor. January 22, 2005
CSC4 - Linear Programming and Combinatorial Optimization Lecture : Different forms of LP. The algebraic objects behind LP. Basic Feasible Solutions Notes taken by Graham Taylor January, 5 Summary: We first
More informationHilbert s Tenth Problem for Subrings of the Rationals
Hilbert s Tenth Problem for Subrings of the Rationals Russell Miller Queens College & CUNY Graduate Center Special Session on Computability Theory & Applications AMS Sectional Meeting Loyola University,
More informationIntroduction to Semidefinite Programming I: Basic properties a
Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite
More informationSolution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions
Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,
More informationMathematical Research Letters xxx, NN (2008) FINITE BOUNDS FOR HÖLDER-BRASCAMP-LIEB MULTILINEAR INEQUALITIES
Mathematical Research Letters xxx, 10001 100NN (2008) FINITE BOUNDS FOR HÖLDER-BRASCAMP-LIEB MULTILINEAR INEQUALITIES Jonathan Bennett, Anthony Carbery, Michael Christ, and Terence Tao Abstract. A criterion
More informationCOMBINATORIAL CURVE NEIGHBORHOODS FOR THE AFFINE FLAG MANIFOLD OF TYPE A 1 1
COMBINATORIAL CURVE NEIGHBORHOODS FOR THE AFFINE FLAG MANIFOLD OF TYPE A 1 1 LEONARDO C. MIHALCEA AND TREVOR NORTON Abstract. Let X be the affine flag manifold of Lie type A 1 1. Its moment graph encodes
More information6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory
6895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 3: Coding Theory Lecturer: Dana Moshkovitz Scribe: Michael Forbes and Dana Moshkovitz 1 Motivation In the course we will make heavy use of
More informationDM559 Linear and Integer Programming. Lecture 2 Systems of Linear Equations. Marco Chiarandini
DM559 Linear and Integer Programming Lecture Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. Outline 1. 3 A Motivating Example You are organizing
More informationDeciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point
Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point Gérard Cornuéjols 1 and Yanjun Li 2 1 Tepper School of Business, Carnegie Mellon
More informationLP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra
LP Duality: outline I Motivation and definition of a dual LP I Weak duality I Separating hyperplane theorem and theorems of the alternatives I Strong duality and complementary slackness I Using duality
More informationCSC Design and Analysis of Algorithms. LP Shader Electronics Example
CSC 80- Design and Analysis of Algorithms Lecture (LP) LP Shader Electronics Example The Shader Electronics Company produces two products:.eclipse, a portable touchscreen digital player; it takes hours
More informationAlgebraic Methods in Combinatorics
Algebraic Methods in Combinatorics Po-Shen Loh June 2009 1 Linear independence These problems both appeared in a course of Benny Sudakov at Princeton, but the links to Olympiad problems are due to Yufei
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationAn algebraic perspective on integer sparse recovery
An algebraic perspective on integer sparse recovery Lenny Fukshansky Claremont McKenna College (joint work with Deanna Needell and Benny Sudakov) Combinatorics Seminar USC October 31, 2018 From Wikipedia:
More informationData Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1
Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline
More informationSemidefinite programs and combinatorial optimization
Semidefinite programs and combinatorial optimization Lecture notes by L. Lovász Microsoft Research Redmond, WA 98052 lovasz@microsoft.com http://www.research.microsoft.com/ lovasz Contents 1 Introduction
More informationAlgebraic Methods in Combinatorics
Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets
More informationIII. Linear Programming
III. Linear Programming Thomas Sauerwald Easter 2017 Outline Introduction Standard and Slack Forms Formulating Problems as Linear Programs Simplex Algorithm Finding an Initial Solution III. Linear Programming
More informationCSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming
CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming January 26, 2018 1 / 38 Liability/asset cash-flow matching problem Recall the formulation of the problem: max w c 1 + p 1 e 1 = 150
More informationChapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)
Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3
More information(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).
CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,
More informationThe use of predicates to state constraints in Constraint Satisfaction is explained. If, as an
of Constraint Satisfaction in Integer Programming H.P. Williams Hong Yan Faculty of Mathematical Studies, University of Southampton, Southampton, UK Department of Management, The Hong Kong Polytechnic
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationCO 250 Final Exam Guide
Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,
More informationApplied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More information