Loop parallelization using compiler analysis

Which of these loops is parallel? How can we determine this automatically using compiler analysis?

Organization of a Modern Compiler Source Program Front-end syntax analysis + type-checking + symbol table High-level Intermediate Representation loops,array references are preserved) Middle1 loop-level transformations Low-level Intermediate Representation array references converted into low level operations, loops converted to control flow) Low-level intermediate Representation Middle2 conventional optimizations Back-end Assembly Code register allocation instruction selection

"! # $ ) * * +, # *.- / 01 23 4 / 01 3 4 9 3 9 =< > 2 @? A 2 CB D@EFG IHE J KML N IOE PE J Q Q DSRT Q Q D N IOE NU VW U H W O QC XE OO U CN E N I IXIO E C E N I O Q ZHH F V U I W NU O QC Z[E I N Q I X JJ J Q D OQ Q G J Q Q D N IOE / 01 \ 3 4 ] \ \ 3 ^ _ 1 ` ] \ \ / 01 a 3 \< 4 ] a \ 3 ] a \ b ] \ \ / 01 a 3 \< 4 / 01 c 3 \< 4 a ] a c ed 3 ] a \? ] c \ f

GF Q ZH OG Q F J IOEN PE h Z g N QVR D@EFG IHE K L J Q D OQ

Š ŒŒŒ # o ), l # # $ ˆ Š ˆ. Š ˆ. Š ˆ Š ˆ. Š Œ ŒŒ Ž ˆ. oƒ + ) # J DO I = 1, N DO J = 1,M S M 1 1 N I # ) #! * #! * - ƒ!

Intuition about systems of linear inequalities Equality line 2D), plane 3D), hyperplane > 3D) Inequality half-plane 2D), half-space>2d) y 3x+4y = 12 x 3x + 4y <= 12 Region described by inequality is convex if two points are in region, all points in between them are in region)

Intuition about systems of linear inequalities Conjunction of inequalties = intersection of half-spaces => some convex region y <= 4 y x >= - 3x+4y <= 12 x 3x - 3y <= 9 Region described by inequalities is a convex polyhedron if two points are in region, all points in between them are in region)

$ + * + + * $, * - + # # + * ) - n ) #, + ) # ) - k # # # * $ + m # # -

± ² ³ ² µ ² ± µ š ž ¹ œ ž Ÿ º»¼»œ ½ ½ ¾ ¾ ¾ ž ¹ ¾ ¾ º¾»»œ ½ ½ ¾ ¾ p # À # # * Á Â Á Ã p Ä Á Â Ä Á Ã Ä Ž ÅÇÆur ts t ~ uw ut vt È É p Ê Á Â Ë Á Ã Å y v Ì vt vuu ÍÎ w { v s r xw É p #! # # * Á Ï Á Ð p Ä Á ÏÒÑ Á Ð Ä Ž Å u vt È t ~ uw t ruæ ts É p Ê Á Ð Ë Á Ï Å y v Ì vt vuu ÍÎ w { v s r xw É p # + + # # * Á Â Á Â p Ä Á Â Ñ Á Â Ä Ž ÅÇÆur rts x z u } uwv Ì uw utè É p Ê Á Â Ê Á Â Å y v Ì vt vuu Í Î w { v s r xw É f

í ä ä î è æè é ä ä ì è æè ä ä è æè Ú Û Ú Û Û Ú Û Ú Û Û Š # # Á Â Á Ã Ã ½ ½ ¾ ¾ ½ ž Ÿ ± ³ ðï ã Ü å ß æëê ã Ü å ß ã Ü å ß æëê ã Ü å ß ã Ü å ß æ ç ã Ü å ß ß â Û à Þ Þ Ü â Ú Þ Þ ß Ý Ûáà Þ Þ Ü Ý Ú Þ Þ Ù o ) # # - À # * ˆØ Â ˆ ž ¹ ¾ ¾ º¾»Õ»œ Ÿ ½ Ÿ Ö»œ Ÿ º»¼»œ Ÿ Ÿ»œ Ÿ ½ ½ ¾ ¾ ¾ š ž ¹ ž Ÿ Ô ¹ ¹ š ž ¹ œ ž ¹ ¹ ² ³ ² µ ² ² Óµ ± µ

ø ö ù óø ï òñ óô õò ó úö ù úò öúò üúû ûõò úý öúþ ò ú ö õò õÿ ø ü ÿ

! " $# ) ) * +, -.... * +, / -... 0 1 2 2 3 4 9 < ê < = >? û? ô ûõò úý >@ ô óóúò õ ü BA < < = < = < ï C

0D BA < < = < = < ûõò úý úe ù @ üüú ú ö òô úó? F @ G H I KJ õ ü F ÿÿ ü> L L L L L L L A A < A < A < < M M M M M M M N O O O O O O O P A A A Q R R R R R R R S ï T

! U " $# ) ) " $# V ) ) * + V -. *. + XW V/ -... 0 1 2 2 3 4 Z [ \ ] Z [ \ZB^_ a`i b c d egf a` b _h \ A i = i j ò kú@ óÿ úe ô û @ õ ù? ô û @ öú@ ] òô ó òô úó ú@ ú l ø õÿ ô üúóô m ôò ú l ø õÿ ô üúóô ï n

oqp r sut r vxw o p y s t y v z p r { p y } oop r p y vq~ o t r { t y vv 2 0 2 z 9 BA i = i ƒ i 9 BA i = i 0 0 ˆ

0 4 " $# ) ) " $# V * + V -. *. + XW V/ -... Z [ \ ] Z [ \ZB^_ a`i b c d egf a` b _h \ BA i = i ˆŠ

0 Œ Ž Ž š š œ š ž šÿ š Ÿœ š š š Z \ < Z \ Z \ < Z \ j ª k «[ < ª G ª ± ² F ² ª ³ ² ˆµ

ˆ ˆ Min s and max s in loop bounds mayseem weird, but actually they describe general polyhedral iteration spaces! J U1 L1 L2 U2 I For a given I, the J co-ordinate of a point in the iteration space of the loop nest satisfies maxl1i),l2i)) <= J <= minu1i),u2i))

¹ º» ¼ ½» ¾ ¹ ¼ º ½ ½»¹ ¹ ¾ ¹ ¼ À º º Á 2 À¼ ¾ Ž ÃÂ Ž ÃÂ Ä Å ª ³ ² Æ«@ ª Ç ª? È? ³ ª > ² É² ² > ² ²³ Ê ª G ÌË ÌË A Æ? ³ ³ l ³ k Å ² ª ª ²³ È F³ ª? @ ³ Å ª ³ ² @F ² Ê k Å F Ç Ç ª «F³ >? k G @ ²³ @F G ª ³ ² ª ª? @ ² È F Ç Í > ² Å³ Ê Î Î ³ ª Î Î ³ ª ³ ² Å ²³ l Å ³ ª ³ ˆÏ

Ð À Ñ» ¹ ¹¼ ¼ ¹ º ¼ ½ ¼ ¹ ¾ ¾» 2» ¼ 2» ¹»»¹ ¼ ¾ º ½» Ñ ¼ ¾» º ½ ¼ ¾ º» 0 ¾» 0 ¼ ¹ ¼» ¾ À»¹ ¼» ¾ Ñ ¾» ~ Ò ÓKÔ 4 Õ 2Ö 2Ö ¾ Ö» 0 ¹ ¾ ½ ¹ ¾¹ Ö¼ º Ö 4 Øˆ

ˆ C Presentation sequence - one equation, several variables 2 x + 3 y = - several equations, several variables 2x + 3 y + z = 3x + 4 y = 3 - equations inequalities 2x + 3 y = x <= y <= -9 Diophatine equations use integer Gaussian elimination Solve equalities first then use Fourier-Motzkin elimination

ˆ T One equation, many variables Thm The linear Diophatine equation a1 x1 + a2 x2 +...+ an xn = c has integer solutions iff gcda1,a2,...,an) divides c. Examples 1) 2x = 3 No solutions 2) 2x = One solution x = 3 3) 2x + y = 3 GCD2,1) = 1 which divides 3. Solutions x = t, y = 3-2t) 4) 2x + 3y = 3 GCD2,3) = 1 which divides 3. Let z = x + floor3/2) y = x + y Rewrite equation as 2z + y = 3 Solutions z = t => x = 3t - 3) y = 3-2t) y = 3-2t) Intuition Think of underdetermined systems of eqns over reals. Caution Integer constraint => Diophantine system may have no solns

ˆ n Thm The linear Diophatine equation a1 x1 + a2 x2 +...+ an xn = c has integer solutions iff gcda1,a2,...,an) divides c. Proof Base case WLOG, assume that all coefficients a1,a2,...an are positive. We prove only the IF case by induction, the proof in the other direction is trivial. Induction is on minsmallest coefficient, number of variables). If # of variables = 1), then equation is a1 x1 = c which has integer solutions if a1 divides c. If smallest coefficient = 1), then gcda1,a2,...,an) = 1 which divides c. Wlog, assume that a1 = 1, and observe that the equation has solutions of the form c - a2 t2 - a3 t3 -...-an tn, t2, t3,...tn). Inductive case Suppose smallest coefficient is a1, and let t = x1 + floora2/a1) x2 +...+ flooran/a1) xn In terms of this variable, the equation can be rewritten as a1) t + a2 mod a1) x2 +...+ an mod a1) xn = c 1) where we assume that all terms with zero coefficient have been deleted. Observe that 1) has integer solutions iff original equation does too. Now gcda,b) = gcda mod b, b) => gcda1,a2,...,an) = gcda1, a2 mod a1),..,an mod a1)) => gcda1, a2 mod a1),..,an mod a1)) divides c. If a1 is the smallest co-efficient in 1), we are left with 1 variable base case. Otherwise, the size of the smallest co-efficient has decreased, so we have made progress in the induction.

Ï Summary Eqn a1 x1 + a2 x2 +...+ an xn = c - Does this have integer solutions? = Does gcda1,a2,...,an) divide c?

Ï ˆ Systems of Diophatine Equations Key idea use integer Gaussian elimination Example 2x + 3y + 4z = x - y + 2z = => 2 3 4 1-1 2 x y z = It is not easy to determine if this Diophatine system has solutions. Easy special case lower triangular matrix 1 0 0-2 0 x y = => x = y = 3 z z = arbitrary integer Question Can we convert general integer matrix into equivalent lower triangular system? INTEGER GAUSSIAN ELIMINATION

Ï ã Unimodular Column Operations a) Interchange two columns 2 3 3 2 0 1 1 0 b) Negate a column Check Let x,y satisfy first eqn. Let x,y satisfy second eqn. x = y, y = x Check 2 3 1 0 0-1 2-3 - x = x, y = - y c) Add an integer multiple of one column to another 2 3 2 1 n = -1 1-1 1 0 1 Check x = x + n y y = y

Ï ä Example 2 3 4 1-1 2 x y z = 2 3 4 2 3 0 2 1 0 0 1 0 => => => => 1-1 2 1-1 0 1-2 0-2 0 1 0 0-2 0 1 0-2 1-1 0 1 0 0 0 1 0 0 1 0 0 1 0-2 1 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0-2 0 x x = = => y y = 3 z z = t => x y z = -1 3-2 1-2 0 0 0 1 3 t 4-2t = -1 t

$ Systems of Inequalities 1

Given system of inequalities of the form Ax b $ Goals determine if system has an integer solution enumerate all integer solutions 2

bounds for x 2) and 3) Upper bounds for x 1) and 4) Lower bounds for y 2) and 4) Upper bounds for y 1) and 3) Lower $ Running example 3x + 4y 1 1) 4x + y 2) 4x y 20 3) 2x 3y 9 4) 3

$ MATLAB graphs y 2x 3y= 9 4x+y= 4 3 2 1 3x+4y=1 4x y=20 0 0 1 2 3 4 9 10 x 4

$ Code for enumerating integer points in polyhedron see graph) Outer loop, Inner loop X DO =d4=3e,b4=13c DO X=dmax1=3 4y=3 9=2 + 3y=2)e,bmin + y=4 14 y=4)c... Outer loop X, Inner loop DO X=1, 9 DO =dmax4 3y=4 4x 20)=)e,bmin 4x= 2x + 9)=3)c... How do we can determine loop bounds?

elimination variable elimination technique for Fourier-Motzkin inequalities $ 3x + 4y 1 ) 4x + y ) 4x y 20 ) 2x 3y 9 ) Let us project out x. First, express all inequalities as upper or lower bounds on x. x 1=3 4y=3 9) x 14 y=4 10) x + y=4 11) x 9=2 + 3y=2 12)

any y, if there is an x that satises all inequalities, then every For bound on x must be less than or equal to every upper bound lower a new system of inequalities from each pair upper,lower) Generate bounds. $ on x. + y=4 1=3 4y=3Inequalities3 1) + y=4 9=2 + 3y=2Inequalities3 4) 14 y=4 1=3 4y=3Inequalities2 1) 14 y=4 9=2 + 3y=2Inequalities2 4)

$ Simplify y 4=3 y 3 y 104= y 4=13 => max4=3 3) y min104= 4=13) => 4=3 y 4=13 This means there are rational solutions to original system of inequalities.

$ We can now express solutions in closed form as follows 4=3 y 4=3 max1=3 4y=3 9=2 + 3y=2) x min + y=4 14 y=4) 9

elimination iterative algorithm Fourier-Motzkin step Iterative $ obtain reduced system by projecting out a variable if reduced system has a rational solution, so does the original Termination no variables left Projection along variable x Divide inequalities into three categories a 1 y + a 2 z + c 1 no x) b 1 x c 2 + b 2 y + b 3 z + upper bound) d 1 x c 3 + d 2 y + d 3 z + lower bound) New system of inequalities All inequalities that do not involve x Each pair lower,upper) bounds gives rise to one inequality b 1 [c 3 + d 2 y + d 3 z + ] d 1 [c 2 + b 2 y + b 3 z + ] 10

If y 1 z 1 ) satises the reduced system, then Theorem 1 y 1 z 1 ) satises the original system, where x 1 is a rational x $ number between min1=b 1 c 2 + b 2 y 1 + b 3 z 1 + ) ) over all upper bounds) and max1=d 1 c 3 + d 2 y 1 + d 3 z 1 + ) ) over all lower bounds) Proof trivial 11

If reduced system has no integer solutions, neither does Corollary original system. the true Reduced system has integer solutions => original system Not too. does problem Multiplying one inequality by b 1 and other by d 1 is Key guaranteed to preserve "integrality" cf. equalities) not projection If all upper bound coecients b i Exact coecients d i bound $ What can we conclude about integer solutions? original system - no integers in original polyhedron - projected system contains integers [ ] X projected system or all lower happen to be 1, then integer solution to reduced system implies integer solution to original system. 12

because there are integers between 4/3 and 4/13, we cannot Just there are integers in feasible region. assume if gap between lower and upper bounds is greater than or However, to 1 for some integer value of y, there must be an integer in equal More accurate algorithm for determining existence $ y 2x 3y= 9 4x+y= 4 3 2 1 3x+4y=1 4x y=20 0 0 1 2 3 4 9 10 x feasible region. 1

shadow region of y for which gap between upper and lower Dark of x is guaranteed to be greater than or equal to 1. bounds $ Determining dark shadow region Ordinary FM elimination x u, x l => u l Dark shadow x u, x l => u l + 1 1

$ For our example, dark shadow projection along x gives system + y=4 1=3 4y=3 + 1Inequalities3 1) + y=4 9=2 + 3y=2 + 1Inequalities3 4) 14 y=4 1=3 4y=3 + 1Inequalities2 1) 14 y=4 9=2 + 3y=2 + 1Inequalities2 4) => /13 y 1/3 There is an integer value of y in this range => integer in polyhedron. 1

More accurate estimate of dark shadow y x = 1/b1c2+b2y+b3z+...) y1 x = 1/d1c3+d2y+d3z+...) gap is some multiple of 1/b1 gap is some multiple of 1/d1 $ For integer values of y1,z1,..., there is no integer value x1 between lower and upper bounds if 1/d1c3+d2y1+d3z1+...) - 1/b1c2+b2y1+b3z1+...) +1/b1+1/d1 <= 1 This means there is an integer between upper and lower bounds if 1/d1c3+d2y1+d3z1+...) - 1/b1c2+b2y1+b3z1+...) +1/b1+1/d1 > 1 To convert this to >=, notice that smallest change of lhs value is 1/b1d1. So the inequality is => 1/d1c3+d2y1+d3z1+...) - 1/b1c2+b2y1+b3z1+...) +1/b1+1/d1 >= 1 + 1/b1d1 1/d1c3+d2y1+d3z1+...) - 1/b1c2+b2y1+b3z1+...) >= 1-1/b1)1-1/d1) 19

If b 1 = 1) or d 1 = 1), dark shadow constraint = real Note constraint shadow $ 20

$ Example 3x 1 4y 4x 20 + y Real shadow 20 + y) 3 41 4y) Dark shadow 20 + y) 3 41 4y) 12 Dark shadow improved) 20 + y) 3 41 4y) 21

may still be integer points nestled closely between an upper There lower bound. and $ What if dark shadow has no integers? dark shadow [ ] X projected system 22

One enumeration idea splintering dark shadow $ X Scan the corners with hyperplanes, looking for integer points. Generate a succession of problems in which each lower bound is replaced with a sequence of hyperplanes. How many hyperplanes are needed? Equation for lower bound x = 1/b1c2+b2y+b3z+...) Hyperplanes x = 1/b1c2+b2y+b3z+...) x = 1/b1c2+b2y+b3z+...)+ 1/b1 x = 1/b1c2+b2y+b3z+...)+ 2/b1 x = 1/b1c2+b2y+b3z+...)+ 3/b1... x = 1/b1c2+b2y+b3z+...)+ 1 in dark shadow region if this is integer, so is ) 24

Summary Two integer linear programming problems Is there an integer point within a polyhedron Ax b where A is an integer matrix and b is an integer vector? used for dependence analysis Enumerate the integer points within a polyhedron Ax b where A is an integer matrix and b is an integer vector. used for code generation after affine loop transformation