Automatic Verification of Pointer Programs Using Grammar-Based Shape Analysis

Automti Verifition of Pointer Progrms Using Grmmr-Bsed Shpe Anlysis Oukseh Lee 1, Hongseok Yng 2, nd Kwngkeun Yi 3 1 Dept. of Computer Siene & Engineering, Hnyng University, Kore 2 ERC-ACI, Seoul Ntionl University, Kore 3 Shool of Computer Siene & Engineering, Seoul Ntionl University, Kore Astrt. We present progrm nlysis tht n utomtilly disover the shpe of omple pointer dt strutures. The disovered invrints re, then, used to verify the sene of sfety errors in the progrm, or to hek whether the progrm preserves the dt onsisteny. Our nlysis etends the shpe nlysis of Sgiv et l. with grmmr nnottions, whih n preisely epress the shpe of omple dt strutures. We demonstrte the usefulness of our nlysis with inomil hep onstrution nd the Shorr-Wite tree trversl. For inomil hep onstrution lgorithm, our nlysis returns grmmr tht preisely desries the shpe of inomil hep; for the Shorr-Wite tree trversl, our nlysis shows tht t the end of the eeution, the result is tree nd there re no memory leks. 1 Introdution We show tht stti progrm nlysis n utomtilly verify pointer progrms, suh s inomil hep onstrution nd the Shorr-Wite tree trversl. The verified properties re: for inomil hep onstrution lgorithm, our nlysis verifies tht the returned hep struture is inomil hep; for the Shorr-Wite tree trversl, it verifies tht the output tree is inry tree, nd there re no memory leks. In oth ses, the nlysis took less thn 0.2 seond in Intel Pentium 3.0C with 1GB memory, nd its result is simple nd humn-redle. Note tht lthough these progrms hndle regulr hep strutures suh s inomil heps nd trees, the topology of pointers (e.g., yles) nd their impertive opertions (e.g., pointer swpping) re firly hllenging for fully utomti stti verifition without ny nnottion from the progrmmer. The stti nlysis is n etension to Sgiv et l. s shpe nlysis [13] y grmmrs. To improve ury, we ssoite grmmrs, whih finitely summrize run-time hep strutures, with the summry nodes of the shpe grphs. This enrihment of shpe grph y grmmrs provides n mple spe for preisely pturing the impertive effets on hep strutures. The grmmr is unfolded to epose n et hep struture on demnd. The grmmr is lso folded to reple n et hep struture y n strt nonterminl. To ensure the termintion of the nlysis, the grmmr merges multiple Lee nd Yi were supported y the Brin Kore 21 projet in 2004, nd Yng ws supported y R08-2003-000-10370-0 from the Bsi Reserh Progrm of the Kore Siene & Engineering Foundtion.

prodution rules into single one, nd unify multiple nonterminls; this simplifition mkes the grmmr size remin within prtil ound. The nlysis s orretness is proved vi seprtion logi [12, 11]. The nlysis is omposition of strt opertions over the grmmr-sed shpe grphs. The semntis (onretiztion) of the shpe grphs is defined s ssertions in seprtion logi. Eh strt opertor is proved sfe y showing tht the seprtion-logi ssertion for the input grph implies tht for the output grph. The input progrm C wrpped y the input nd output ssertions {P }C{Q} is lwys provle Hore triple y the seprtion-logi proof rules. The min limittion of our nlysis is tht the nlysis nnot hndle DAGs nd generl grphs. To overome this limittion, we need to use more generl grmmr, where the nonterminls n tlk out shred ells. Relted Work We orrowed severl interesting ides from the shpe nlysis [14]. Our nlysis represents progrm invrint using set of shpe grphs where eh shpe grph onsists of either onrete or strt nodes. It uses the ide of refining n strt node, often lled fous or mteriliztion, nd lso the ide of merging the shpe grphs whih hve similr struture [9, 5]. The differene is the use of grmmr; it is the min reson for the improved preision of our nlysis. Another differene is tht our nlysis seprtes node-summrizing riteri from the properties of the summry nodes. Normlly, the shpe nlysis of Sgiv et l. prtitions ll the onrete nodes ording to the instrumenttion predites tht they stisfy, nd groups eh prtition into single summry node. Thus, two different summry nodes must stisfy different sets of instrumenttion predites. Our nlysis, on the other hnd, groups the onrete nodes using the most pproimte grmmr: eh group is miml set of onrete nodes tht n e epressed y the most pproimte grmmr. Then, the nlysis summrizes eh group y single summry node, nd nnottes the summry node with new grmmr tht est desries the pointer struture of the summrized onrete nodes. As onsequene, two different summry nodes in our nlysis n hve the identil grmmr nnottions. Grph type [7, 10] nd shpe type [6] re lso losely relted to our work. Both of them epress invrints of hep ojets (or dt strutures) using grmmr-sed lnguges, whih re more epressive thn the grmmrs we used. However, they ssume tht ll the loop invrints of progrm re provided, while our work infers suh invrints. Outline Setion 2 desries the soure progrmming lnguge. Setion 3 overviews seprtion logi tht we use to give the mening of strt vlues. Then, we eplin the key ides of our nlysis, using simpler version tht n hndle tree-like strutures with no shred nodes. Setion 4 nd 5 eplin the strt domin nd strt opertors, nd Setion 6 defines the nlyzer. The simpler version is etended to the full nlysis in Setion 7. Setion 8 demonstrtes the ury of our nlysis using inomil hep onstrution lgorithm nd the Shorr-Wite tree trversing lgorithm. 2 Progrmming Lnguge We use the stndrd while lnguge with dditionl pointer opertions. 2

Vrs Fields f {0, 1} Boolen Epressions B ::= = y!b Commnds C ::= := nil := y := new := y->f ->f := y C; C if B C C while B C This lnguge ssumes tht every hep ell is inry, hving fields 0 nd 1. A hep ell is lloted y := new, nd the ontents of suh n lloted ell is essed y the field-dereferene opertion ->. All the other onstruts in the lnguge re stndrd. 3 Seprtion Logi with Reursive Predites Let Lo nd Vl e unspeified infinite sets suh tht nil Lo nd Lo {nil} Vl. We onsider seprtion logi for the following semnti domins. Stk = Vrs fin Vl Hep = Lo fin Vl Vl Stte = Stk Hep This domin implies tht stte hs the stk nd hep omponents, nd tht the hep omponent of stte hs finitely mny inry ells. The ssertion lnguge in seprtion logi is given y the following grmmr: 4 P ::= E = E emp (E E, E) P P true P P P P P. P Seprting onjuntion P Q is the most importnt, nd it epresses the splitting of hep storge; P Q mens tht the hep n e split into two prts, so tht P holds for the one nd Q for the other. We often use preise equlity nd iterted seprting onjuntion, oth of whih we define s syntti sugrs. Let X e finite set { 1,..., n } where ll i s re different. E. = E = E=E emp X A = if (X = ) then emp else (A 1... A n ) In this pper, we use the etension of the si ssertion lnguge with reursive predites [15]: P ::=... α(e,..., E) re Γ in P Γ ::= α( 1,..., n ) = P Γ, Γ The etension llows the definition of new reursive predites y lest-fied points in re Γ in P, nd the use of suh defined reursive predites in α(e,..., E). To ensure the eistene of the lest-fied point in re Γ in P, we will onsider only well-formed Γ where ll reursively defined predites pper in positive positions. A reursive predite in this etended lnguge mens set of hep ojets. A hep ojet is pir of lotion (or lotions) nd hep. Intuitively, the first omponent denotes the strting ddress of dt struture, nd the seond the ells in the dt struture. For instne, when linked list is seen s hep ojet, the lotion of the hed of the list eomes the first omponent, nd the ells in the linked list the seond omponent. The preise semntis of this ssertion lnguge is given y foring reltion =. For stte (s, h) nd n environment η for reursively defined predites, we define indutively when n ssertion P holds for (s, h) nd η. We show the smple luses elow; the full definition ppers in [8]. 4 The ssertion lnguge lso hs the djoint of. But this djoint is not used in this pper, so we omit it here. 3

(s, h), η = α(e) iff ([[E]]s, h) η(α) (s, h), η = re α()=p in Q iff (s, h), η[α k] = P (where k = lfi λk 0.{(v, h ) (s[ v ], h ), η[α k 0 ] = P }) 4 Astrt Domin Shpe Grph Our nlysis interprets progrm s (nondeterministi) trnsformer of shpe grphs. A shpe grph is n strtion of onrete sttes; this strtion mintins the si struture of the stte, ut strts wy ll the other detils. For instne, onsider stte ([ 1, y 3], [1 2, nil, 2 nil, nil, 3 1, 3 ]). We otin shpe grph from this stte in two steps. First, we reple the speifi ddresses, suh s 1 nd 2, y symoli lotions; we introdue symols,,, nd represent the stte y ([, y ], [, nil, nil, nil,, ]). Note tht this proess strts wy the speifi ddresses nd just keeps the reltionship etween the ddresses. Seond, we strt hep ells nd y grmmr. Thus, this step trnsforms the stte to ([, y ], [ tree,, ]) where tree mens tht is the ddress of the root of tree, whose struture is summrized y grmmr rules for nonterminl tree. The forml definition of shpe grph is given s follows: SymLo = {,,,...} NonTerm = {α, β, γ,...} Grph = (Vrs fin SymLo) (SymLo fin {nil} + SymLo 2 + NonTerm) Here the set of nonterminls is disjoint from Vrs nd SymLo; these nonterminls represent reursive hep strutures suh s tree or list. Eh shpe grph hs two omponents (s, g). The first omponent s mps stk vriles to symoli lotions. The other omponent g desries hep ells rehle from eh symoli lotion. For eh, either no hep ells n e rehed from, i.e, g() = nil; or, is inry ell with ontents, ; or, the ells rehle from form hep ojet speified y nonterminl α. We lso require tht g desries ll the ells in the hep; for instne, if g is the empty prtil funtion, it mens the empty hep. The semntis (or onretiztion) of shpe grph (s, g) is given y trnsltion into n ssertion in seprtion logi: mens v (, nil) = =. nil mens v (, α) = α() mens v (,, ) = (, ) mens s(s, g) =. ( dom(s) =. s()) ( dom(g) mensv(, g())) The trnsltion funtion mens s lls suroutine mens v to get the trnsltion of the vlue of g(), nd then, it eistentilly quntifies ll the symoli lotions ppering in the trnsltion. For instne, mens s([, y ], [ tree,, ]) is. (. = ) (y. = ) tree() (, ). When we present shpe grph, we interhngely use the set nottion nd grph piture. Eh vrile or symoli lotion eomes node in grph, nd s nd g re represented y edges or nnottions. For instne, we drw shpe grph (s, g) = ([ ], [,, nil, α]) s: nil Note tht pir g() is represented y two edges (the left one is for field 0 nd the right one for field 1), nd non-pir vlues g() nd g() y nnottions to the nodes. α 4

Grmmr A grmmr gives the mening of nonterminls in shpe grph. We define grmmr R s finite prtil funtion from nonterminls (the lhs of prodution rules) to nf ({nil} + ({nil} + NonTerm) 2 ) (the rhs of prodution rules), where nf (X) is the fmily of ll nonempty finite susets of X. Grmmr = NonTerm fin nf ({nil} + ({nil} + NonTerm) 2 ) Set R(α) ontins ll the possile shpes of hep ojets for α. If nil R(α), α n e the empty hep ojet. If β, γ R(α), then some hep ojet for α n e split into root ell, the left hep ojet β, nd the right hep ojet γ. For instne, if R(tree) = {nil, tree, tree } (i.e., in the prodution rule nottion, tree ::= nil tree, tree ), then tree represents inry trees. In our nlysis, we use only well-formed grmmrs, where ll nonterminls ppering in the rnge of grmmr re defined in the grmmr. The mening mens g(r) of grmmr R is given y reursive predite delrtion Γ in seprtion logi. Γ is defined etly for dom(r), nd stisfies the following: when nil R(α), Γ (α) is α() =. (, ) mens v (, v) mens v (, w), v,w R(α) where neither nor ppers in, v or w; otherwise, Γ (α) is identil s ove eept tht. = nil is dded s disjunt. For instne, mens g ([tree {nil, tree, tree }]) is {tree() =. =nil.(, ) tree() tree()}. Astrt Domin The strt domin D for our nlysis onsists of pirs of shpe grph set nd grmmr: D = { } + nf (Grph) Grmmr. The element indites tht our nlysis fils to produe ny meningful results for given progrm euse the progrm hs sfety errors, or the progrm uses dt strutures too omple for our nlysis to pture. The mening of eh strt stte (G, R) in D is mens(g, R) = re mens g (R) in (s,g) G mens s(s, g). 5 Normlized Astrt Sttes nd Normliztion The min strength of our nlysis is to utomtilly disover grmmr tht desries, in n intuitive level, invrints for hep dt strutures, nd to strt onrete sttes ording to this disovered grmmr. This inferene of high-level grmmrs is minly done y the normliztion funtion from D to sudomin D of normlized strt sttes. In this setion, we eplin these two notions, normlized strt sttes nd normliztion funtion. 5.1 Normlized Astrt Sttes An strt stte (G, R) is normlized if it stisfies the following two onditions. First, ll the shpe grphs (s, g) in G re strt enough: ll the reognizle hep ojets re repled y nonterminls. Note tht this ondition on (G, R) is out individul shpe grphs in G. We ll shpe grph normlized if it stisfies this ondition. Seond, n strt stte does not hve redundnies: ll shpe grphs re not similr, nd ll nonterminls hve non-similr definitions. 5

d e β y nil nil α nil d e α y y y α nil α nil β α (s 1, g 1 ) (s 2, g 2 ) (s 3, g 3 ) () not normlized () normlized () (s 1, g 1) (s 2, g 2) ut (s 1, g 1) (s 3, g 3) Fig. 1. Emples of the Normlized Shpe Grphs nd Similr Shpe Grphs. Normlized Shpe Grphs A shpe grph is normlized when it is mimlly folded. A symoli lotion is foldle in (s, g) if g() is pir nd there is no pth from to shred symoli lotion tht is referred more thn one. When dom(g) of shpe grph (s, g) does not hve ny foldle lotions, we sy tht (s, g) is normlized. For instne, Figure 1.() is not normlized, euse is foldle: is pir nd does not reh ny shred symoli lotions. On the other hnd, Figure 1.() is normlized, euse ll the pirs in the grph (i.e., nd ) n reh shred symoli lotion e. y e d f Similrity We define three notions of similrity: one for shpe grphs, nother for two ses of the grmmr definitions, nd the third for the grmmr definitions of two nonterminls. Two shpe grphs re similr when they hve the similr strutures. Let S e sustitution tht renmes symoli lotions. Two shpe grphs (s, g) nd (s, g ) re similr up to S, denoted (s, g) G S (s, g ), if nd only if 1. dom(s) = dom(s ) nd S(dom(g)) = dom(g ); 2. for ll dom(s), S(s()) = s (); nd 3. for ll dom(g), if g() is pir, for some nd, then g (S()) = S(), S() ; if g() is not pir, neither is g (S()). Intuitively, two shpe grphs re S-similr, when equting nil nd ll nonterminls mkes the grphs identil up to renming S. We sy tht (s, g) nd (s, g ) re similr, denoted (s, g) (s, g ), if nd only if there is renming reltion S suh tht (s, g) G S (s, g ). For instne, in Figure 1.(), (s 1, g 1 ) nd (s 2, g 2 ) re not similr euse we nnot find renming sustitution S suh tht S(s 1()) = S(s 1(y)) (ondition 2). However, (s 1, g 1 ) nd (s 3, g 3 ) re similr euse renming sustitution {d/, e/, f/} mkes (s 1, g 1 ) identil to (s 3, g 3 ) when nil nd ll nonterminls re ersed from the grphs. Cses e 1 nd e 2 in the grmmr definitions re similr, denoted e 1 C e 2, if nd only if either oth e 1 nd e 2 re pirs, or they re oth non-pir vlues. The similrity E 1 D E 2 etween grmmr definitions E 1 nd E 2 uses this se similrity: E 1 D E 2 if nd only if, for ll ses e in E 1, E 2 hs similr se e to e (e C e ), nd vie vers. For emple, in the grmmr α ::= β, nil, β ::= nil β, nil, γ ::= γ, γ α, nil the definitions of α nd γ re similr euse oth γ, γ nd α, nil re similr to β, nil. But the definitions of α nd β re not similr sine α does not hve se similr to nil. Definition 1 (Normlized Astrt Sttes). An strt stte (G, R) is normlized if nd only if 6

1. ll shpe grphs in G re normlized; 2. for ll (s 1, g 1 ), (s 2, g 2 ) G, we hve (s 1, g 1 ) (s 2, g 2 ) (s 1, g 1 )=(s 2, g 2 ); 3. for ll α dom(r) nd ll ses e 1, e 2 R(α), e 1 C e 2 implies tht e 1=e 2; 4. for ll α, β in dom(r), R(α) D R(β) implies tht α=β. We write D for the set of normlized strt sttes. k-bounded Normlized Sttes Unfortuntely, the normlized strt domin D does not ensure the termintion of the nlysis, euse it hs infinite hins. For eh numer k, we sy tht n strt stte (G, R) is k-ounded iff ll the shpe grphs in G use t most k symoli lotions, nd we define D k to e the set of k-ounded normlized strt sttes. This finite domin D k is used in our nlysis. 5.2 Normliztion Funtion The normlize funtion trnsforms (G, R) to normlized (G, R ) with further strtion (i.e., mens(g, R) mens(g, R )). 5 It is defined y the omposition of five suroutines: normlize = ound k simplify unify fold rmjunk. The first suroutine rmjunk removes ll the imginry shring nd grge due to onstnt symoli lotions, so tht it mkes the rel shring nd grge esily detetle in synt. The suroutine rmjunk pplies the following two rules until n strt stte does not hnge. In the definition, is disjoint union of sets, nd is union of prtil mps with disjoint domins. (lis) (G {(s [ ], g [ nil])}, R) (G {(s [ ], g [ nil, nil])}, R) where should pper in (s, g) nd is fresh. (g) (G {(s, g [ nil])}, R) (G {(s, g)}, R) where does not pper in (s, g) For instne, given shpe grph ([, y ], [ nil, nil]), (g) ollets the grge, nd (lis) elimintes the imginry shring etween nd y y renming in y. So, the shpe grph eomes ([, y ], [ nil, nil]). The seond suroutine fold onverts shpe grph to norml form, y repling ll foldle symoli lotions y nonterminls. The suroutine fold repetedly pplies the following rule until the strt stte does not hnge: (fold) (G {(s, g [,, v, v ])}, R) (G {(s, g [ α])}, R [α { v, v }]) where neither nor ppers in (s, g), α is fresh, nd v nd v re not pirs. The rule reognizes tht the symoli lotions nd re essed only vi. Then, it represents ell, plus the rehle ells from nd y nonterminl α. Figure 2 shows how the (fold) folds tree. The third suroutine unify merges two similr shpe grphs in G. Let (s, g) nd (s, g ) e similr shpe grphs y the identity renming (i.e., (s, g) G (s, g )). Then, these two shpe grphs re lmost identil; the only eeption is when g() nd g () re nonterminls or nil. unify elimintes ll suh differenes in two shpe grphs; if g() nd g () re nonterminls, then unify hnges g nd g, so tht they mp to the sme fresh nonterminl γ, nd then it defines γ to over oth α nd β. The unify proedure pplies the following rules to n strt stte (G, R) until the strt stte does not hnge: 5 The normlize funtion is reminisent of the widening in [2, 3]. 7

() fold d β e nil (fold) (fold) δ α γ α New grmmr rules γ ::= β, nil δ ::= γ, α () unify nd unil α e β d β f (unify) γ β nil γ nil (unil) γ δ New grmmr rules γ ::= R(α) R(β) δ ::= R(β) {nil} Fig. 2. Emples of (fold), (unify), nd (unil). (unify) (G {(s 1, g 1 [ 1 α 1 ]), (s 2, g 2 [ 2 α 2 ])}, R) (G {(S(s 1), S(g 1) [ 2 β]), (s 2, g 2 [ 2 β])}, R [β R(α 1) R(α 2)]) where (s 1, g 1 [ 1 α 1]) G S (s 2, g 2 [ 2 α 2]), S( 1) 2, α 1 α 2, nd β is fresh. (unil) (G {(s 1, g 1 [ 1 α]), (s 2, g 2 [ 2 nil])}, R) (G {(S(s 1 ), S(g 1 ) [ 2 β]), (s 2, g 2 [ 2 β])}, R [β R(α) {nil}]) where (s 1, g 1 [ 1 α]) G S (s 2, g 2 [ 2 nil]), S( 1 ) 2, nd β is fresh. The (unify) rule reognizes two similr shpe grphs tht hve different nonterminls t the sme position, nd reples those nonterminls y fresh nonterminl β tht overs the two nonterminls. The (unil) rule dels with the two similr grphs tht hve, respetively, nonterminl nd nil t the sme position. For instne, in Figure 2.(), the left two shpe grphs re unified y (unify) nd (unil). We first reple the left hildren α nd β y γ tht overs oth; tht is, to given grmmr R, we dd [γ R(α) R(β)]. Then we reple the right hildren β nd nil y δ tht overs oth. The fourth suroutine simplify redues the ompleity of grmmr y omining similr ses or similr definitions. 6 It pplies three rules repetedly: If the definition of nonterminl hs two similr ses β, v nd β, v, nd β nd β re different nonterminls, unify nonterminls β nd β. Apply the sme for the seond field. If the definition of nonterminl hs two similr ses β, v nd nil, v, dd the nil se to R(β) nd reple nil, v y β, v. Apply the sme for the seond field. If the definitions of two nonterminls re similr, unify the nonterminls. Formlly, the three rules re: (se) (G, R) (G, R) {β/α} where { α, v, β, v } R(γ) nd α β. (sme for the seond field) (nil) (G, R [α E { β, v, nil, v }]) (G, R [β R (β) {nil}]) where R = R [α E { β, v, β, v }]. (sme for the seond field) (def) (G, R) (G, R) {β/α} where R(α) R(β) nd α β. Here, (G, R){α/β} sustitutes α for β, nd in ddition, it removes the definition of β from R nd re-defines α suh tht α overs oth α nd β: 6 The simplify suroutine is similr to the widening opertor in [4]. 8

(G, R [α E 1, β E 2 ]) {α/β} = (G {α/β}, R {α/β} [α (E 1 E 2 ) {α/β}]). For emple, onsider the following trnsitions: α::=nil β, β γ, γ, β::= γ, γ, γ::= nil, nil (se) α::=nil β, β, β::= β, β nil, nil (nil) α::=nil β, β, β::= β, β β, nil nil (nil) (def) α::=nil β, β, β::= β, β nil α::=nil α, α In the initil grmmr, α s definition hs the similr ses β, β nd γ, γ, so we pply {β/γ} (se). In the seond grmmr, β s definition hs similr ses β, β nd nil, nil. Thus, we reple nil y β, nd dd the nil se to β s definition (nil). We pply (nil) one more for the seond field. In the fourth grmmr, sine α nd β hve similr definitions, we pply {α/β} (def). As result, we otin the lst grmmr whih sys tht α desries inry trees. The lst suroutine ound k heks the numer of symoli lotions in eh shpe grph. The suroutine ound k simply gives when one of shpe grphs hs more thn k symoli lotions, therey ensuring the termintion of the nlysis. 7 ound k (G, R)=if (no (s, g) in G hs more thn k symoli lotions) then (G, R) else Lemm 1. Given every strt stte (G, R), normlize(g, R) lwys termintes, nd its result is k-ounded normlized strt stte. 6 Anlysis Our nlyzer (defined in Figure 3) onsists of two prts: the forwrd nlysis of ommnds C, nd the kwrd nlysis of oolen epressions B. Both of these interpret C nd B s funtions on strt sttes, nd they omplish the usul gols in the stti nlysis: for n initil strt stte (G, R), [[C]](G, R) pproimtes the possile output sttes, nd [[B]](G, R) denotes the result of pruning some sttes in (G, R) tht do not stisfy B. One prtiulr feture of our nlysis is tht the nlysis lso heks the sene of memory errors, suh s null-pointer dereferene errors. Given ommnd C nd n strtion (G, R) for input sttes, the result [[C]](G, R) of nlyzing the ommnd C n e either some strt stte (G, R ) or. (G, R ) mens tht ll the results of C from (G, R) re pproimted y (G, R ), ut in ddition to this, it lso mens tht no omputtions of C from (G, R) n generte memory errors., on the other hnd, epresses the possiility of memory errors, or indites tht progrm uses the dt strutures whose ompleity goes eyond the urrent pility of the nlysis. The nlyzer unfolds the grmmr definition y lling the suroutine unfold. Given shpe grph (s, g), vrile nd grmmr R, the suroutine unfold first heks whether g(s()) is nonterminl or not. If g(s()) is nonterminl α, unfold looks up the definition of α in R nd unrolls this definition in the shpe grph (s, g): for eh se e in R(α), it updtes g y [s() e]. For instne, when R(β) = { β, γ, δ, δ }, unfold(([ ], [ β]), R, ) is shpe-grph set {([ ], [ β, γ ]), ([ ], [ δ, δ ])}. 7 Limiting the numer of symoli lotions to e t most k ensures the termintion of the nlyzer in the worst se. When progrms use dt strutures tht our grmmr ptures well, the nlysis usully termintes without using this k limittion, nd yields meningful results. 9

[[C]] : D D [[ := new]] (G, R) = ({(s[ ], g[,, nil, nil]) (s, g) G }, R) new,, [[ := nil]] (G, R) = ({(s[ ], g[ nil]) (s, g) G }, R) new [[ := y]] (G, R) = when y dom(s) for ll (s, g) G, ({(s[ s(y)], g) (s, g) G }, R) [[->0 := y]] (G, R) = when unfold(g, R, ) = G nd (s, g) G. y dom(s), ({(s, g[s() s(y), ] (s, g) G, g(s()) =, }, R) [[ := y->0]] (G, R) = when unfold(g, R, y) = G, ({(s[ ], g) (s, g) G, g(s(y)) =, }, R) [[C 1;C 2]] (G, R) = [[C 2]] ([[C 1]] (G, R)) [[if B C 1 C 2 ]] (G, R) = [[C 1 ]] ([[B]] ( (G, R)) [[C 2 ]] ([[!B]] (G, R)) [[while B C]] (G, R) = [[!B]] lfi λa: D ) k. normlize(a (G, R) [[C]] ([[B]]A)) [[B]] : D D [[C]]A = (other ses) [[ = y]] (G, R) = when split(split((g, R), ), y) = (G, R ) ({(s, g) G s() s(y) g(s())=g(s(y))=nil) }, R ) [[! = y]] (G, R) = when split(split((g, R), ), y) = (G, R ) ({(s, g) G s() s(y) (g(s()) nil g(s(y)) nil) }, R ) [[!(!B)]] (G, R) = [[B]] (G, R) [[B]] A = (other ses) Suroutine unfold unrolls the definition of grmmr: {(s, g[,, v, u]) v, u R()} if g(s())=α nil R(α) unfold((s, g), R, )= {(s, g)} if g(s()) is pir { otherwise (s,g) G unfold((s, g), R, ) if (s, g) G. unfold((s, g), R, ) unfold(g, R, ) = otherwise Suroutine split((s, g), R, ) hnges (s, g) to (s, g ) s.t. s () mens nil iff g (s ())=nil. split((s, g), R, )=if ( α. g(s())=α R(α) {nil} R(α) {nil}) then ({(s, g[s() nil]), (s, g[s() β])}, R[β R(α) {nil}]) for fresh β else if ( α. g(s())=α R(α)={nil}) then ({(s, g[s() nil])}, R) { else ({(s, g)}, R) split(g, R, ) = (s,g) G split((s, g), R, ) if (s, g) G. dom(s) otherwise The lgorithmi order defined in [8] stisfies tht if A B, mens(a) mens(b) 7 Full Anlysis Fig. 3. Anlysis. The si version of our nlysis, whih we hve presented so fr, nnot del with dt strutures with shring, suh s douly linked lists nd inomil heps. If progrm uses suh dt strutures, the nlysis gives up nd returns. The full nlysis overomes this shortoming y using more epressive lnguge for grmmr, where nonterminl is llowed to hve prmeters. The min feture of this new prmeterized grmmr is tht n invrint for dt struture with shring 10

is epressile y grmmr, s long s the shring is yli. A prmeter plys role of trgets of suh yles. The overll struture of the full nlysis is lmost identil to the si version in Figure 3. Only the suroutines, suh s normlize, re modified. In this setion, we will eplin the full nlysis y fousing on the new prmeterized grmmr, nd the modified normliztion funtion for this grmmr. The full definition is in [8]. 7.1 Astrt Domin Let self nd rg e two different symoli lotions. In the full nlysis, the domins for shpe grphs nd grmmrs re modified s follows: NTermApp = NonTerm (SymLo+ ) NTermAppR = NonTerm ({self, rg}+ ) Grph = (Vrs fin SymLo) (SymLo fin {nil} + SymLo 2 + NTermApp) Grmmr = NonTerm fin nf ({nil} + ({nil}+{self, rg}+ntermappr) 2 ) The min hnge in the new definitions is tht ll the nonterminls hve prmeters. All the uses of nonterminls in the old definitions re repled y the pplitions of nonterminls, nd the delrtions of nonterminls in grmmr n use two symoli lotions self nd rg, s opposed to none, whih denote the impliit self prmeter nd the epliit prmeter. 8 For instne, douly-linked list is defined y dll ::= nil rg, dll(self). This grmmr mintins the invrint tht rg points to the previous ell. So, the first field of node lwys points to the previous ell, nd the seond field the the net ell. Note tht n e pplied to nonterminl; this mens tht we onsider suses of the nonterminl where the rg prmeter is not used. For instne, if grmmr R mps β to {nil, rg, rg }, then β( ) eludes rg, rg, nd mens the empty hep ojet. As in the si se, the preise mening of shpe grph nd grmmr is given y trnsltion into seprtion-logi ssertions. We n define trnsltion mens y modifying only mens v nd mens g. mens v(, nil) =. = nil mens v(, α()) = α(, ) mens v (, ) =. = mens v (, α( )) =.α(, ) In the lst luse, is different vrile from. The mening of grmmr is ontet defining set of reursive predites. mens g(r) = {α(, )= e R(α) mensg(,, e)} α dom(r) mens g(,, nil) = mens v(, nil) mens g (,, v 1, v 2 ) = 1 2. ( 1, 2 ) mens v ( 1, v 1 {/self, /rg}) mens v( 2, v 2 {/self, /rg}) In the seond luse, 1 nd 2 re vriles tht do not pper in v 1, v 2,,. 7.2 Normliztion Funtion To fully eploit the inresed epressivity of the strt domin, we hnge the normliztion funtion in the full nlysis. The most importnt hnge in the new normliztion funtion is the ddition of new rules (ut) nd (fold) into the fold proedure. 8 We llow only one epliit prmeter. So, we n use pre-defined nme rg. 11

() ut () fold r α () (ut) (fold) (fold) α ( ) α () α() α() α() unfold & r r r move to the net (ut) (fold) β( ) γ() β( ) β( ) α() β( ) α ::= rg α ::= α (rg) α ::= α (rg) Initil grmmr α::= α(rg) rg β::= β( ) nil Finl grmmr α::= α(rg) γ(rg) β::= β( ) nil γ ::= rg Fig. 4. Emples of (ut) nd (fold). The (ut) rule enles the onversion of yli struture to grmmr definitions. Rell tht the (fold) rule n reognize hep ojet only when the ojet does not hve shred ells internlly. The key ide is to ut non-ritil link to shred ell, nd represent the removed link y prmeter to nonterminl. If enough suh links re ut from n hep ojet, the ojet no longer hs (epliitly) shred ells, so tht the wrpping step of (fold) n e pplied. The forml definition of the (ut) rule is: ( ) G {(s, g [ α()])}, (ut) (G {(s, g [ 1, 2 ])}, R) R [α { 1, 2 {self/, rg/}}] where there re pths from vriles to 1 nd 2 in g, free( v 1, v 2 ) {, }, nd α is fresh. (If free( v 1, v 2 ) {}, we use α( ) insted of α().) Figure 4.() shows how yli struture is onverted to grmmr definitions. 9 In the first shpe grph, ell is shred euse vrile points to nd ell points to, ut the link from to is not ritil euse even without this link, is still rehle from. Thus, the (ut) rule uts the link from to, introdues nonterminl α with the definition { rg }, nd nnottes node with α (). Note tht the resulting grph (the seond shpe grph in Figure 4.()) does not hve epliit shring. So, we n pply the (fold) rule to, nd then to s shown in the lst two shpe grphs in Figure 4.(). The (fold) rule wrps ell from the k. Rell tht the (fold) rule puts ell t the front of hep ojet; it dds the ell s root for nonterminl. The (fold) rule, on the other hnd, puts ell t the eit of hep ojet. When is used s prmeter for nonterminl α, the rule omines nd α. This rule n est e eplined using list-trversing lgorithm. Consider progrm tht trverses linked list, where vrile r points to the hed ell of the list, nd vrile to the urrent ell of the list. The usul loop invrint of suh progrm is epressed y the first shpe grph in Figure 4.(). However, only with the (fold) rule, whih dds ell to the front, we nnot disover this invrint; one itertion of the progrm moves to the net ell, nd thus hnges the shpe grph into the seond shpe grph in Figure 4.(), ut this new grph is not similr to the initil one. The (fold) rule hnges the new shpe grph k to the one for the invrint, y merging α() with ell. The (ut) rule 9 To simplify the presenttion, we ssume tht eh ell in the figure hs only single field. 12

first uts the link from to, etends grmmr with [γ { rg }], nd nnottes the node with γ(). Then, the (fold) rule finds ll the ples where rg is used s itself in the definition of α, nd reples rg there y γ(rg). Finlly, the rule hnges the inding for from α() to α(), nd elimintes ell, thus resulting the lst shpe grph in Figure 4(). 10 The preise definition of (fold) does wht we ll linerity hek, in order to ensure the soundness of repling rg y nonterminls: 11 (fold) (G {(s, g [ α(), β(w)])}, R) (G {(s, g [ α (w)])}, R [α E]) where does not pper in g, α is liner (tht is, rg ppers etly one in eh se of R(α)), nd E = {nil R(α)} ( { f(v 1), f(v 2) v 1, v 2 R(α) ) } where f(v) = if (v rg) then β(rg) else if (v α(rg)) then α (rg)else v 7.3 Corretness The orretness of our nlysis is epressed y the following theorem: Theorem 1. For ll progrms C nd strt sttes (G, R), if [[C]](G, R) is non- strt stte (G, R ), then triple {mens(g, R)}C{mens(G, R )} holds in seprtion logi. We proved this theorem in two steps. First, we showed lemm tht ll suroutines, suh s normlize nd unfold, nd the kwrd nlysis re orret. Then, with this lemm, we pplied the indution on the struture of C, nd showed tht {mens(g, R)}C{mens(G, R )} is derivle in seprtion logi. The vlidity of the triple now follows, euse seprtion-logi proof rules re sound. The detils re in [8]. 8 Eperiments We hve tested our nlysis with the si progrms in Tle 1. For eh of the progrms, we rn the nlyzer, nd otined strt sttes for loop invrint nd the result. In this setion, we will eplin the ses of inomil hep onstrution nd the Shorr- Wite tree trversl. The others re eplined t http://rops.snu..kr/grmmr. Binomil Hep Constrution In this eperiment, we took n implementtion of inomil hep onstrution in [1], where eh ell hs three pointers: one to the left-most hild, nother to the net siling, nd the third to the prent. We rn the nlyzer with this inomil hep onstrution progrm nd the empty strt stte ({}, []). Then, the nlyzer inferred the following sme strt stte (G, R) for the result of the onstrution s well s for the loop invrint. Here we omit from forest( ). 10 The grmmr is slightly different from the one for the invrint. However, if we omine two strt sttes nd pply unify nd simplify, then the grmmr for the invrint is reovered. 11 Here we present only for the se tht the prmeter of α is not pssed to nother different nonterminls. With suh nonterminls, we need to do more serious linerity hek on those nonterminls, efore modifying the grmmr. 13

progrm desription ost(se) nlysis result listrev. list onstrution followed y list 0.01 the result is list reversl dinry. onstrution of tree with prent 0.01 the result is tree with prent pointers pointers dll. douly-linked list onstrution 0.01 the result is douly-linked list h. inomil hep onstrution 0.14 the result is inomil hep sw. Shorr-Wite tree trversl 0.05 the result is tree swfree. Shorr-Wite tree disposl 0.02 the tree is ompletely disposed For ll the emples, our nlyzer proves the sene of null pointer dereferene errors nd memory leks. Tle 1. Eperimentl Results G = {( [ ], [ forest] )} R = [ ] forest ::= nil stree(self), forest, nil, stree ::= nil stree(self), stree(rg), rg The unique shpe grph in G mens tht the hep hs only single hep ojet whose root is stored in, nd the hep ojet is n instne of forest. Grmmr R defines the struture of this hep ojet. It sys tht the hep ojet is linked list of instnes of stree, nd tht eh instne of stree in the list is given the ddress of the ontining list ell. These instnes of stree re, indeed, preisely those trees with pointers to the left-most hildren nd to the net siling, nd the prent pointer. Shorr-Wite Tree Trversl We used the following (G 0, R 0) s n initil strt stte: G 0 = {([ ], [ tree])} R 0 = [tree ::= nil I, tree, tree ] Here we omit from tree( ). This strt stte mens tht the initil hep ontins only inry tree whose ells re mrked I. Given the trversing lgorithm nd the strt stte (G 0, R 0 ), the nlyzer produed (G 1, R 1 ) for finl sttes, nd (G 2, R 2 ) for loop invrint: G 1 = {( [ ], [ treer] )} R 1 = [treer ::= nil R, treer, treer ] G 2 = {( [, y ], [ treeri, rtree] )} [ ] rtree ::= nil R, treer, rtree L, rtree, tree, tree ::= nil I, tree, tree, R 2 = treer ::= nil R, treer, treer, treeri ::= nil I, tree, tree R, treer, treer The strt stte (G 1, R 1 ) mens tht the hep ontins only single hep ojet, nd tht this hep ojet is inry tree ontining only R-mrked ells. Note tht this strt stte implies the sene of memory leks euse the tree is the only thing in the hep. The loop invrint (G 2, R 2) mens tht the hep ontins two disjoint hep ojets nd y. Sine the hep ojet is n instne of treeri, the ojet is n I-mrked inry tree, or n R-mrked inry tree. This first se indites tht is first visited, nd the seond se tht hs een visited efore. The nonterminl rtree for the other hep ojet y implies tht one of left or right field of ell y is reversed. The seond se, R, treer, rtree, in the definition of rtree mens tht the urrent ell is mrked R, its right field is reversed, nd the left sutree is n R-mrked inry tree. The third se, L, rtree, tree, mens tht the urrent ell is mrked L, the left field is reversed, 14

nd the right sutree is n I-mrked inry tree. Note tht this invrint, indeed, holds euse y points to the prent of, so the left or right field of ell y must e reversed. Referenes 1. Thoms H. Cormen, Chrles E. Leiserson, Ronld L. Rivest, nd Clifford Stein. Introdution to Algorithms. MIT Press nd MGrw-Hill Book Compny, 2001. 2. Ptrik Cousot nd Rdhi Cousot. Astrt interprettion: A unified lttie model for stti nlysis of progrms y onstrution or pproimtion of fipoints. In Proeedings of the ACM Symposium on Priniples of Progrmming Lnguges, pges 238 252, Jnury 1977. 3. Ptrik Cousot nd Rdhi Cousot. Astrt interprettion frmeworks. J. Logi nd Comput., 2(4):511 547, 1992. 4. Ptrik Cousot nd Rdhi Cousot. Forml lnguge, grmmr nd set-onstrintsed progrm nlysis y strt interprettion. In Proeedings of the ACM Conferene on Funtionl Progrmming Lnguges nd Computer Arhiteture, pges 170 181, L Joll, Cliforni, June 1995. ACM Press, New York, NY. 5. A. Deutsh. Interproedurl my-lis nlysis for pointers: Beyond k-limiting. In Proeedings of the ACM Conferene on Progrmming Lnguge Design nd Implementtion, pges 230 241. ACM Press, 1994. 6. Psl Frdet nd Dniel Le Métyer. Shpe types. In Proeedings of the ACM Symposium on Priniples of Progrmming Lnguges, pges 27 39. ACM Press, Jnury 1997. 7. Nils Klrlund nd Mihel I. Shwrtzh. Grph types. In Proeedings of the ACM Symposium on Priniples of Progrmming Lnguges, Jnury 1993. 8. Oukseh Lee, Hongseok Yng, nd Kwngkeun Yi. Automti verifition of pointer progrms using grmmr-sed shpe nlysis. Teh. Memo. ROPAS-2005-23, Progrmming Reserh Lortory, Shool of Computer Siene & Engineering, Seoul Ntionl University, Mrh 2005. 9. R. Mnevih, M. Sgiv, G. Rmlingm, nd J. Field. Prtilly disjuntive hep strtion. In Proeedings of the Interntionl Symposium on Stti Anlysis, volume 3148 of Leture Notes in Computer Siene, pges 265 279, August 2004. 10. A. Møller nd M. I. Shwrtzh. The pointer ssertion logi engine. In Proeedings of the ACM Conferene on Progrmming Lnguge Design nd Implementtion. ACM, June 2001. 11. Peter W. O Hern, Hongseok Yng, nd John C. Reynolds. Seprtion nd informtion hiding. In Proeedings of the ACM Symposium on Priniples of Progrmming Lnguges, pges 268 280. ACM Press, Jnury 2004. 12. John C. Reynolds. Seprtion logi: A logi for shred mutle dt strutures. In Proeedings of the 17th IEEE Symposium on Logi in Computer Siene, pges 55 74. IEEE, July 2002. 13. M. Sgiv, T. Reps, nd R. Wilhelm. Solving shpe-nlysis prolems in lnguges with destrutive updting. ACM Trns. Progrm. Lng. Syst., 20(1):1 50, Jnury 1998. 14. Mooly Sgiv, Thoms Reps, nd Reinhrd Wilhelm. Prmetri shpe nlysis vi 3-vlued logi. ACM Trns. Progrm. Lng. Syst., 24(3):217 298, 2002. 15. Élodie-Jne Sims. Etending seprtion logi with fipoints nd postponed sustitution. In Proeedings of the Interntionl Conferene on Algeri Methodology nd Softwre Tehnology, volume 3116 of Leture Notes in Computer Siene, pges 475 490, 2004. 15