Rewriting for Satisfiability Modulo Theories

1 Dipartimento di Informatica Università degli Studi di Verona Verona, Italy July 10, 2010 1 Joint work with Chris Lynch (Department of Mathematics and Computer Science, Clarkson University, NY, USA) and Leonardo de Moura (Microsoft Research, Redmond, WA, USA)

The inference system DPLL(Γ+T)

Problem statement Decide satisfiability of first-order formulæ generated by, e.g., verifying compiler: invariant checking static analyzer: invariant generation Satisfiability w.r.t. background theories With quantifiers to write, e.g., invariants about loops, heaps, data structures... axioms of type systems or application-specific theories without decision procedure Emphasis on automation: prover called by verifying compiler or static analyzer

Shape of problem Background theory T T = n i=1 T i, e.g., linear arithmetic Set of formulæ: R P R: set of non-ground clauses without T -symbols P: large ground formula (set of ground clauses) typically with T -symbols Determine whether R P is satisfiable modulo T (Equivalently: determine whether T R P is satisfiable)

Tools Davis-Putnam-Logemann-Loveland (DPLL) procedure for SAT T i -solvers: Satisfiability procedures for the T i s DPLL(T )-based SMT-solver: Decision procedure for T with Nelson-Oppen combination of the T i -sat procedures First-order engine Γ to handle R (additional theory): Resolution+Rewriting+Superposition: Superposition-based

Equality sharing method (Nelson-Oppen) T i s disjoint: no shared function/predicate symbols beside Mixed terms separated by introducing new constants T i -solvers generate and propagate all entailed (disjunctions of) equalities between shared constants T i s stably infinite: every T i -sat ground formula has T i -model with infinite cardinality (ensures existence of quantifier-free interpolants hence that propagation suffices in completeness proof)

Combining strengths of different tools DPLL: SAT-problems; large non-horn clauses Theory solvers: e.g., ground equality, linear arithmetic DPLL(T )-based SMT-solver: efficient, scalable, integrated theory reasoning Superposition-based inference system Γ: Horn clauses, equalities with universal quantifiers (automated instantiation) Sat-procedure for several theories of data structures

Superposition-based inference system Γ Generic, FOL+=, axiomatized theories Deduce clauses from clauses (expansion) Remove redundant clauses (contraction) Semi-decision procedure: empty clause (contradiction) generated, return unsat No backtracking

Ordering-based inferences Ordering on terms and literals to restrict expansion inferences define contraction inferences Complete Simplification Ordering: stable: if s t then sσ tσ monotone: if s t then l[s] l[t] subterm property: l[t] t total on ground terms and literals

Inference system Γ State of derivation: set of clauses F Superposition: superpose maximal side of maximal equation into maximal side of maximal (in)equation Simplification: by well-founded rewriting Resolution: resolve maximal complementary literals Paramodulation: superpose maximal side of maximal equation into maximal literal Subsumption: Cσ D (as multisets) Other rules: e.g., Factoring rules, Deletion of trivial clauses

DPLL and DPLL(T ) Propositional logic, ground problems in built-in theories Build candidate model M Decision procedure: model found: return sat; failure: return unsat Backtracking

DPLL(Γ+T ): integrate Γ in DPLL(T ) Idea: literals in M can be premises of Γ-inferences Stored as hypotheses in inferred clause Hypothetical clause: (L 1... L n ) (L 1...L m) interpreted as L 1... L n L 1... L m Inferred clauses inherit hypotheses from premises Predecessor: DPLL(Γ) [L. de Moura and N. Bjørner at IJCAR 2008]

DPLL(Γ+T ) as a transition system Search mode: State of derivation M F M sequence of assigned ground literals: partial model F set of hypothetical clauses Conflict resolution mode: State of derivation M F C C ground conflict clause Initial state: M empty, F is { C C R P}

DPLL(Γ+T ): DPLL rules Decide: guess ground L true, add it to M (decided literal) M F = M L F UnitPropagate consequence of assignment (implied literal) if M = P C (all lits in C false) M F,H (C L) = M L H (C L) F,H (C L) Note: literals in H are immaterial here because they come from M

DPLL(Γ+T ): DPLL rules Conflict: if M = P C M F,H C = M F,H C H C Unsat: conflict clause is (nothing else to try) M F = unsat

DPLL(Γ+T ): DPLL rules Explain: unfold by resolution implied lit: if L H (D L) M M F C L = M F H D C Learn conflict clause C clauses(f) M F C = M F,C C Backjump: if L is the least recently decided literal such that M = P C and L undefined in M M L M F C L = M L C L F F is F minus clauses whose hypothesis intersects L M

DPLL(Γ+T ): DPLL(T ) rules T -Propagate: add ground L that is T -consequence of M: if L 1,...,L n M and L 1,...,L n = T L M F = M L ( L1... L n L) F T -Conflict: detect that L 1,...,L n in M are T-inconsistent: if L 1,...,L n M and L 1,...,L n = T M F = M F L 1... L n

DPLL(Γ+T ): model-based theory combination A variant of equality sharing: Each T i -solver builds a candidate T i -model M i It is enough to generate and propagate the equalities between shared constants that are true in M i Predecessor: [L. de Moura and N. Bjørner at SMT 2007]

DPLL(Γ+T ): model-based theory combination PropagateEq: add to M ground s t true in T i -model: if M i (t) = M i (s) M F = M t s F Less expensive than generating (disjunctions of) equalities true in all T i -models consistent with M Optimistic: if t s inconsistent, retract + fix M i by backtracking Ground terms, not only shared constants, to serve next rule

DPLL(Γ+T ): expansion inferences Deduce: Γ-rule γ, e.g., superposition, using non-ground clauses {H 1 C 1,...,H m C m } in F and R-literals {L m+1,...,l n } in M M F = M F,H C where H = H 1... H m {L m+1,...,l n } and γ infers C from {C 1,...,C m,l m+1,...,l n } Only R-literals: Γ-inferences ignore T -literals Take unit clauses from M as PropagateEq puts them there

DPLL(Γ+T ): contraction inferences Single premise H C: apply to C (e.g., tautology deletion) Multiple premises (e.g., subsumption, simplification): prevent situation where clause is deleted, but clauses that make it redundant are gone because of backjumping Scope level: level(l) in M L M : number of decided literals in M L level(h) = max{level(l) L H} and 0 for

DPLL(Γ+T ): contraction inferences Say we have H C, H 2 C 2,...,H m C m, and L m+1,...,l n C 2,...,C m,l m+1,...,l n simplify C to C or subsume it Let H = H 2... H m {L m+1,...,l n } Simplification: replace H C by (H H ) C Both simplification and subsumption: if level(h) level(h ): delete if level(h) < level(h ): disable (re-enable when backjumping level(h ))

DPLL(Γ+T ): Summary Use each engine for what is best at: DPLL(T ) works on ground clauses Γ not involved with ground inferences and built-in theory Γ works on non-ground clauses and ground unit clauses taken from M: inferences guided by current partial model Γ works on R-sat problem

Issues about completeness Γ is refutationally complete Since Γ does not see all the clauses, DPLL(Γ+T ) does not inherit refutational completeness trivially Equality sharing is complete for Nelson-Oppen built-in theories: how to extend to a combination with an axiomatized theory R? DPLL(T ) uses depth-first search: complete for ground SMT problems, not when injecting non-ground inferences

From rewriting-based theorem proving N: set of ground clauses, N I N : candidate model Counterexample: I N = C Reduction property for counterexamples: for all N and counterexample C N, Γ infers a counterexample D C Thm: if N saturated, then satisfiable

From rewriting-based T -sat procedures: Variable-inactivity Clause C: variable-inactive if no maximal literal in C is a t x where x Var(t) Set of clauses: variable-inactive if all its clauses are Theory R: variable-inactive if limit S of fair Γ-derivation from S 0 = R S is variable-inactive [A. Armando, M.P. Bonacina, S. Ranise, S. Schulz, ACM TOCL, 2009]

From rewriting-based T -sat procedures: Variable-inactivity Theorem (Modularity of termination): if Γ terminates on R i -sat problems, it terminates also on R-sat problems for R = n i=1 R i, if the R i s are disjoint and variable-inactive Idea: the only inferences across theories are superpositions from shared constants (correspond to equalities between shared constants in equality sharing) [A. Armando, M.P. Bonacina, S. Ranise, S. Schulz, ACM TOCL, 2009]

From rewriting-based T -sat procedures: Variable-inactivity Theorem: if R is variable-inactive, then it is stably infinite Idea: if S 0 is sat, it admits no infinite model iff S contains a cardinality constraint (e.g., y x y z) In practice: Γ reveals lack of stable infiniteness by generating a cardinality constraint (not variable-inactive) [M.P. Bonacina, S. Ghilardi, E. Nicolini, S. Ranise, D. Zucchelli at IJCAR 2006]

Putting it all together: T -smooth set R P is T -smooth, for T = n i=1 T i, if T 1,...,T n and R are disjoint T 1,...,T n are stably infinite R is variable-inactive P is P 1 P 2 P1 : ground R-clauses P2 : ground T -clauses

From rewriting-based theorem proving Fairness: all applicable inferences applied eventually except redundant Deduce steps Saturated state: Either M F Or M F s. t. only applicable inferences are redundant Deduce steps Fair derivation yields saturated state eventually

Refutational completeness of DPLL(Γ+T ) Theorem: if input S = R P is T -smooth, whenever DPLL(Γ+T ) reaches saturated state M F, S is T -sat. Ingredients: ground non-unit R-clauses: redundant by saturation w.r.t. Decide R-part: sat by saturation and reduction property for counterexamples T -part: sat by saturation w.r.t. T -conflict completeness of Nelson-Oppen combination by T -smoothness

How to ensure fairness? Let s see an example 1. p(x,y) p(f(x),f(y)) p(g(x),g(y)): seen by Γ 2. p(a,b) 3. g(x) x: seen by Γ 4. g(c) c g(d) d

How to ensure fairness? Let s see an example 1. p(x,y) p(f(x),f(y)) p(g(x),g(y)): seen by Γ 2. p(a,b) 3. g(x) x: seen by Γ 4. g(c) c g(d) d 1. Decide adds p(a,b) to M: seen by Γ 2. Resolution generates p(f(a),f(b)) p(g(a),g(b)) 3. Decide adds p(f(a),f(b)) to M: seen by Γ 4. Resolution generates p(f(f(a)),f(f(b))) p(g(f(a)),g(f(b)))... 5.... infinite unfair derivation that does not detect unsat!

Answer: iterative deepening Inference depth: Clause: infdepth(c) = depth of inference tree producing C Implied lit: infdepth(l) = depth of clause that implied L Decided lit: infdepth(l) = min inference depth of clause including L k-bounded DPLL(Γ+T ): Deduce restricted to premises C with infdepth(c) < k

Let s see the example again 1. p(x,y) p(f(x),f(y)) p(g(x),g(y)): seen by Γ 2. p(a,b) 3. g(x) x: seen by Γ 4. g(c) c g(d) d 1. The bound prevents the infinite alternation of Decide and Resolution steps 2. Decide adds g(c) c to M: seen by Γ 3. Resolution generates 4. Decide adds g(d) d to M: seen by Γ 5. Resolution generates 6. Unsat

Termination Theorem: k-bounded DPLL(Γ+T) terminates: DPLL(T ) does + finitely many Deduce steps within k DPLL(Γ+T ) stuck at k if only Deduce applies and only to premises excluded by k Three outcomes: sat, unsat, stuck (don t know) Decision procedure: sat, unsat

Summary of contributions This talk: DPLL(Γ+T ) + variable-inactivity: completeness and combination of both built-in and axiomatized theories At CADE 2009: DPLL(Γ+T ) + speculative inferences: Decision procedures for Type systems with multiple/single inheritance used in ESC/Java and Spec# All in: On deciding satisfiability with speculative inferences (submitted to journal)

Current and future work Interpolation in first-order theorem proving Interpolation in DPLL(Γ+T ) Application to invariant generation Joint work with Moa Johansson