Bisimulation and coinduction in higher-order languages Davide Sangiorgi Focus Team, University of Bologna/INRIA ICE, Florence, June 2013
Bisimulation Behavioural equality One of the most important contributions of Concurrency Theory to CS (and beyond) [Milner, Park, 1980] Bisimulation: a relation R on states of an LTS s.t. whenever M R N: 1. P a P implies Q a Q and P R Q 2. the converse. Bisimilarity ( ): the union of all bisimulations [in the remainder: converse clauses omitted] page 1
Important 1. The definition gives us a powerful proof method: P R Q R is a bisimulation P Q 2. Coinduction and induction Bisimulation: a coinductive notion Congruence: the inductive dual of bisimulation (equivalence) [compatibility with the constructs of the language] In a language: we need them both inductive syntax coinductive semantics page 2
Higher-order languages Functions and/or processes move and/or used as data Example of higher-order feature: P(x) where x can be a program Functional languages, mobile code What is bisimulation? Compatibility can be hard page 3
The λ-calculus and contextual equivalence page 4
The λ-calculus The paradigmatical higher-order language M,N ::= x λx.m MN Values = the terms of the form λx.m Λ = the closed terms Reduction (call-by-name) M M (λx.m)n M{ N /x} MN M N = = the reflexive and transitive closure of. M = M terminates page 5
Behavioural equality in sequential languages for all context C, and for all values V, C[ P ] = V iff C[ Q ] = V Too strong in higher-order languages: I = λx.x λx.(ii) because I = I and λx.(ii) = λx.(ii) The observables should be as weak as possible page 6
Contextual equivalence [Morris, 68] M C N contextually equivalent if, for any context C such that C[ M ] and C[ N ] are closed, C[M] iff C[N] No need to check the identity of first-order values returned Example: if C[ P ] = 5 and C[ Q ] = 7, wrap C into if C = 5then trueelse <diverge> Problem : definition very hard to use (utterly useless in higher-order languages) page 7
Proof techniques for contextual equivalence in higher-order languages Till the 1990s: denotational techniques hard mathematics full abstraction scalability in non-purely functional extensions (eg, state; worst: concurrency) After the 1990s: coinduction (bisimulation) [Abramsky] A major factor in the movement towards operationally-based techniques in PL semantics after the 1990s Still a hot research topic page 8
Applicative bisimulation page 9
Bisimulation in the λ-calculus [Abramsky, 1990] Applicative bisimulation: a relation R Λ Λ s.t. whenever M R N: 1. M = λx.m implies N = λx.n and M { L /x} R N { L /x} for all L; Applicative bisimilarity ( A ) : the union of all bisimulations Questions: 1. A vs C? (contextual equivalence) 2. does the definition scale to extensions of the λ-calculus? page 10
Bisimilarity vs contextual equivalences A C easy? (cf: bisimilarity implies may testing) surprise: what is easy is the converse C A (λx.m)n C M{ N /x} ( ) M C N and M = λx.m imply N = λx.n We need: M { L /x} C N { L /x}, for all L M { L /x} C ( ) ML C (substitutivity) NL C ( ) N { L /x} Conclude from transitivity of C page 11
Congruence? A C would follow from the compatibility of A : for all M,N, and context C, if M A N then C[ M ] A C[ N ] A proof attempt : R = {(C[ M ],C[ Ñ ]) : M A Ñ} Induction on the structure of C. Main problematic case: C = C1C2 page 12
The two congruence problems Suppose (λx. M1)M2 R N1N2 with { λx.m1 R N1 M2 R N2 From the inductive assumption: (λx.m1)m2 M1{ M 2/x} R N1N2 = (λx.n 1 )N 2 N 1 {N 2/x} (... only if M2 = N2!) But we need more: (1) if M1{ M 2/x} = λx.m then N1{ L /x} = λx.n and for all L... (2) M2 R N2 page 13
Techniques for congruence Abramsky: via denotational semantics Howe s technique: define a relation that is, by definition, a congruence, and then prove that it is the same as A. Difficult to apply Limitations in extensions of the λ-calculus (concurrency) page 14
Bisimulation in the λ-calculus [Abramsky, 1990] Applicative bisimulation: a relation R Λ Λ s.t. whenever M R N: 1. M = λx.m implies N = λx.n and M { L /x} R N { L /x} for all L; Applicative bisimilarity ( A ) : the union of all bisimulations Questions: 1. A vs C? (contextual equivalence) 2. does the definition scale to extensions of the λ-calculus? page 15
Unsoundness of applicative bisimilarity under language extensions [example: call-by-value with generation of names] M = νnreturn λf.fn N = return λf.νn fn M A N (the argument supplied for f does not know n) M C N, as C[ M ] true but C[ N ] false for C = let [ ] = g in g(λn.g(λm.m = n)) [Koutavas, Levy, Sumii, 2011] page 16
Logical bisimulation, revisited page 17
simple congruence proof separate enhancements of the bisimulation Basis: logical bisimulation cf: logical relations [Kobayashi, Sangiorgi, Sumii, 2008 and 2010] page 18
First congruence problem From the inductive assumption: (λx.m1)m2 M1{ M 2/x} R N1N2 = (λx.n 1 )N 2 N 1 {N 2/x} (if M2 = N2!) But we need more: (1) if M1{ M 2/x} = λx.m then N1{ L /x} = λx.n and for all L... introduce a clause for internal moves (cf: concurrency) page 19
First change... whenever M R N: 1. M M implies N = N and M R N ; 2. M = λx.m implies N = λx.n and M { L /x} R N { L /x} for all L Problem: the new definition heavier to use in proofs page 20
The second congruence problem From the inductive assumption: (λx.m1)m2 M1{ M 2/x} R N1N2 = (λx.n 1 )N 2 N 1 {N 2/x} (if M2 = N2!) But we need more: (2) M2 R N2 first-order substitutivity: Q. P(x) P (x) implies P(Q) P (Q) higher-order substitutivity: Q,Q. P(x) P (x) and Q Q imply P(Q) P (Q ) page 21
Second change... whenever M R N : 1.... 2. M = λx.m implies N = λx.n and M { P /x} R N { Q /x} for all P R Q Now: problematic case ok Problem: definition unsound λx.x = I A K = λx.i, for R = {(I,I),(I,K)} R and the identity relation are bisimulations, but not their union a non-monotone functional page 22
Second change... whenever M R N : 1.... 2. M = λx.m implies N = λx.n and M { P /x} R N { Q /x} for all P R Q Problem: definition unsound W sound if R is a congruence (or substitutive) directly from the definition: A C page 23
Third change A congruence (or substitutive) R s.t. whenever M R N: 1. M M implies N = N and M R N ; 2. M = λx.m implies N = λx.n and M { P /x} R N { Q /x} for all P R Q Logical bisimilarity: L Theorem L = C Problem: not a good proof technique from the definition No need to kill two birds with one stone! Enhancements of the proof method, separately cf: up-to techniques [cf: Pous/Bonchi talk] bisimilarity results using relations smaller a bisimulation page 24
Example enhancement: up-to context [ R = the context closure of R ] A relation R is a bisimulation up-to contexts if whenever M R N 1. M M implies N = N and M R N ; 2. M = λx.m implies N = λx.n and M { P /x} R N { Q /x} for all P R Q Theorem If R is a bisimulation up-to contexts then R is a bisimulation. Proof: essentially the earlier proof of congruence page 25
Big-step up-to contexts and reductions A relation R is a big-step bisimulation up-to contexts and reductions if whenever M R N 1. M = λx.m implies N = λx.n and M { P /x} = R = N { Q /x} for all P R Q Theorem If R is a big-step bisimulation up-to contexts and reductions then = R = is a bisimulation. page 26
An example proof with enhancements I1 C I2 for I1 I2 = λx.x = λx.(λy. y)x A plain bisimulation R: a congruence closed under the rules I1 R I2 M R N M R (λy.y)n S = {(I1,I2)} is a big-step bisimulation up-to contexts and reductions, as for M S N: λx.x M M S λx.(λy.y)x N (λy.y)n = N page 27
Fixed-points The functional of the final definition non-monotone (even on congruence relations) but it has a greatest fixed point ( C ) non cocontinuous, but it has the stratification approximation A theory of coinduction for non-monotone functionals? Another possibility: environmental bisimulations monotone functional, robust more complex definition [Kobayashi, Sangiorgi, Sumii, 2010] page 28
Extensions and variations page 29
The example with language extension [call-by-value with generation of names] M = νnreturn V for V = λf.fn N = return W for W = λf.νn fn Distinguished in C = let [ ] = g in g(λn.g(λm.m = n)) Now, also M L N : M = V λn.v (λm.m=n ) = true N = W λn.w(λm.m=n )= false page 30
Evaluation contexts Sometimes useful to separate evaluation contexts [example: call-by-value λ-calculus with references] M = if!l = 0then l := 1else Ω N = l := 1 EC = contextual equivalence, under only evaluation contexts [l = 0]; M EC [l = 0]; N [l = 0]; M C [l = 0]; N for C = [ ];[ ] page 31
Coupled logical bisimulation (E, G) with E closed under contexts, and E G, G closed under evaluation contexts [call-by-value λ-calculus]... whenever M G N 1. M M implies N = N and M G N 2. M = λx.m implies N = λx.n with λx.m E λx.n, M{ P /x} G N{ Q /x} for all P E Q M E N implies M C N, and M G N implies M EC N page 32
Non-determinism and probabilities page 33
Non-determinism M,N ::=... M N Now the easy proof C A/L breaks (as C ) Convergence: may, must Variants of C : may, must, may & must Bisimulation: different from any of them λx.i λx.ω = λx.(i Ω) cf: the CCS-like law µ.p µ.q = µ.(p Q) page 34
To regain coincidence, two possibilities : 1. stengthen C 2. weaken A/L (1) is easy: replace contextual equivalence with barbed congruence (congruence induced by barbed bisimulation) Barbed bisimulation whenever M R N 1. M M implies N = N and M R N 2. M iff N (2) is be more delicate (cf: proof of C A/L ; first congruence problem) A case study: the probabilistic λ-calculus page 35
The probabilistic λ-calculus [Alberti, Dal Lago, Sangiorgi, on-going work] M N abbreviates for M 1/2 N M1 M2 1/2 M1 M1 M2 1/2 M2 Example: ((I Ω)I) Ω (I Ω)I 1 I Ω 1/2 1/2 Ω 1/2 1/2 I Ω which shows ((I Ω)I) Ω 1/4 page 36
Distributions Y = fix point operator N = λf.(i f) Y N 1 I Y N 1/2 1/2 I Y N For all n, Y N = 1/2 n I hence Y N 1 (ie, Y N 0) Using (partial) distributions: Y N Σn I,1/2 n page 37
Probabilistic contextual equivalence, C P M C N if, C[M] p iff C[N] p, for all contexts C. No issues of may and must convergence Probabilistic applicative bisimulation, A P, following Larsen-Skou An equivalence s.t. M R N implies, for all equivalence classes E of R and for all inputs L: prob(m = L E) = prob(n = L E) Theorem A P is a congruence [Howe s technique] A P = C P? And how discriminating? page 38
The effect of probabilities on pure λ-terms =LL = Lévy Longo Tree equality The finest equivalence for pure λ-term, under call-by-name [Dezani,Giovannetti tutorial, 2001] Theorem In Λ Λ: =LL = A P = C P Higher-order and probability: maximal discriminating power (on pure λ-terms) [cf: work in concurrency: eg Deng, Hennessy 2010] page 39
Quite different from (non-probabilistic) non-determinism : λx. xx and λx. x(λy. xy) contextually equivalent (both may and must) [under may (similarly for must): if L may then λz.lz = L; otherwise LÑ = Ω] Different in C P : (λx.xx)(i Ω) 1/4 I (λx.x(λy.xy))(i Ω) 1/2 λy.(i Ω)y Similarly, different under bisimulation, and different LL trees Outside pure λ-terms, usual counterexample: λx.i λx.ω vs λx.(i Ω) Coinductive characterisation of C P? page 40
Lévy Longo Trees The Lévy Longo Tree of M Λ is the labeled tree, LT(M), defined coinductively as follows: 1. λx if M = λx.n LT(N) 2. x if M = xm1...mn LT(M1)... LT(Mn) 3. LT(M) = otherwise (ie, M ) page 41
Probabilistic coupled logical bisimulation A partial distribution : Σi Mi,pi A distribution value : λx.σi Mi,pi = Σi λx.mi,pi Allow distributions in redex position Extended λ-terms (ΛD): E,F ::= EM Σi Mi,pi M1 M2 λx.m (M Λ) page 42
A bisimulation: (E, G) with E Λ Λ, G Λ D Λ D, E G. (E,G) is a bisimulation if for each E G F we have: 1. if E E then F = F and E G F ; 2. if E = λx.e then F = λx.f with prob(e ) = prob(f ), and E { M /x} G W{ N /x} for all M E N Write M CL P N if M E N for some bisimulation (E,G). Theorem CL P = C P page 43
Probabilistic λ-calculus: big-step operational semantics, call-by-name [Dal Lago, Zorzi, 2012] LAM λx.m λx.m,1 EMP M M λx.σ i Mi,pi Mi{ N /x} Σj Ni,j,qj APP MN Σi,j Ni,j,pi qj PLUS M E M F M N E 1/2 + F 1/2 Inductively: M = E if E = sup{f : M F} Coinductively, without EMP: M = E if E = inf{f : M F} page 44