Introduction to lambda calculus Part 2 Antti-Juhani Kaijanaho 2017-01-24... 1 Untyped lambda calculus 1.1 Syntax... x, y, z Var t, u Term t, u ::= x t u λx t... In this document, I will be using the following concrete phrase-structure syntax for the pure lambda calculus: <term> ::= <term1> λ <variables>. <term> <term1> ::= <term2> <term1> <term2> <term2> ::= <variable> ( <term> ) <variables> ::= <variable> <variables> <variable> As to the lexical syntax, in this document variables are single letters possibly with subscripts or primes added (so x, x 1, and x are distinct variables); all Minor changes made afterward. Last changed 2017-01-25 10:15:48+02:00. 1
other lexemes are λ, the period (which I usually write on the centerline, not at the bottom of the line), and the parentheses.... 1.2 Denotational semantics...... E: Term (Var D) D E x σ = σ ( x ) (1) E tu σ = E t σ (E u σ) (2) E λx t σ = f, where f : D D (3) f(z) = E t (σ[x := z]) 1.3 Basic conversions The trouble with the denotational semantics of lambda calculus is that while it specifies what kind of objects lambda calculus term denote, it does not give us any clue as to how to manipulate them and compute with them. In fact, while the lambda calculus was always intended to describe mathematical functions, the original description of it was based on mechanical calculation rules, called conversions (in Finnish muunnos), and calculating with lambdas is still based on them. Informally specified, the two most important conversions are (Barendregt 1984; Church 1985): 1 α conversion It is permissible to change the name of the function parameter in an abstraction, so long as all references to the parameter inside the abstraction are similarly changed. β reduction It is permissible to perform a function call, that is, to replace any term of the form (λx t)u with a version of t where each reference to the function parameter x is replaced by the argument u. In formulas, we write the use of α conversion or β reduction as α or β, respectively. The reverse direction of β reduction is called β expansion (sometimes also β abstraction), and when the direction is not important, we may talk of β conversion. 2 1 Note that these informal descriptions leave out important qualifications and limitations, as discussed below. Precise definitions of these conversions will be given later. 2 In Finnish, reduction is sievennys and expansion is lavennus. 2
The claim is that both rules preserve the term s denotation and may therefore be used to simplify terms to be more easily understood. Example 4 Let us compute (λxy x) a b, which is the same as ((λx (λy x)) a) b: ((λx (λy x)) a) b β (λy a) b β a To see that this is correct, let us compute the denotation of the original term: E ((λx (λy x)) a) b σ = E (λx (λy x)) a σ(e b σ) by 2 = (E (λx (λy x)) σ(e a σ))(e b σ) by 2 = (E (λx (λy x)) σ(σ a ))(σ b ) by 1 = (f(σ a ))(σ b ) where f : D D f(z) = E λy x (σ[x := z]) by 3 = (E λy x (σ[x := σ a ])) (σ b ) call f = f(σ b ) where f : D D = f(σ b ) where f : D D = f(σ b ) where f : D D f(z) = E x ((σ[x := σ a ]) [y := z]) f(z) = ((σ[x := σ a ]) [y := z]) x f(z) = σ a by 3 by 1 simplify = σ a call f Then let us compute the denotation of the resulting term: E a σ = σ a by 1 They are, indeed, identical. Exercise 2 Show that λx x and λy y denote the same function. There are subtleties in these conversion rules, however. Let us first look at an example: 3
Example 5 Let us compute ((λx (λy x)) y) z: But is this correct? Let us compute: E ((λx (λy x)) y) z σ ((λx (λy x)) y) z β (λy y) z β z = E (λx (λy x)) y σ(e z σ) by 2 = E (λx (λy x)) y σ(σ z ) by 1 = (E λx (λy x) σ(e y σ))(σ z ) by 2 = (E λx (λy x) σ(σ y ))(σ z ) by 1 = (f(σ y ))(σ z ), where f : D D f(z) = E λy x (σ[x := z]) by 3 = (E λy x (σ[x := σ y ]))(σ z ) call f = f(σ z ), where f : D D = f(σ z ), where f : D D f(z) = E x (σ[x := σ y ][y := z]) f(z) = σ y by 3 by 1 = σ y call f But the result of our conversion denotes trivially σ z ; since σ may assign different values for y and z, these are not equivalent results. Clearly the conversion is incorrect. What is happening in the above example is called variable capture (in Finnish muuttujankaappaus). The same issue exists in ordinary programming languages; consider the following Java fragment: class A { public int a ; public int foo ( ) { return a ; } } class B { public A oa ; public int a ; public int bar ( ) { 4
return oa. foo ( ) ; } } public class Capture { public static void main ( S t r i n g [ ] args ) { B ob = new B ( ) ; ob. oa = new A( ) ; ob. a = 2 ; ob. oa. a = 4 ; System. out. p r i n t l n (ob. bar ( ) ) ; } } How is the call from B.bar to A.foo performed? A naive idea is to just copy the text of A.foo in place of the call: Thus, return oa.foo(); becomes return a;. Thus, the program would print 2. But let us try it in Java: ajk@kukkaistutus:~/opetus/okp-2017/matksut$ javac Capture.java ajk@kukkaistutus:~/opetus/okp-2017/matksut$ java Capture 4 Java does not do the call naively, instead it knows that the a in B.bar is a different a than in A.foo. 3 Getting back to lambda calculus, we can avoid variable capture by applying α conversion to rename a parameter variable whenever capture would otherwise result. Example 6 Let us compute ((λx (λy x)) y) z: The result is correct. ((λx (λy x)) y) z α ((λx (λy x)) y) z β (λy y) z β y We need somehow to correct the β reduction rule so that it disallows variable capture and forces the computor 4 to apply the α conversion where 3 We say that Java implements lexical scoping or static scoping (in Finnish leksikaalinen / staattinen näkyvyysalue). The naive method of just replacing the call with the text of the callee (with parameters substituted as appropriate) results in what is called dynamic scoping (in Finnish dynaaminen näkyvyysalue); it is rarely seen outside Lisp, where it has some popularity. 4 I use the word computor to refer to a human or machine who computes using formal rules. Thus, a computor may be a computer, but it is not always the case. 5
necessary. One key insight is that capture happens in a situation like this: (λx... (λy... x...))(... y...) In other words, the parameter variable occurs inside an abstraction that names its parameter the same as some variable that occurs in the argument term. But there is a complication: there is no variable capture in (λx... (λy... x...))(... (λy... y...)...) so long as y does not occur in the argument outside the inner abstraction. 1.4 Free variables Somehow, therefore, there is a distinction between variable occurrences that are inside an abstraction naming that variable and other occurrences. This distinction is conventionally made by talking about variable binding (in Finnish muuttujan sidonta), bound (in Finnish sidottu) occurrences of a variable, and free (in Finnish vapaa) occurrences of a variable. Informally we may state that an abstraction binds its parameter variable; all occurrences of that variable inside that abstraction are bound, while an occurrence of a variable that is not bound is a free occurrence. Further, a variable that has at least one free occurrence in a term is called a free variable of that term. Formally, it is customary to define a function F V : Term P (Var) that gives for each term a set of its free variables. We may define it for the pure untyped lambda calculus as follows: F V : Term P (Var) F V x = { x } (4) F V tu = F V t F V u (5) F V λx t = F V t \ { x } (6) Example 7 Let us compute the set of free variables of ((λx (λy x)) y) z: F V ((λx (λy x)) y) z = F V (λx (λy x)) y F V z by 5 = (F V λx (λy x) F V y ) F V z by 5 = ((F V λy x \ { x }) F V y ) F V z by 6 = (((F V x \ { y }) \ { x }) F V y ) F V z by 6 = ((({ x } \ { y }) \ { x }) { y }) { z } by 4 = (({ x } \ { x }) { y }) { z } simplify 6
= ( { y }) { z } simplify = { y } { z } simplify = { y, z } simplify Exercise 3 Extend the definition of F V to the untyped lambda calculus extended with nonnegative integer constants, additions, and multiplications. 5 It is possible to prove (but we will not prove) the following theorem: Theorem 1 The value of a variable that is not a free variable of a term has no effect on the denotation of that term. In other words, for all terms t, variables x, environments σ, and values v D: x F V t E t σ = E t (σ[x := v]) 1.5 Variable substitution and conversions formally To avoid variable capture, we should define precisely what it means to replace all references to a variable with some term. The key insight is that all (and only the) references to a variable bound by an abstraction are free occurrences of that variable in the body of the abstraction. 6 Further, the only thing that can prevent an occurrence from being free is that there is some binding of that variable between the abstraction and the occurrence on the path connecting them in the AST. Thus, we will replace a variable recursively until another binding for it is found. However, that is not enough. A variable y is captured if and only if we replace another variable x with a term that contains y as a free variable inside an abstraction that binds y. Thus, we must forbid such a replacement. Formally, we may now define variable substitution (in Finnish muuttujan korvaus). There are multiple traditional notations for this operation, but I will adopt here t[x := u] for the subsitution of u for x in t. 7 Although this is a meta-level operation, it is usually written without semantic brackets as if it were an object language operation, with higher precedence than both application and abstraction. Thus we define (supposing that x and y are different variables): x[x := u] = u (7) y[x := u] = x (8) 5 See http://users.jyu.fi/~antkaij/opetus/okp/2017/meeting5.pdf 6 We call the t of an abstraction λx t its body (in Finnish runko). 7 Other notations seen in the literature for the same substitution include t[u/x] and S x ut. 7
(t 1 t 2 )[x := u] = t 1 [x := u] t 2 [x := u] (9) (λx t)[x := u] = λx t (10) (λy t)[x := u] = λy t[x := u] if y F V u (11) Note that variable substitution is not always defined. In practice, this just means that we need to perform α conversion in an appropriate subterm. 8 Now we are in position to define formally what α conversion and β reduction are: λx t α λy t[x := y] (12) (λx t)u β t[x := u] (13) Further, the following rules apply equally to both α conversion and β reduction. Let v {α, β}: x v x (14) tu v t u if t v t (15) tu v tu if u v u (16) λx t v λx t if t v t (17) In other words, both α conversion and β reduction may be employed in any subterm of a term. Example 8 With this formal definition of variable substitution and beta reduction, it is revealed why the computation in Example 5 is fallacious. To perform the β reduction (λx (λy x)) y? β λy y, one must compute (λy x)[x := y] but this cannot be done, because the side condition of (11), that is y F V y, is trivially false, and no other equation defining substitution is applicable. It is possible to prove (but we will not prove) the following theorems: Theorem 2 For any terms t and u and any environment σ, the following two formulas hold: t α u = E t σ = E u σ (18) t β u = E t σ = E u σ (19) 8 A term u is a subterm of a term t if the AST of u is a subtree of the AST of t. 8
1.6 Normal forms and evaluation orders The goal of computing using α conversions and β-reductions is to reach a (β) normal form (in Finnish (β-)normaalimuoto): a term on which no more β reductions can be performed even after any number of α conversions. After defining a (β) redex (in Finnish (β-)redeksi) as a term of the form (λx t)u that is, an application whose left subterm is an abstraction it is a fairly simple thing to prove the following theorem: Theorem 3 A term is in β normal form if and only if it contains no subterm that is a β redex. A far more complex but also more interesting result was proven by Church and Rosser (1936). This result and its analogues is usually called Church Rosser or CR, and it is said that a formal system possessing a similar theorem is Church Rosser. Before I can state this theorem, I must define two new notations: I will write t β u (and say that t β-reduces to u) as an abbreviation for the statement that there is a sequence of α conversions and β reductions leading from t to u; and I will write t = α u (and say that t and u are α-equivalent) as an abbreviation for the statement that there is a sequence of α conversions leading from t to u. In both cases a sequence may be empty, so t β t and t = α t are both vacuously true. Theorem 4 (Church Rosser) Let t be a term and u 1 and u 2 be terms in β normal form. If t β u 1 and t β u 2, then u 1 = α u 2. In other words, if a term has a β normal form (that is, there is at least one sequence of α conversions and β reductions from that term to a β normal form), then that β normal form is unique (up to α equivalence). There are two important qualifications. First, some sequences of α conversions and β reductions can be extended indefinitely without reaching a β normal form. Second, there are terms that have no β normal form. Example 9 Let us consider the term (λx xx)(λx xx). It is not in β normal form since it contains a β redex. But the only possible β reduction on it makes no changes to that term, and α conversions never remove β redexes from a term. Thus, this term cannot have a β normal form. Example 10 Let us consider the term (λx y)((λx xx)(λx xx)). This term has two β redexes: the whole term itself and the subterm that was considered 9
above. We can choose which redex to apply β reduction to. If we choose the former, we reach a β normal form immediately: (λx y)((λx xx)(λx xx)) β y But if we choose the latter, the term is not changed, and we have to make the choice again. If we stubbornly always choose the latter, we will never learn the β normal form. There are two important reduction strategies, that is, systematic rules for choosing which β redex to apply β reduction to. Normal order reduction always chooses the leftmost of those β redexes that are not subterms of any other β-redex. Applicative order reduction always chooses the leftmost of those β redexes that have no other β redexes as subterm. It is sometimes said that the normal order (in Finnish normaalijärjestys) chooses the leftmost outermost redex, while the applicative order (in Finnish applikatiivinen järjestys) chooses the leftmost innermost redex. The following theorem can be proven: Theorem 5 If a term has a β normal form, then a normal order reduction will eventually find it. No such theorem is true of the applicative order, as Example 10 showed. Exercise 4 Let us make the following shorthard definitions: IF = (λx x) TRUE = (λxy x) FALSE = (λxy y) Now determine the β normal forms of the following terms. 1. IF TRUE a b 2. IF FALSE a b Exercise 5 It is possible to encode nonnegative integers in pure untyped lambda calculus using what are known as Church encodings. In this system, we model a nonnegative integer n as a function of two parameters f and x that returns f(f(f... (fx)...)) 10
where the number of times f is repeated is n. Thus 0 (λfx x) 1 (λfx fx) 2 (λfx f(fx)) We can further define add (λnmfx.nf(mfx)) Using these definitions, show that one plus one is indeed two. 1.7 Important variations If we delete rule (17) from β reduction, we end up with weak β reduction. The corresponding normal form concept is weak normal form; a term is in weak normal form if and only if the only β redexes it contains are subterms of abstractions. In essence, we decline to reduce inside abstractions. The call semantics of mainstream programming languages like C and Java is best modeled using weak β reduction using the applicative order. This corresponds roughly to how these languages treat function and method calls. If we delete rule (16) from β reduction, we end up with head β reduction. The corresponding normal form concept is head normal form; a term is in head normal form if and only if the only β redexes it contains are in argument positions (that is, inside the second subterm of an argument). In essence, we decline to reduce function arguments. If we delete both rules from β reduction, we end up with weak head β reduction. The corresponding normal form concept is weak head normal form or WHNF ; a term is in weak head normal form if it is of the form tu 1... u n for some n {0, 1, 2... }, and t is not an abstraction if n 1 and in any case not an application. This is a particularly important concept as it is the semantics of function evaluation in Haskell (so long as one ignores evaluation efficiency): expressions are reduced to WHNF, at which point the computation is considered finished. 9 All these variations can be further modified by introducing graph reduction (in Finnish verkonsievennys), invented by Wadsworth (1971). The idea 9 It is often said that Haskell uses the normal order, but that is misleading. In weak head reduction, there is never a question of which redex to reduce, as there can be only one candidate. 11
is to regard the original term as an AST, and when, in the process of reduction, a variable occurrence is substituted by a term, instead of making a copy of the term we simply move all edges leading to the variable occurrence to lead instead to the root vertex of the substitute term. The result is often that the term no longer is a tree but a graph; and the graph itself may contain loops. This is the essence of lazy evaluation as it occurs in Haskell; Haskell s model is essentially weak head graph reduction. 10 References Barendregt, H. P. (1984). The Lambda Calculus. Its Syntax and Semantics. Revised. Studies in logic and the foundation of mathematics 103. Amsterdam: Elsevier. Church, Alonzo (1985). The Calculi of Lambda Conversion. Princeton University Press. Church, Alonzo and J. B. Rosser (1936). Some properties of conversion. In: Transactions of the American Mathematical Society 39, pp. 472 482. doi: 10.1090/S0002-9947-1936-1501858-0. Wadsworth, Christopher Peter (1971). Semantics and Pragmatics of the Lambda Calculus. PhD thesis. Oxford University. 10 For a more extensive tutorial on graph reduction, see http://users.jyu.fi/ ~antkaij/opetus/okp/2010/luennot/laziness.pdf. 12