The Core Method International Center for Computational Logic Technische Universität Dresden Germany The Very Idea The Propositional CORE Method Human Reasoning The Core Method 1
The Very Idea Various semantics for logic programs coincide with fixed points of associated immediate consequence operators (Apt, van Emden: Contributions to the Theory of Logic Programming. Journal of the ACM 9, 841-86: 198). Banach Contraction Mapping Theorem A contraction mapping f defined on a complete metric space (X, d) has a unique fixed point. The sequence y, f(y), f(f(y)),... converges to this fixed point for any y X. Consider programs whose immediate consequence operator is a contraction. (Fitting: Metric Methods Three Examples and a Theorem. Journal of Logic Programming 1, 113-17: 1994). Every continuous function on the reals can be uniformly approximated by feedforward connectionist networks (Funahashi: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks, 183-19: 1989). Consider programs whose immediate consequence operator is continuous. (H., Kalinke, Störr: Approximating the Semantics of Logic Programs by Recurrent Neural Networks. Applied Intelligence 11, 45-59: 1999). The Core Method
First Ideas H., Kalinke: Towards a New Massively Parallel Computational Model for Logic Programming In: Proceedings of the ECAI94 Workshop on Combining Symbolic and Connectionist Processing, 68-77: 1994. The Core Method 3
Interpretations Let L be a propositional language and {, } the set of truth values. An interpretation I is a mapping L {, }. For a given program P, an interpretation I can be represented by the set of atoms occurring in P which are mapped to under I, i.e. I R P. R P is the set of all interpretations for P. ( R P, ) is a complete lattice. An interpretation I for P is a model for P iff I(P) =. The Core Method 4
Immediate Consequence Operator Immediate consequence operator T P : R P R P : T P (I) = {A there is a clause A L 1... L n P such that I = L 1... L n}. I is a supported model iff T P (I) = I. Let lfp T P be the least fixed point of T P if it exists. The Core Method 5
The Propositional CORE Method Let L be a propositional logic language. Given a logic program P together with immediate consequence operator T P. Let R P = m and R P be the set of interpretations for P. Find a mapping rep : R P R m. Construct a feed-forward network computing f P : R m R m, called the core, such that the following holds: If T P (I) = J then f P (rep(i)) = rep(j), where I, J R P. If f P (s) = t then T P (rep 1 (s)) = rep 1 (t), where s, t R m. Connect the units in the output layer recursively to the units in the input layer. Show that the following holds I = lfp T P iff the recurrent network converges to rep(i), i.e. it reaches a stable state with input and output layer representing rep(i). Connectionist model generation using recurrent networks with feed-forward core. The Core Method 6
3-Layer Recurrent Networks core 1... input layer 1 hidden layer output layer At each point in time all units do: apply activation function to obtain potential, apply output function to obtain output. The Core Method 7
Propositional CORE Method using Binary Threshold Units Let L be the language of propositional logic. Let P be a propositional logic program, e.g., P = {p, r p q, r p q}. T P (I) = {A A L 1... L m P such that I = L 1... L m}. T P ( ) = {p} T P ({p}) = {p, r} T P ({p, r}) = {p, r} = lfp T P The Core Method 8
Representing Interpretations R P Let m = R P and identify R P with {1,...,m}. Define rep : R P R m such that for all 1 j m we find: { 1 if j I, rep(i)[j] = 0 if j I. E.g., if R P = {p, q, r} = {1,, 3} and I = {p, r} then rep(i) = (1, 0, 1). Other encodings are possible, e.g., rep (I)[j] = { 1 if j I, 1 if j I. We can represent interpretations by arrays of binary or bipolar threshold units. The Core Method 9
Computing the Core Theorem For each program P, there exists a core of logical threshold units computing T P. Proof Let P be a program, m = R P, and R +. Wlog we assume that all occurrences of in P have been eliminated. Translation Algorithm 1 Input and output layer: vector of length m of binary threshold units with threshold 0 and / in the input and output layer, respectively. For each clause of the form A L 1... L k P, k 0, do:.1 Add a binary threshold unit u h to the hidden layer.. Connect u h to the unit representing A in the output layer with weight..3 For each literal L j, 1 j k, connect the unit representing L j in the input layer to u h and, if L j is an atom then set the weight to otherwise set the weight to..4 Set the threshold of u h to (p 0), where p is the number of positive literals occurring in L 1... L k. The Core Method 10
Computing the Core (Continued) Theorem For each program P, there exists a core of logical threshold units computing T P. Proof (Continued) Some observations: u h becomes active at time t + 1 iff L 1... L k is mapped to by the interpretation represented by the state of the input layer at time t. The unit representing A in the output layer becomes active at time t + iff there is a rule of the form A L 1... L k P and the unit u h in the hidden layer corresponding to this rule is active at time t + 1. The result follows immediately from these observations. qed The Core Method 11
Computing the Core (Example) Consider again P = {p, r p q, r p q}. The translation algorithm yields: p 1 - p q 1 q r 1 r The Core Method 1
Hidden Layers are Needed The XOR can be represented by the program {r p q, r p q}. Proposition -layer networks cannot compute T P for definite P. Proof Suppose there exist -layer networks for definite P computing T P. Consider P = {p q, p r s, p t u}. p 1 w 7 θ 7 7 p q 8 q r 3 9 r s 4 w 76 10 s t 5 11 t u 6 1 u Let v be the state of the input layer; v is an interpretation. The Core Method 13
Hidden Layers are Needed (Continued) Proposition -layer networks cannot compute T P for definite P. Proof (Continued) Consider P = {p q, p r s, p t u}. We have to find θ 7 and w 7j, j 6, such that p T P (v) iff w 7 v + w 73 v 3 + w 74 v 4 + w 75 v 5 + w 76 v 6 θ 7. Because conjunction is commutative we find p T P (v) iff w 7 v + w 74 v 3 + w 73 v 4 + w 76 v 5 + w 75 v 6 θ 7. Consequently, p T P (v) iff w 7 v + w 1 (v 3 + v 4 ) + w (v 5 + v 6 ) θ 7, where w 1 = 1 (w 73 + w 74 ) and w = 1 (w 75 + w 76 ). The Core Method 14
Hidden Layers are Needed (Continued) Proposition -layer networks cannot compute T P for definite P. Proof (Continued) Consider P = {p q, p r s, p t u}. Likewise, because disjunction is commutative we find p T P (v) iff w x θ 7, where w = 1 3 (w 7 + w 1 + w ) and x = 7 j= v i. For the network to compute T P the following must hold: If x = 0 (v =... = v 6 = 0) then w x θ 7 < 0. If x = 1 (v = 1, v 3 =... = v 6 = 0) then w x θ 7 0. If x = (v = v 4 = v 6 = 0, v 3 = v 5 = 1) then w x θ 7 < 0. However, d(w x θ 7) = w cannot change its sign; contradiction. dx Consequently, -layer feed-forward networks cannot compute T P qed The Core Method 15
Adding Recurrent Connections Recall P = {p, r p q, r p q}. - The Core Method 16
On the Existence of Least Fixed Points Theorem For definite programs P, T P has a least fixed point which can be obtained by iterating T P starting with the empty interpretation. (Apt, van Emden: Contributions to the Theory of Logic Programming. Journal of the ACM 9, 841-86: 198.) In general, however, least fixed points do not always exist. Consider P = {p q, q p} p - p q - q The corresponding recurrent network does not reach a stable state if initialized by the empty interpretation. It has two stable states corresponding to the interpretations {p} and {q}. The Core Method 17
Metrics for Logic Programs Let P be program and l a level mapping for P. For interpretations I, J R P we define d P = { 0 if I = J n if n is the smallest level on which I and J differ. Proposition 1 ( R P, d P ) is a complete metric space. Proposition If P is acceptable, then there exists a metric space such that T P is a contraction on it. For proofs of both Propositions see Fitting: Metric Methods Three Examples and a Theorem. Journal of Logic Programming 1, 113-17: 1994. Corollary If P is acceptable, then there exists a 3-layer recurrent network of logical threshold units such that the computation starting with an arbitrary initial input converges and yields the unique fixed point of T P. The Core Method 18
Time and Space Complexity Let n = P be the number of clauses and m = R m be the number of propositional variables occurring in P. m + n units, mn connections in the core. T P (I) is computed in steps. The parallel computational model to compute T P (I) is optimal. (A parallel computational model requiring p(n) processors and t(n) time to solve a problem of size n is optimal if p(n) t(n) = O(T(n)), where T(n) is the sequential time to solve this problem; see: Karp, Ramachandran: Parallel Algorithms for Shared-Memory Machines. In: Handbook of Theoretical Computer Science, Elsevier, 869-941: 1990.) The recurrent network settles down in 3n steps in the worst case. (Dowling, Gallier: Linear-time Algorithms for Testing the Satisfiability of Propositional Horn Formulae. J. of Logic Programming 1, 67-84: 1984; Scutella: A Note on Dowling and Gallier s Top-Down Algorithm for Propositional Horn Satisfiability. J. of Logic Programming 8: 65-7: 1990.) Exercise Give an example of a program with worst case time behavior. The Core Method 19
Reasoning wrt Least Fixed Points Let P be a program and assume that T P admits a least fixed point. Let lfp T P denote the least fixed point of P. It can be shown that lfp T P is the least model of P. We define P = lm F iff lfp T P (F) =. Observe = lm =. Consider P = {p, q r}. Then, lfp T P = {p} and P = lm p q r, but P = q and P = r. If we consider = lm, then negation is not classical negation. This is the reason for using instead of. is often called negation by failure. The Core Method 0
Extensions The approach has been extended to many-valued logic programs (Kalinke 1994, Seda, Lane 004), extended logic programs (d Avila Garcez, Broda, Gabbay 00), modal logic programs (d Avila Garcez, Lamb, Gabbay 00), intuitionistic logic programs d Avila Garcez, Lamb, Gabbay 003), first-order logic programs (H., Kalinke, Störr 1999, Bader, Hitzler, H., Witzel 007). The Core Method 1
KBANN Knowledge Based Artificial Neural Networks Towell, Shavlik: Extracting Refined Rules from Knowledge Based Neural Networks. Machine Learning 131, 71-101: 1993. Can we do better than empirical learning? Consider acyclic logic programs, e.g., P = {a b c d, a d e, h f g, k a h}. b c d e 3 a k f g 3 h The Core Method
KBANN Learning Given hierachical sets of propositional rules as background knowledge. Map rules into multi-layer feed-forward networks with sigmoidal units. Add hidden units (optional). Add units for known input features that are not referenced in the rules. Fully connect layers. Add near-zero random numbers to all links and thresholds. Apply backpropagation. Empirical evaluation: system performs better than purely empirical and purely hand-built classifiers. The Core Method 3
KBANN A Problem Towel, Shavlik 1993: Works if rules have few conditions and there are few rules with the same head. q 1. q 9 q 10 q 19 s r 1 r. r 10 19 r p q = p r = 9 and v q = v r = p s = 0.9 and v s = 1 1+e β(9 9) 0.46 with β = 1. 1 1+e β(0.9 0) 0.6 with β = 1. The Core Method 4
Solving the Problem d Avila Garcez, Zaverucha, Carvalho: Logic Programming and Inductive Learning in Artificial Neural Networks. In: Knowledge Representation in Neural Networks (Herrmann, Strohmaier, eds.), Logos, Berlin, 33-46: 1997. Can we combine the ideas of the propositational CORE method and KBANN while avoiding the above mentioned problem? The approach has been generalized in Bader: Neural-Symbolic Integration. PhD thesis, TU Dresden, Informatik: 009. The Core Method 5
Propositional CORE Method using Squashing Units Let u be a squashing unit. Let Ψ = lim p Ψ(p) and Ψ + = lim p Ψ(p). Let a, a 0, a + R such that Ψ < a < a 0 < a + < Ψ +. u is active iff v a +. u is passive iff v a. u is in null-state iff v = a 0. u is undecided iff a < (v a 0 ) < a +. p + = Ψ 1 (a + ) is called minimal activation potential. p = Ψ 1 (a ) is called maximal inactivation potential. p 0 = Ψ 1 (a 0 ) is called null-state potential. The Core Method 6
The Task How can we guarantee that a unit is either active, passive or in the null-state? Suppose input layer units output only finitely many values. Let u be a hidden layer unit. If the input layer is finite, then the potential of u may only take finitely many different values. Let P = {p 1,...,p n} be the set of possible values for the potential of u. Let P + = {p P p > p 0 } and P = {p P p < p 0 }. Let m = max(m, m + ), where { 0 if P + } { = 0 if P } = m + = p +, m = p min(p + ) p0 otherwise otherwise. p 0 min(p ) Observations If the weights on the connections to u and the threshold of u are multiplied with m, then u is either active, passive or in the null-state. u produces only finitely many different output values. The transformation can be applied to output layer units as well. The Core Method 7
Example Consider a bipolar sigmoidal unit u with Ψ(p) = tanh(p). Let P = { 0.9, 0, 0.3, 0.0, 0., 0.4, 0.8} and a = 0.8, a 0 = 0.0, a + = 0.8. Then, p = Ψ 1 (a ) 1.1, p 0 = Ψ 1 (a 0 ) = 0.0, p + = Ψ 1 (a + ) 1.1. Hence, m = 1.1 0.0+0.3 = 3.66 and m+ 1.1 = 0. 0.0 = 5.493. Thus, m = max(m, m + ) = max(3.66, 5.493) = 5.493 and we obtain p tanh(p) tanh(m p) -0.9-0.716-0.999-0 -0.46-0.99-0.3-0.91-0.99 0.0 0.000 0.000 0. 0.197 0.800 0.4 0.380 0.976 0.8 0.664 0.999 Exercise Specify a core of bipolar sigmoidal units for P = {p, r p q, r p q} with a = 0.9, a 0 = 0.0, a + = 0.9. The Core Method 8
Results Relation to logic programs is preserved. For each program P, there exists a core of squashing units computing T P. If P is acceptable, then there exists a 3-layer recurrent network of squashing units such that the computation starting with an arbitrary initial input converges and yields the unique fixed point of T P. Likewise for consistent acceptable extended logic programs. The core is trainable by backpropagation. The neural symbolic cycle writable embedding Symbolic System Connectionist System trainable readable extraction The Core Method 9
Cores for Three-Valued Logic Programs Consider {p q}. A translation algorithm translates programs into a core. Recurrent connections connect the output to the input layer. Kalinke 1995, Seda, Lane 004 Φ F new Φ SvL p p p p p p p p q q q q q - q q q - - The Core Method 30
A CORE Method for Human Reasoning Consider three-layer feed forward networks of binary threshold units. The input as well as the output layer shall represent interpretations. Theorem For each program P there exists a feed-forward core computing Φ SvL,P. Add recurrent connections between corresponding units in the output and the input layer. Corollary The recurrent network reaches a stable state representing lfpφ SvL,P if initialized with,. The Core Method 31
The Suppression Task Modus Ponens If she has an essay to write, she will study late in the library. She has an essay to write. 96% of subjects conclude that she will study late in the library. P 4 = {l e ab, e, ab }. e e l l ab ab 3 - e e l l ab ab - lfpφ SvL,P4 = lm 3Ł wcp 4 = {l, e},{ab}. From {l, e},{ab} follows that she will study late in the library. The Core Method 3
The Suppression Task Alternative Arguments If she has an essay to write, she will study late in the library. She has an essay to write. If she has some textbooks to read, she will study late in the library. 96% of subjects conclude that she will study late in the library. P 5 = {l e ab 1, e, ab 1, l t ab, ab }. e e l l 3 ab 1 ab 1 t t ab ab 3 3 - e e l l ab 1 ab 1 t t - ab ab lfpφ SvL,P5 = lm 3Ł wcp 5 = {e, l},{ab 1, ab }. From {e, l},{ab 1, ab } follows that she will study late in the library. The Core Method 33
The Suppression Task Additional Argument If she has an essay to write, she will study late in the library. She has an essay to write. If the library stays open, she will study late in the library. 38% of subjects conclude that she will study late in the library. P 6 = {l e ab 1, e, l o ab, ab 1 o, ab e,}. e e l l 3 ab 1 ab 1 o o ab ab 3 3 - e e l l ab 1 ab 1 o o - ab ab lfpφ SvL,P6 = lm 3Ł wcp 6 = {e},{ab }. From {e},{ab } follows that it is unknown whether she will study late in the library. The Core Method 34
The Suppression Task Denial of Antecendent (DA) If she has an essay to write, she will study late in the library. She does not have an essay to write. 46% of subjects conclude that she will not study late in the library. P 7 = {l e ab, e, ab }. e e l l ab ab 3 - e e l l ab ab - lfpφ SvL,P7 = lm 3Ł wcp 7 =,{ab, e, l}. From,{ab, e, l} follows that she will not study late in the library. The Core Method 35
The Suppression Task Alternative Argument and DA If she has an essay to write, she will study late in the library. She does not have an essay to write. If she has textbooks to read, she will study late in the library. 4% of subjects conclude that Marian will not study late in the library. P 8 = {l e ab 1, e, ab 1, l t ab, ab }. e e l l 3 ab 1 ab 1 t t ab ab 3 3 - e e l l ab 1 ab 1 t t - ab ab lfpφ SvL,P8 = lm 3Ł wcp 8 =,{ab 1, ab, e}. From,{ab 1, ab, e} follows that it is unknown whether she will study late in the library. The Core Method 36
The Suppression Task Additional Argument and DA If she has an essay to write, she will study late in the library. She does not have an essay to write. If the library is open, she will study late in the library. 63% of subjects conclude that she will not study late in the library. P 9 = {l e ab 1, e, l o ab, ab 1 o, ab e}. e e l l 3 ab 1 ab 1 o o ab ab 3 3 - e e l l ab 1 ab 1 o o - ab ab lfpφ SvL,P9 = lm 3Ł wcp 9 = {ab },{e, l}. From {ab },{e, l} follows that she will not study late in the library. The Core Method 37
Summary Under Łukasiewicz semantics we obtain Byrne 1989 Program lm 3Ł wcp i (l) Modus Ponens l (96%) P 4 Alternative Arguments l (96%) P 5 Additional Arguments l (38%) P 6 U Modus Ponens and DA l (46%) P 7 Alternative Arguments and DA l (4%) P 8 U Additional Arguments and DA l (63%) P 9 The approach appears to be adequate. Fitting semantics or completion is inadequate. The Core Method 38