Integrating Logic Programs and Connectionist Systems

Size: px

Start display at page:

Download "Integrating Logic Programs and Connectionist Systems"

David Maurice Curtis
6 years ago
Views:

Germany Pascal Hitzler Institute of Applied Informatics and Formal Description Methods

1 Integrating Logic Programs and Connectionist Systems Sebastian Bader, Steffen Hölldobler International Center for Computational Logic Technische Universität Dresden Germany Pascal Hitzler Institute of Applied Informatics and Formal Description Methods Technische Universität Karlsruhe Germany Integrating Logic Programs and Connectionist Systems 1

2 Introduction & Motivation: Overview Introduction & Motivation Propositional Logic Existing Approaches Proposititonal Logic Programs and the Core Method First-Order Logic Existing Approaches First-Order Logic Programs and the Core Method The Neural-Symbolic Learning Cycle Challenge Problems Introduction & Motivation

3 Introduction & Motivation: Connectionist Systems Well-suited to learn, to adapt to new environments, to degrade gracefully etc. Many successful applications. Approximate functions. Hardly any knowledge about the functions is needed. Trained using incomplete data. Declarative semantics is not available. Recursive networks are hardly understood. McCarthy 1988: We still observe a propositional fixation. Structured objects are difficult to represent. Smolensky 1987: Can we instantiate the power of symbolic computation within fully connectionist systems? Introduction & Motivation 3

4 Introduction & Motivation: Logic Systems Well-suited to represent and reason about structured objects and structure-sensitive processes. Many successful applications. Direct implementation of relations and functions. Explicit expert knowledge is required. Highly recursive structures. Well understood declarative semantics. Logic systems are brittle. Expert knowledge may not be available. Can we instantiate the power of connectionist computation within a logic system? Introduction & Motivation 4

5 Introduction & Motivation: Objective Seek the best of both paradigms! Understanding the relation between connectionist and logic systems. Contribute to the open research problems of both areas. Well developed for propositional case. Hard problem: going beyond. In this lecture: Overview on existing approaches. Logic programs and recurrent networks. Semantic operators for logic programs can be computed by connectionist systems. Introduction to the core method. Introduction & Motivation 5

6 Connectionist Networks A connectionist network consists of a set U of units and a set W U U of connections. Each connection is labeled by a weight w R. If there is a connection from unit u j to u k, then w kj is its associated weight. A unit is specified by an input vector i = (i 1,..., i m ), i j R, 1 j m, an activation function Φ mapping i to a potential p R, an output function Ψ mapping p to an (output) value v R. If there is a connection from u j to u k then w kj v j is the input received by u k along this connection. The potential and value of a unit are synchronously recomputed (or updated). Often a linear time t is added as parameter to input, potential and value. The state of a network with units u 1,..., u n at time t is (v 1 (t),..., v n (t)). Introduction & Motivation 6

7 Propositional Logic Existing Approaches Finite Automata and McCulloch-Pitts Networks Weighted Automata and Semiring Artificial Neural Networks Propositional Reasoning and Symmetric/Stochastic Networks Other Approaches Proposititonal Logic Programs and the Core Method The Very Idea Logic Programs Propositional Core Method Backpropagation Knowledge-Based Artificial Neural Networks Propositional Core Method using Sigmoidal Units Further Extensions Propositional Logic 7

8 Finite Automata and McCulloch-Pitts Networks McCulloch, Pitts 1943: Can the activities of nervous systems be modelled by a logical calculus? A McCulloch-Pitts network consists of a set U of binary threshold units and a set W U U of weighted connections. The set U I of input units is defined as U I = {u k U ( u j U) w kj = 0}. The set U O of output units is defined as U O = {u j U ( u k U) w kj = 0}. U I. McCulloch-Pitts network. U O Theorem McCulloch-Pitts networks are finite automata and vice versa. Propositional Logic: Existing Approaches 8

9 Binary Threshold Units u k is a binary threshold unit if where θ k R is a threshold. Three binary threshold units: Φ( i k ) = p k = P m j j=1 w kjv j 1 if pk θ Ψ(p k ) = v k = k 0 otherwise v 1 w 1 = 1 θ = 0.5 v = v 1 u v 1 v w 31 = 1 w 3 = 1 θ 3 = 0.5 u 3 v 1 w 31 = 1 v 3 = v 1 v v w 3 = 1 θ 3 = 1.5 u 3 v 3 = v 1 v Propositional Logic: Existing Approaches 9

10 Weighted Automata and Semiring Artificial Neural Networks Bader, Hölldobler, Scalzitti 004: Can the result by McCulloch and Pitts be extended to weighted automata? Let (K,,, 0 K, 1 K ) be a semiring. u k is a -unit if u k is a -unit if Φ( i k ) = p k = L m j=1 w kj v j Ψ(p k ) = v k = p k Φ( i k ) = p k = J m j=1 w kj v j Ψ(p k ) = v k = p k A semiring artificial neural network consists of a set U of - and -units and a set W U U of K-weighted connections. Theorem Weighted automata are semiring artificial neural networks. Propositional Logic: Existing Approaches 10

11 Symmetric Networks Hopfield 198: Can statistical models for magnetic materials explain the behavior of certain classes of networks? A symmetric network consists of a set U of binary threshold units and a set W U U of weighted connections such that w kj = w jk for all k, j with k j. Asynchronous update procedure: while state v is unstable: update an arbitrary unit Minimizes the energy function E( v) = P k<j w kjv j v k + P k θ kv k. Propositional Logic: Existing Approaches 11

12 Stochastic Networks or Boltzmann Machines Hinton, Sejnowski 1983: Can we escape local minima? A stochastic network is a symmetric network, but the values are computed probabilistically P (v k = 1) = e (θ k p k )/T where T is called pseudo temperature. In equilibrium stochastic networks are more likely to be in a state with low energy. Kirkpatrick etal. 1983: Can we compute a global minima? Simulated annealing: decrease T gradually. Theorem (Geman, Geman 1984) A global minima is reached if T is decreased in infinitesimal small steps. Propositional Logic: Existing Approaches 1

13 Propositional Reasoning and Energy Minimization Pinkas 1991: Is there a link between propositional logic and symmetric networks? Let D = C 1,..., C m be a propositional formula in clause form. We define Example τ (C) = 8 >< >: 0 if C = [ ], p if C = [p], 1 p if C = [ p], τ (C 1 ) + τ (C ) τ (C 1 )τ (C ) if C = (C 1 C ). τ (D) = P m i=1 (1 τ (C i)) τ ( [ o, m], [ s, m], [ c, m], [ c, s], [ v, m] ) = vm cm cs + sm om + c + o. Propositional Logic: Existing Approaches 13

14 Propositional Reasoning and Symmetric Networks Theorem v = D iff τ (D) has a global minima at v. Compare τ (D) = vm cm cs + sm om + c + o with E( v) = P k<j w kjv j v k + P k θ kv k. u 1 = o u 5 = v u = m u 4 = c u 3 = s Propositional Logic: Existing Approaches 14

15 Propositional Non-Monotonic Reasoning Pinkas 1991a: Can the above mentioned approach be extended to non-monotonic reasoning? Consider D = (C 1, k 1 ),..., (C m, k m ), where C i are clauses and k i R +. The penalty of v for (C, k) is k if v = C and 0 otherwise. The penalty of v for D is the sum of the penalties for (C i, k i ). v is preferred over w wrt D if the penalty of v for D is smaller than the penalty of w for D. Modify τ to become τ (D) = P m i=1 k i(1 τ (C i )), e.g., τ ( ([ o, m], 1), ([ s, m], ), ([ c, m], 4), ([ c, s], 4), ([ v, m], 4) ) = 4vm 4cm 4cs + sm om + 8c + o. The corresponding stochastic network computes most preferred interpretations. Propositional Logic: Existing Approaches 15

16 Proposititonal Logic Programs and the Core Method The Very Idea Logic Programs Propositional Core Method Backpropagation Knowledge-Based Artificial Neural Networks Propositional Core Method using Sigmoidal Units Further Extensions Propositional Logic Programs and the Core Method 16

17 The Very Idea Various semantics for logic programs coincide with fixed points of associated immediate consequence operators (e.g., Apt, vanemden 198). Banach Contraction Mapping Theorem A contraction mapping f defined on a complete metric space (X, d) has a unique fixed point. The sequence y, f(y), f(f(y)),... converges to this fixed point for any y X. Fitting 1994: Consider logic programs, whose immediate consequence operator is a contraction. Funahashi 1989: Every continuous function on the reals can be uniformly approximated by feedforward connectionist networks. Hölldobler, Kalinke, Störr 1999: Consider logic programs, whose immediate consequence operator is continuous on the reals. Propositional Logic Programs and the Core Method 17

18 Metrics A metric on a space M is a mapping d : M M R such that d(x, y) = 0 iff x = y, d(x, y) = d(y, x), and d(x, y) d(x, z) + d(z, y). Let (M, d) be a metric space and S = (s i s i M) a sequence. S converges if ( s M)( ɛ > 0)( N)( n N) d(s n, s) ɛ. S is Cauchy if ( ɛ > O)( N)( n, m N) d(s n, s m ) ɛ. (M, d) is complete if every Cauchy sequence converges. A mapping f : M M is a contraction on (M, d) if ( 0 < k < 1)( x, y M) d(f(x), f(y)) k d(x, y). Propositional Logic Programs and the Core Method 18

19 Propositional Logic Programs A propositional logic program P over a propositional language L is a finite set of clauses A L 1... L n, where A is an atom, L i are literals and n 0. P is definite if all L i, 1 i n are atoms. Let V be the set of all propositional variables occurring in L. An interpretation I is a mapping V {, }. I can be represented by the set of atoms which are mapped to under I. V is the set of all interpretations. Immediate consequence operator T P : V V : T P (I) = {A there is a clause A L 1... L n P such that I = L 1... L n }. I is a supported model iff T P (I) = I. Propositional Logic Programs and the Core Method 19

20 The Core Method Let L be a logic language. Given a logic program P together with immediate consequence operator T P. Let I be the set of interpretations for P. Find a mapping R : I R n. Construct a feed-forward network computing f P : R n R n, called the core, such that the following holds: If T P (I) = J then f P (R(I)) = R(J), where I, J I. If f P ( s) = t then T P (R 1 ( s)) = R 1 ( t), where s, t R n. Connect the units in the output layer recursively to the units in the input layer. Show that the following holds I = lfp (T P ) iff the recurrent network converges to or approximates R(I). Connectionist model generation using recurrent networks with feed forward core. Propositional Logic Programs and the Core Method 0

21 3-Layer Recurrent Networks... output layer... hidden layer... input layer... core At each point in time all units do: apply activation function to obtain potential, apply output function to obtain output. Propositional Logic Programs and the Core Method 1

22 Propositional Core Method using Binary Threshold Units Let L be the language of propositional logic over a set V of variables. Let P be a propositional logic program, e.g., P = {A, C A B, C A B}. I = V is the set of interpretations for P. T P (I) = {A A L 1... L m P such that I = L 1... L m }. T P ( ) = {A} T P ({A}) = {A, C} T P ({A, C}) = {A, C} = lfp (T P ) Propositional Logic Programs and the Core Method

23 Representing Interpretations I = V Let n = V and identify V with {1,..., n}. Define R : I R n such that for all 1 j n we find: R(I)[j] = j 1 if j I, 0 if j I. E.g., if V = {A, B, C} = {1,, 3} and I = {A, C} then R(I) = (1, 0, 1). Other encodings are possible, e.g., R(I)[j] = j 1 if j I, 1 if j I. Propositional Logic Programs and the Core Method 3

24 Computing the Core Consider again P = {A, C A B, C A B}. A translation algorithm translates P into a core of binary threshold units: A B C ω ω ω output layer ω ω ω ω hidden layer ω 1 1 A B C 1 input layer Propositional Logic Programs and the Core Method 4

25 Some Results Proposition -layer networks cannot compute T P for definite P. Theorem For each program P, there exists a core computing T P. Recall P = {A, C A B, C A B}. Adding recurrent connections: ω ω ω ω ω ω Propositional Logic Programs and the Core Method 5

26 More Results A logic programs P is said to be strongly determined if there exists a metric d on the set of all Herbrand interpretations for P such that T P is a contraction wrt d. Corollary Let P be a strongly determined program. Then there exists a core with recurrent connections such that the computation with an arbitrary initial input converges and yields the unique fixed point of T P. Let n be the number of clauses and m be the number of propositional variables occurring in P. m + n units, mn connections in the core. T P (I) is computed in steps. The parallel computational model to compute T P (I) is optimal. The recurrent network settles down in 3n steps in the worst case. See Hölldobler, Kalinke 1994 or Hitzler, Hölldobler, Seda 004 for details. Propositional Logic Programs and the Core Method 6

27 Rule Extraction (1) Proposition For each core C there exists a program P such that C computes T P. u u u u u u 1 u u 1 u u 3 u 4 u 5 u 6 u 7 p 3 v 3 p 4 v 4 p 5 v 5 p 6 v 6 p 7 v Propositional Logic Programs and the Core Method 7

28 Rule Extraction () Extracted program: P = { A 1 A 1 A, A 1 A 1 A, A A 1 A, A 1 A 1 A, A A 1 A, A 1 A 1 A, A A 1 A }. Simplified form: P = {A 1, A A 1, A A 1 A }. Propositional Logic Programs and the Core Method 8

29 3-Layer Feed-Forward Networks Revisited Theorem (Funahashi 1989) Suppose that Ψ : R R is non-constant, bounded, monotone increasing and continuous. Let K R n be compact, let f : K R be continuous, and let ε > 0. Then there exists a 3-layer feed-forward network with output function Ψ for the hidden layer and linear output function for the input and output layer whose input-output mapping f : K R satisfies max x K f(x) f(x) < ε. Every continuous function f : K R can be uniformly approximated by input-output functions of 3-layer feed-forward networks. u k is a sigmoidal unit if Φ( i k ) = p k = P m j=1 w kjv j Ψ(p k ) = v k = 1 1+e β(θ k p k ) where θ k R is a threshold (or bias) and β > 0 a steepness parameter. Propositional Logic Programs and the Core Method 9

30 Backpropagation Bryson, Ho 1969, Werbos 1974, Parker 1985, Rumelhart, etal. 1986: Can 3-layer feed-forward networks learn a particular function? Training set of input-output pairs {( i l, o l ) 1 l n}. Minimize E = P l El where E l = 1 P k (ol k vl k ). Gradient descent algorithm to learn appropriate weights. Backpropagation 1 Initialize weights arbitrarily. Present input pattern i l at time t. 3 Compute output pattern v l at time t +. 4 Change weights according to w l ij = ηδl i vl j, where j δ l Ψ i = i (pl i ) (ol i vl i ) Ψ i (pl i ) P k δl k w ki η > 0 is called learning rate. if i is output unit, if i is hidden unit, Propositional Logic Programs and the Core Method 30

31 Knowledge Based Artificial Neural Networks Towell, Shavlik 1994: Can we do better than empirical learning? Sets of hierarchical logic programs, e.g., P = {A B C D, A D E, H F G, K A, H}. A ω ω 3ω K ω ω ω 3ω H B C D E F G Propositional Logic Programs and the Core Method 31

32 Knowledge Based Artificial Neural Networks Learning Given hierachical sets of propositional rules as background knowledge. Map rules into multi-layer feed forward networks with sigmoidal units. Add hidden units (optional). Add units for known input features that are not referenced in the rules. Fully connect layers. Add near-zero random numbers to all links and thresholds. Apply backpropagation. Empirical evaluation: system performs better than purely empirical and purely hand-built classifiers. Propositional Logic Programs and the Core Method 3

33 Knowledge Based Artificial Neural Networks A Problem Works if rules have few conditions and there are few rules with the same head. C ω A 19ω 19ω B A 1 A 9 A 10 B 1 B B 10 1 p A = p B = 9ω and v A = v B = 1+eβ(9.5ω 9ω) 0.46 with β = 1. 1 p C = 0.9ω and v c = 1+eβ(0.5ω 0.9ω) 0.6 with β = 1. Propositional Logic Programs and the Core Method 33

34 Propositional Core Method using Pipolar Sigmoidal Units d Avila Garcez, Zaverucha, Carvalho 1997: Can we combine the ideas in Hölldobler, Kalinke 1994 and Towell, Shavlik 1994 while avoiding the above mentioned problem? Consider propositional logic language. Let I be an interpretation and a [0, 1]. R(I)[j] = j v [a, 1] if j I, w [ 1, a] if j I. Replace threshold and sigmoidal units by bipolar sigmoidal ones, i.e., units with Φ( i k ) = p k = P m j=1 w kjv j, Ψ(p k ) = v k = 1+e β(θ k p k ) 1, where θ k R is a threshold (or bias) and β > 0 a steepness parameter. Propositional Logic Programs and the Core Method 34

35 The Task How should a, ω and θ i be selected such that: v i [a, 1] or v i [ 1, a] and the core computes the immediate consequence operator? Propositional Logic Programs and the Core Method 35

36 Hidden Layer Units Consider A L 1... L n. Let u be the hidden layer unit for this rule. Suppose I = L 1... L n. u receives input ωa from unit representing L i. p u nωa = p + u. Suppose I = L 1... L n. u receives input ωa from at least one unit representing L i. p u (n 1)ω1 ωa = p u. θ u = nωa+(n 1)ω ωa = (na + n 1 a) ω = (n 1)(a + 1)ω. Propositional Logic Programs and the Core Method 36

37 Output Layer Units Let µ be the number of clause with head A. Consider A L 1... L n. Suppose I = L 1... L n. p A ωa + (µ 1)ω( 1) = ωa (µ 1)ω = p + A. Suppose for all rules of the form A L 1... L n we find I = L 1... L n. p A µωa = p A. θ A = ωa (µ 1)ω µωa = (a µ + 1 µa) ω = (1 µ)(a + 1)ω. Propositional Logic Programs and the Core Method 37

38 Computing a Value for a p + u > p u : nωa > (n 1)ω ωa. nωa + ωa > (n 1)ω. a(n + 1)ω > (n 1)ω. a > n 1 n+1. p + A > p A : ωa (µ 1)ω > µaω. ωa + µaω > (µ 1)ω. a(1 + µ)ω > (µ 1)ω. a > µ 1 µ+1. Consider all rules minimum value for a. Propositional Logic Programs and the Core Method 38

39 Computing a Value for ω Ψ(p) = 1+eβ(θ p) 1 a. 1+eβ(θ p) 1 + a. 1+a 1 + eβ(θ p). 1+a 1 = 1+a 1+a 1+a = 1 a 1+a eβ(θ p). ln( 1 a 1+a ) β(θ p). 1 β ln(1 a 1+a ) θ p. Consider a hidden layer unit: 1 β ln(1 a 1+a ) (n 1)(a + 1)ω ω (n 1 a(n+1))β ln(1 a 1+a nωa = na+n a 1 na ω = n 1 a(n+1) ω. ) because a n 1 n+1. Consider all hidden and output layer units as well as the case that Ψ(p) a: minimum value for ω. Propositional Logic Programs and the Core Method 39

40 Results Relation to logic programs is preserved. The core is trainable by backpropagation. Many interesting applications, e.g.: DNA sequence analysis. Power system fault diagnosis. Empirical evaluation: system performs better than well-known machine learning systems. See d Avila Garcez, Broda, Gabbay 00 for details. Propositional Logic Programs and the Core Method 40

41 Further Extensions Many-valued logic programs Modal logic programs Answer set programming Metalevel priorities Rule extraction Propositional Logic Programs and the Core Method 41

42 Propositional Core Method Three-Valued Logic Programs Kalinke 1994: Consider truth values,, u. Interpretations are pairs I = I +, I. Immediate consequence operator Φ P (I) = J +, J, where J + = {A A L 1... L m P and I(L 1... L m ) = }, J = {A for all A L 1... L m P : I(L 1... L m ) = }. Let n = V and identify V with {1,..., n}. Define R : I R n as follows: j 1 if j I + ff R(I)[j 1] = 0 if j I + and R(I)[j] = j 1 if j I 0 if j I ff Propositional Logic Programs and the Core Method 4

43 Propositional Core Method Multi-Valued Logic Programs For each program P, there exists a core computing Φ P, e.g., P = {C A B, D C E, D C}. A A B B C C D D E E ω ω 3ω ω ω ω ω ω 3ω ω 3ω ω ω ω ω ω 1 A A B B C C D D E E Lane, Seda 004: Extension to finitely determined sets of truth values. Propositional Logic Programs and the Core Method 43

44 Propositional Core Method Modal Logic Programs d Avila Garcez, Lamb, Gabbay 00. Let L be a propositional logic language plus the modalities and, and a finite set of labels w 1,..., w k denoting worlds. Let B be an atom, then B and B are modal atoms. A modal definite logic program P is a set of clauses of the form w i : A A 1... A m together with a finite set of relations w i w j, where w i, 1 i, j k, are labels and A, A 1,..., A m are atoms or modal atoms. P = S k i=1 P i, where P i consists of all clauses labelled with w i. Propositional Logic Programs and the Core Method 44

45 Modal Logic Programs Semantics Example: P = {w 1 : A, w 1 : C A} {w : B} {w 3 : B} {w 4 : B} {w 1 w, w 1 w 3, w 1 w 4, w w 4, } Kripke semantics: C B B A w 1 C C C C B B B B B A w w 4 C B B A w 3 f C (w 1 ) = w 4 Propositional Logic Programs and the Core Method 45

46 Modal Immediate Consequence Operator Interpretations are tuples I = I 1,..., I k Immediate consequence operator MT P (I) = J 1,..., J k, where J i = {A there exists A A 1... A m P i such that {A 1,..., A m } I i } { A there exists w i w j P and A I j } { A for all w i w j P we find A I j } {A there exists w j w i P and A I j } {A there exists w j w i P, A I j and f A (w j ) = w i } Propositional Logic Programs and the Core Method 46

47 Modal Logic Programs The Translation Algorithm Let n = V and identify V with {1,..., n}. Let a [0, 1]. Define R : I R 3n as follows: R(I)[3j ] = R(I)[3j 1] = R(I)[3j] = Translation algorithm such that j v [a, 1] j w [ 1, a] v [a, 1] j w [ 1, a] v [a, 1] w [ 1, a] if j Ij if j I j if j Ij if j I j if j Ij if j I j for each world the local part of MT P is computed by a core, the cores are turned into recurrent networks, and the cores are connected with respect to the given set of relations. Propositional Logic Programs and the Core Method 47

48 The Example Network w 4 w w 3 w 1 C C C B B B A A A Propositional Logic Programs and the Core Method 48

49 First-Order Logic Existing Approaches Reflexive Reasoning and SHRUTI Connectionist Term Representations Holographic Reduced Representations Plate 1991 Recursive Auto-Associative Memory Pollack 1988 Horn logic and CHCL Hölldobler 1990, Hölldobler, Kurfess 199 Other Approaches First-Order Logic Programs and the Core Method Initial Approach Construction of Approximating Networks Topological Analysis and Generalisations Employing Iterated Function Systems First-Order Logic 49

50 Reflexive Reasoning Humans are capable of performing a wide variety of cognitive tasks with extreme ease and efficiency. For traditional AI systems, the same problems turn out to be intractable. Human consensus knowledge: about 10 8 rules and facts. Wanted: Reflexive decisions within sublinear time. Shastri, Ajjanagadde 1993: SHRUTI. First-Order Logic 50

51 SHRUTI Knowledge Base Finite set of constants C, finite set of variables V. Rules: ( X 1... X m ) (p 1 (...)... p n (...) ( Y 1... Y k p(...)). p, p i, 1 i n, are multi-place predicate symbols. Arguments of the p i : variables from {X 1,..., X m } V. Arguments of p are from {X 1,..., X m } {Y 1,..., Y k } C. {Y 1,..., Y k } V. {X 1,..., X m } {Y 1,..., Y k } =. Facts and queries (goals): ( Z 1... Z l ) q(...). Multi-place predicate symbol q. Arguments of q are from {Z 1,..., Z l } C. {Z 1,..., Z l } V. First-Order Logic 51

52 Further Restrictions Restrictions to rules, facts, and goals: No function symbols except constants. Only universally bound variables may occur as arguments in the conditions of a rule. All variables occurring in a fact or goal occur only once and are existentially bound. An existentially quantified variable is only unified with variables. A variable which occurs more than once in the conditions of a rule must occur in the conclusion of the rule and must be bound when the conclusion is unified with a goal. A rule is used only a fixed number of times. Incompleteness. First-Order Logic 5

53 SHRUTI Example Rules P = { owns(y, Z) gives(x, Y, Z), owns(x, Y ) buys(x, Y ), can-sell(x, Y ) owns(x, Y ), gives(john, josephine, book), ( X) buys(john, X), owns(josephine, ball) }, Queries: can-sell(josephine, book) yes ( X) owns(josephine, X) yes {X book} {X ball} First-Order Logic 53

54 SHRUTI : The Network book john ball josephine can-sell 6 6 owns from john from jos. from book gives buys from john First-Order Logic 54

55 Solving the Variable Binding Problem buys buys nd arg buys 1st arg buys buys gives gives 3nd arg gives nd arg gives 1st arg gives gives owns owns nd arg owns 1st arg owns owns can sell nd arg can sell 1st arg can sell can sell josephine ball john book First-Order Logic 55

56 SHRUTI Remarks Answers are derived in time proportional to depth of search space. Number of units as well as of connections is linear in the size of the knowledge base. Extensions: compute answer substitutions allow a fixed number of copies of rules allow multiple literals in the body of a rule built in a taxonomy ROBIN (Lange, Dyer 1989): signatures instead of phases. Biological plausibility. Trading expressiveness for time and size. Logical reconstruction by Beringer, Hölldobler 1993: Reflexive reasoning is reasoning by reduction. First-Order Logic 56

57 First-Order Logic Programs and the Core Method Initial Approach Construction of Approximating Networks Topological Analysis and Generalisations Employing Iterated Function Systems First-Order Logic Programs and the Core Method 57

58 Logic Programs A logic program P over a first-order language L is a finite set of clauses A L 1... L n, where A is an atom, L i are literals and n 0. B L is the set of all ground atoms over L called Herbrand base. A Herbrand interpretation I is a mapping B L {, }. B L is the set of all Herbrand interpretations. ground(p) is the set of all ground instances of clauses in P. Immediate consequence operator T P : B L B L: T P (I) = {A there is a clause A L 1... L n ground(p) such that I = L 1... L n }. I is a supported model iff T P (I) = I. First-Order Logic Programs and the Core Method 58

59 The Initial Approach Hölldobler, Kalinke, Störr 1999: Can the core method be extended to first-order logic programs? Problem Given a logic program P over a first order language L together with T P : B L B L. B L is countably infinite. The method used to relate propositional logic and connectionist systems is not applicable. How can the gap between the discrete, symbolic setting of logic, and the continuous, real valued setting of connectionist networks be closed? First-Order Logic Programs and the Core Method 59

60 The Goal Find R : B L R and f P : R R such that the following conditions hold. T P (I) = I implies f P (R(I)) = R(I ). f P (x) = x implies T P (R 1 (x)) = R 1 (x ). f P is a sound and complete encoding of T P. T P is a contraction on B L iff f P is a contraction on R. The contraction property and fixed points are preserved. f P is continuous on R. A connectionst network approximating f P is known to exist. First-Order Logic Programs and the Core Method 60

61 Acyclic Logic Programs Let P be a program over a first order language L. A level mapping for P is a function l : B L N. We define l( A) = l(a). We can associate a metric d L with L and l. Let I, J B L: d L (I, J) = j 0 if I = J n if n is the smallest level on which I and J differ. Proposition (Fitting 1994) ( B L, d L ) is a complete metric space. P is said to be acyclic wrt a level mapping l, if for every A L 1... L n ground(p) we find l(a) > l(l i ) for all i. Proposition Let P be an acyclic logic program wrt l and d L the metric associated with L and l, then T P is a contraction on ( B L, d L ). First-Order Logic Programs and the Core Method 61

62 Mapping Interpretations to Real Numbers Let D = {r R r = P i=1 a i4 i, where a i {0, 1} for all i}. Let l be a bijective level mapping. {, } can be identified with {0, 1}. The set of all mappings B L {, } can be identified with the set of all mappings N {0, 1}. Let I L be the set of all mappings from B L to {0, 1}. Let R : I L D be defined as R(I) = X I(l 1 (i))4 i. i=1 Proposition R is a bijection. We have a sound and complete encoding of interpretations. First-Order Logic Programs and the Core Method 6

63 Mapping Immediate Consequence Operators to Functions on the Reals We define f P : D D : r R(T P (R 1 (r))). T P I I R R r f P r We have a sound and complete encoding of T P. Proposition Let P be an acylic program wrt a bijective level mapping. f P is a contraction on D. Contraction property and fixed points are preserved. First-Order Logic Programs and the Core Method 63

64 Approximating Continuous Functions Corollary f P is continuous. Recall Funahashi s theorem: Every continuous function f : K R can be uniformly approximated by input-output functions of 3-layer feed forward networks. Theorem f P can be uniformly approximated by input-output functions of 3-layer feed forward networks. T P can be approximated as well by applying R 1. Connectionist network approximating immediate consequence operator exists. First-Order Logic Programs and the Core Method 64

65 An Example Consider P = {q(0), q(s(x)) q(x)} and let l(q(s n (0))) = n + 1. P is acyclic wrt l, l is bijective, R(B L ) = 1 3. f P (R(I)) = 4 l(q(0)) + P q(x) I 4 l(q(s(x))) = 4 l(q(0)) + P q(x) I 4 (l(q(x)))+1) = 1+R(I) 4. Approximation of f P to accuracy ε yields» 1 + x f(x) ε, 1 + x ε. Starting with some x and iterating f yields in the limit a value» 1 4ε r, 1 + 4ε. 3 3 Applying R 1 to r we find q(s n (0)) R 1 (r) if n < log 4 ε 1. First-Order Logic Programs and the Core Method 65

66 Approximation of Interpretations Let P be a logic program over a first order language L and l a level mapping. An interpretation I approximates an interpretation J to a degree n N if for all atoms A B L with l(a) < n we find I(A) = iff J(A) =. I approximates J to a degree n iff d L (I, J) n. First-Order Logic Programs and the Core Method 66

67 Approximation of Supported Models Given an acyclic logic program P with bijective level mapping. Let T P be the immediate consequence operator associated with P and M P the least supported model of P. We can approximate T P by a 3-layer feed forward network. We can turn this network into a recurrent one. Does the recurrent network approximate the supported model of P? Theorem For an arbitrary m N there exists a recursive network with sigmoidal activation function for the hidden layer units and linear activation functions for the input and output layer units computing a function f P such that there exists an n 0 N such that for all n n 0 and for all x [ 1, 1] we find d L (R 1 (f n P (x)), M P) m. First-Order Logic Programs and the Core Method 67

68 Literature Apt, van Emden 198: Contributions to the Theory of Logic Programming. Journal of the ACM 9, Bader, Hölldobler, Scalzitti 004: Semiring Artificial Neural Networks and Weighted Automata and an Application to Digital Image Encoding. In: KI 004: Advances in Artificial Intelligence, Lecture Notes in Artificial Intelligence 338, Beringer, Hölldobler 1993: On the Adequateness of the Connection Method. In: Proceedings of the AAAI National Conference on Artificial Intelligence, d Avila Garcez, Gabbay 00: Neural-Symbolic Learning Systems. Springer. d Avila Garcez, Lamb, Gabbay 00: A Connectionist Inductive Learning System for Modal Logic Programming. In: Proceedings of the IEEE International Conference on Neural Information Processing. d Avila Garcez, Zaverucha, Carvalho 1997: Logic Programming and Inductive Inference in Artificial Neural Networks. In: Knowledge Representation in Neural Networks Logos, Berlin, Fitting 1994: Metric Methods Three Examples and a Theorem. Journal of Logic Programming 1, First-Order Logic Programs and the Core Method 68

69 Funahashi 1989: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks, Geman, Geman 1984: Stochastic Relaxation, Gibbs Distribution, and the Bayesian Restoration of Image. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, Hinton, Sejnowski 1983: Optimal Perceptual Inference. In: Proceedings of the IEEE Conference on Computer Vision and Recognition, Hitzler, Hölldobler, Seda 004: Logic Programs and Connectionist Networks. Journal of Applied Logic, Hölldobler 1990: A Structured Connectionist Unification Algorithm. In: Proceedings of the AAAI National Conference on Artificial Intelligence, Hölldobler, Kalinke 1994: Towards a Massively Parallel Computational Model for Logic Programming. In: Proceedings of the ECAI94 Workshop on Combining Symbolic and Connectionist Processing, Hölldobler, Kalinke, Störr 1999: Approximating the Semantics of Logic Programs by Recurrent Neural Networks. Applied Intelligence 11, Hölldobler, Kurfess 199: CHCL A Connectionist Inference System. In: Parallelization in Inference Systems, Lecture Notes in Artificial Intelligence, 590, Hopfield 198: Neural Networks and Physical Systems with Emergent Collective Computational Abilities. In: Proceedings of the National Academy of Sciences USA, First-Order Logic Programs and the Core Method 69

70 Kalinke 1994: Ein massiv paralleles Berechnungsmodell für normale logische Programme. Diplomarbeit, TU Dresden, Fakultät Informatik (in German). Kirkpatrick etal. 1983: Optimization by Simulated Annealing. Science 0, Lane, Seda 004: Some Aspects of the Integration of Connectionist and Logic- Based Systems. University College Cork. Lange, Dyer 1989: High-Level Inferencing in a Connectionist Network. Connection Science 1, McCarthy 1988: Epistemological Challenges for Connectionism. Behavioural and Brain Sciences 11, 44. McCulloch, Pitts 1943: A Logical Calculus and the Ideas immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5, Plate 1991: Holographic Reduced Representations. In Proceedings of the International Joint Conference on Artificial Intelligence, Pinkas 1991: Symmetric Neural Networks and Logic Satisfiability. Neural Computation 3, Pinkas 1991a: Propositional Non-Monotonic Reasoning and Inconsistency in Symmetrical Neural Networks. In: Proceedings International Joint Conference on Artificial Intelligence, Pollack 1988: Recursive auto-associative memory: Devising compositional distributed representations. In: Proceedings of the Annual Conference of the Cognitive Science Society, First-Order Logic Programs and the Core Method 70

71 Rumelhart eta. 1986: Parallel Distributed Processing. MIT Press. Shastri, Ajjanagadde 1993: From Associations to Systematic Reasoning: A Connectionist Representation of Rules, Variables and Dynamic Bindings using Temporal Synchrony. Behavioural and Brain Sciences 16, Smolensky 1987: On Variable Binding and the Representation of Symbolic Structures in Connectionist Systems. Report No. CU-CS , Department of Computer Science & Institute of Cognitive Science, University of Colorado, Boulder. Towell, Shavlik 1994: Knowledge Based Artificial Neural Networks. Artificial Intelligence 70, First-Order Logic Programs and the Core Method 71

Recurrent Neural Networks and Logic Programs

Recurrent Neural Networks and Logic Programs The Very Idea Propositional Logic Programs Propositional Logic Programs and Learning Propositional Logic Programs and Modalities First Order Logic Programs