Artificial Intelligence Prof. Dr. Jürgen Dix

Size: px

Start display at page:

Download "Artificial Intelligence Prof. Dr. Jürgen Dix"

Shannon Patterson
5 years ago
Views:

1 Artificial Intelligence Prof. Dr. Jürgen Dix Department of Informatics TU Clausthal Summer 2007 Time and place: Mondays and Tuesdays in Multimediahörsaal (Tannenhöhe) Exercises: Approximately every other Tuesday. Website Visit regularly! There you will find important information about the lecture, documents, exercises et cetera. Organization: Exercise: N. Bulling/ W. Jamroga Exam: July 26th., 3pm. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 1/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 2/718 Lecture Overview History This course evolved over the last 10 years. It was first held in Koblenz (WS 95/96, WS 96/97), then in Vienna (Austria, WS 98, WS00), Bahia Blanca (Argentina, SS 98, SS 01) and Clausthal (SS 2004, SS 2005, SS 2006). Most chapters (except 4, 6, 8 and 9) are based on the seminal book of Russel/Norvig: Artificial Intelligence. I am indebted to Tristan Behrens who translated my slides into beamer tex and made many improvements. 1. Introduction 2. Searching 3. Supervised Learning 4. Knowledge Engineering (1) 5. Learning in Networks 6. Knowledge Engineering (2) 7. Planning 8. More about Logic 9. Nonmonotonic Reasoning Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 3/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 4/718

2 1. Introduction Chapter 1. Introduction 1. Introduction 1. What is AI? 1.1 What is AI? Introduction 1.1 What is AI? 1.2 From Plato to Zuse 1.3 History of AI 1.4 Intelligent Agents Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 5/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 6/ Introduction 1. What is AI? The exciting new effort to make computers think... machines with minds, in the full and literal sense (Haugeland, 1985) [The automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning... (Bellman, 1978) The art of creating machines that perform functions that require intelligence when performed by people (Kurzweil, 1990) The study of how to make computers do things at which, at the moment, people are better (Rich and Knight, 1991) Table 1.1: Several Definitions of AI The study of mental faculties through the use of computational models (Charniak and McDermott, 1985) The study of the computations that make it possible to perceive, reason, and act (Winston, 1992) A field of study that seeks to explain and emulate intelligent behavior in terms of computational processes (Schalkoff, 1990) The branch of computer science that is concerned with the automation of intelligent behavior (Luger and Stubblefield, 1993) 1. Introduction 1. What is AI? 1. Cognitive science 2. Socrates is a man. All men are mortal. Therefore Socrates is mortal. (Famous syllogisms by Aristotle.) (1) Informal description (2) Formal description (3) Problem solution (2) is often problematic due to under-specification (3) is deduction (correct inferences): only enumerable, but not decidable Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 7/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 8/718

3 1. Introduction 1. What is AI? 3. Turing Test: asaygin/tt/ttest.html Standard Turing Test Total Turing Test Turing believed in 1950: In 2000 a computer with 10 9 memory-units could be programmed such that it can chat with a human for 5 minutes and pass the Turing Test with a probability of 30 %. 1. Introduction 1. What is AI? 4. In item 2. correct inferences were mentioned. Often not enough information is available in order to act in a way that makes sense (to act in a provably correct way). Chapter 9: non-monotonic logics. The world is in general under-specified. It is also impossible to act rationally without correct inferences: reflexes. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 9/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 10/ Introduction 1. What is AI? To pass the total Turing Test, the computer will need: computer vision to perceive objects, robotics to move them about. The question Are machines able to think? leads to 2 theses: Weak AI thesis: Machines can be built, that act as if they were intelligent. Strong AI thesis: Machines that act in an intelligent way do possess cognitive states, i.e. mind. 1. Introduction 1. What is AI? The Chinese Chamber was used in 1980 by Searle to demonstrate that a system can pass the Turing Test but disproves the Strong AI Thesis. The chamber consists of: CPU: an English-speaking man without experiences with the Chinese language, Program: a book containing rules formulated in English which describe how to translate Chinese texts, Memory: sufficient pens and paper Papers with Chinese texts are passed to the man, which he translates using the book. Question Does the chamber understand Chinese? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 11/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 12/718

4 1. Introduction 2. From Plato to Zuse 1.2 From Plato to Zuse 1. Introduction 2. From Plato to Zuse 450 BC: Plato, Socrates, Aristotle Sokr.: What is characteristic of piety which makes all actions pious? Aris.: Which laws govern the rational part of the mind? 800 : Al Chwarizmi (Arabia): Algorithm 1300 : Raymundus Lullus: Ars Magna 1350 : William van Ockham: Ockham's Razor Entia non sunt multiplicanda praeter necessitatem. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 13/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 14/ Introduction 2. From Plato to Zuse : R. Descartes: Mind = physical system Free will, dualism : B. Pascal, W. Schickard: Addition-machines 1. Introduction 2. From Plato to Zuse : G. W. Leibniz: Materialism, uses ideas of Ars Magna to build a machine for simulating the human mind : F. Bacon: Empirism : J. Locke: Empirism Nihil est in intellectu quod non antefuerat in sensu : D. Hume: Induction : I. Kant: Der Verstand schöpft seine Gesetze nicht aus der Natur, sondern schreibt sie dieser vor. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 15/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 16/718

1. Introduction 2. From Plato to Zuse 1. Introduction 2. From Plato to Zuse 1805 : Jacquard: Loom 1815 1864: G. Boole: Formal language, Logic as a mathematical discipline 1792 1871: Ch.

5 1. Introduction 2. From Plato to Zuse 1. Introduction 2. From Plato to Zuse 1805 : Jacquard: Loom : G. Boole: Formal language, Logic as a mathematical discipline : Ch. Babbage: Difference Engine: Logarithm-tables Analytical Engine: with addressable memory, stored programs and conditional jumps Figure 1.1: Reconstruction of Babbage s difference engine. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 17/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 18/ Introduction 2. From Plato to Zuse : G. Frege: Begriffsschrift 2-dimensional notation for PL1 1. Introduction 2. From Plato to Zuse : D. Hilbert: Famous talk 1900 in Paris: 23 problems 23rd problem: The Entscheidungsproblem : B. Russell: 1910: Principia Mathematica Logical positivism, Vienna Circle ( ) Figure 1.2: A formula from Frege s Begriffsschrift. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 19/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 20/718

1. Introduction 2. From Plato to Zuse 1902 1983 : A. Tarski: (1936) Idea of truth in formal languages 1906 1978 : K.

6 1. Introduction 2. From Plato to Zuse : A. Tarski: (1936) Idea of truth in formal languages : K. Gödel: Completeness theorem (1930) Incompleteness theorem (1930/31) Unprovability of theorems (1936) 1. Introduction 2. From Plato to Zuse 1940: First computer Heath Robinson built to decipher German messages (Turing) 1943 Collossus built from vacuum tubes : A. Turing: Turing-machine (1936) Computability : A. Church: λ-calculus, Church-Turing-thesis Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 21/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 22/ Introduction 2. From Plato to Zuse 1941: First operational programmable computer: Z 3 by K. Zuse (Deutsches Museum) with floating-point-arithmetic. Plankalkül: First high-level programming language 1. Introduction 2. From Plato to Zuse : H. Aiken: develops MARK I, II, III. ENIAC: First general purpose electronic comp. Figure 1.3: Reconstruction of Zuse s Z3. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 23/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 24/718

1. Introduction 2. From Plato to Zuse 1948: First stored program computer (The Baby) Tom Kilburn (Manchester) Manchester beats Cambridge by 3 months 1. Introduction 2. From Plato to Zuse First program run on The Baby in 1948: Figure 1.

7 1. Introduction 2. From Plato to Zuse 1948: First stored program computer (The Baby) Tom Kilburn (Manchester) Manchester beats Cambridge by 3 months 1. Introduction 2. From Plato to Zuse First program run on The Baby in 1948: Figure 1.4: Reconstruction of Kilburn s baby. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 25/718 Figure 1.5: Reconstruction of first executed program on The Baby. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 26/ Introduction 2. From Plato to Zuse 1. Introduction 3. History of AI 1.3 History of AI 1952: IBM: IBM 701, first computer to yield a profit (Rochester et al.) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 27/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 28/718

8 1. Introduction 3. History of AI The year 1943: McCulloch and W. Pitts drew on three sources: 1 physiology and function of neurons in the brain, 2 propositional logic due to Russell/Whitehead, 3 Turing s theory of computation. Model of artificial, connected neurons: Any computable function can be computed by some network of neurons. All the logical connectives can be implemented by simple net-structures. 1. Introduction 3. History of AI The year 1951: Minsky and Edwards built the first computer based on neuronal networks (Princeton) The year 1952: A. Samuel develops programs for checkers that learn to play tournament-level checkers. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 29/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 30/ Introduction 3. History of AI The year 1956: Two-month workshop at Dartmouth organized by McCarthy, Minsky, Shannon and Rochester. Idea: Combine knowledge about automata theory, neural nets and the studies of intelligence (10 participants) 1. Introduction 3. History of AI Newell und Simon show a reasoning program, the Logic Theorist, able to prove most of the theorems in Chapter 2 of the Principia Mathematica (even one with a shorter proof). But the Journal of Symbolic Logic rejected a paper authored by Newell, Simon and Logical Theorist. Newell and Simon claim to have solved the venerable mind-body problem. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 31/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 32/718

9 1. Introduction 3. History of AI 1. Introduction 3. History of AI The term Artificial Intelligence is proposed as the name of the new discipline. Logic Theorist is followed by the General Problem Solver, which was designed from the start to imitate human problem-solving protocols. The year 1958: Birthyear of AI McCarthy joins MIT and develops: 1 Lisp, the dominant AI programming language 2 Time-Sharing to optimize the use of computer-time 3 Programs with Common-Sense. Advice-Taker: A hypothetical program that can be seen as the first complete AI system. Unlike others it embodies general knowledge of the world. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 33/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 34/ Introduction 3. History of AI The year 1959: H. Gelernter: Geometry Theorem Prover 1. Introduction 3. History of AI The years : McCarthy concentrates on knowledge-representation and reasoning in formal logic ( Robinson s Resolution, Green s Planner, Shakey). Minsky is more interested in getting programs to work and focusses on special worlds, the Microworlds. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 35/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 36/718

10 1. Introduction 3. History of AI SAINT is able to solve integration problems typical of first-year college calculus courses ANALOGY is able to solve geometric analogy problems that appear in IQ-tests 1. Introduction 3. History of AI Blocksworld is the most famous microworld. is to as is to: Work building on the neural networks of McCulloch and Pitts continued. Perceptrons by Rosenblatt and convergence theorem: Convergence theorem An algorithm exists that can adjust the connection strengths of a perceptron to match any input data. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 37/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 38/ Introduction 3. History of AI Summary: Great promises for the future, initial successes but miserable further results. The year 1966: All US funds for AI research are cancelled. Inacceptable mistake The spirit is willing but the flesh is weak. was translated into The vodka is good but the meat is rotten. 1. Introduction 3. History of AI The years : A dose of reality Simon 1958: In 10 years a computer will be grandmaster of chess. Simple problems are solvable due to small search-space. Serious problems remain unsolvable. Hope: Faster hardware and more memory will solve everything! NP-Completeness, S. Cook/R. Karp (1971), P NP? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 39/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 40/718

11 1. Introduction 3. History of AI The year 1973: The Lighthill report forms the basis for the decision by the British government to end support for AI research. Minsky s book Perceptrons proved limitations of some approaches with fatal consequences: Research funding for neural net research decreased to almost nothing. 1. Introduction 3. History of AI The years : Knowledge-based systems General purpose mechanisms are called weak methods, because they use weak information about the domain. For many complex problems it turns out that their performance is also weak. Idea: Use knowledge suited to making larger reasoning steps and to solving typically occurring cases on narrow areas of expertise. Example DENDRAL (1969) Leads to expert systems like MYCIN (diagnosis of blood infections). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 41/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 42/ Introduction 3. History of AI 73: PROLOG 74: Relational databases (Codd) 81-91: Fifth generation project 91: Dynamic Analysis and Replanning Tool (DART) paid back DARPA s investment in AI during the last 30 years 97: IBM s Deep Blue 98: NASA s remote agent program 1. Introduction 3. History of AI Something to laugh about: In 1902 a German poem was translated into Japanese. The Japanese version war translated into French. At last this version was translated back into German, assuming that it was a Japanese poem. The result: Stille ist im Pavillon aus Jade, Krähen fliegen stumm Zu beschneiten Kirschbäumen im Mondlicht. Ich sitze und weine. Which is the original poem? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 43/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 44/718

12 1. Introduction 4. Intelligent Agents 1.4 Intelligent Agents 1. Introduction 4. Intelligent Agents Definition 1.1 (Agent a) An agent a is anything that can be viewed as perceiving its environment through sensor and acting upon that environment through effectors. sensors environment percepts actions? agent effectors Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 45/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 46/ Introduction 4. Intelligent Agents Definition 1.2 (Rational Agent, Omniscient Agent) A rational agent is one that does the right thing (Performance measure determines how successful an agent is). A omniscient agent knows the actual outcome of his actions and can act accordingly. Attention: A rational agent is in general not omniscient! 1. Introduction 4. Intelligent Agents Question What is the right thing and what does it depend on? 1 Performance measure (as objective as possible). 2 Percept sequence (everything the agent has received so far). 3 The agent s knowledge about the environment. 4 How the agent can act. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 47/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 48/718

13 1. Introduction 4. Intelligent Agents Definition 1.3 (Ideal Rational Agent) For each possible percept-sequence an ideal rational agent should do whatever action is expected to maximize its performance measure (based on the evidence provided by the percepts and built-in knowledge). 1. Introduction 4. Intelligent Agents Mappings: set of percept sequences set of actions can be used to describe agents in a mathematical way. Hint: Internally an agent is agent = architecture + program AI is engaged in designing agent programs Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 49/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 50/ Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents Agent Type Performance Measure Medical diagnosis systemize Healthy patient, mini- costs, lawsuits Satellite image analysis system Correct image categorization Part-picking robot Percentage of parts in correct bins Refinery controller Maximize purity, yield, safety Interactive English tutor Maximize student s score on test Environment Actuators Sensors Patient, hospital, staff Display questions, Keyboard entry of tests, diagnoses, symptoms, findings, treatments, referrals patient s answers Downlink from orbiting Display categorization Color pixel arrays satellite of scene Conveyor belt with Jointed arm and hand Camera, joint angle parts; bins sensors Refinery, operators Valves, pumps, Temperature, pressure, heaters, displays chemical sensors Set of students, testing Display exercises, sug- Keyboard entry agency gestions, corrections Table 1.2: Examples of agents types and their PEAS descriptions. A simple agent program: function SKELETON-AGENT( percept) returns action static: memory, the agent s memory of the world memory UPDATE-MEMORY(memory, percept) action CHOOSE-BEST-ACTION(memory) memory UPDATE-MEMORY(memory, action) return action Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 51/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 52/718

14 1. Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents In theory everything is trivial: function TABLE-DRIVEN-AGENT( percept) returns action static: percepts, a sequence, initially empty table, a table, indexed by percept sequences, initially fully specified append percept to the end of percepts action LOOKUP( percepts, table) return action An agent example taxi driver: Agent Type Performance Measure Environment Actuators Sensors Taxi driver Safe, fast, legal, comfortable Roads, other traffic, Steering, accelerator, Cameras, sonar, trip, maximize prof- pedestrians, customers brake, signal, horn, speedometer, GPS, its display odometer, accelerometer, engine sensors, keyboard Table 1.3: PEAS description of the task environment for an automated taxi Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 53/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 54/ Introduction 4. Intelligent Agents Some examples: 1 Production rules: If the driver in front hits the breaks, then hit the breaks too. function SIMPLE-REFLEX-AGENT( percept) returns action static: rules, a set of condition-action rules state INTERPRET-INPUT( percept) rule RULE-MATCH(state, rules) action RULE-ACTION[rule] return action 1. Introduction 4. Intelligent Agents A first mathematical description At first, we want to keep everything as simple as possible. Agents and environments An agent is situated in an environment and can perform actions A := {a 1,...,a n } (set of actions) and change the state of the environment S := {s 1,s 2,...,s n } (set of states). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 55/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 56/718

15 1. Introduction 4. Intelligent Agents How does the environment (the state s) develop when an action a is executed? We describe this with a function env : S A 2 S. This includes non-deterministic environments. 1. Introduction 4. Intelligent Agents How do we describe agents? We could take a function action : S A. Agent Condition action rules Sensors What the world is like now What action I should do now Environment Effectors Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 57/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 58/ Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents Question: How can we describe an agent, now? Definition 1.4 (Purely Reactive Agent) An agent is called purely reactive, if its function is given by action : S A. This is too weak! Take the whole history (of the environment) into account: s 0 a0 s 1 a1...s n an... The same should be done for env! Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 59/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 60/718

16 1. Introduction 4. Intelligent Agents This leads to agents that take the whole sequence of states into account, i.e. action : S A. We also want to consider the actions performed by an agent. This requires the notion of a run (next slide). 1. Introduction 4. Intelligent Agents We define the run of an agent in an environment as a sequence of interleaved states and actions: Definition 1.5 (Run r, R = R act R state ) A run r over A and S is a finite sequence r : s 0 a0 s 1 a1...s n an... Such a sequence may end with a state s n or with an action a n : we denote by R act the set of runs ending with an action and by R state the set of runs ending with a state. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 61/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 62/ Introduction 4. Intelligent Agents Definition 1.6 (Environment, 2. version) An environment Env is a triple S,s 0,τ consisting of 1 the set S of states, 2 the initial state s 0 S, 3 a function τ : R act 2 S, which describes how the environment changes when an action is performed (given the whole history). 1. Introduction 4. Intelligent Agents Definition 1.7 (Agent a) An agent a is determined by a function action : R state A, describing which action the agent performs, given its current history. Important: An agent system is then a pair a = action,env consisting of an agent and an environment. We denote by R(a,Env) the set of runs of agent a in environment Env. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 63/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 64/718

17 1. Introduction 4. Intelligent Agents Definition 1.8 (Characteristic Behaviour) The characteristic behaviour of an agent a in an environment Env is the set R of all possible runs r : s 0 a0 s 1 a1...s n an... with: 1 for all n: a n = action( s 0,a 0...,a n 1,s n ), 2 for all n > 0: s n τ(s 0,a 0,s 1,a 1,...,s n 1,a n 1 ). 1. Introduction 4. Intelligent Agents Important: The formalization of the characteristic behaviour is dependent of the concrete agent type. Later we will introduce further behaviours (and corresponding agent designs). For deterministic τ, the relation can be replaced by =. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 65/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 66/ Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents Equivalence Two agents a, b are called behaviourally equivalent wrt. environment Env, if R(a,Env) = R(b,Env). Two agents a, b are called behaviourally equivalent, if they are behaviourally equivalent wrt. all possible environments Env. So far so good, but... What is the problem with all these agents and this framework in general? Problem All agents have perfect information about the environment! (Of course, it can also be seen as feature!) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 67/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 68/718

18 1. Introduction 4. Intelligent Agents We need more realistic agents! Note In general, agents only have incomplete/uncertain information about the environment! We extend our framework by perceptions: Definition 1.9 (Actions A, Percepts P, States S) A := {a 1,a 2,...,a n } is the set of actions. P := {p 1,p 2,...,p m } is the set of percepts. S := {s 1,s 2,...,s l } is the set of states 1. Introduction 4. Intelligent Agents Sensors don t need to provide perfect information! Agent Condition action rules Sensors What the world is like now What action I should do now Effectors Environment Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 69/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 70/ Introduction 4. Intelligent Agents Question: How can agent programs be designed? There are four types of agent programs: Simple reflex agents Agents that keep track of the world Goal-based agents Utility-based agents 1. Introduction 4. Intelligent Agents First try We consider a purely reactive agent and just replace states by perceptions. Definition 1.10 (Simple Reflex Agent) An agent is called simple reflex agent, if its function is given by action : P A. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 71/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 72/718

19 1. Introduction 4. Intelligent Agents function SIMPLE-REFLEX-AGENT( percept) returns action static: rules, a set of condition-action rules state INTERPRET-INPUT( percept) rule RULE-MATCH(state, rules) action RULE-ACTION[rule] return action function REFLEX-AGENT-WITH-STATE( percept) returns action static: state, a description of the current world state rules, a set of condition-action rules state UPDATE-STATE(state, percept) rule RULE-MATCH(state, rules) action RULE-ACTION[rule] state UPDATE-STATE(state, action) return action 1. Introduction 4. Intelligent Agents As before, let us now consider sequences of percepts: Definition 1.11 (Standard Agent a) A standard agent a is given by a function action : P A together with see : S P. An agent is thus a pair see, action. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 73/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 74/ Introduction 4. Intelligent Agents Definition 1.12 (Indistinguishable) Two different states s,s are indistinguishable for an agent a, if see(s) = see(s ). The relation indistinguishable on S S is an equivalence relation. What does = S mean? And what = 1? As mentioned before, the characteristic behaviour has to match with the agent design! 1. Introduction 4. Intelligent Agents Definition 1.13 (Characteristic Behaviour) The characteristic behaviour of a standard agent see,action in an environment Env is the set of all finite sequences p 0 a0 p 1 a1...p n an... where p 0 = see(s 0 ), a i = action( p 0,...,p i ), p i = see(s i ), where s i τ(s 0,a 0,s 1,a 1,...,s i 1,a i 1 ). Such a sequence, even if deterministic from the agent s viewpoint, may cover different environmental behaviours (runs): s 0 a0 s 1 a1...s n an... Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 75/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 76/718

20 1. Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents Instead of using the whole history, resp. P, one can also use internal states I := {i 1,i 2,...i n }. State Sensors Definition 1.14 (State-based Agent a state ) A state-based agent a state is given by a function action : I A together with see : S P, and next : I P I. Here next(i,p) is the successor state of i if p is observed. How the world evolves What my actions do Condition action rules Agent What the world is like now What action I should do now Effectors Environment Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 77/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 78/ Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents Definition 1.15 (Characteristic Behaviour) The characteristic behaviour of a state-based agent a state in an environment Env is the set of all finite sequences with (i 0,p 0 ) a0 (i 1,p 1 ) a1... an (i n,p n ),... 1 for all n: a n = action(i n ), 2 for all n: next(i n,p n ) = i n+1, Sequence covers the runs r : s 0 a0 s 1 a1... where a j = action(i j ), s j τ(s 0,a 0,s 1,a 1,...,s j 1,a j 1 ), p j = see(s j ) Are state-based agents more expressive than standard agents? How to measure? Definition 1.16 (Environmental Behaviour of a state ) The environmental behaviour of an agent a state is the set of possible runs covered by the characteristic behaviour of the agent. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 79/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 80/718

21 1. Introduction 4. Intelligent Agents Theorem 1.17 (Equivalence) Standard agents and state-based agents are equivalent with respect to their environmental behaviour. More precisely: For each state-based agent a state and next storage function there exists a standard agent a which has the same environmental behaviour, and vice versa. 1. Introduction 4. Intelligent Agents 3. Goal based agents: State How the world evolves What my actions do Goals Sensors What the world is like now What it will be like if I do action A What action I should do now Environment Agent Effectors Will be discussed in Planning (Chapter 7). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 81/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 82/ Introduction 4. Intelligent Agents Intention: We want to tell our agent what to do, without telling it how to do it! We need a utility measure. 1. Introduction 4. Intelligent Agents How can we define a utility function? Take for example u : S R Question: What is the problem with this definition? Better idea: u : R R Take the set of runs R and not the set of states S. How to calculate the utility if the agent has incomplete information? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 83/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 84/718

22 1. Introduction 4. Intelligent Agents 1. Introduction 4. Intelligent Agents 4. Agents with a utility measure: State How the world evolves What my actions do Utility Sensors What the world is like now What it will be like if I do action A How happy I will be in such a state What action I should do now Environment Example 1.18 (Tileworld) H T T T T T T (a) (b) (c) H H Agent Effectors The utility measure can often be simulated through a set of goals. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 85/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 86/ Introduction 4. Intelligent Agents Question: How do properties of the environment influence the design of an agent? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 87/ Introduction 4. Intelligent Agents Some important properties: Fully/partially oberservable: If the environment is not completely observable the agent will need internal states. Deterministic/stochastic: If the environment is only partially observable, then it may appear stochastic (while it is deterministic). Episodic/nonepisodic: Percept-action-sequences are independent. The agent s experience is divided into episodes. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 88/718

23 1. Introduction 4. Intelligent Agents Static/dynamic: The environment can change while an agent is deliberating. An environment is semidynamic if it does not change with the passage of time but the agent s performance measure does. Discrete/continuous: If there is a limited number of percepts and actions the environment is discrete. Single/multi agents: Is there just one agent or are there several interacting with each other. 1. Introduction 4. Intelligent Agents Task Environment Observable Deterministic Episodic Static Discrete Agents Crossword puzzle Fully Deterministic Sequential Static Discrete Single Chess with a clock Fully Strategic Sequential Semi Discrete Multi Poker Partially Strategic Sequential Static Discrete Multi Back- gammon Fully Stochastic Sequential Static Discrete Multi Taxi driving Partially Stochastic Sequential Dynamic Cont Multi Medical diagnosis Partially Stochastic Sequential Dynamic Cont Single Image-analysis Fully Deterministic Episodic Semi Cont Single Part-picking robot Partially Stochastic Episodic Dynamic Cont Single Refinery controller Partially Stochastic Sequential Dynamic Cont Single Interactive English tutor Partially Stochastic Sequential Dynamic Discrete Multi Table 1.4: Examples of task environments and their characteristics Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 89/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 90/ Searching 2. Searching Chapter 2. Searching Searching 2.1 Problem formulation 2.2 Uninformed search 2.3 Best-First search 2.4 A search 2.5 Heuristics 2.6 Limited memory 2.7 Iterative improvements 2.8 Online Search Wanted: Problem-solving agents, which form a subclass of the goal-oriented agents. Structure: Formulate, Search, Execute. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 91/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 92/718

24 2. Searching Formulate: Goal: a set of states, Problem: States, actions mapping from states into states, transitions Search: Which sequence of actions is helpful? Execute: The resulting sequence of actions is applied to the initial state. 2. Searching function SIMPLE-PROBLEM-SOLVING-AGENT( ) returns an action inputs:, a percept static:, an action sequence, initially empty 5, some description of the current world state +, a goal, initially null 6, a problem formulation if 5 UPDATE-STATE(, is empty then do + 6 FORMULATE-GOAL( ) FORMULATE-PROBLEM(, ) + SEARCH( ) 6 5 FIRST( ) 5 5 REST( ) 5 return Table 2.5: A simple problem solving agent. ) When executing, percepts are ignored. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 93/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 94/ Searching 1. Problem formulation 2.1 Problem formulation 2. Searching 1. Problem formulation We distinguish four types: 1 1-state-problems: Actions are completely described. Complete information through sensors to determine the actual state. 2 Multiple-state-problems: Actions are completely described, but the initial state is not certain. 3 Contingency-problems: Sometimes the result is not a fixed sequence, so the complete tree must be considered. ( Excercise: Murphy s law, Chapter 7. Planning) 4 Exploration-problems: Not even the effect of each action is known. You have to search in the world instead of searching in the abstract model. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 95/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 96/718

25 2. Searching 1. Problem formulation Table 2.6: The vacuum world. 2. Searching 1. Problem formulation Definition 2.1 (1-state-problem) A 1-state-problem consists of: a set of states (incl. the initial state) a set of n actions (operators), which applied to a state leads to an other state: Operator i : States States, i = 1,...,n We use a function Successor-Fn: S 2 A S. It assigns each state a set of pairs a,s : the set of possible actions and the state it leads to. a set of goal states or a goal test, which applied on a state determines if it is a goal-state or not. a cost function g, which assesses every path in the state space (set of reachable states). g is usually additive. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 97/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 98/ Searching 1. Problem formulation 2. Searching 1. Problem formulation What about multiple-state-problems? ( excercise) How to choose actions and states? ( abstraction) Definition 2.2 (State Space) The state space of a problem is the set of all reachable states (from the initial state). It forms a directed graph with the states as nodes and the arcs the actions leading from one state to another. 7 3 Start State 2 7 Table 2.7: The 8-puzzle. 6 Goal State 25 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS 07 99/718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

26 2. Searching 1. Problem formulation 2. Searching 1. Problem formulation L R L R S S L R R L R R L L S S S S L R L R S S Table 2.9: State Space for Vacuum world. Table 2.8: The 8-queens problem. Missionaries vs. cannibals Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 1. Problem formulation 2. Searching 1. Problem formulation L R L R Real-world-problems: S S L R L S R S S travelling-salesman-problem VLSI-layout labelling maps robot-navigation R L L R S R L S Table 2.10: Belief Space for Vacuum world without sensors. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

27 2. Searching 2. Uninformed search 2. Searching 2. Uninformed search 2.2 Uninformed search Oradea 71 Neamt Zerind Arad 140 Sibiu 99 Fagaras Rimnicu Vilcea Timisoara 111 Lugoj 70 Mehadia 75 Dobreta 120 Craiova Pitesti Bucharest 90 Giurgiu 87 Iasi Urziceni Vaslui Hirsova 86 Eforie Choose, test, expand RBFS Table 2.11: Map of Romania Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search Principle: Choose, test, expand. Search-tree 2. Searching 2. Uninformed search (a) The initial state Sibiu Arad Fagaras Oradea (b) After expanding Arad Rimnicu Vilcea Arad Timisoara Arad Lugoj Arad Zerind Arad Oradea 6 function TREE-SEARCH(, + % ) returns a solution, or failure 6 initialize the search tree using the initial state of loop do if there are no candidates for expansion then return failure choose a leaf node for expansion according to + % if the node contains a goal state then return the corresponding solution else expand the node and add the resulting nodes to the search tree Sibiu Timisoara Zerind Arad Fagaras Oradea Rimnicu Vilcea Arad Lugoj Arad Oradea Table 2.12: Tree Search. (c) After expanding Sibiu Arad Sibiu Timisoara Zerind Arad Fagaras Oradea Rimnicu Vilcea Arad Lugoj Arad Oradea Map of Romania Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

28 2. Searching 2. Uninformed search Important: State-space versus search-tree: The search-tree is countably infinite in contrast to the finite state-space. a node is a bookkeeping data structure with respect to the problem-instance and with respect to an algorithm; a state is a snapshot of the world. 2. Searching 2. Uninformed search Definition 2.3 (Datatype Node) The datatype node is defined by state ( S), parent (a node), action (also called operator) which generated this node, path-costs (the costs to reach the node) and depth (distance from the root). Tree-Search Important: The recursive dependency between node and parent is important. If the depth is left out then a special node root has to be introduced. Conversely the root can be defined by the depth: root is its own parent with depth 0. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search PARENT NODE ACTION = right Node DEPTH = 6 PATH COST = 6 STATE 2. Searching 2. Uninformed search Now we will try to instantiate the function Tree-SEARCH a bit. Design-decision: Queue The nodes-to-be-expanded can be described as a set. But we will use a queue instead. The fringe is the set of generated nodes that are not yet expanded. Here are a few functions operating on queues: Make-Queue(Elements) Empty?(Queue) First(Queue) Remove-First(Queue) Insert(Element,Queue) Insert-All(Elements,Queue) Figure 2.6: Illustration of a node in the 8-puzzle. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

29 2. Searching 2. Uninformed search function TREE-SEARCH( " T X G # C ) returns a solution, or failure G # C % " T # " a % " T G # C % " T INSERT(MAKE-NODE(INITIAL-STATE[ ]), ) loop do if EMPTY?( ) then return failure REMOVE-FIRST( ) if GOAL-TEST[ ] applied to STATE[ ] succeeds then return SOLUTION( ) INSERT-ALL(EXPAND(, ), ) function EXPAND( # " a X " T ) returns a set of nodes 3 " % the empty set for each d " #, 3 e in SUCCESSOR-FN[ " T ](STATE[ # " a ]) do % anewnode STATE[ ] % 3 PARENT-NODE[ ] % # " a ACTION[ ] % " # PATH-COST[ ] % PATH-COST[ # " a ]+STEP-COST( # " a, " #, ) DEPTH[ ] % DEPTH[ # " a ]+1 add to 3 " return 3 " Datatype Node 2. Searching 2. Uninformed search Question: Which are interesting requirements of search-strategies? completeness time-complexity space complexity optimality (w.r.t. path-costs) We distinguish: Uninformed vs. informed search. Graph-Search Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Table 2.13: Tree-Search Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search 2. Searching 2. Uninformed search Breadth-first-search: nodes with the smallest depth are expanded first, Make-Queue : add new nodes at the end: FIFO Complete? Yes. Optimal? Yes, if all operators are equally expensive. Constant branching-factor b: for a solution at depth d we have generated (in the worst case) b + b b d + (b d+1 b)-many nodes. Space complexity = Time Complexity A A A A B C B C B C B C D E F G D E F G D E F G D E F G Figure 2.7: Illustration of Breadth-First Search. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

30 2. Searching 2. Uninformed search Depth Nodes Time Memory millisecond 100 bytes seconds 11 kilobytes 4 11, seconds 1 megabyte minutes 111 megabytes hours 11 gigabytes days 1 terabyte years 111 terabytes years 11,111 terabytes Table 2.14: Time versus Memory. 2. Searching 2. Uninformed search Uniform-Cost-Search: Nodes n with lowest path-costs g(n) are expanded first Make-Queue : new nodes are compared to those in the queue according to their path costs and are inserted accordingly Complete? Yes, if each operator increases the path-costs by a minimum of δ > 0 (see below). Worst case space/time complexity: O(b 1+ C δ ), where C is the cost of the optimal solution and each action costs at least δ Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search If all operators have the same costs (in particular if g(n) = depth(n) holds): Uniform-cost Uniform-cost=Breadth-first. Theorem 2.4 (Optimality of uniform-cost) If δ > 0 : g(succ(n)) g(n) + δ then: Uniform-Cost is optimal. 2. Searching 2. Uninformed search Depth-first-search: Nodes of the greatest depth are expanded first, Make-Queue : LIFO,new nodes are added at the beginning If branching-factor b is constant and the maximum depth is m then: Space-complexity: b m, Time-complexity: b m. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

31 u 2. Searching 2. Uninformed search A A A B C B C B C 2. Searching 2. Uninformed search Depth-limited-search: search to depth d. D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A function DEPTH-LIMITED-SEARCH( " T, T ) returns a solution, or failure/cutoff return RECURSIVE-DLS(MAKE-NODE(INITIAL-STATE[ " T ]), " T, T ) function RECURSIVE-DLS( # " a, " T, T ) returns a solution, or failure/cutoff 3 " q " 3 a r % false if GOAL-TEST[ " T ](STATE[ # " a ]) then return SOLUTION( # " a ) else if DEPTH[ # " a ]= T then return 3 " q else for each 3 " in EXPAND( # " a, " T ) do 3 % RECURSIVE-DLS( 3 ", " T, T ) if 3 = 3 " q then 3 " q " 3 a r % true else if 3 w G 3 then return 3 if 3 " q " 3 a r then return 3 " q else return G 3 B C B C B C D E F G D E F G D E F G H I J K L M N O H I J K L M N O H I J K L M N O Table 2.16: Depth-Limited-Search. Table 2.15: Illustration of Depth-First-Search. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search 2. Searching 2. Uninformed search Iterative-deepening-search: depth increases In total: 1 + b + b b d 1 + b d + (b d+1 b) many nodes. Space-complexity: b l, Time-complexity: b l. Limit = 0 A A Limit = 1 A A B C B C Limit = 2 A A B C B C D E F G D E F G A A B C B C D E F G D E F G A B C A B C D E F G A B C D E F G A B C A B C D E F G A B C D E F G Two different sorts of failures! Limit = 3 B A D E F G H I J K L M N O C A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A A A A B C B C B C B C D E F G D E F G D E F G D E F G H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O A A A A B C B C B C B C D E F G D E F G D E F G D E F G H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O Table 2.17: Illustration of Iterative-Deepening-Search. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

32 K 2. Searching 2. Uninformed search 2. Searching 2. Uninformed search function ITERATIVE-DEEPENING-SEARCH( 6 ) returns a solution, or failure inputs: 6, a problem for =, 0 to O do! DEPTH-LIMITED-SEARCH( 6 if! M cutoff then return!, =, ) Table 2.18: Iterative-Deepening-Search. How many nodes? d b + (d 1) b b d b d -many. Compare with breadth-first search: b + b b d + (b d+1 b)-many. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search 2. Searching 2. Uninformed search Attention: Iterative-deepening-search is faster than breadth-first-search because the latter also generates nodes at depth d + 1 (even if the solution is at depth d). The amount of revisited nodes is kept within a limit. Start Goal Figure 2.8: Bidirectional search. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

33 2. Searching 2. Uninformed search Criterion Breadth- First Uniform- Cost Iterative Deepening Depth- Depth- First Limited Bidirectional Complete Yes 1 Yes 1,2 No No Yes 1 Yes 1,4 Time O(b d+1 ) O ( C ) b ε O(b m ) O(b l ) O(b d ) O ( ) b d 2 Space O(b d+1 ) O ( C ) ( d ) b ε O(bm) O(bl) O(bd) O b 2 Optimal Yes 3 Yes No No Yes 3 Yes 3,4 Table 2.19: Complexities of uninformed search procedures. 1 if b is finite; 2 if step costs ε for positive ε; 3 if step costs are all identical; 4 if both directions use breadth-first search 2. Searching 2. Uninformed search How to avoid repeated states? Can we avoid infinite trees by checking for loops? Compare number of states with number of paths in the search tree. Symbols: b branching factor d depth of the shallowest solution m maximum depth of the search tree l depth limit C cost of the optimal solution Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 2. Uninformed search Rectangular grid: How many different states are reachable within a path of length d? 2. Searching 2. Uninformed search Graph-Search= Tree-Search+ Loop-checking Tree-Search A B C D A B B A C C C C (a) (b) (c) Table 2.20: State space versus Search tree: exponential blow-up. function GRAPH-SEARCH( 6 : / + ) returns a solution, or failure = / + 6 = 6 / + 6 an empty set INSERT(MAKE-NODE(INITIAL-STATE[ ]), ) loop do if EMPTY?( ) then return failure REMOVE-FIRST( ) if GOAL-TEST[ ](STATE[ ]) then return SOLUTION( ) if STATE[ ] is not in then add STATE[ ]to INSERT-ALL(EXPAND(, ), ) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

34 2. Searching 3. Best-First search 2.3 Best-First search 2. Searching 3. Best-First search Idea: Use problem-specific knowledge to improve the search. Tree-search is precisely defined. Only freedom: Make-Queue. Let s assume we have an evaluation-function f which assigns a value f(n) to each node n. We change Make-Queue as follows the nodes with smallest f are located at the beginning of the queue thus the queue is sorted wrt. f. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 3. Best-First search function BEST-FIRST-SEARCH( problem, EVAL-FN) returns a solution sequence inputs: problem, a problem Eval-Fn, an evaluation function Queueing-Fn a function that orders nodes by EVAL-FN return GENERAL-SEARCH( problem, Queueing-Fn) Table 2.22: Best-First-Search. What about time and space complexity? 2. Searching 3. Best-First search Greedy search: here the evaluation-function is Eval_Fn(n):= expected costs of an optimal path from the state in n to a goal state. The word optimal is used with respect to the given cost-function g. This evaluation-function is also called heuristic function, written h(n). Heuristic Function We require from all heuristic functions that they assign the value 0 to goal states. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

35 2. Searching 3. Best-First search Example 2.5 (path-finding) Arad Zerind Timisoara 111 Dobreta Oradea Lugoj Mehadia 120 Sibiu 99 Craiova Fagaras 80 Rimnicu Vilcea 97 Pitesti 211 Neamt Bucharest Giurgiu 87 Iasi Urziceni Vaslui Hirsova 86 Eforie Straight line distance to Bucharest Arad 366 Bucharest 0 Craiova 160 Dobreta 242 Eforie 161 Fagaras 178 Giurgiu 77 Hirsova 151 Iasi 226 Lugoj 244 Mehadia 241 Neamt 234 Oradea 380 Pitesti 98 Rimnicu Vilcea 193 Sibiu 253 Timisoara 329 Urziceni 80 Vaslui 199 Zerind 374 h(n) might be defined as the direct-line distance between Bucharest and the city denoted by n. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 3. Best-First search (a) The initial state Arad 366 (b) After expanding Arad Arad Sibiu Timisoara Zerind (c) After expanding Sibiu Arad Sibiu Timisoara Zerind Arad Fagaras Oradea Rimnicu Vilcea (d) After expanding Fagaras Arad Sibiu Timisoara Zerind Arad Fagaras Oradea Rimnicu Vilcea Sibiu Bucharest Table 2.23: Illustration of Greedy-Search. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 3. Best-First search 2. Searching 4. A search 2.4 A search Questions: 1 Is greedy-search optimal? 2 Is greedy-search complete? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

36 2. Searching 4. A search A -search: Here the evaluation-function is the sum of an heuristic function h(n) and the real path-costs g(n): Eval_Fn(n) := h(n) + g(n) Also written f(n). So A -search is greedy + uniform-cost, because h(n z ) = 0 holds for final states n z, as f(n z ) = g(n z ). 2. Searching 4. A search On Slide 136 we required from any heuristic function that its value is 0 for goal nodes. An important generalization of this is that it never overestimates the cost to reach the goal. Definition 2.6 (Admissible heuristic function) The heuristic function h is called admissible if h(n) is always smaller or equal than the optimal costs h from n to a goal-node: h(n) h (n) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 4. A search 2. Searching 4. A search Table 2.25: Illustration of A (2). Table 2.24: Illustration of A (1). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

37 2. Searching 4. A search To show completeness of A we have to ensure: 1 Never it is the case that an infinite number of nodes is expanded in one step (locally finite). 2 There is a δ > 0 such that in each step the path costs increase by at least δ. These conditions must also hold in the following optimality results. Theorem 2.7 (Completeness of A ) A is complete (wrt. Tree Search or Graph Search), if the above two properties are satisfied. 2. Searching 4. A search f := Eval_Fn is monotone in our example. This does not hold in general. But it is an easy task to ensure f is monotone: Definition 2.8 (f monotone f mon ) We define a modified function f mon recursively and depending on the node-depth: f mon (n) := max{f mon (n ),f(n)} in which n is the parent of n. Is monotony of f needed to ensure optimality? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 4. A search 2. Searching 4. A search O Z N Theorem 2.9 (Optimality of A wrt Tree Search) A is optimal using Tree Search, if the heuristic function h is admissible. A T S R F I V L P D M C 420 G B U H E Figure 2.9: Goal-directed contours of A. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

38 2. Searching 4. A search 2. Searching 4. A search What if we use Graph Search? The proof breaks down! Definition 2.10 (Consistent heuristic function) The heuristic function h is called consistent if the following holds for every node n and successor n of n: h(n) cost(n,a,n ) + h(n ). Consistency of h implies monotony of f. Is the converse also true? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 4. A search Theorem 2.11 (Optimality of A wrt Graph Search) A is optimal using Graph Search, if the heuristic function h is consistent. Is the last theorem also true if we require monotony of f (instead of consistency of h)? 2. Searching 4. A search Question: How many nodes does A store in memory? Answer: Virtually always exponentially many with respect to the length of the solution. It can be shown: As long as the heuristic function is not extremely exact h(n) h (n) < O(logh (n)) the amount of nodes is always exponential with respect to the solution. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

39 2. Searching 4. A search 2. Searching 4. A search For almost every usable heuristic a bad error-estimation holds: h(n) h (n) O(h (n)) Important: A s problem is space not time. Important: A is even optimally efficient: No other optimal algorithm (which expands search-paths beginning with an initial node) expands less nodes than A. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 5. Heuristics 2. Searching 5. Heuristics 2.5 Heuristics Start State Goal State Table 2.26: An instance of the 8-puzzle. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

40 2. Searching 5. Heuristics 2. Searching 5. Heuristics Question: Which branching-factor? Answer: Approx. 3 (more exactly 8 3 ). Question: How many nodes have to be considered? But: There are only 9! 10 5 states! In other words: Looking at cycles can be very helpful. Answer: 3 g in which g is the amount of moves necessary to get a solution. g is approx. 22. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 5. Heuristics Question: Which heuristic functions come in handy? Hamming-distance: h 1 is the amount of numbers which are in the wrong position. I.e. h 1 (start) = 8. Manhattan-distance: Calculate for every piece the distance to the right position and sum up: h 2 := 8 i=1 (distance of i to the right position) I.e. h 2 (start) = = 15. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 5. Heuristics Question: How to determine the quality of a heuristic function? (In a single value) Definition 2.12 (Effective Branching Factor) Suppose A detects an optimal solution for an instance of a problem at depth d with N nodes expanded. Then we define b as follows N + 1 = 1 + b + (b ) (b ) d the effective branching-factor of A. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

41 2. Searching 5. Heuristics 2. Searching 5. Heuristics Search Cost Effective Branching Factor d IDS A*(h 1 ) A*(h 2 ) IDS A*(h 1 ) A*(h 2 ) Attention: b depends on h and on the special problem-instance. But for many classes of problem-instances b is quite constant Table 2.27: Comparing A (Hamming and Manhattan) with IDS. Question: Is Manhattan better than Hamming? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 5. Heuristics 2. Searching 5. Heuristics How to determine a (good) heuristics for a given problem? There is no general solution, it always depends on the problem. Often one can consider a relaxed problem, and take the precise solution of the relaxed problem as a heuristics for the original one Start State Goal State Table 2.28: A relaxed version of the 8-puzzle. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

42 2. Searching 5. Heuristics Relaxed problem: Try to bring 1 4 in the right positions, but do not care about all the others. 2. Searching 6. Limited memory 2.6 Limited memory This heuristics is better than Manhattan distance (for the 8-puzzle). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 6. Limited memory Question: A is memory-intensive. What if the memory is limited? What to do if the queue is restricted in its length? This leads to: IDA : A + iterative deepening, RBFS: Recursive Best-First Search, SMA : Simplified memory bounded A. 2. Searching 6. Limited memory IDA : A combined with iterative-deepening search. We perform depth-first-search (small memory), but use values of f instead of depth. So we consider contures: only nodes within the f -limit. Concerning those beyond the limit we only store the smallest f -value above the actual limit. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

43 2. Searching 6. Limited memory 2. Searching 6. Limited memory function IDA*( problem) returns a solution sequence inputs: problem, a problem static: f-limit, the current f - COST limit root, a node root MAKE-NODE(INITIAL-STATE[problem]) f-limit f -COST(root) loop do solution, f-limit DFS-CONTOUR(root, f-limit) if solution is non-null then return solution if f-limit = 1 then return failure; end function DFS-CONTOUR(node, f-limit) returns a solution sequence and a new f -COST limit inputs: node, a node f-limit, the current f -COST limit static: next-f, the f - COST limit for the next contour, initially 1 Theorem 2.13 (Properties of IDA ) IDA is optimal if enough memory is available to store the longest solution-path with costs f. if f - COST[node] >f-limit then return null, f - COST[node] if GOAL-TEST[problem](STATE[node]) then return node, f-limit for each node s in SUCCESSORS(node) do solution, new-f DFS-CONTOUR(s, f-limit) if solution is non-null then return solution, f-limit next-f MIN(next-f, new-f); end return null, next-f Figure 2.10: IDA. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 6. Limited memory Question: What about complexity? space: depends on the smallest operator-costs δ, the branching-factor b and the optimal costs f. b f /δ-many nodes are expanded (worst case) time: depends on the amount of values of the heuristic function. If we consider a small amount of values, the last iteration of IDA will often be like A. Consider a large amount of values. Then only one node per conture will be added: How many nodes will IDA expand if A expands n-many? What does this say about the time-complexity of A and IDA? 2. Searching 6. Limited memory RBFS: recursive depth-first version of Best-First search using only linear memory. Memorizes f -value of best alternative path. f -value of each node is replaced by the best (smallest) f -value of its children. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

44 2. Searching 6. Limited memory function RECURSIVE-BEST-FIRST-SEARCH( 6 RBFS( 6,MAKE-NODE(INITIAL-STATE[ ) returns a solution, or failure 6 ]), O ) function RBFS( 6, =, / 6 ) returns a solution, or failure and a new R -cost limit if GOAL-TEST[ 6 ]( ) then return =! EXPAND( =, 6 ) if! is empty then return /!, O for each in! do R [s] S U W X Z X [ \ ^ X [ : R ` = a [ repeat the lowest R -value node in! if R ` a b / 6 then return /!, R [ ] d the second-lowest R -value among!!, R [ ] RBFS( 6,, S f h X / 6 : d [ ) if! M K /! then return! Table 2.29: RBFS 2. Searching 6. Limited memory (a) After expanding Arad, Sibiu, and Rimnicu Vilcea 447 Sibiu Arad Fagaras Oradea Rimnicu Vilcea (b) After unwinding back to Sibiu and expanding Fagaras 447 Sibiu Arad Fagaras Oradea Rimnicu Vilcea Sibiu Bucharest (c) After switching back to Rimnicu Vilcea and expanding Pitesti 447 Sibiu 393 Arad Fagaras Oradea Arad Craiova Pitesti Sibiu Rimnicu Vilcea 417 Arad Arad Craiova Pitesti Sibiu Timisoara Zerind Timisoara Zerind 449 Timisoara Zerind Map of Romania Bucharest Craiova Rimnicu Vilcea Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Table 2.30: Illustration of RBFS. 2. Searching 6. Limited memory RBFS is optimal and complete under the same assumptions as A. IDA and RBFS suffer from using too little memory. We would like to use as much memory as possible. This leads to SMA 2. Searching 6. Limited memory SMA : is an extension of A, which only needs a limited amount of memory. If there is no space left but nodes have to be expanded, nodes will be removed from the queue: those with possibly great f -value (forgotten nodes). But their f -costs will be stored. Later those nodes will be considered if all other paths lead to higher costs. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

45 2. Searching 6. Limited memory 2. Searching 6. Limited memory A 0+12= B G 10+5=15 8+5= C D H I 20+5= = = = E F J K 30+5= = = = A A A A (15) B B G G H 18 5 A 6 A 7 A 8 A 15(15) 15 15(24) 20(24) G B G B B 24( ) 15 20( ) I C D function SMA*(problem) returns a solution sequence inputs: problem, a problem static: Queue, a queue of nodes ordered by f -cost Queue MAKE-QUEUE({MAKE-NODE(INITIAL-STATE[problem])}) loop do if Queue is empty then return failure n deepest least-f-cost node in Queue if GOAL-TEST(n) then return success s NEXT-SUCCESSOR(n) if s is not a goal and is at maximum depth then f(s) 1 else f(s) MAX(f(n), g(s)+h(s)) if all of n s successors have been generated then update n s f -cost and those of its ancestors if necessary if SUCCESSORS(n) all in memory then remove n from Queue if memory is full then delete shallowest, highest-f-cost node in Queue remove it from its parent s successor list insert its parent on Queue if necessary insert s on Queue end Figure 2.11: Illustration of SMA. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Table 2.31: SMA. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 6. Limited memory Theorem 2.14 (Properties of SMA ) SMA is complete if enough memory is available to store the shortest solution-path. SMA is optimal if there is enough memory to store the optimal solution-path. 2. Searching 7. Iterative improvements 2.7 Iterative improvements Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

46 2. Searching 7. Iterative improvements Idea: Considering certain problems only the actual state is important, but not the path leading to it. evaluation current state Of course this problem is as difficult as you like! 2. Searching 7. Iterative improvements Hill-climbing: gradient-descent (-ascent). Move in a direction at random. Compare the new evaluation with the old one. Move from the new point if the evaluation is better. Problems: local optima: (getting lost). plateaux: (wandering around). ridge: (detour). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 7. Iterative improvements 2. Searching 7. Iterative improvements function HILL-CLIMBING( problem) returns a solution state inputs: problem, a problem static: current, a node next, a node current MAKE-NODE(INITIAL-STATE[problem]) loop do next a highest-valued successor of current if VALUE[next] < VALUE[current] then return current current next end Table 2.32: Hill Climbing. Simulated annealing: modified hill-climbing: also bad moves (small evaluation-value) are allowed with the small probability of e Eval T. It depends on the temperature T ranging from to 0. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

47 k 2. Searching 7. Iterative improvements 2. Searching 7. Iterative improvements function SIMULATED-ANNEALING( problem, schedule) returns a solution state inputs: problem, a problem schedule, a mapping from time to temperature static: current, a node next, a node T, a temperature controlling the probability of downward steps current MAKE-NODE(INITIAL-STATE[problem]) for t 1 to 1 do T schedule[t] if T=0 then return current next a randomly selected successor of current ΔE VALUE[next] VALUE[current] if ΔE >0then current next else current next only with probability e ΔE/T Table 2.33: Simulated Annealing.! function GENETIC-ALGORITHM(,FITNESS-FN) returns an individual! inputs:, a set of individuals FITNESS-FN, a function that measures the fitness of an individual repeat u! empty set loop for! from 1 to SIZE( ) do! RANDOM-SELECTION(,FITNESS-FN) %! RANDOM-SELECTION(,FITNESS-FN), = REPRODUCE( k, % ) if (small random probability) then, = MUTATE(, = ) add, = to u!! u! until some individual is fit enough, or enough time has elapsed return the best individual in!, according to FITNESS-FN function REPRODUCE( k, % ) returns an individual inputs: k, %, parent individuals LENGTH( k ) random number from 1 to return APPEND(SUBSTRING( k,1, ), SUBSTRING( %, \ z, )) Table 2.34: Genetic Algorithm. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Searching 7. Iterative improvements 2. Searching 7. Iterative improvements % % % 14% = (a) Initial Population (b) Fitness Function (c) Selection (d) Crossover (e) Mutation Table 2.35: Illustration of Genetic Algorithm. Table 2.36: Crossover in the 8-queens problem. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

48 2. Searching 8. Online Search 2.8 Online Search 2. Searching 8. Online Search Up to now: offline search. What about interleaving actions and computation? Remember the exploration problem. 3 G 2 1 S Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Table 2.37: Online vs Offline search. 2. Searching 8. Online Search 2. Searching 8. Online Search The competitive ratio can be infinite (for irreversible actions). Definition 2.15 (Competitive Ratio) The competitive ratio of an online search problem is the costs of the path taken by the agent divided by the costs of the optimal path. S A G S A Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 G

49 2. Searching 8. Online Search The competitive ratio can be unbounded (even for reversible actions). 2. Searching 8. Online Search S G Table 2.40: Online DFS Agent. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Table 2.39: Unbounded Competitive Ratio. 2. Searching 8. Online Search Why not simply taking random walks? S G 2. Searching 8. Online Search Hill-Climbing + Memory=LRTA 1 (a) (b) Table 2.41: Random Walk. Because they can lead to exponentially many steps. Will a random walk eventually find the goal (completeness)? Yes, if state space is finite; Yes, for 2-dimensional grids; with probability 0.34, for 3-dimensional grids. (c) (d) (e) Table 2.42: Illustration of LRTA 4 3 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

50 2. Searching 8. Online Search 2. Searching 8. Online Search LRTA a) The algorithm got stuck in a local minimum. The best neighbour is right from the current (yellow) state. b) Therefore the expected costs of the previous state have to be updated (3 instead of 2 because the best neighbour has expected costs 2 and the costs to go there is 1). Therefore the algorithm walks back to this node (it is the best one). c) In the same way the expected costs of the previous node have to be updated (4 instead of 3). d) Similarly, the expected costs of the current node have to be updated (5 instead of 4). e) Finally, the best next node is to the right (4 is better than 5). Table 2.43: LRTA Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning Chapter 3. Supervised Learning Supervised Learning 3.1 Basics 3.2 Decision trees 3.3 Ensemble Learning 3.4 PL1 formalisations 3.5 PAC Learning 3.6 Noise and overfitting 3. Supervised Learning Learning in general Hitherto: An agent s intelligence is in his program, it is hard-wired. Now: We want a more autonomous agent, which should learn through percepts (experiences) to know its environment. Important: If the domain in which it acts can t be described completely. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

51 3. Supervised Learning 1. Basics 3. Supervised Learning 1. Basics 3.1 Basics Performance standard Critic Sensors feedback learning goals Learning element changes knowledge Performance element Environment Problem generator Agent Effectors Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 1. Basics Performance element: Observes and chooses actions. This was the whole agent till now. Critic: Observes the result of an action and assesses it with respect to an external standard. Why external? Otherwise it would set its own standard so low that it always holds! 3. Supervised Learning 1. Basics Learning element: Modifies the performance element by considering the critic (and the inner architecture of the performance element). It also creates new goals (they improve the understanding of effects of actions). Problem generator: It proposes the execution of actions to satisfy the goals of the learning element. These do not have to be the best actions (in the performance element s point of view): but they should be informative and deliver new knowledge about the world. Example: driving a taxi Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

52 3. Supervised Learning 1. Basics Question: What does learning (the design of the learning element) really depend on? 1. Which components of the performance element should be improved? 2. How are these components represented? ( Page 213 (Decision trees), Page 237 (Ensemble Learning), Page 246 (Domains formalised in PL1)) 3. Supervised Learning 1. Basics 3. Which feedback? Supervised learning: The execution of an incorrect action leads to the right solution as feedback (e.g. How intensively should the brakes be used?). Driving instructor Reinforcement learning: Only the result is perceived. Critic tells, if good or bad, but not what would have been right. Unsupervised learning: No hints about the right actions. 4. Which a-priori-information is there? (Often there is useful background knowledge) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 1. Basics Components of the performance element: 1 Mappings from the conditions of the actual state into the set of actions 2 Deriving relevant properties of the world from the percepts 3 Information about the development of the world 4 Information about the sequence of actions 5 Assessing-function of the world-states 6 Assessing-function concerning the quality of single actions in one state 7 Description of state-classes, which maximise the utility-function of an agent 3. Supervised Learning 1. Basics Important: All these components are from a mathematical point of view mappings. Learning means to represent these mappings. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

53 3. Supervised Learning 1. Basics Inductive learning: Given are pairs (x,y). Find f with f (x) = y. Example 3.1 (Continue a series of numbers) Which number is next? 3, 5, 7,? 3, 5, 17, 257,? o o o o o o o o o o o o o o o o o o o o 3. Supervised Learning 1. Basics Simple reflex agent: global examples fg function REFLEX-PERFORMANCE-ELEMENT( percept) returns an action if ( percept, a) in examples then return a else h INDUCE(examples) return h( percept) procedure REFLEX-LEARNING-ELEMENT(percept, action) inputs: percept, feedback percept action, feedback action examples examples [f( percept,action)g (a) (b) (c) (d) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 1. Basics Example 3.2 (Wason s Test; Verify and Falsify) Consider a set of cards. Each card has a letter printed on one side and a number on the other. Having taken a look at some of these cards you formulate the following hypothesis: If there is a vowel on one side then there is an even number on the other. 3. Supervised Learning 2. Decision trees 3.2 Decision trees Now there are the following cards on the table: A T 4 7. You are allowed to turn around a maximum of two card to check the hypothesis. Which card(s) do you flip? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

54 3. Supervised Learning 2. Decision trees 3. Supervised Learning 2. Decision trees Decision trees represent boolean functions. Small example: You plan to go out for dinner and arrive at a restaurant. Should you wait for a free table or should you move on? Patrons? None Some Full No Yes WaitEstimate? > No Alternate? Hungry? No Yes No Yes Reservation? Fri/Sat? Yes Alternate? No Yes No Yes No Yes Yes Bar? Yes No Yes Yes Raining? No Yes No Yes No Yes No Yes Trainings set Learned Tree Decision Tree Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 2. Decision trees decision tree = conjunction of implications (implication = path leading to a leaf) For all restaurants r: (Patrons(r, Full) Wait_estimate(r, 10 30) Hungry(r)) Will_Wait(r) Attention: This is written in first order logic but a decision tree talks only about a single object (r above). So this is really propositional logic: Patrons Full r Wait_estimate r Hungry r Will_Wait r 3. Supervised Learning 2. Decision trees Question: Can every boolean function be represented by a decision tree? Answer: Yes! Each row of the table describing the function belongs to one path in the tree. Attention Decision trees can be much smaller! But there are boolean function which can only be represented by trees with an exponential size: { 1, if n Parity function: par(x 1,...,x n ) := i=1 x i is even 0, else Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

55 3. Supervised Learning 2. Decision trees A variant of decision-trees is the... Example 3.3 (Decision List) All attributes are boolean. A decision list is TEST 1 yes no TEST 2 yes Answer 1 Answer 2 no... TEST e yes Answer e no Answer e+1 with Answer i {Yes,No} and Test i a conjunction of (possibly negated) attributes ( Exercise: Compare decision trees and decision lists). k-dl(n) is the set of boolean functions with n attributes, which can be represented by decision lists with a maximum of k checks in each test. 3. Supervised Learning 2. Decision trees PAC-Learning No Patrons(x,Some) Yes Yes Patrons(x,Full) Fri/Sat(x) > Yes Yes Obviously: n-dl(n) = set of all boolean functions card(n-dl(n)) = 2 2n. No No Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 2. Decision trees Question: Table of examples How should decision trees be learned? Example Attributes Goal Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 Yes No No Yes Some $$$ No Yes French 0 10 Yes X 2 Yes No No Yes Full $ No No Thai No X 3 No Yes No No Some $ No No Burger 0 10 Yes X 4 Yes No Yes Yes Full $ No No Thai Yes X 5 Yes No Yes No Full $$$ No Yes French >60 No X 6 No Yes No Yes Some $$ Yes Yes Italian 0 10 Yes X 7 No Yes No No None $ Yes No Burger 0 10 No X 8 No No No Yes Some $$ Yes Yes Thai 0 10 Yes X 9 No Yes Yes No Full $ Yes No Burger >60 No X 10 Yes Yes Yes Yes Full $$$ No Yes Italian No X 11 No No No No None $ No No Thai 0 10 No X 12 Yes Yes Yes Yes Full $ No No Burger Yes 3. Supervised Learning 2. Decision trees The set of examples-to-be-learned is called training set. Examples can be evaluated positively (attribute holds) or negatively (attribute does not hold). Trivial solution of learning The paths in the tree are exactly the examples. Disadvantage: New cases can not be considered. Decision Tree Learned Tree Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

56 3. Supervised Learning 2. Decision trees Idea: Choose the simplest tree (or rather the most general) which is compatible with all examples. Ockham s razor: Entia non sunt multiplicanda praeter necessitatem. 3. Supervised Learning 2. Decision trees Example 3.4 (Guess!) A computer program encodes triples of numbers with respect to a certain rule. Find out that rule. You enter triples (x 1,x 2,x 3 ) of your choice (x i N) and get as answers yes or no. Simplification: At the beginning the program tells you that these triples are in the set: (4,6,8), (6,8,12), (20,22,40) Your task: Make more enquiries (approx. 10) and try to find out the rule. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 2. Decision trees The idea behind the learning algorithm Goal: A tree which is as small as possible. First test the most important attributes (in order to get a quick classification). This will be formalised later, using information theory. Then proceed recursively, i.e. with decreasing amounts of examples and attributes. 3. Supervised Learning 2. Decision trees We distinguish the following cases: 1 There are positive and negative examples. Choose the best attribute. 2 There are only positive or only negative examples. Done a solution has been found. 3 There are no more examples. Then a default value has to be chosen, e.g. the majority of examples of the parent node. 4 There are positive and negative examples, but no more attributes. Then the basic set of attributes does not suffice, a decision can not be made. Not enough information is given. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

57 3. Supervised Learning 2. Decision trees 3. Supervised Learning 2. Decision trees function DECISION-TREE-LEARNING(examples, attributes, default) returns a decision tree inputs: examples, set of examples attributes, set of attributes default, default value for the goal predicate if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return MAJORITY-VALUE(examples) else best CHOOSE-ATTRIBUTE(attributes, examples) tree a new decision tree with root test best for each value v i of best do examples i felements of examples with best = v ig subtree DECISION-TREE-LEARNING(examples i, attributes ; best, MAJORITY-VALUE(examples)) add a branch to tree with label v i and subtree subtree end return tree Patrons? None Some Full No Yes Hungry? No Yes No Type? French Italian Thai Burger Yes No Fri/Sat? Yes No Yes No Yes Decision Tree Trainings set Learned Tree Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 2. Decision trees The algorithm computes a tree which is as small as possible and consistent with the given examples. Question: How good is the generated tree? How different is it from the actual tree? Is there an a-priory-estimation? ( PAC learning). 3. Supervised Learning 2. Decision trees Empiric approach: 1 Chose a set of examples M Ex. 2 Divide into two sets: M Ex = M Trai M Test. 3 Apply the learning algorithm on M Trai and get a hypothesis H. 4 Calculate the amount of correctly classified elements of M Test. 5 Repeat for many M Trai M Test with randomly generated M Trai. Attention: peeking! Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

58 3. Supervised Learning 2. Decision trees 3. Supervised Learning 2. Decision trees Proportion correct on test set Information theory Question: How to choose the best attribute? The best attribute is the one that delivers the highest amount of information. Example: flipping a coin Training set size Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 2. Decision trees Definition 3.5 (1 bit, information) 1 bit is the information contained in the outcome of flipping a (fair) coin. More generally: assume there is an experiment with n possible outcomes v 1,...,v n. Each outcome v i will result with a probability of P(v i ). The information encoded in this result (the outcome of the experiment) is defined as follows: I(P(v 1 ),...,P(v n )) := n P(v i )log 2 P(v i ) i=1 3. Supervised Learning 2. Decision trees Assume the coin is manipulated. With a probability of 90% head will come out. Then I(0.1,0.9) = Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

59 3. Supervised Learning 2. Decision trees 3. Supervised Learning 2. Decision trees Question: For each attribute A: If this attribute is evaluated with respect to the actual training-set, how much information will be gained this way? The best attribute is the one with the highest gain of information! Definition 3.6 (Gain of Information) We gain the following information by testing the attribute A: p Gain(A) = I( p + n, n p + n ) Missing_Inf(A) with Missing_Inf(A) = ν p i + n i i=1 p + n I( p i p i + n i, n i p i + n i ) Choose_Attribute chooses the A with maximal Gain(A). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 2. Decision trees 3. Supervised Learning 3. Ensemble Learning Type? Patrons? 3.3 Ensemble Learning French Italian Thai Burger None Some Full No Yes Hungry? No Yes 4 12 (a) (b) The figure implies Gain(Patron) Calculate Gain(Type), Gain(Hungry) (Hungry as the first attribute), Gain(Hungry) (with predecessor Patron). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

60 3. Supervised Learning 3. Ensemble Learning So far: A single hypothesis is used to make predictions. Idea: Let s take a whole bunch of em (an ensemble). Motivation: Among several hypotheses, use majority voting. The misclassified ones get higher weights! 3. Supervised Learning 3. Ensemble Learning Consider three simple hypotheses. No one is perfect. But all together, a new hypothesis is created (which is not constructible by the original method) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 3. Ensemble Learning h 1 h 2 h 3 h 4 h 3. Supervised Learning 3. Ensemble Learning Weighted Training Set: Each example gets a weight w j 0. Initialisation: All weights are set to 1 n. Boosting: Misclassified examples are getting higher weights. Iterate: We get new hypotheses h i. After we got a certain number M of them we feed them into the Boosting-Algorithm: It creates a weighted ensemble hypothesis. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

61 3. Supervised Learning 3. Ensemble Learning 3. Supervised Learning 3. Ensemble Learning Theorem 3.7 (Effect of boosting) Suppose the Learning algorithm has the following property: it always returns a hypothesis with weighted error that is slightly better than random guessing. Then AdaBOOST will return a hypothesis classifying the training data perfectly for large enough M. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 3. Ensemble Learning Boosted decision stumps 0.6 Decision stump Proportion correct on test set Training set size 3. Supervised Learning 3. Ensemble Learning 1 Training/test accuracy Training error Test error Number of hypotheses M Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

62 3. Supervised Learning 4. PL1 formalisations 3.4 PL1 formalisations 3. Supervised Learning 4. PL1 formalisations Goal: A more general framework. Idea: To learn means to search in the hypotheses space ( planning). Goal-predicate: Q(x), one-dimensional (hitherto: Will_Wait) We seek a definition of Q(x), i.e. a formula C(x) with x (Q(x) C(x)) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 4. PL1 formalisations 3. Supervised Learning 4. PL1 formalisations Each example X i represents a set of conditions under which Q(X i ) holds or not. We look for an explanation: a formula C(x) which uses all predicates of the examples. r Will_Wait(r) Patrons(r, Some) (Patrons(r, Full) Hungry(r) Type(r, French) (Patrons(r, Full) Hungry(r) Type(r, T hai) Fri_Sat(r)) (Patrons(r, Full) Hungry(r) Type(r, Burger) Definition 3.8 (Hypothesis, Candidate Function) A formula C i (x) with x (Q(x) C i (x)) is called candidate function. The whole formula is called hypothesis: H i : x (Q(x) C i (x)) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

63 3. Supervised Learning 4. PL1 formalisations Definition 3.9 (Hypotheses-Space H ) The hypotheses-space H of a learning-algorithm is the set of those hypotheses the algorithm can create. The extension of a hypothesis H with respect to the goal-predicate Q is the set of examples for which H holds. Attention: The combination of hypotheses with different extensions leads to inconsistency. Hypotheses with the same extension are logically equivalent. 3. Supervised Learning 4. PL1 formalisations The situation in general: We have a set of examples {X 1,...,X n }. We describe each example X through a clause and the declaration Q(X) (holds) or Q(X) (does not hold). Example Attributes Goal Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 Yes No No Yes Some $$$ No Yes French 0 10 Yes X 2 Yes No No Yes Full $ No No Thai No X 3 No Yes No No Some $ No No Burger 0 10 Yes X 4 Yes No Yes Yes Full $ No No Thai Yes X 5 Yes No Yes No Full $$$ No Yes French >60 No X 6 No Yes No Yes Some $$ Yes Yes Italian 0 10 Yes X 7 No Yes No No None $ Yes No Burger 0 10 No X 8 No No No Yes Some $$ Yes Yes Thai 0 10 Yes X 9 No Yes Yes No Full $ Yes No Burger >60 No X 10 Yes Yes Yes Yes Full $$$ No Yes Italian No X 11 No No No No None $ No No Thai 0 10 No X 12 Yes Yes Yes Yes Full $ No No Burger Yes Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 4. PL1 formalisations X 1 ist defined through Alt(X 1 ) Bar(X 1 ) Fri_Sat(X 1 ) Hungry(X 1 )... und Will_Wait(X 1 ). Note that H is the set of hypotheses as defined in Definition 3.8. While it corresponds to decision trees, it is not the same. The training set is the set of all those clauses. 3. Supervised Learning 4. PL1 formalisations We search for a hypothesis which is consistent with the training set. Question: Under which conditions is a hypothesis H inconsistent with an example X? false negative: Hypothesis says no ( Q(X)) but Q(X) does holds. false positive: Hypothesis says yes (Q(X)) but Q(X) does hold. Attention: Inductive learning in logic-based domains means to restrict the set of possible hypotheses with every example. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

64 3. Supervised Learning 4. PL1 formalisations For first order logic: H is in general infinite, thus automatic theorem proving is much too general. 1st approach: We keep one hypothesis and modify it if the examples are inconsistent with it. 2nd approach: We keep the whole subspace that is still consistent with the examples (version space). This is effectively represented by two sets (analogical to the representation of a range of real numbers by [a,b]). 3. Supervised Learning 4. PL1 formalisations 1st approach: current-best-hypothesis search. Begin with a simple hypothesis H. If a new example is consistent with H: okay. If it is false negative: generalise H. If it is false positive: specialise H (a) (b) (c) (d) (e) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 4. PL1 formalisations This leads to an algorithm: function CURRENT-BEST-LEARNING(examples) returns a hypothesis H any hypothesis consistent with the first example in examples for each remaining example in examples do if e is false positive for H then H choose a specialization of H consistent with examples else if e is false negative for H then H choose a generalization of H consistent with examples if no consistent specialization/generalization can be found then fail end return H 3. Supervised Learning 4. PL1 formalisations Question: How to generalize/specialize? H 1 : H 2 : x (Q(x) C 1 (x)) x (Q(x) C 2 (x)) H 1 generalises H 2, if x (C 2 (x) C 1 (x)), H 1 specialises H 2, if x (C 1 (x) C 2 (x)). generalisation means: leave out -elements in a conjunction, add -elements to a disjunction. specialisation means: add -elements to a conjunction, leave out -elements in a disjunction. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

65 3. Supervised Learning 4. PL1 formalisations 2nd approach: version-space. function VERSION-SPACE-LEARNING(examples) returns a version space local variables: V, the version space: the set of all hypotheses V the set of all hypotheses for each example e in examples do if V is not empty then V VERSION-SPACE-UPDATE(V, e) end return V function VERSION-SPACE-UPDATE(V, e) returns an updated version space V fh 2 V : h is consistent with eg 3. Supervised Learning 4. PL1 formalisations Problem: H is a big disjunction H 1... H n. How to represent this? Reminder: How is the set of real numbers between 0 and 1 represented? Through the range [0,1]. To solve our problem: There is a partial order on H (generalize/specialize). The borders are defined through G set: is consistent with all previous examples and there is no more general hypothesis. S set: is consistent with all previous examples and there is no more special hypothesis. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 4. PL1 formalisations On considering new examples the G- and S-sets must be appropriately modified in VERSION-SPACE-UPDATE. This region all inconsistent 3. Supervised Learning 4. PL1 formalisations Question: What can happen? G 1 G 2 G 3... G m S 1 S 2... S n More general More specific G S G 2 1 only one hypothesis remains: Hooray! 2 the space collapses: G = /0 or S = /0. 3 no examples left but several hypotheses: i.e. ending with a big disjunction. this region all inconsistent Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

66 3. Supervised Learning 5. PAC Learning 3.5 PAC Learning 3. Supervised Learning 5. PAC Learning Question: What is the distance between the hypothesis H calculated by the learning algorithm and the real function f? computational learning theory: PAC-learning Probably Approximately Correct. Idea: If a hypothesis is consistent with a big training set then it cannot be that wrong. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 5. PAC Learning Question: How are the training set and test set related? We assume: The elements of the training and test set are taken from the set of all examples with the same probability. 3. Supervised Learning 5. PAC Learning Definition 3.10 (error(h)) Let h H be a hypothesis and f the target (i.e. to-be-learned) function. We are interested in the set Diff (f,h) := {x : h(x) f(x)}. We denote with error(h) the probability of a randomly selected example being in Diff (f, h). With ε > 0 the hypothesis h is called ε approximatively correct, if error(h) ε holds. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

67 3. Supervised Learning 5. PAC Learning Question: ε > 0 is given. How many examples must the training set contain to make sure that the hypothesis created by a learning algorithm is ε approximatively correct? Question is wrongly stated! 3. Supervised Learning 5. PAC Learning Different sets of examples lead to different propositions and with it to different values of ε. It depends not only on how many but also which examples are chosen. For this reason we reformulate our question more carefully. Question: More carefully and precisely stated. Let ε > 0 and δ > 0 be given. How many examples must the training-set contain to make sure that the hypothesis computed by a learning-algorithm is ε approximatively correct with a probability of at least 1 δ? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 5. PAC Learning 3. Supervised Learning 5. PAC Learning We want to abstract from special learning-algorithms and make a statement about all possible learning algorithms. So we assume that every learning-algorithm calculates a hypothesis that is consistent with all previous examples. Definition 3.11 (Example Complexity) Let δ > 0 and ε > 0 be given. The example complexity is the amount m of examples an arbitrary learning algorithm needs so that the created hypothesis h is ε approximatively correct with the probability 1 δ. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

68 3. Supervised Learning 5. PAC Learning 3. Supervised Learning 5. PAC Learning H Theorem 3.12 (Example Complexity) The example-complexity m depends on ε, δ and the hypotheses-space H as follows: m 1 ε (ln1 + ln H ) δ H bad Question: What does the last result mean? f Answer: The complexity depends on log( H ). What does this mean for boolean functions with n arguments? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 5. PAC Learning Answer: We have log( H ) = 2 n. Thus one needs exponentially-many examples even if one is satisfied with ε approximative correctness under a certain probability! 3. Supervised Learning 5. PAC Learning Proof (of the theorem): h b H with error(h b ) > ε. What is the probability P b of h b being consistent with the chosen examples? For one single example: it is (1 ε) because of the definition of error. Therefore: P b (1 ε) m What is the probability P that there is a hypothesis h b with error(h b ) > ε and consistent with m examples at all? P H bad (1 ε) m H (1 ε) m Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

69 3. Supervised Learning 5. PAC Learning 3. Supervised Learning 5. PAC Learning Proof (continuation): We want P δ: H (1 ε) m δ After some transformations: (1 ε) m H δ m ln(1 ε) ln(δ) ln( H ) m 1 ln(1 ε) (ln( 1 δ ) + ln( H )) m 1 ε (ln( 1 δ ) + ln( H )) The last line holds because of ln(1 ε) < ε. Essential result: Learning is never better than looking up in a table! Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 5. PAC Learning 1st way out: We ask for a more specialised hypothesis instead of one that is just consistent (complexity gets worse). 2nd way out: We give up on learning arbitrary boolean functions and concentrate on appropriate subclasses. 3. Supervised Learning 5. PAC Learning We consider Decision lists. Theorem 3.13 (Decision Lists can be learned) Learning functions in k-dl(n) (decision lists with a maximum of k tests) has a PAC-complexity of m = 1 ε (ln(1 δ ) + O(nk log 2 (n k ))). Decision Lists Each algorithm which returns a consistent decision list for a set of examples can be turned into a PAC-learning-algorithm, which learns a k-dl(n) function after a maximum of m examples. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

70 3. Supervised Learning 5. PAC Learning 3. Supervised Learning 5. PAC Learning Important estimates (exercises): Conj(n,k) = k ( 2n i=0 i ) = O(n k ), k-dl(n) 3 Conj(n,k) Conj(n,k)!, k-dl(n) 2 O(nk log 2 (n k )). function DECISION-LIST-LEARNING(examples) returns a decision list, No or failure if examples is empty then return the value No t a test that matches a nonempty subset examples t of examples such that the members of examples t are all positive or all negative if there is no such t then return failure if the examples in examples t are positive then o Yes else o No return a decision list with initial test t and outcome o and remaining elements given by DECISION-LIST-LEARNING(examples ; examples t ) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 5. PAC Learning 3. Supervised Learning 6. Noise and overfitting Proportion correct on test set Decision tree Decision list Training set size 3.6 Noise and overfitting Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

71 3. Supervised Learning 6. Noise and overfitting Noise: examples are inconsistent (Q(x) together with Q(x)), no attributes left to classify more examples, makes sense if the environment is stochastic. 3. Supervised Learning 6. Noise and overfitting Overfitting: Dual to noise. remaining examples can be classified using attributes which establish a pattern, which is not existent (irrelevant attributes). Example 3.14 (Tossing Dice) Several coloured dice are tossed. Every toss is described via (day, month, time, colour). As long as there is no inconsistency every toss is described by exactly one (totally overfitted) hypothesis. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Supervised Learning 6. Noise and overfitting Other examples: the pyramids, astrology, Mein magisches Fahrrad. 4. Knowledge Engineering (1) Chapter 4. Knowledge Engineering (1) Knowledge Engineering (1) 4.1 Sentential Logic 4.2 Sudoku 4.3 Calculi for SL 4.4 Wumpus in SL 4.5 A Puzzle 4.6 P=NP Problem Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

72 4. Knowledge Engineering (1) 1. Sentential Logic 4.1 Sentential Logic 4. Knowledge Engineering (1) 1. Sentential Logic Definition 4.1 (Sentential Logic L SL, Language L L SL ) The language L SL of propositional (or sentential) logic consists of and : the constants falsum and verum, p,q,r,x 1,x 2,...x n,...: a countable set AT of SL-constants,,,, : the sentential connectives ( is unary, all others are binary operators), (, ): the parentheses to help readability. In most cases we consider only a finite set of SL-constants. They define a language L L SL. The set of L-formulae Fml L is defined inductively. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 1. Sentential Logic Definition 4.2 (Semantics, Valuation, Model) A valuation v for a language L L SL is a mapping from the set of SL-constants defined by L into the set {true, false} with v( ) = false, v( ) = true. Each valuation v can be uniquely extended to a function v : Fml L {true,false} so that: { true, if v(p) = false, v( p) = false, if v(p) = true. { true, if v(ϕ) = true and v(γ) = true, v(ϕ γ) = false, else { true, if v(ϕ) = true or v(γ) = true, v(ϕ γ) = false, else 4. Knowledge Engineering (1) 1. Sentential Logic Definition (continued) v(ϕ γ) = { true, if v(ϕ) = false or ( v(ϕ) = true and v(γ) = true), false, else Thus each valuation v uniquely defines a v. We call v L-structure. A structure determines for each formula if it is true or false. If a formula φ is true in structure v we also say A v is a model of φ. From now on we will speak of models, structures and valuations synonymously. Semantics The process of mapping a set of L-formulae into {true,false} is called semantics. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

73 4. Knowledge Engineering (1) 1. Sentential Logic Definition 4.3 (Validity of a Formula, Tautology) 1 A formula ϕ Fml L holds under the valuation v if v(ϕ) = true. We also write v = ϕ or simply v = ϕ. v is a model of ϕ. 2 A theory is a set of formulae: T Fml L. v satisfies T if v(ϕ) = true for all ϕ T. We write v = T. 3 A L-formula ϕ is called L-tautology if for all possible valuations v in L v = ϕ holds. 4. Knowledge Engineering (1) 1. Sentential Logic Definition 4.4 (Consequence Set Cn(T )) A formula ϕ follows from T if for all models v of T (i.e. v = T ) also v = ϕ holds. We write: T = ϕ. We call Cn L (T ) = def {ϕ Fml L : T = ϕ}, or simply Cn(T ), the semantic consequence operator. From now on we suppress the language L, because it is obvious from context. Nevertheless it needs to be carefully defined. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 1. Sentential Logic Lemma 4.5 (Properties of Cn(T )) The semantic consequence operator has the following properties: 1 T -expansion: T Cn(T ), 2 Monotony: T T Cn(T ) Cn(T ), 3 Closure: Cn(Cn(T )) = Cn(T ). Lemma 4.6 (ϕ Cn(T)) ϕ Cn(T ) if and only if there is a model v with v = T and v(ϕ) = false. 4. Knowledge Engineering (1) 1. Sentential Logic Definition 4.7 (MOD(T), Cn(U)) If T Fml L then we denote with MOD(T ) the set of all L-structures A which are models of T : MOD(T ) = def {A : A = T }. If U is a set of models, we consider all those sentences, which are valid in all models of U. We call this set Cn(U): Cn(U) = def {ϕ Fml L : v U : v(ϕ) = true}. MOD is obviously dual to Cn: Cn(MOD(T )) = Cn(T ), MOD(Cn(T )) = MOD(T ). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

74 4. Knowledge Engineering (1) 1. Sentential Logic 4. Knowledge Engineering (1) 1. Sentential Logic Definition 4.8 (Completeness of a Theory T ) T is called complete if for each formula ϕ Fml: T = ϕ or T = ϕ holds. Attention: Do not mix up this last condition with the property of a valuation (model) v: each model is complete in the above sense. Definition 4.9 (Consistency of a Theory) T is called consistent if there is a valuation (model) v with v(ϕ) = true for all ϕ T. Lemma 4.10 (Ex Falso Quodlibet) T is consistent if and only if Cn(T ) Fml L. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku 4.2 Sudoku 4. Knowledge Engineering (1) 2. Sudoku Since some time, Sudoku puzzles are becoming quite famous. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Table 4.44: A simple Sudoku (S 1 ) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

75 4. Knowledge Engineering (1) 2. Sudoku Can they be solved with sentential logic? Idea: Given a Sudoku-Puzzle S, construct a language L Sudoku and a theory T S Fml LSudoku such that MOD(T S ) = Solutions of the puzzle S Solution In fact, we construct a theory T Sudoku and for each (partial) instance of a 9 9 puzzle S a particular theory T S such that MOD(T Sudoku T S ) = {S : S is a solution of S} 4. Knowledge Engineering (1) 2. Sudoku We introduce the following language L Sudoku : 1 eins i, j, 1 i, j 9, 2 zwei i, j, 1 i, j 9, 3 drei i, j, 1 i, j 9, 4 vier i, j, 1 i, j 9, 5 fuenf i, j, 1 i, j 9, 6 sechs i, j, 1 i, j 9, 7 sieben i, j, 1 i, j 9, 8 acht i, j, 1 i, j 9, 9 neun i, j, 1 i, j 9. This completes the language, the syntax. How many symbols are these? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku We distinguished between the puzzle S and a solution S of it. What is a model (or valuation) in the sense of Definition 4.2? 4. Knowledge Engineering (1) 2. Sudoku We have to give our symbols a meaning: the semantics! eins i, j means i, j contains a 1 zwei i, j means i, j contains a 2. neun i, j means i, j contains a 9 To be precise: given a 9 9 square that is completely filled out, we define our valuation v as follows (for all 1 i, j 9). { true, if 1 is at position (i, j), v(eins i, j ) = false, else. Table 4.45: How to construct a model S? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

76 4. Knowledge Engineering (1) 2. Sudoku { true, if 2 is at position (i, j), v(zwei i, j ) = false, else. { true, if 3 is at position (i, j), v(drei i, j ) = false, else. { true, if 4 is at position (i, j), v(vier i, j ) = false, else. etc. { true, if 9 is at position (i, j), v(neun i, j ) = false, else. Therefore any 9 9 square can be seen as a model or valuation with respect to the language L Sudoku. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku How does T S look like? T S = { eins 1,4,eins 5,8,eins 6,6, zwei 2,2,zwei 4,8, drei 6,8,drei 8,3,drei 9,4, vier 1,7,vier 2,5,vier 3,1,vier 4,3,vier 8,2,vier 9,8, }. neun 3,4,neun 5,2,neun 6,9, Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku How should the theory T Sudoku look like (s.t. models of T Sudoku T S correspond to solutions of the puzzle)? First square: T 1 1 eins 1,1... eins 3,3 2 zwei 1,1... zwei 3,3 3 drei 1,1... drei 3,3 4 vier 1,1... vier 3,3 5 fuenf 1,1... fuenf 3,3 6 sechs 1,1... sechs 3,3 7 sieben 1,1... sieben 3,3 8 acht 1,1... acht 3,3 9 neun 1,1... neun 3,3 4. Knowledge Engineering (1) 2. Sudoku The formulae on the last slide are saying, that 1 The number 1 must appear somewhere in the first square. 2 The number 2 must appear somewhere in the first square. 3 The number 3 must appear somewhere in the first square. 4 etc Does that mean, that each number 1,..., 9 occurs exactly once in the first square? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

77 4. Knowledge Engineering (1) 2. Sudoku No! We have to say, that each number occurs only once: T 1 : 1 (eins i, j zwei i, j ), 1 i, j 3, 2 (eins i, j drei i, j ), 1 i, j 3, 3 (eins i, j vier i, j ), 1 i, j 3, 4 etc 5 (zwei i, j drei i, j ), 1 i, j 3, 6 (zwei i, j vier i, j ), 1 i, j 3, 7 (zwei i, j fuenf i, j ), 1 i, j 3, 8 etc How many formulae are these? 4. Knowledge Engineering (1) 2. Sudoku Second square: T 2 1 eins 1,4... eins 3,6 2 zwei 1,4... zwei 3,6 3 drei 1,4... drei 3,6 4 vier 1,4... vier 3,6 5 fuenf 1,4... fuenf 3,6 6 sechs 1,4... sechs 3,6 7 sieben 1,4... sieben 3,6 8 acht 1,4... acht 3,6 9 neun 1,4... neun 3,6 And all the other formulae from the previous slides (adapted to this case): T 2 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku 4. Knowledge Engineering (1) 2. Sudoku What is still missing: The same has to be done for all 9 squares. Rows: Each row should contain exactly the numbers from 1 to 9 (no number twice). Columns: Each column should contain exactly the numbers from 1 to 9 (no number twice). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

78 4. Knowledge Engineering (1) 2. Sudoku First Row: T Row 1 1 eins 1,1 eins 1,2... eins 1,9 2 zwei 1,1 zwei 1,2... zwei 1,9 3 drei 1,1 drei 1,2... drei 1,9 4 vier 1,1 vier 1,2... vier 1,9 5 fuenf 1,1 fuenf 1,2... fuenf 1,9 6 sechs 1,1 sechs 1,2... sechs 1,9 7 sieben 1,1 sieben 1,2... sieben 1,9 8 acht 1,1 acht 1,2... acht 1,9 9 neun 1,1 neun 1,2... neun 1,9 4. Knowledge Engineering (1) 2. Sudoku Analogously for all other rows, eg. Ninth Row: T Row 9 1 eins 9,1 eins 9,2... eins 9,9 2 zwei 9,1 zwei 9,2... zwei 9,9 3 drei 9,1 drei 9,2... drei 9,9 4 vier 9,1 vier 9,2... vier 9,9 5 fuenf 9,1 fuenf 9,2... fuenf 9,9 6 sechs 9,1 sechs 9,2... sechs 9,9 7 sieben 9,1 sieben 9,2... sieben 9,9 8 acht 9,1 acht 9,2... acht 9,9 9 neun 9,1 neun 9,2... neun 9,9 Is that sufficient? What if a row contains several 1's? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku First Column: T Column 1 1 eins 1,1 eins 2,1... eins 9,1 2 zwei 1,1 zwei 2,1... zwei 9,1 3 drei 1,1 drei 2,1... drei 9,1 4 vier 1,1 vier 2,1... vier 9,1 5 fuenf 1,1 fuenf 2,1... fuenf 9,1 6 sechs 1,1 sechs 2,1... sechs 9,1 7 sieben 1,1 sieben 2,1... sieben 9,1 8 acht 1,1 acht 2,1... acht 9,1 9 neun 1,1 neun 2,1... neun 9,1 Analogously for all other columns. Is that sufficient? What if a column contains several 1's? 4. Knowledge Engineering (1) 2. Sudoku All put together: T Sudoku = T 1 T 1... T 9 T 9 T Row 1... T Row 9 T Column 1... T Column 9 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

79 4. Knowledge Engineering (1) 2. Sudoku Here is a more difficult one. Table 4.46: A difficult Sudoku S difficult 4. Knowledge Engineering (1) 2. Sudoku The above formulation is strictly formulated in propositional logic. Theorem provers, even if they consider only propositional theories, often use predicates, variables etc. smodels uses a predicate logic formulation, including variables. But as there are no function symbols, such an input can be seen as a compact representation. It allows to use a few rules as a shorthand for thousands of rules using propositional constants. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 2. Sudoku For example smodels uses the following constructs: 1 row(0..8) is a shorthand for row(0), row(1),..., row(8). 2 val(1..9) is a shorthand for val(1), val(2),..., val(9). 3 The constants 1,..., 9 will be treated as numbers (so there are operations available to add, subtract or divide them). 4. Knowledge Engineering (1) 2. Sudoku 1 Statements in smodels are written as head :- body This corresponds to an implication body head. 2 An important feature in smodels is that all atoms that do not occur in any head, are automatically false. For example the theory p(x, Y, 5) :- row(x), col(y) means that the whole grid is filled with 5 s and only with 5's: eg. p(x,y,1) is true for all X,Y, as well as p(x,y,2) etc. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

80 4. Knowledge Engineering (1) 2. Sudoku More constructs in smodels 1 1 { p(x,y,a) : val(a) } 1 :- row(x), col(y) this makes sure that in all entries of the grid, exactly one number (val()) is contained. 2 1 { p(x,y,a) : row(x) : col(y) : eq(div(x,3), div(r,3)) : eq(div(y,3), div(c,3) } 1 :- val(a), row(r), col(c) this rule ensures that in each of the 9 squares each number from 1 to 9 occurs only once. 3 More detailed info is on the web-page of this lecture. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL 4.3 Calculi for SL Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL A general notion of a certain sort of calculi. Definition 4.11 (Hilbert-Type Calculi) A Hilbert-Type calculus over a language L is a pair Ax,Inf where Ax: is a subset of Fml L, the set of well-formed formulae in L: they are called axioms, Inf: is a set of pairs written in the form φ 1,φ 2,...,φ n ψ where φ 1,φ 2,...,φ n,ψ are L-formulae: they are called inference rules. Intuitively, one can assume all axioms as true formulae (tautologies) and then use the inference rules to derive even more new formulae. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL We now define a particular instance of our general notion. Definition 4.12 (Calculus for Sentential Logic SL) We define Hilbert SL L = AxSL L,{MP}, the Hilbert-Type calculus, as follows. The underlying language is L L SL with the wellformed formulae Fml L as defined in Definition 4.1. Axioms in SL (Ax SL L ) are the following formulae: 1 φ, φ,,, 2 (φ ψ) φ, (φ ψ) ψ, 3 φ (φ ψ), ψ (φ ψ), 4 φ φ, (φ ψ) ((φ ψ) φ), 5 φ (ψ φ), φ (ψ (φ ψ)). φ,ψ stand for arbitrarily complex formulae (not just constants). They represent schemata, rather than formulae in the language. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

81 4. Knowledge Engineering (1) 3. Calculi for SL Definition (continued) The only inference rule in SL is modus ponens: MP : Fml Fml Fml : (ϕ,ϕ ψ) ψ. or short (MP) ϕ, ϕ ψ. ψ (ϕ,ψ are arbitrarily complex formulae). 4. Knowledge Engineering (1) 3. Calculi for SL Definition 4.13 (Proof) A proof of a formula ϕ from a theory T Fml L is a sequence ϕ 1,...,ϕ n of formulae such that ϕ n = ϕ and for all i with 1 i n one of the following conditions holds: ϕ i is substitution instance of an axiom, ϕ i T, there is ϕ l,ϕ k = (ϕ l ϕ i ) with l,k < i. Then ϕ i is the result of the application of modus ponens on the predecessor-formulae of ϕ i. We write: T ϕ (ϕ can be derived from T ). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL We show that: 1 and. 2 A A B and A A. 3 The rule A ϕ, A ψ (R) ϕ ψ can be derived. 4 Our version of sentential logic does not contain a connective. We define φ ψ as a macro for φ ψ ψ φ. Show the following: If φ ψ, then φ if and only if ψ. 4. Knowledge Engineering (1) 3. Calculi for SL We have now introduced two important notions: Syntactic derivability : the notion that certain formulae can be derived from other formulae using a certain calculus, Semantic validity =: the notion that certain formulae follow from other formulae based on the semantic notion of a model. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

82 4. Knowledge Engineering (1) 3. Calculi for SL Definition 4.14 (Correct-, Completeness for a calculus) Given an arbitrary calculus (which defines a notion ) and a semantics based on certain models (which defines a relation =), we say that Correctness: The calculus is correct with respect to the semantics, if the following holds: Φ φ implies Φ = φ. Completeness: The calculus is complete with respect to the semantics, if the following holds: Φ = φ implies Φ φ. 4. Knowledge Engineering (1) 3. Calculi for SL Theorem 4.15 (Correct-, Completeness for Hilbert SL L ) A formula follows semantically from a theory T if and only if it can be derived: T = ϕ if and only if T ϕ Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL Theorem 4.16 (Compactness for Hilbert SL L ) A formula follows from a theory T if and only if it follows from a finite subset of T : Cn(T ) = {Cn(T ) : T T, T finite}. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL Although the axioms from above and modus ponens suffice it is reasonable to consider more general systems. Therefore we introduce the following term: Definition 4.17 (Rule System MP + D) Let D be a set of general inference rules, e.g. mappings, which assign a formula ψ to a finite set of formulae ϕ 1,ϕ 2,...,ϕ n. We write ϕ 1,ϕ 2,...,ϕ n. ψ MP + D is the rule systems which emerges from adding the rules in D to modus ponens. For W Fml be Cn D (W) in the following the set of all formulae ϕ, which can be derived from W and the inference rules from MP+D. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

83 4. Knowledge Engineering (1) 3. Calculi for SL We bear in mind that Cn(W) is defined semantically but Cn D (W) is defined syntactically (using the notion of proof). Both sets are equal according to the completeness-theorem in the special case D = /0. Lemma 4.18 (Properties of Cn D ) D be a set of general inference rules and W Fml. Then: 1 Cn(W) Cn D (W). 2 Cn D (Cn D (W)) = Cn D (W). 3 Cn D (W) is the smallest set which is closed in respect to D and contains W. 4. Knowledge Engineering (1) 3. Calculi for SL Question: What is the difference between an inference rule ϕ ψ and the implication ϕ ψ? Assume we have a set T of formulae and we choose two constants p, q L. We can either consider (1) T together with MP and { p q } or (2) T {p q} together with MP Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL 1. Case: Cn { p q } (T ), 2. Case: Cn(T {p q}). If T = { q}, then we have in (2): p Cn(T {p q}), but not in (1). 4. Knowledge Engineering (1) 3. Calculi for SL It is well-known, that any formula φ can be written as a conjunction of disjunctions n m i φ i, j i=1 j=1 The φ i, j are just constants or negated constants. The n disjunctions m i j=1 φ i, j are called clauses of φ. Normalform Instead of working on arbitrary formulae, it is sometimes easier to work on finite sets of clauses. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

84 4. Knowledge Engineering (1) 3. Calculi for SL A resolution calculus for SL The resolution calculus is defined over the language L res L SL where the set of well-formed formulae FmlL Res res consists of all disjunctions of the following form A B C... E, i.e. the disjuncts are only constants or their negations. No implications or conjunctions are allowed. These formulae are also called clauses. is also a clause: the empty disjunction. 4. Knowledge Engineering (1) 3. Calculi for SL Set-notation of clauses A disjunction A B C... E is often written as a set {A, B,C,..., E}. Thus the set-theoretic union of such sets corresponds again to a clause: {A, B} {A, C} represents A B C. Note that the empty set /0 is identified with. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 3. Calculi for SL We define the following inference rule on Fml Res L res : Definition 4.19 (SL resolution) Let C 1,C 2 be clauses (disjunctions). Deduce the clause C 1 C 2 from C 1 A and C 2 A: (Res) C 1 A, C 2 A C 1 C 2 If C 1 = C 2 = /0, then C 1 C 2 =. 4. Knowledge Engineering (1) 3. Calculi for SL If we use the set-notation for clauses, we can formulate the inference rule as follows: Definition 4.20 (SL resolution (Set notation)) Deduce the clause C 1 C 2 from C 1 {A} and C 2 { A}: (Res) C 1 {A}, C 2 { A} C 1 C 2 Again, we identify the empty set /0 with. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

85 4. Knowledge Engineering (1) 3. Calculi for SL Definition 4.21 (Resolution Calculus for SL) We define the resolution calculus Robinson SL L res = /0,{Res} as follows. The underlying language is L res L SL defined on Slide 333 together with the well-formed formulae FmlL Res res. Thus there are no axioms and only one inference rule. The well-formed formulae are just clauses. Question: Is this calculus correct and complete? 4. Knowledge Engineering (1) 3. Calculi for SL Answer: It is correct, but it is not complete! But every problem of the kind T = φ is equivalent to T { φ} is unsatisfiable or rather to T { φ} ( stands for the calculus introduced above). Theorem 4.22 (Completeness of Resolution Refutation) If M is an unsatisfiable set of clauses then the empty clause can be derived in Robinson SL L res. We also say that resolution is refutation complete. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 4. Wumpus in SL 4. Knowledge Engineering (1) 4. Wumpus in SL 4.4 Wumpus in SL 4 Stench Breeze PIT 3 Breeze Stench PIT Breeze Gold 2 Stench Breeze 1 Breeze PIT Breeze START Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

86 4. Knowledge Engineering (1) 4. Wumpus in SL 4. Knowledge Engineering (1) 4. Wumpus in SL 1,4 2,4 3,4 4,4 1,3 2,3 3,3 4,3 1,2 2,2 3,2 4,2 OK 1,1 2,1 3,1 4,1 A OK OK A = Agent B = Breeze G = Glitter, Gold OK = Safe square P = Pit S = Stench V = Visited W = Wumpus 1,4 2,4 3,4 4,4 1,3 2,3 3,3 4,3 1,2 2,2 3,2 4,2 P? OK 1,1 2,1 A 3,1 P? 4,1 V B OK OK 1,4 2,4 3,4 4,4 1,3 W! 2,3 3,3 4,3 1,2 A 2,2 3,2 4,2 S OK OK 1,1 2,1 B 3,1 P! 4,1 V V OK OK A = Agent B = Breeze G = Glitter, Gold OK = Safe square P = Pit S = Stench V = Visited W = Wumpus 1,4 2,4 3,4 4,4 P? 1,3 W! 2,3 A 3,3 P? 4,3 S G B 1,2 S 2,2 3,2 4,2 V V OK OK 1,1 2,1 B 3,1 P! 4,1 V V OK OK (a) (b) (a) (b) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 4. Wumpus in SL Language definition: S i, j stench B i, j breeze Pit i, j is a pit Gl i, j glitters contains Wumpus W i, j General knowledge: S 1,1 ( W 1,1 W 1,2 W 2,1 ) S 2,1 ( W 1,1 W 2,1 W 2,2 W 3,1 ) S 1,2 ( W 1,1 W 1,2 W 2,2 W 1,3 ) 4. Knowledge Engineering (1) 4. Wumpus in SL Knowledge after the 3rd move: S 1,1 S 2,1 S 1,2 B 1,1 B 2,1 B 1,2 Question: Can we deduce that the wumpus is located at (1,3)? Answer: Yes. Either via resolution or using our Hilbert-calculus. S 1,2 (W 1,3 W 1,2 W 2,2 W 1,1 ) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

87 4. Knowledge Engineering (1) 4. Wumpus in SL 4. Knowledge Engineering (1) 4. Wumpus in SL Problem: We want more: given a certain situation we would like to determine the best action, i.e. to ask a query which gives us back such an action. This is impossible in SL: we can only check for each action whether it is good or not and then, by comparison, try to find the best action. But we can check for each action if it should be done or not. Therefore we need additional axioms: A 1,1 East W 2,1 Forward A 1,1 East Grube 2,1 Forward A i, j Gl i, j Take Gold Stench Breeze Stench Gold Stench Breeze START Breeze PIT Breeze PIT PIT Breeze Breeze Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 4. Wumpus in SL Disadvantages actions can only be guessed database must be changed continuously the set of rules becomes very big because there are no variables 4. Knowledge Engineering (1) 5. A Puzzle 4.5 A Puzzle Using an appropriate formalisation (additional axioms) we can check if KB action or KB action But it can happen that neither one nor the other is deducible. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

88 4. Knowledge Engineering (1) 5. A Puzzle We now want to formalize a Logelei and solve it with a theorem prover. Logelei from Die Zeit (1) Alfred ist als neuer Korrespondent in Wongowongo. Er soll über die Präsidentschaftswahlen berichten, weiß aber noch nichts über die beiden Kandidaten, weswegen er sich unter die Leute begibt, um Infos zu sammeln. Er befragt eine Gruppe von Passanten, von denen drei Anhänger der Entweder-oder-Partei sind und drei Anhänger der Konsequenten. 4. Knowledge Engineering (1) 5. A Puzzle Logelei from Die Zeit (2) Auf seinem Notizzettel notiert er stichwortartig die Antworten. A:»Nachname Songo: Stadt Rongo«, B:»Entweder-oder-Partei: älter«, C:»Vorname Dongo: bei Umfrage hinten«, A:»Konsequenten: Vorname Mongo«, B:»Stamm Bongo: Nachname Gongo«, C:»Vorname Dongo: jünger«, D:»Stamm Bongo: bei Umfrage vorn«, E:»Vorname Mongo: bei Umfrage hinten«, F:»Konsequenten: Stamm Nongo«, D:»Stadt Longo: jünger«, E:»Stamm Nongo: jünger«. F:»Konsequenten: Nachname Gongo«. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 5. A Puzzle Logelei from Die Zeit (3) Jetzt grübelt Alfred. Er weiß, dass die Anhänger der Entweder-oder-Partei (A, B und C) immer eine richtige und eine falsche Aussage machen, während die Anhänger der Konsequenten (D, E und F) entweder nur wahre Aussagen oder nur falsche Aussagen machen. Welche Informationen hat Alfred über die beiden Kandidaten? 4. Knowledge Engineering (1) 5. A Puzzle Towards a solution Selection of the language (Propositional Logic, Predicate Logic,...). Analysis and formalization of the problem. Transformation to the input format of a prover. Output of a solution, i.e. a model. (By Zweistein) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

89 4. Knowledge Engineering (1) 5. A Puzzle Definition of the constants Sur x,songo x s surname is Songo Sur x,gongo x s surname is Gongo First x,dongo x s first name is Dongo First x,mongo x s first name is Mongo Tribe x,bongo x belongs to the Bongos Tribe x,nongo x belongs to the Nongos City x,rongo x comes from Rongo City x,longo x comes from Longo A x x is the senior candidate J x x is the junior candidate H x x s poll is worse V x x s poll is better Here x is a candidate, i.e. x {a,b}. So we have 24 constants in total. 4. Knowledge Engineering (1) 5. A Puzzle The correspondent Alfred noted 12 statements about the candidates (each interviewee gave 2 statements, φ,φ ) which we enumerate as follows φ A,φ A,φ B,φ B,...,φ F,φ F, All necessary symbols are now defined, and we can formalize the given statements. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 5. A Puzzle Formalization of the statements φ A (Sur a,songo City a,rongo ) (Sur b,songo City b,rongo ) φ A First b,mongo φ B A a φ B (Tribe a,bongo Sur a,gongo ). (Tribe b,bongo Sur b,gongo ) 4. Knowledge Engineering (1) 5. A Puzzle Furthermore, explicit conditions between the statements are given, e.g. and (φ A φ A ) ( φ A φ A ) (φ D φ D) ( φ D φ D). Analogously, for the other statements. Is this enough information to solve the puzzle? E.g., can the following formula be satisfied? Sur a,songo Sur a,gongo Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

90 4. Knowledge Engineering (1) 5. A Puzzle We also need implicit conditions (axioms) which are required to solve this problem. It is necessary to state that each candidate has only one name, comes from one city, etc. We need the following background knowledge... Sur x,songo Sur x,gongo First x,dongo First x,mongo. H x V x Can we abstain from these axioms by changing our representation of the puzzle? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 5. A Puzzle What is still missing? Can we prove that when a s poll is worse, then s s poll is better? We need to state the relationships between these attributes: H x V y A x J y Finally, we have modeled all sensible information. Does this yield a unique model? No! There are 6 models in total, but this is all right. It just means there is no unique solution. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 5. A Puzzle What if a unique model is desirable? Often, there are additional assumptions hidden between the lines. Think, for example, of deductions by Sherlock Holmes (or Miss Marple, Spock etc). 4. Knowledge Engineering (1) 5. A Puzzle For example, it might be sensible to assume that both candidates come from different cities: City x,rongo City y,longo Indeed, with this additional axiom there is an unique model. But, be careful this additional information may not be justified by the nature of the task! Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

91 4. Knowledge Engineering (1) 6. P=NP Problem 4.6 P=NP Problem 4. Knowledge Engineering (1) 6. P=NP Problem Observation: Polynomial algorithms are fast, exponential ones are slow! To show the unsatisfiability of an SL-formula using the truth-table-method we need to test 2 n assignments in the worst case (n attributes). Question: Can this problem be solved in polynomial time with other methods? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 6. P=NP Problem We consider yes-no-problems. For each instance I of a problem we can ask if I has a respective property or not: I is yes- or no-instance. Definition 4.23 (The class P) A problem is in the class P problems which can be deterministically recognized in polynomial time if there is an algorithm Alg and a polynomial p(n) so that for each instance I the following holds: I is yes-instance iff Alg replies with yes to the input I after at most p(length(i ))-steps. 4. Knowledge Engineering (1) 6. P=NP Problem Definition 4.24 (The class NP) A problem is in the class NP problems which can be non-deterministically recognized in polynomial time if there is an algorithm Check and a polynomial p(n) so that for each instance I the following holds: I is yes-instance iff there is an instance I of another problem: (1) length(i ) < p(length(i )), (2) Check replies with yes to the input I after at most p(length(i ))-steps. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

92 4. Knowledge Engineering (1) 6. P=NP Problem 4. Knowledge Engineering (1) 6. P=NP Problem Question: Which are the most difficult problems in NP? Obvious: The satisfiability of SL-formulae is in NP and P is a subset of NP. Problems which can be polynomially reduced onto each other are obviously equivalent. Definition 4.25 A problem Prob 1 is polynomially transformable into a problem Prob 2 if for each instance I 1 of Prob 1 one can calculate an instance I 2 of Prob 2 in polynomial time so that the following holds: I 1 is yes-instance of Prob 1 if and only if I 2 is yes-instance of Prob 2 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (1) 6. P=NP Problem 4. Knowledge Engineering (1) 6. P=NP Problem Definition 4.26 (NP-completeness) A problem is called NP-hard if all problems in NP can be reduced onto it in polynomial time. A problem is called NP-complete if it is NP-hard and it belongs to NP. These problems are NP-complete: nonprimeness Hamiltonian circle satisfiability maximum clique 3-colorability subset sum Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

93 5. Learning in networks Chapter 5. Learning in networks Learning in networks 5.1 The human brain 5.2 Neural networks 5.3 The Perceptron 5.4 Multi-layer feed-forward 5.5 Kernel Machines 5.6 Applications 5.7 Genetic Algorithms 5. Learning in networks Learning with networks is a method to build complex functions from many very simple but connected units and to learn this construction from examples, improve the understanding of the functionality of the human brain. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 1. The human brain 5.1 The human brain 5. Learning in networks 1. The human brain Axonal arborization Axon from another cell Synapse Dendrite Axon Nucleus Synapses Cell body or Soma Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

94 5. Learning in networks 1. The human brain A neuron consists of the soma: the body of the cell, the nucleus: the core of the cell, the dendrites, the axon: 1 cm - 1 m in length. The axon branches and connects to the dendrites of other neurons: these locations are called synapses. Each neuron shares synapses with others. 5. Learning in networks 1. The human brain Signals are propagated from neuron to neuron by a complicated electrochemical reaction. Chemical transmitter substances are released from the synapses and enter the dendrite, raising or lowering the electrical potential of the cell body. When the potential reaches a threshold an electrical pulse or action potential is sent along the axon. The pulse spreads out along the branches of the axons and releases transmitters into the bodies of other cells. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 1. The human brain Question: How does the building process of the network of neurons look like? Question Long term changes in the strength of the connections are in response to the pattern of stimulation. 5. Learning in networks 1. The human brain Biology versus electronics: Computer Human Brain Computational units 1 CPU, 10 5 gates neurons Storage units 10 9 bits RAM, bits disk neurons, synapses Cycle time 10 ;8 sec 10 ;3 sec Bandwidth 10 9 bits/sec bits/sec Neuron updates/sec Computer: sequential processes, very fast, rebooting quite often Brain: works profoundly concurrently, quite slow, error-correcting, fault-tolerant (neurons die constantly) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

95 5. Learning in networks 2. Neural networks 5.2 Neural networks 5. Learning in networks 2. Neural networks Definition 5.1 (Neural Network) A neural network consists of: 1 units, 2 links between units. The links are weighted. There are three kinds of units: 1 input units, 2 hidden units, 3 output units. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 2. Neural networks 5. Learning in networks 2. Neural networks Idea: A unit i receives an input via links to other units j. The input function in i := W j,i a j j calculates the weighted sum. Notation a i a i g g 0 Err i Err e I i I I e in i N O O i O t T T T e W j,i W i W i W Meaning Activation value of unit i (also the output of the unit) Vector of activation values for the inputs to unit i Activation function Derivative of the activation function Error (difference between output and target) for unit i Error for example e Activation of a unit i in the input layer Vector of activations of all input units Vector of inputs for example e Weighted sum of inputs to unit i Total number of units in the network Activation of the single output unit of a perceptron Activation of a unit i in the output layer Vector of activations of all units in the output layer Threshold for a step function Target (desired) output for a perceptron Target vector when there are several output units Target vector for example e Weight on the link from unit j to unit i Weight from unit i to the output in a perceptron Vector of weights leading into unit i Vector of all weights in the network Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

96 a j 5. Learning in networks 2. Neural networks Input Links Wj,i in i Σ a i = g(in i ) g a i Output Links 5. Learning in networks 2. Neural networks The activation function g calculates the output a i (from the inputs) which will be transferred to other units via output-links: a i := g(in i ) Examples: a i a i a i Input Function Activation Function Output t in i in i in i 1 (a) Step function (b) Sign function (c) Sigmoid function Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 2. Neural networks Standardisation: We consider step 0 instead of step t. If an unit i uses the activation function step t (x) then we bring in an additional input link 0 which adds a constant value of a 0 := 1. This value is weighted as W 0,i := t. Now we can use step 0 for the activation function: step t ( n j=1 W j,i a j ) = step 0 ( t + n j=1 W j,i a j ) = step 0 ( n j=0 W j,i a j ) 5. Learning in networks 2. Neural networks Networks can be recurrent, i.e. somehow connected, feed-forward, i.e. they form an acyclic graph. Usually networks are partitioned into layers: units in one layer have only links to units of the next layer. E.g. multi-layer feed-forward networks: without internal states (no short-term memory). I 1 w 13 w 14 H 3 w 35 O 5 I 2 w 23 w 24 H 4 w 45 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

97 5. Learning in networks 2. Neural networks 5. Learning in networks 2. Neural networks Important: The output of the input-units is determined by the environment. Question 1: Which function does the figure describe? Question 2: Why can non-trivial functions be represented at all? Question 3: How many units do we need? a few: a small number of functions can be represented, many: the network learns by heart ( overfitting) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 2. Neural networks Hopfield networks: 1 bidirectional links with symmetrical weights, 2 activation function: sign, 3 units are input- and output-units, 4 can store up to 0.14N training examples. 5. Learning in networks 2. Neural networks Learning with neural networks means the adjustment of the parameters to ensure consistency with the training-data. Question: How to find the optimal network structure? Answer: Perform a search in the space of network structures (e.g. with genetic algorithms). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

98 5. Learning in networks 3. The Perceptron 5.3 The Perceptron 5. Learning in networks 3. The Perceptron Definition 5.2 (Perceptron) A perceptron is a feed-forward network with one layer based on the activation function step 0. I j W j,i O i I j W j O Input Units Output Units Input Units Output Unit Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Perceptron Network Single Perceptron Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 3. The Perceptron Question: Can every boolean function be represented by a feed-forward network? Can AND, OR and NOT be represented? Is it possible to represent every boolean function by simply combining these? What about 5. Learning in networks 3. The Perceptron Solution: Every boolean function can be composed using AND, OR and NOT (or even only NAND). The combination of the respective perceptrons is not a perceptron! f (x 1,...,x n ) := { 1, if n i=1 x i > n 2 0, else. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

99 5. Learning in networks 3. The Perceptron Perceptron with sigmoid activation Perceptron output x x 2 5. Learning in networks 3. The Perceptron Question: What about XOR? I I I I I 2 (a) I 1 and I 2 (b) I 1 or I 2 (c) I 1 xor I 2? I 2 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 3. The Perceptron Output = step 0 ( n j=1 W j I j ) n j=1 W ji j = 0 defines a n-dimensional hyperplane. Definition 5.3 (Linear Separable) A boolean function with n attributes is called linear separable if there is a hyperplane ((n 1)-dimensional subspace) which separates the positive domain-values from the negative ones. 5. Learning in networks 3. The Perceptron I 3 I 1 I 2 W = 1 W = 1 t = 1.5 W = 1 (a) Separating plane (b) Weights and threshold Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

100 5. Learning in networks 3. The Perceptron Learning algorithm: Similar to current best hypothesis ( chapter on learning). hypothesis: network with the current weights (firstly randomly generated) UPDATE: make it consistent through small changes. Important: for each example UPDATE is called several times. These calls are called epochs. 5. Learning in networks 3. The Perceptron function NEURAL-NETWORK-LEARNING(examples) returns network network a network with randomly assigned weights repeat for each e in examples do O NEURAL-NETWORK-OUTPUT(network, e) T the observed output values from e update the weights in network based on e, O, and T end until all examples correctly predicted or stopping criterion is reached return network Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 3. The Perceptron 5. Learning in networks 3. The Perceptron Definition 5.4 (Perceptron Learning (step 0 ) Perceptron learning modifies the weights W j with respect to this rule: W j := W j + α I j Error with Error:= T O (i.e. the difference between the correct and the current output-value). α is the learning rate. Definition 5.5 (Perceptron Learning (sigmoid) Perceptron learning modifies the weights W j with respect to this rule: W j := W j + α g (in)i j Error with Error:= T O (i.e. the difference between the correct and the current output-value). α is the learning rate. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

101 \ š k : X 5. Learning in networks 3. The Perceptron The following algorithm uses x j [e] for I j. function PERCEPTRON-LEARNING(, ) returns a perceptron hypothesis u ' inputs:, k a set 6 of examples, each with input x M k Œ : : k and output % k 6, a perceptron with weights r M L, and activation function + ' u repeat for each in k 6 do H % n p ` } W ` a + X [ a O n O + X [ O k until some stopping criterion is satisfied return NEURAL-NET-HYPOTHESIS( ) u ' ` a 5. Learning in networks 3. The Perceptron Theorem 5.6 (Rosenblatt s Theorem) Every function which can be represented by a perceptron is learned through the perceptron learning algorithm (Definition 5.4). More exactly: The series W j converges to a function which represents the examples correctly. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 3. The Perceptron Proof: Let ŵ be a solution, wlog. we can assume (why? exercise) ŵ I > 0 for all I I pos I neg with I pos consisting of the positive and I neg consisting of the negative examples (and I neg = { I : I I neg }). Let I := I pos I neg and m := min{ŵ I : I I }. w 1,..., w j,... be the sequence of weights resulting from the algorithm. We want to show that this sequence eventually becomes constant. 5. Learning in networks 3. The Perceptron Proof (continued): Consider the possibility that not all w i are different from their predecessor (or successor) (this is the case if the new example is not consistent with the current weights). Let k 1,k 2,...,k j be the indices of the changed weights (where error is non-zero), i.e. w k j Ik j 0, w k j +1 = w k j + α I k j. With I k j being the k j -th tested example in I (which is not consistent (wrt the definition of k j )). The we have w k j +1 = w k1 + α I k1 + α I k α I k j Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

102 5. Learning in networks 3. The Perceptron Proof (continued): We use this to show that j cannot become arbitrarily big. We compose ŵ w k j +1 cosω = ŵ w k j +1 and estimate as follows (by decreasing the numerator and increasing the denominator): cosω = ŵ w k j +1 ŵ w k j +1 ŵ w k1 + αm j ŵ w k1 2 + α 2 M j The right side converges to infinity (when j increases to infinity) and cosinus will never get greater than 1. This leads to the contradiction. 5. Learning in networks 3. The Perceptron Proof (continued): How do we get the estimation above? the scalar product ŵ w k j +1 leads to ŵ w k1 + αŵ( I I j ) ŵ w k1 + αm j. w k j +1 2 = w k j + α I k j 2 = w k j 2 + 2α I k jwk j + α 2 I k j 2 w k j 2 + α 2 I k j 2, da Ik jwk j 0, this is how we have chosen k j. Now be M := max{ I 2 : I I }. Then w k j +1 2 w k1 2 + α 2 I k α 2 I k j 2 w k1 2 + α 2 M j holds. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 3. The Perceptron 1 5. Learning in networks 3. The Perceptron 1 Proportion correct on test set Perceptron Decision tree Proportion correct on test set Decision tree Perceptron Training set size Training set size Figure 5.12: Perceptron Better than Decision Tree Which could be the underlying example? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Figure 5.13: Decision Tree Better than Perceptron Which could be the underlying example? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

103 5. Learning in networks 4. Multi-layer feed-forward 5.4 Multi-layer feed-forward 5. Learning in networks 4. Multi-layer feed-forward Problem: How does the error-function of the hidden units look like? Learning with multi-layer networks is called back propagation. Hidden units can be seen as perceptrons (Figure on page 393). The outcome can be a linear combination of such perceptrons (see next two slides). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 4. Multi-layer feed-forward 5. Learning in networks 4. Multi-layer feed-forward h W (x 1, x 2 ) x x 2 h W (x 1, x 2 ) x x 2 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

104 5. Learning in networks 4. Multi-layer feed-forward Output units Hidden units O i W j,i a j 5. Learning in networks 4. Multi-layer feed-forward The perceptron was not powerful enough in our restaurant-example (Figure 20). So we try 2 layers. 10 attributes lead to 10 input-units. Question How many hidden units are necessary? Input units W k,j I k Answer: Four! Perceptron s error is easily determined because there was only one W j between input and output. Now we have several. How should the error be distributed? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 4. Multi-layer feed-forward We minimise E = 1 2 i(t i O i ) 2 and get E = 1 2 i(t i g( j W j,i a j )) 2 = 2 1 i(t i g( j W j,i g( k W k, j I k ))) 2 Do a gradient descent. E W j,i = 1 2 i(t i O i ) 2 = 1 2 2(T i g(...))( g x (in i))a j = a j (T i O i ) g x (in i) = a j i 5. Learning in networks 4. Multi-layer feed-forward Now the W k, j : E W k, j = 1 2 i (2(T i g(...))( g x (in i))(w j,i g x (in j)i k )) = i ( i W j,i g x (in j)i k ) = g x (in j)i k i W j,i i Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

105 m m [ \ š m k 5. Learning in networks 4. Multi-layer feed-forward Idea: We perform two different updates. One for the weights to the input units and one for the weights to the output units. output units: similar to the perceptron W j,i := W j,i + α a j Error i g (in i ) Instead of Error i g (in i ) write i. hidden units: each hidden unit j is partly responsible for the error i (if j is connected with the output unit i). 5. Learning in networks 4. Multi-layer feed-forward function BACK-PROP-LEARNING(, ) returns a neural network u ' inputs:, k a set 6 of examples, each with input vector x and output vector y k 6, a multilayer network with. layers, weights ³, activation function + ' u repeat for each in do k 6 for each node in the input layer do r for =2to do Ï H ³ X + [ for each node Q in the output layer do [ X ` a [ + X O for = to 1 do for each Ï node in layer z do r X H + ³ for each Q node in \ z layer do ` a W k, j := W k, j + α I k j with j := g (in j ) i W j,i i. ³ O m ³ until some stopping criterion is satisfied return NEURAL-NET-HYPOTHESIS( ) u ' O Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 4. Multi-layer feed-forward Total error on training set Number of epochs 5. Learning in networks 4. Multi-layer feed-forward Proportion correct on test set Decision tree Multilayer network Training set size Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

106 5. Learning in networks 4. Multi-layer feed-forward Back propagation algorithm: 1 calculate i for the output units based on the observed error error i. 2 for each layer proceed recursively (output layer first): back propagate the i (predecessor layer) modify the weight between the current layers 5. Learning in networks 4. Multi-layer feed-forward error is the function of the network s weights. This function delivers the error surface. Err Important: Back propagation is gradient search! b a W 1 W 2 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 4. Multi-layer feed-forward General remarks: expressibility: neural networks are suitable for continous input and outputs (noise). To represent all boolean functions with n attributes 2n n hidden units suffice. Often much less suffice: the art of determining the topology of the network. 5. Learning in networks 4. Multi-layer feed-forward efficiency: m examples, W weights: each epoch needs O(m W )-time. We know: Number of epochs is exponential. In practice the time of convergence is very variable. Problem: local minima on the error surface. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

107 5. Learning in networks 4. Multi-layer feed-forward 5. Learning in networks 5. Kernel Machines 5.5 Kernel Machines transparency: black box. Trees and lists explain their results! Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 5. Kernel Machines 5. Learning in networks 5. Kernel Machines x The last figure is not linear separable. Can it be transformed into a linear separable set? Yes, in a higher dimensional space! F : R 2 R 3 ; x 1,x 2 x 2 1,x2 2, 2x 1 x 2 Note that F(x i )F(x j ) = (x i x j ) x 1 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

108 5. Learning in networks 5. Kernel Machines 5. Learning in networks 5. Kernel Machines 2x 1 x x x 2 2 Given n data points. Can they always be linearly separated in n 1 dimensions? Yes, so what? Overfitting!!! Support Vector Machines Kernel (or Support Vector-) machines were invented to compute the optimal linear separator. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 5. Kernel Machines Given pairs x i,y i (y i { 1,1} is the classification). Find α i such that the following is maximised i α i (α i α j y i y j (x i x j )) i, j subject to the constraints α i 0 and i α i y i = 0. The separator is then given by h(x) = sign( α i y i (xx i )) i Most of the α i are 0, only those closest to the separator are nonzero: they are called support vectors. 5. Learning in networks 5. Kernel Machines x x 1 2 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

109 5. Learning in networks 6. Applications 5.6 Applications 5. Learning in networks 6. Applications Each of the following applications has been solved in test-period of several months. Pronounciation: (1987) NETtalk can pronounce English texts. The input consits of the letters-to-be-spoken, the predecessor and three successors: 80 hidden units. output layer: high pitch, low pitch, stressed, unstressed,... training: 1024 words. After 50 epochs: 95% works correctly on the training-set. testing-set: 78 % correct. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 6. Applications Easy: Hard: Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 6. Applications Handwritten character recognition: (1989) recognition of the zipcode on envelopes. input 6 16 pixel-array. 3 hidden layers, each with 768, 192, 30 units. 10 output units (0 9). This leads to weights! Trick: Units in the first layer observe only a 5 5 submatrix. 768 = Each of these 12 groups uses its own weights. Each group is responsible for one characteristic. Alltogether 9760 weights. training: 7300 examples. testing-set: 99 % correct. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

110 5. Learning in networks 6. Applications Driving: (1993) ALVINN. Actions: steer, accelerate, brake. Driving straight. input: color stereo video, radar. Picture is a pixel-array (input units). output layer: 30 units (representing the steering direction). The output unit with the highest activation is chosen. Demanded is a function which maps each image on a steering direction. hidden units: one layer with 5 units training: a human drives for 5 minutes, the network observes. back-propagation: 10 minutes problems: the human drives too good brightness! Depends on covering and weather. 5. Learning in networks 7. Genetic Algorithms 5.7 Genetic Algorithms Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 7. Genetic Algorithms Remember: Darwin, Lamarck survival of the fittest mutation, selection, reproduction. Algorithm: Works on a population, reproduces and selects according to a fitness-function and returns individuals. 5. Learning in networks 7. Genetic Algorithms function GENETIC-ALGORITHM( population, FITNESS-FN) returns an individual inputs: population, a set of individuals FITNESS-FN, a function that measures the fitness of an individual repeat parents SELECTION( population,fitness-fn) population REPRODUCTION( parents) until some individual is fit enough return the best individual in population, according to FITNESS-FN Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

111 5. Learning in networks 7. Genetic Algorithms Question: What does this have to do with learning? The role of the individual: function of an agent (fitness is performance), component function of an agent (fitness is critic). GENETIC-ALGORITHM represents concurrent search with hill-climbing. 5. Learning in networks 7. Genetic Algorithms Question: How to apply this learning algorithm? 1 fitness function: set of individuals IR, 2 representation of an individual: a string (finite alphabet), 3 selections of individuals: randomly, probability proportional to the fitness, 4 reproduction of individuals: cross over and mutation. Selected individuals are randomly mated. For each pair a cross-over-position is randomly chosen (n 0 IN): i.e. descendant has the first n 0 characters of the first parent and the last of the second parent. Each character mutates with a certain probability. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Learning in networks 7. Genetic Algorithms 5. Learning in networks 7. Genetic Algorithms Example 5.7 (Restaurant revisited) fitness function: number of examples which are consistent with an individual ( a decision list). representation: (remember: 10 attributes, some not boolean). To represent all pairs (attribute,value) we need 5 bits. For k DL(n) we need: t (5 k + 1) bits (a) Initial Population % 24% 24% 20% (b) Fitness Function (c) Selection (d) Cross Over (e) Mutation Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

112 6. Knowledge Engineering (2) Chapter 6. Knowledge Engineering (2) 6. Knowledge Engineering (2) 1. First Order Logic 6.1 First Order Logic Knowledge Engineering (2) 6.1 First Order Logic 6.2 Herbrand 6.3 Resolution 6.4 Sit-Calculus 6.5 Problems Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic Definition 6.1 (First order logic L FOL, L L FOL ) The language L FOL of first order logic (Praedikatenlogik erster Stufe) is: x,y,z,x 1,x 2,...,x n,...: a countable set Var of variables for each k IN 0 : P k 1,Pk 2,...,Pk n,... a countable set Pred k of k-dimensional predicate symbols (the 0-dimensional predicate symbols are the propositional logic constants from At of L SL ). We suppose that and are available. for each k IN 0 : f k 1, f k 2,..., f k n,... a countable set Funct k of k-dimensional function symbols,,, : the sentential connectives (, ): the parentheses, : quantifiers 6. Knowledge Engineering (2) 1. First Order Logic Definition (continued) The 0-dimensional function symbols are called individuum constants we leave out the parentheses. In general we will need as in propositional logic only a certain subset of the predicate or function symbols. These define a language L L FOL (analogously to definition 4.1 on page 2). The used set of predicate and function symbols is also called signature Σ. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

113 6. Knowledge Engineering (2) 1. First Order Logic 6. Knowledge Engineering (2) 1. First Order Logic Definition (continued) The concept of an L-term t and an L-formula ϕ are defined inductively: Term: L-terms t are defined as follows: 1 each variable is a L-term. 2 if f k is a k-dimensional function symbol from L and t 1,...,t k are L-terms, then f k (t 1,...,t k ) is a L-Term. The set of all L-terms that one can create from the set X Var is called Term L (X) or Term Σ (X). Using X = /0 we get the set of basic terms Term L (/0), short: Term L. Definition (continued) Formula: L-formulae ϕ are also defined inductively: 1 if P k is a k-dimensional predicate symbol from L and t 1,...,t k are L-terms then P k (t 1,...,t k ) is a L-formula 2 for all L-formulae ϕ is ( ϕ) a L-formula 3 for all L-formulae ϕ and ψ are (ϕ ψ) and (ϕ ψ) L-formulae. 4 if x is a variable and ϕ a L-formula then are ( xϕ) and ( x ϕ) L-formulae. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic 6. Knowledge Engineering (2) 1. First Order Logic An illustrating example Definition (continued) Atomic L-formulae are those which are composed according to 1., we call them At L (X) (X Var). The set of all L-formulae in respect to X is called Fml L (X). Positive formulae (Fml L + (X)) are those which are composed using only 1, 3. and 4. If ϕ is a L-formula and is part of an other L-formula ψ then ϕ is called sub-formula of ψ. Example 6.2 (From semigroups to rings) We consider L = {0, 1, +,,, =}, where 0, 1 are constants, +, binary operations and,= binary relations. What can be expressed in this language? Ax 1: x y z x + (y + z) = (x + y) + z Ax 2: x (x + 0 = 0 + x) (0 + x = x) Ax 3: x y (x + y = 0) (y + x = 0) Ax 4: x y x + y = y + x Ax 5: x y z x (y z) = (x y) z Ax 6: x y z x (y + z) = x y + x z Ax 7: x y z (y + z) x = y x + z x Axiom 1 describes an semigroup, the axioms 1-2 describe a monoid, the axioms 1-3 a group, and the axioms 1-7 a ring. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

114 6. Knowledge Engineering (2) 1. First Order Logic Definition 6.3 (L-structure A = (U A,I A )) A L-structure or a L-interpretation is a pair A = def (U A,I A ) with U A being an arbitrary non-empty set, which is called the basic set (the universe or the individuum range) of A. Further I A is a mapping which assigns to each k-dimensional predicate symbol P k in L a k-dimensional predicate over U A assigns to each k-dimensional function symbol f k in L a k-dimensional function on U A In other words: the domain of I A is exactly the set of predicate and function symbols of L. 6. Knowledge Engineering (2) 1. First Order Logic Definition (continued) The range of I A consists of the predicates and functions on U A. We write: I A (P) = P A, I A ( f ) = f A. ϕ be a L 1 -formula and A = def (U A,I A ) a L-structure. A is called matching with ϕ if I A is defined for all predicate and function symbols which appear in ϕ, i.e. if L 1 L. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic 6. Knowledge Engineering (2) 1. First Order Logic Definition 6.4 (Variable assignment ρ) A variable assignment ρ over a L-structure A = (U A,I A ) is a function ρ : Var U A ; x ρ(x). Definition 6.5 (Semantics of first order logic, Model A) Let ϕ be a formula, A a structure matching with ϕ and ρ a variable assignment over A. For each term t, which can be built from components of ϕ, we define inductively the value of t in the structure A, we call A(t). 1 for a variable x is A(x) = def ρ(x). 2 if t has the form t = f k (t 1,...,t k ), with t 1,...,t k being terms and f k a k-dimensional function symbol, then A(t) = def f A (A(t 1 ),...,A(t k )). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

115 6. Knowledge Engineering (2) 1. First Order Logic Definition (continued) We define inductively the logical value of a formula ϕ in A: 1. if ϕ = def P k (t 1,...,t k ) with the terms t 1,...,t k and the k-dimensional predicate symbol P k, then { true, i f (A(t1 ),...,A(t A(ϕ) = k )) P A, def false, else. 2. if ϕ = def ψ, then { true, A(ϕ) = def false, 3. if ϕ = def (ψ η), then { true, A(ϕ) = def false, i f A(ψ) = false, else. i f A(ψ) = true and A(η) = true, else. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic Definition (continued) 4. if ϕ = def (ψ η), then { true, A(ϕ) = def false, 5. if ϕ = def xψ, then { true, A(ϕ) = def false, 6. if ϕ = def xψ, then { true, A(ϕ) = def false, i f A(ψ) = true or A(η) = true, else. i f d UA : A [x/d] (ψ) = true, else. i f d UA : A [x/d] (ψ) = true, else. In the cases 5. and 6. the notation [x/d] was used. It is defined as follows: For d U A let A [x/d] be the structure A, which is identical to A except for the definition of x A : x A = def d (whether I A is defined for x or not). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic Definition (continued) We write: A = ϕ[ρ] for A(ϕ) = true: A is a model for ϕ with respect to ρ. If ϕ does not contain free variables, then A = ϕ[ρ] is independent from ρ. We simply leave out ρ. If there is at least one model for ϕ, then ϕ is called satisfiable or consistent. A free variable is a variable which is not in the scope of a quantifier. For instance, z is a free variable of xp(x,z) but not free (or bounded) in z xp(x,z) 6. Knowledge Engineering (2) 1. First Order Logic Definition 6.6 (Tautology) 1 A theory is a set of formulae without free variables: T Fml L. The structure A satisfies T if A = ϕ holds for all ϕ T. We write A = T and call A a model of T. 2 A L-formula ϕ is called L-tautology, if for all matching L-structures A the following holds: A = ϕ. From now on we suppress the language L, because it is obvious from context. Nevertheless it has to be defined. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

116 6. Knowledge Engineering (2) 1. First Order Logic Definition 6.7 (Consequence set Cn(T )) A formula ϕ follows semantically from T, if for all structures A with A = T also A = ϕ holds. We write: T = ϕ. In other words: all models of T do also satisfy ϕ. We denote by Cn L (T ) = def {ϕ Fml L : T = ϕ}, or simply Cn(T ), the semantic consequence operator. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic Lemma 6.8 (Properties of Cn(T )) The semantic consequence operator has the following properties 1 T -extension: T Cn(T ), 2 Monotony: T T Cn(T ) Cn(T ), 3 Closure: Cn(Cn(T )) = Cn(T ). Lemma 6.9 (ϕ Cn(T)) ϕ Cn(T ) if and only if there is a structure A with A = T and A = ϕ. In other words: ϕ Cn(T ) if and only if there is a counterexample: a model of T in which ϕ is not true. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic Definition 6.10 (MOD(T ), Cn(U)) If T Fml L, then we denote by MOD(T ) the set of all L-structures A which are models of T : MOD(T ) = def {A : A = T }. If U is a set of structures then we can consider all sentences, which are true in all structures. We call this set also Cn(U): Cn(U) = def {ϕ Fml L : A U : A = ϕ}. 6. Knowledge Engineering (2) 1. First Order Logic Definition 6.11 (Completeness of a theory T ) T is called complete, if for each formula ϕ Fml L : T = ϕ or T = ϕ holds. Attention: Do not mix up this last condition with the property of a structure v (or a model): each structure is complete in the above sense. Lemma 6.12 (Ex Falso Quodlibet) T is consistent if and only if Cn(T ) Fml L. MOD is obviously dual to Cn: Cn(MOD(T )) = Cn(T ), MOD(Cn(T )) = MOD(T ). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

117 6. Knowledge Engineering (2) 1. First Order Logic An illustrating example Example 6.13 (Natural numbers in different languages) N Pr = (IN 0,0 N,+ N,= N ) ( Presburger Arithmetik ), N PA = (IN 0,0 N,+ N, N,= N ) ( Peano Arithmetik ), N PA = (IN 0,0 N,1 N,+ N, N,= N ) (variant of N PA ). These sets each define the natural numbers, but in different languages. Question: If the language bigger is bigger then we can express more. Is L PA more expressive then L PA? 6. Knowledge Engineering (2) 1. First Order Logic Answer: No, because one can replace the 1 N by a L PA -formula: there is a L PA -formula φ(x) so that for each variable assignment ρ the following holds: N PA = ρ φ(x) if and only if ρ(x) = 1 N Thus we can define a macro for 1. Each formula of L PA can be transformed into an equivalent formula of L PA. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 1. First Order Logic Question: Is L PA perhaps more expressive than L Pr, or can the multiplication be defined somehow? We will see later that L PA is indeed more expressive: the set of sentences valid in N Pr is decidable, whereas the set of sentences valid in N PA is not even recursively enumerable. 6. Knowledge Engineering (2) 1. First Order Logic As for sentential logic, formulae can be derived from a given theory and they can also (semantically) follow from it. Syntactic derivability : the notion that certain formulae can be derived from other formulae using a certain calculus, Semantic validity =: the notion that certain formulae follow from other formulae based on the semantic notion of a model. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

118 6. Knowledge Engineering (2) 1. First Order Logic Definition 6.14 (Correct-, Completeness for a calculus) Given an arbitrary calculus (which defines a notion ) and a semantics based on certain models (which defines a relation =), we say that Correctness: The calculus is correct with respect to the semantics, if the following holds: Φ φ implies Φ = φ. Completeness: The calculus is complete with respect to the semantics, if the following holds: Φ = φ implies Φ φ. 6. Knowledge Engineering (2) 1. First Order Logic We have already defined a complete and correct calculus for sentential logic L SL. Such calculi also exist for first order logic L FOL. Theorem 6.15 (Correct-, Completeness of FOL) A formula follows semantically from a theory T if and only if it can be derived: T ϕ if and only if T = ϕ Theorem 6.16 (Compactness of FOL) A formula follows from a theory T if and only if it follows from a finite subset of T : Cn(T ) = {Cn(T ) : T T, T finite}. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 2. Herbrand 6.2 Herbrand 6. Knowledge Engineering (2) 2. Herbrand The introduced relation T = φ says that each model of T is also a model of φ. But because there are many models with very large universes the following question arises: can we restrict to particular models? Theorem 6.17 (Löwenheim-Skolem) T = φ holds if and only if φ holds in all countable models of T. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

119 6. Knowledge Engineering (2) 2. Herbrand Quite often the universes of models (which we are interested in) consist exactly of the basic terms Term L (/0). This leads to the following notion: Definition 6.18 (Herbrand model) A model A is called Herbrand model with respect to a language if the universe of A consists exactly of Term L (/0) and the function symbols f k i are interpreted as follows: fi k A : TermL (/0)... Term L (/0) Term L (/0); (t 1,...,t k ) fi k(t 1,...,t k ) We write T = Herb φ if each Herbrand model of T is also a model of φ. 6. Knowledge Engineering (2) 2. Herbrand Theorem 6.19 (Reduction to Herbrand models) If T is universal and φ existential, then the following holds: T = φ if and only if T = Herb φ Question: Is T = Herb φ not much easier, because we have to consider only Herbrand models? Is it perhaps decidable? (More about it in Section 574.) No, truth in Herbrand models is highly undecidable. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 2. Herbrand The following theorem is the basic result for applying resolution. In a way it states that FOL can be somehow reduced to SL. Theorem 6.20 (Herbrand) T be universal and φ does not contain quantifiers. Then: T = φ if and only if there is t 1,...,t n Term L (/0) with: T = φ(t 1 )... φ(t n ) Or: Let M be a set of clauses of FOL (formulae in the form P 1 (t 1 ) P 2 (t 2 )... P n (t n ) with t i Term L (X)). Then: M is unsatisfiable if and only if there is a finite and unsatisfiable set M inst of basic instances of M. 6. Knowledge Engineering (2) 2. Herbrand In automatic theorem proving we are always interested in the question M = x 1,...x n Then i φ i M { x 1,...x n φ i } i is a set of clauses, which are unsatisfiable only if the relation above holds. if and Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

120 6. Knowledge Engineering (2) 3. Resolution 6.3 Resolution 6. Knowledge Engineering (2) 3. Resolution Definition 6.21 (Most general unifier: mgu) Given a finite set of equations between terms or equations between literals. Then there is an algorithm which calculates a most general solution substitution (i.e. a substitution of the involved variables so that the left sides of all equations are syntactically identical to the right sides) or which returns fail. In the first case the most general solution substitution is defined (up to renaming of variables): it is called mgu, most general unifier Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 3. Resolution p(x,a) = q(y,b), p(g(a), f (x)) = p(g(y), z). Basic substitutions are: [y/a,x/a,z/ f (a)], [y/a,x/ f (a),z/ f ( f (a))],... The mgu is: [y/a,z/ f (x)]. 6. Knowledge Engineering (2) 3. Resolution We want to outline the mentioned algorithm using an example. Given: f (x,g(h(y),y)) = f (x,g(z,a)) The algorithm successively calculates the following sets of equations: { x = x, g(h(y),y) = g(z,a) } { g(h(y),y) = g(z,a) } { h(y) = z, y = a } { z = h(y), y = a } { z = h(a), y = a } Thus the mgu is: [x/x, y/a, z/h(a)]. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

121 6. Knowledge Engineering (2) 3. Resolution A resolution calculus for FOL The resolution calculus is defined over the language L res L FOL where the set of well-formed formulae FmlL Res res consists of all disjunctions of the following form A B C... E, i.e. the disjuncts are only atoms or there negations. No implications or conjunctions are allowed. These formulae are also called clauses. Such a clause is also written as the set {A, B,C,..., E}. This means that the set-theoretic union of such sets corresponds again to a clause. Note, that a clause now consists of atoms rather than constants, as it was the case of the resolution calculus for SL. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 3. Resolution Definition 6.22 (Robinson s resolution for FOL) The resolution calculus consists of two rules: (Res) C 1 {A 1 } C 2 { A 2 } (C 1 C 2 )mgu(a 1,A 2 ) where C 1 {A 1 } and C 2 {A 2 } are assumed to be disjunct wrt the variables, and the factorization rule (Fac) C 1 {L 1,L 2 } (C 1 {L 1 })mgu(l 1,L 2 ) Consider for example M = {r(x) p(x), p(a), s(a)} and the question M = x(s(x) r(x))? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 3. Resolution Definition 6.23 (Resolution Calculus for FOL) We define the resolution calculus Robinson FOL L res = /0,{Res,Fac} as follows. The underlying language is L res L FOL defined on Slide 481 together with the set of well-formed formulae FmlL Res res. Thus there are no axioms and only two inference rules. The well-formed formulae are just clauses. Question: Is this calculus correct and complete? 6. Knowledge Engineering (2) 3. Resolution Question: Why do we need factorization? Answer: Consider M = {s(x 1 ) s(x 2 ), s(y 1 ) s(y 2 )} Resolving both clauses gives {s(x 1 )} { s(y 1 )} or variants of it. Resolving this new clause with one in M only leads to variants of the respective clause in M. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

122 6. Knowledge Engineering (2) 3. Resolution Answer (continued): can never be derived (using resolution only). 6. Knowledge Engineering (2) 4. Sit-Calculus 6.4 Sit-Calculus Factorization instantly solves the problem, we can deduce both s(x) and s(y), and from there the empty clause. Theorem 6.24 (Resolution is refutation complete) Robinsons resolution calculus Robinson FOL L res is refutation complete: given an unsatisfiable set, the empty clause can be derived using resolution and factorization. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 4. Sit-Calculus Question: How do we axiomatize the Wumpus-world in FOL? function KB-AGENT( percept) returns an action static: KB, a knowledge base t, a counter, initially 0, indicating time TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t)) action ASK(KB,MAKE-ACTION-QUERY(t)) TELL(KB, MAKE-ACTION-SENTENCE(action, t)) t t +1 return action 6. Knowledge Engineering (2) 4. Sit-Calculus Idea: In order to describe actions or their effects consistently we consider the world as a sequence of situations (snapshots of the world). Therefore we have to extend each predicate by an additional argument. We use the function symbol result(action, situation) as the term for the situation which emerges when the action action is executed in the situation situation. Actions: Turn_right, Turn_le ft, Foreward, Shoot, Grab, Release, Climb. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

123 6. Knowledge Engineering (2) 4. Sit-Calculus Gold PIT PIT Gold PIT PIT PIT Gold PIT PIT PIT S 02 Turn (Right) Gold PIT PIT PIT S 3 Forward 6. Knowledge Engineering (2) 4. Sit-Calculus We also need a memory, a predicate At(person, location, situation) with person being either Wumpus or Agent and location being the actual position (stored as pair [i,j]). Important axioms are the so called successor-state axioms, they describe how actions effect situations. The most general form of these axioms is true afterwards an action made it true or it is already true and no action made it false S 1 PIT Forward S 0 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 4. Sit-Calculus Axioms about At(p, l, s): At(p,l,result(Forward,s)) ((l. = location_ahead(p,s) Wall(l)) At(p,l,s) Location_ahead(p,s). = Location_toward(l, Orient.(p, s)) Wall([x,y]) (x. = 0 x. = 5 y. = 0 y. = 5) 6. Knowledge Engineering (2) 4. Sit-Calculus Location_toward([x,y],0). = [x + 1,y] Location_toward([x,y],90). = [x,y + 1] Location_toward([x,y],180). = [x 1,y] Location_toward([x,y],270). = [x,y 1] Orient.(Agent,s 0 ). = 90 Orient.(p,result(a,s)). = d ((a. = turn_right d. = mod(orient.(p,s) 90,360)) (a. = turn_le ft d. = mod(orient.(p,s) + 90,360)) (Orient.(p,s). = d (a. = Turn_right a. = Turn_le ft)) mod(x,y) is the implemented modulo -function, assigning a distinct value between 0 and y to each variable x. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

124 6. Knowledge Engineering (2) 4. Sit-Calculus Axioms about percepts, useful new notions: Percept([Stench, b, g, u, c], s) Stench(s) Percept([a, Breeze, g, u, c], s) Breeze(s) Percept([a, b, Glitter, u, c], s) At_Gold(s) Percept([a, b, g, Bump, c], s) At_Wall(s) Percept([a, b, g, u, Scream], s) Wumpus_dead(s) At(Agent, l, s) Breeze(s) Breezy(l) At(Agent, l, s) Stench(s) Smelly(l) 6. Knowledge Engineering (2) 4. Sit-Calculus Ad jacent(l 1,l 2 ) d l 1. = Location_toward(l2,d) Smelly(l 1 ) l 2 At(Wumpus,l 2,s) (l 2. = l1 Ad jacent(l 1,l 2 )) Percept([none,none,g,u,c],s) At(Agent,x,s) Ad jacent(x,y) OK(y) ( At(Wumpus, x,t) Pit(x)) OK(y) At(Wumpus,l 1,s) Ad jacent(l 1,l 2 ) Smelly(l 2 ) At(Pit,l 1,s) Ad jacent(l 1,l 2 ) Breezy(l 2 ) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 4. Sit-Calculus Axioms to describe actions: Holding(Gold, result(grab, s)) At_Gold(s) Holding(Gold, s) Holding(Gold, result(release, s)) Holding(Gold, result(turn_right, s)) Holding(Gold, s) Holding(Gold, result(turn_le f t, s)) Holding(Gold, s) Holding(Gold, result(forward, s)) Holding(Gold, s) Holding(Gold, result(climb, s)) Holding(Gold, s) Each effect must be described carefully. 6. Knowledge Engineering (2) 4. Sit-Calculus Axioms describing preferences among actions: Great(a, s) Action(a, s) (Good(a,s) bgreat(b,s)) Action(a,s) (Medium(a,s) b(great(b,s) Good(b,s))) Action(a,s) (Risky(a,s) b(great(b,s) Good(b,s) Medium(a,s))) Action(a, s) At(Agent,[1, 1], s) Holding(Gold, s) Great(Climb, s) At_Gold(s) Holding(Gold, s) Great(Grab, s) At(Agent, l, s) Visited(Location_ahead(Agent, s)) OK(Location_ahead(Agent, s)) Good(Forward, s) Visited(l) s At(Agent, l, s) The goal is not only to find the gold but also to return safely. We need the additional axioms Holding(Gold, s) Go_back(s) etc. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

125 6. Knowledge Engineering (2) 5. Problems 6.5 Problems 6. Knowledge Engineering (2) 5. Problems Important problems of knowledge representation There are three very important representation-problems concerning the axiomatization of a changing world: Frame problem: most actions change only little we need many actions to describe invariant properties. It would be ideal to axiomatize only what does not change and to add a proposition like nothing else changes. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Knowledge Engineering (2) 5. Problems Qualification problem: it is necessary to enumerate all conditions under which an action is executed successfully. For example: x (Bird(x) Penguin(x) Dead(x) Ostrich(x) BrokenWing(x)... ) Flies(x) It would be ideal to only store birds normally fly. Ramification problem: How should we handle implicit consequences of actions? For example Grab(Gold): Gold can be contaminated. Then Grab(Gold) is not optimal. 6. Knowledge Engineering (2) 5. Problems Programming versus knowledge engineering programming knowledge engineering choose language choose logic write program define knowledge base write compiler implement calculus run program derive new facts Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

126 6. Knowledge Engineering (2) 5. Problems From Wittgenstein s Tractatus Logico-Philosophicus: 1 Die Welt ist alles was der Fall ist. 1.1 Die Welt ist die Gesamtheit der Tatsachen, nicht der Dinge. 2 Was der Fall ist, die Tatsache, ist das Bestehen von Sachverhalten. 3 Das logische Bild der Tatsachen ist der Gedanke. 4 Der Satz ist eine Wahrheitsfunktion der Elementarsätze. 5 Die allgemeine Form der Wahrheitsfunktion ist: [p,ξ,n(ξ)]. 6 Wovon man nicht sprechen kann, darüber muß man schweigen. 7. Planning Chapter 7. Planning Planning 7.1 Planning vs. Problem-Solving 7.2 STRIPS 7.3 Partial-Order Planning 7.4 Conditional Planning 7.5 Extensions Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 1. Planning vs. Problem-Solving 7.1 Planning vs. Problem-Solving 7. Planning 1. Planning vs. Problem-Solving Motivation: problem-solving agent: The effects of a static sequence of actions are determined. knowledge-based agent: Actions can be chosen. We try to merge both into a planning agent. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

127 7. Planning 1. Planning vs. Problem-Solving 7. Planning 1. Planning vs. Problem-Solving function SIMPLE-PLANNING-AGENT( percept) returns an action static: KB, a knowledge base (includes action descriptions) p, a plan, initially NoPlan t, a counter, initially 0, indicating time local variables: G, a goal current, a current state description TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t)) current STATE-DESCRIPTION(KB, t) if p = NoPlan then G ASK(KB,MAKE-GOAL-QUERY(t)) p IDEAL-PLANNER(current, G, KB) if p = NoPlan or p is empty then action NoOp else action FIRST( p) p REST( p) TELL(KB,MAKE-ACTION-SENTENCE(action, t)) t t +1 return action Example 7.1 (Running Example) We want to drink freshly made banana shake and drill some holes into a wall at home. Thus an agent needs to solve the following problem: 1 Get a quart of milk, 2 a bunch of bananas, and 3 a variable-speed cordless drill. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 1. Planning vs. Problem-Solving Question: How does a problem-solving agent handle this? Start Go To Pet Store Go To School Go To Supermarket Go To Sleep Read A Book Sit in Chair Talk to Parrot Buy a Dog Go To Class Buy Tuna Fish Buy Arugula Buy Milk Sit Some More... Finish 7. Planning 1. Planning vs. Problem-Solving Planning in the situation calculus: Initial state: At(Home,S 0 ), Have(Milk,S 0 ), Have(Bananas,S 0 ), Have(Drill,S 0 ). Goal: s (At(Home, s) Have(Milk, s) Have(Bananas, s) Have(Drill, s)). Axioms: e.g. buy milk : a, s Have(Milk, result(a, s)) (a = Buy(Milk) At(Supermarket, s)) (Have(Milk, s) a Drop(Milk)) Etc. Etc Read A Book Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

128 7. Planning 1. Planning vs. Problem-Solving We need also a term Result (l,s): the situation that would emerge when the sequence of actions l is executed in s. s Result ([],s) := s a, p,s Result ([a p],s) := Result (p,result(a,s)) 7. Planning 1. Planning vs. Problem-Solving The task is now to find a p with At(Home,Result (p,s 0 )) Have(Milk,Result (p,s 0 )) Have(Bananas,Result (p,s 0 )) Have(Drill,Result (p,s 0 )). Idea: This is the solution of our problem! Now we start our theorem prover and get the answer, don t we? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 1. Planning vs. Problem-Solving Problems: T φ is only semi-decidable. If p is a plan then also [Empty_Action p] and [A,A 1 p]. Result: We should not resort to a general prover, but to one which is specially designed for our special case. We also should restrict the language. 7. Planning 1. Planning vs. Problem-Solving We should make clear the difference between shipping a goal to a planner, asking a query to a theorem prover. In the first case we look for a plan so that after the execution of the plan the goal holds. In the second case we ask if the query can be made true wrt the KB: KB = xφ(x). The dynamics is in the terms. The logic itself is static. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

129 L 7. Planning 2. STRIPS 7.2 STRIPS 7. Planning 2. STRIPS STRIPS stands for STanford Research Institute Problem Solver. states: conjunctions of function-free groundatoms (positive literals). goal: conjunctions of function-free literals actions: STRIPS-operations consist of three components 1 description, name of the action 2 precondition, conjunction of atoms 3 postcondition, conjunction of literals Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 2. STRIPS At(here), Path(here, there) Go(there) 7. Planning 2. STRIPS Example 7.2 (Air Cargo Transportation) Three actions: Load, Unload, Fly. Two predicates In(c, p): cargo c is inside plane p, At(x,a): object x is at airport a. At(there), At(here) Is cargo c at airport a when it is loaded in a plane p? Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

130 ) ) ) Ë ) ) ) ) ) Ë ) X Å. X ) ) : X Å X X * *. & [ X X Æ [ [ [ Æ ) ) & & & X X & Å Å X X Å & X X & X X & X Å X ) ) ) ) : [ & X [ Å X Œ : Œ [ [ [ & X & X Å X & X X Å [ ) ) ) ) [ Ž Ž [ & X 7. Planning 2. STRIPS 7. Planning 2. STRIPS Ä X Œ & Å Æ Ê + X Œ ) X X È Å È : Œ = X X ) PRECOND: ) EFFECT: : È ) Ž Ê + X Ž ) ) Å Ì : [ : : [ : [ : Å X & Å Æ [ [ [ [ ) X X Ž & Å Æ ) X Ì : [ Ê + X X : Ì [ [ Ä X Í = X : Ì : [ : X : Ì [ ) X Ì : [ Ê + X PRECOND: Ä : [ Ä X : Ì [ [ EFFECT: ) Å Æ ) & X X Ì [ X Ì [ X È Å : ) X [ ) X [ Example 7.3 (Spare Tire) Four actions: Remove(spare,trunk), Remove( f lat, axle), PutOn, LeaveOvernight. One predicate At(x,a) meaning object x is at location a. Is the following a STRIPS description? X ) PRECOND: ) EFFECT: X Ì : / 6 [ X Ì [ ) X / 6 [ ) X [ % X Ì : / 6 : [ : ) X Ì : / 6 [ X Ì : [ [ Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 2. STRIPS 7. Planning 2. STRIPS X Ä X X 6 d X ) PRECOND: ) EFFECT: X 6 d X ) PRECOND: ) EFFECT: : k [ ) : k [ [ : l! ' [, : l! ' [ : l! ' [ : : : : [, k [ k [ k : l! ' [ [ ) [, k X! X ) : Ë! = [ PRECOND: ) : Ë! = [ EFFECT: X d d +,, PRECOND: ) : Ë! = [ EFFECT: ) ) ) : Ë! = [ [ : Ë! = [ [ : : : [ k [ [ k k [ : l! ' [ Example 7.4 (Blocks World) One action: Move. Move(b, x, y) moves the block b from x to y if both b and y are clear. One predicate On(b,x) meaning block b is on x (x can be another block or the table). How to formulate that a block is clear? b is clear : x On(x, b). Not allowed in STRIPS. ) ) ) : Ë! = [ k [ [ ) : Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

131 ) ) Ë Æ Æ µ MK [ ¼ ¼ : [ ¼ [ [ ¼ [ µ µ [ µ µ : ¼ K [ [ [ ¼ ¼ X ¼ [ K [ [ ¼ [ [ ¼ ¼ [ 7. Planning 2. STRIPS 7. Planning 2. STRIPS Therefore we introduce another predicate Clear(y). Ä X X Î : l [ ' X Î [ - Æ ' X X : l [ - Æ ' X X : l [ What about Move(b,x,y) defined by Precondition: On(b, x) Clear(b) Clear(y) and Effect: On(b, y) Clear(x) On(b, x) Clear(y)? - Ê X Î [ X X Î : X Ï d X Ñ : Æ PRECOND: Ñ X EFFECT: Æ X Ê Æ X X Ñ : [ : Ê Ñ X M Ê [ [ X Ñ [ X Ê M [ [ X, X Ñ : - Ê ' X Ñ [ X [ [ X Ñ : Ê X Æ Æ PRECOND: Æ EFFECT: X Ñ : Ê X Ñ [ - ' X Ñ [ X Ñ MK [ : X Ï d l l X Ñ :, Æ X Ñ : [ [ X Ñ : l [ Ê X Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 2. STRIPS ADL (Action Description Language) and its many derivatives are extensions of STRIPS: States: both positive and negative literals in states. OWA: Open World assumption (not CWA) Effects: Add all positive and negative literals, and delete their negations. Quantification: Quantified variables in goals are allowed. Goals: disjunction and negation also allowed. when P: Conditional effects allowed. 7. Planning 2. STRIPS (a) Progression (forward) and (b) Regression (backward) state-space search (a) (b) At(P 1, A) At(P 2, A) At(P 1, A) At(P 2, B) At(P 1, B) At(P 2, A) Fly(P 1,A,B) Fly(P 2,A,B) Fly(P 1,A,B) Fly(P 2,A,B) At(P 1, B) At(P 2, A) At(P 1, A) At(P 2, B) At(P 1, B) At(P 2, B) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

132 L 7. Planning 2. STRIPS At(here), Path(here, there) Go(there) At(there), At(here) 7. Planning 2. STRIPS Definition 7.5 (Applicable Operator) An operator Op is applicable in a state s if there is some way to instantiate the variables in Op so that every one of the preconditions of Op is true in s: Precond(Op) s. In the resulting state, all the positive literals in Effect(Op) hold, as do all the literals that held in s, except for those that are negative literals in Effect(Op). Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 2. STRIPS Frame problem is handled implicitly: literals not mentioned in effects remain unchanged (persistence). E f f ect is sometimes split into add and delete lists. Up to now we can consider this as being problem-solving. We use STRIPS as an representation-formalism and search a solution-path: nodes in the search-tree situations solution paths plans. 7. Planning 2. STRIPS Idea: Perform a search in the space of all plans! Begin with a partial plan and extend successively. Therefore we need operators which operate on plans. We distinguish two types: refinement-op: constraints are attached to a plan. Then a plan represents the set of all complete plans (analogously Cn(T ) for MOD(T )). modification-op: all others. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

133 7. Planning 2. STRIPS Question: How do we represent plans? Answer: We have to consider two things: instantiation of variables: instantiate only if necessary, i.e. always choose the mgu partial order: refrain from the exact ordering (reduces the search-space) 7. Planning 2. STRIPS Definition 7.6 (Plan) A plan is formally defined as a data structure consisting of the following four components: A set of plan steps. Each step is one of the operators for the problem. A set of step ordering constraints. Each one of the form S i S j : S i has to be executed before S j. A set of variable binding constants. Each one of the form v = x: v is a variable in some step, and x is either a constant or another variable. c A set of causal links. Each one of the form S i S j : S i achieves c for S j. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 2. STRIPS 7. Planning 2. STRIPS The initial plan consists of two steps, called START and FINISH. Start Initial State Goal Finish (a) State LeftShoeOn, Start Finish (b) RightShoeOn The initial plan of a problem is: Plan( Steps : {S 1 : START,S 2 : FINISH} Orderings : {S 1 S 2 } Bindings : /0 Links : /0 ) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

134 7. Planning 2. STRIPS 7. Planning 2. STRIPS Partial Order Plan: Start Total Order Plans: Start Start Start Start Start Start Question: What is a solution? Left Sock Right Sock Right Sock Left Sock Right Sock Left Sock Left Sock Right Sock Left Sock Right Sock Right Sock Right Shoe Left Sock Left Shoe Answer: Considering only fully instantiated, linearly ordered plans: checking is easy. LeftSockOn Left Shoe RightSockOn Right Shoe Right Shoe Left Shoe Right Shoe Left Shoe Left Sock Right Sock But our case is far more complicated: Definition 7.7 (Solution of a Plan) LeftShoeOn, RightShoeOn Finish Left Shoe Finish Right Shoe Finish Left Shoe Finish Right Shoe Finish Left Shoe Finish Right Shoe Finish A solution is a complete and consistent plan. complete: each precondition of each step is achieved by some other step, consistent: there are no contradictions in the ordering or binding constraints. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 2. STRIPS More precisely: S i achieves c for S j means c Precondition(S j ), c E f f ect(s i ), S i S j, S k : c Effect(S k ) with S i S k S j in any linearization of the plan no contradictions means neither (S i S j and S j S i ) nor (v = A and v = B for different constants A,B). Note: these propositions may be derivable, because and = are transitive. 7. Planning 3. Partial-Order Planning 7.3 Partial-Order Planning Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

135 7. Planning 3. Partial-Order Planning 7. Planning 3. Partial-Order Planning We consider the banana-milk example. The operators are Buy and Go. At(x) Go(HWS) Start At(x) Go(SM) At(Home) Sells(SM,Banana) Start Sells(SM,Milk) Sells(HWS,Drill) At(HWS), Sells(HWS,Drill) At(SM), Sells(SM,Milk) At(SM), Sells(SM,Bananas) Buy(Drill) Buy(Milk) Buy(Bananas) Have(Drill), Have(Milk), Have(Bananas), At(Home) Have(Drill) Have(Milk) Have(Banana) At(Home) Finish Finish Start At(Home) At(Home) Causal links are protected! Go(HWS) Go(SM) At(HWS), Sells(HWS,Drill) At(SM), Sells(SM,Milk) At(SM), Sells(SM,Bananas) Buy(Drill) Buy(Milk) Buy(Bananas) Have(Drill), Have(Milk), Have(Bananas), At(Home) Finish Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 3. Partial-Order Planning The last partial plan cannot be extended. How do we determine this? By determing that a causal link is threatened and the threat cannot be resolved. S 1 c S 3 Lc S 1 c S 3 Lc S 1 c 7. Planning 3. Partial-Order Planning Question: We have to introduce a Go-step in order to ensure the last precondition. But how can we ensure the precondition of the Go-step? Now there are a lot of threats and many of them are unresolvable. This leads to Start S 2 S 2 S 2 At(Home) At(HWS) (a) (b) (c) The way to resolve the threat is to add ordering constraints (this will not always work!). S 3 Lc Go(HWS) Go(SM) At(HWS), Sells(HWS,Drill) At(SM), Sells(SM,Milk) At(SM), Sells(SM,Bananas) Buy(Drill) Buy(Milk) Buy(Bananas) Have(Drill), Have(Milk), Have(Bananas), At(Home) Finish At(SM) Go(Home) Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

136 7. Planning 3. Partial-Order Planning Start At(Home) Go(HWS) At(HWS) Sells(HWS,Drill) Buy(Drill) At(HWS) Go(SM) At(SM) Sells(SM,Milk) At(SM) Sells(SM,Ban.) Buy(Milk) Buy(Ban.) At(SM) Go(Home) 7. Planning 3. Partial-Order Planning This leads to the following algorithm In each round the plan is extended in order to ensure the precondition of a step. This is done by choosing an appropriate operator. The respective causal link is introduced. Threats are resolved through ordering constraints (two cases: the new step threatens existing ones or the existing ones threaten the new one). If there is no operator or the threat cannot be resolved then perform backtracking. Theorem 7.8 (POP) POP is complete and correct. Have(Milk) At(Home) Have(Ban.) Have(Drill) Finish Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 3. Partial-Order Planning 7. Planning 3. Partial-Order Planning function POP(initial, goal, operators) returns plan plan MAKE-MINIMAL-PLAN(initial, goal) loop do if SOLUTION?( plan) then return plan Sneed, c SELECT-SUBGOAL( plan) CHOOSE-OPERATOR( plan, operators, Sneed, c) RESOLVE-THREATS( plan) end function SELECT-SUBGOAL( plan) returns Sneed, c pick a plan step Sneed from STEPS( plan) with a precondition c that has not been achieved return Sneed, c procedure CHOOSE-OPERATOR(plan, operators, S need, c) choose a step Sadd from operators or STEPS( plan) that has c as an effect if there is no such step then fail c add the causal link Sadd ;! Sneed to LINKS( plan) add the ordering constraint Sadd Sneed to ORDERINGS( plan) if Sadd is a newly added step from operators then Sadd to STEPS( plan) add Start Sadd Finish to ORDERINGS( plan) So far we did not consider variable-substitutions. Question: Suppose S 1 ensures the At(home) precondition of a step S 2 and there is a concurrent step S 3 with the postcondition At(x). Is this a threat? procedure RESOLVE-THREATS(plan) c for each Sthreat that threatens a link Si ;! Sj in LINKS( plan) do choose either Promotion: Add Sthreat Si to ORDERINGS( plan) Demotion: Add Sj Sthreat to ORDERINGS( plan) if not CONSISTENT( plan) then fail end Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

137 7. Planning 3. Partial-Order Planning We call such a threat possible and ignore it for the time being, but keep it in mind. But if x is instantiated with home then a real threat emerges which has to be resolved. S 1 7. Planning 3. Partial-Order Planning procedure CHOOSE-OPERATOR(plan, operators, S need,c) choose a step S add from operators or STEPS( plan) that has c add as an effect such that u =UNIFY(c, c add,bindings( plan)) if there is no such step then fail add u to BINDINGS( plan) c add S add ;! S need to LINKS( plan) add S add S need to ORDERINGS( plan) if S add is a newly added step from operators then add S add to STEPS( plan) add Start S add Finish to ORDERINGS( plan) At(home) S 2 S 3 At(x) procedure RESOLVE-THREATS(plan) c for each S i ;! S j in LINKS( plan) do for each S threat in STEPS( plan) do for each c 0 in EFFECT(S threat) do if SUBST(BINDINGS( plan), c) = SUBST(BINDINGS( plan), : c 0 ) then choose either Promotion: Add S threat S i to ORDERINGS( plan) Demotion: Add S j S threat to ORDERINGS( plan) if not CONSISTENT( plan) then fail end end end Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS / Planning 3. Partial-Order Planning 7. Planning 3. Partial-Order Planning Ls4 STRIPS was originally designed for SHAKEY, a small and mobile robot. SHAKEY is described through 6 Operators: Go(x), Push(b, x, y), Climb(b), Down(b), Turn_On(ls), Turn_O f f (ls). Shakey Room 4 Room 3 Room 2 Door 4 Ls3 Door 3 Ls2 Door 2 Corridor To turn the lights on or off SHAKEY has to stand on a box. On(Shakey, f loor) is a precondition of the Go-action so that SHAKEY does not fall off a box. Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718 Box3 Ls1 Box2 Room 1 Door 1 Box4 Box1 Prof. Dr. Jürgen Dix Department of Informatics, TU Clausthal Artificial Intelligence, SS /718

Artificial Intelligence

Artificial Intelligence Time and place: Tuesday and Wednesday 10 12 in Multimediahörsaal (Tannenhöhe) Exercises: Approximately every other Tuesday. Prof. Dr. Jürgen Dix Department of Informatics TU Clausthal