Introduction to Artificial Intelligence. Learning from Oberservations

Similar documents
Learning from Observations. Chapter 18, Sections 1 3 1

CS 380: ARTIFICIAL INTELLIGENCE

Statistical Learning. Philipp Koehn. 10 November 2015

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

From inductive inference to machine learning

Learning Decision Trees

Learning and Neural Networks

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )

Bayesian learning Probably Approximately Correct Learning

Decision Trees. None Some Full > No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. Patrons? WaitEstimate? Hungry? Alternate?

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Decision Trees. Ruy Luiz Milidiú

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

16.4 Multiattribute Utility Functions

Machine Learning (CS 419/519): M. Allen, 14 Sept. 18 made, in hopes that it will allow us to predict future decisions

CSC 411 Lecture 3: Decision Trees

EECS 349:Machine Learning Bryan Pardo

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Introduction to Machine Learning

Assignment 1: Probabilistic Reasoning, Maximum Likelihood, Classification

Decision Trees. Tirgul 5

CMPT 310 Artificial Intelligence Survey. Simon Fraser University Summer Instructor: Oliver Schulte

Introduction to Statistical Learning Theory. Material para Máster en Matemáticas y Computación

CSC242: Intro to AI. Lecture 23

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

Artificial Intelligence Roman Barták

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Lecture 3: Decision Trees

Classification: Decision Trees

Lecture 3: Decision Trees

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Learning from Examples

CS 6375 Machine Learning

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

the tree till a class assignment is reached

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Decision Tree Learning

Decision Trees (Cont.)

Machine learning. This can raise two sorts of doubts.

Classification Algorithms

Introduction To Artificial Neural Networks

Decision Tree Learning

Learning Decision Trees

Learning Decision Trees

Classification Algorithms

Machine learning. The general model. Routine learning. General learning schemes

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Decision Tree Learning. Dr. Xiaowei Huang

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

COMPUTATIONAL LEARNING THEORY

Decision Tree Learning

Notes on Machine Learning for and

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Statistics and learning: Big Data

Decision Tree Learning

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Decision Trees.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Decision Trees / NLP Introduction

Decision Trees.

Introduction to Artificial Intelligence. Logical Agents

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Incremental Stochastic Gradient Descent

Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Decision Tree Learning and Inductive Inference

Classification and Prediction

Inductive Learning. Inductive hypothesis h Hypothesis space H size H. Example set X. h: hypothesis that. size m Training set D

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Induction on Failing Derivations

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

Concept Learning. Space of Versions of Concepts Learned

Introduction to Machine Learning CMU-10701

Machine Learning 2nd Edi7on

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

A Course in Machine Learning

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Tree Learning - ID3

Symbolic methods in TC: Decision Trees

Decision Tree Learning

Decision Trees. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Learning Classification Trees. Sargur Srihari

Decision Trees Entropy, Information Gain, Gain Ratio

CC283 Intelligent Problem Solving 28/10/2013

Lecture Notes in Machine Learning Chapter 4: Version space learning

Clause/Term Resolution and Learning in the Evaluation of Quantified Boolean Formulas

EEE 241: Linear Systems

Binary Decision Diagrams. Graphs. Boolean Functions

Classification and Regression Trees

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Web-Mining Agents Computational Learning Theory

Transcription:

Introduction to Artificial Intelligence Learning from Oberservations Bernhard Beckert UNIVERSITÄT KOBLENZ-LANDAU Winter Term 2004/2005 B. Beckert: KI für IM p.1

Outline Learning agents Inductive learning Decision tree learning B. Beckert: KI für IM p.2

Learning Reasons for learning Learning is essential for unknown environments, when designer lacks omniscience B. Beckert: KI für IM p.3

Learning Reasons for learning Learning is essential for unknown environments, when designer lacks omniscience Learning is useful as a system construction method, expose the agent to reality rather than trying to write it down B. Beckert: KI für IM p.3

Learning Reasons for learning Learning is essential for unknown environments, when designer lacks omniscience Learning is useful as a system construction method, expose the agent to reality rather than trying to write it down Learning modifies the agent s decision mechanisms to improve performance B. Beckert: KI für IM p.3

Learning Agents B. Beckert: KI für IM p.4

Learning Element Design of learning element is dictated by what type of performance element is used which functional component is to be learned how that functional component is represented what kind of feedback is available B. Beckert: KI für IM p.5

Types of Learning Supervised learning Correct answers for each example instance known Requires teacher B. Beckert: KI für IM p.6

Types of Learning Supervised learning Correct answers for each example instance known Requires teacher Reinforcement learning Occasional rewards Learning is harder Requires no teacher B. Beckert: KI für IM p.6

Inductive Learning (a.k.a. Science) Simplest form Learn a function f from examples (tabula rasa), i.e., find an hypothesis h such that h f given a training set of examples f is the target function An example is a pair x, f (x) B. Beckert: KI für IM p.7

Inductive Learning (a.k.a. Science) Simplest form Learn a function f from examples (tabula rasa), i.e., find an hypothesis h such that h f given a training set of examples f is the target function An example is a pair x, f (x) Example (for an example) O O X X, +1 X B. Beckert: KI für IM p.7

Inductive Learning Method This is a highly simplified model of real learning Ignores prior knowledge Assumes a deterministic, observable environment Assumes examples are given Assumes that the agent wants to learn f (why?) B. Beckert: KI für IM p.8

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting Ockham s razor Maximize a combination of consistency and simplicity B. Beckert: KI für IM p.9

Attribute-based Representations Example description consists of Attribute values (boolean, discrete, continuous, etc.) Target value B. Beckert: KI für IM p.10

Attribute-based Representations Example Situations where I will/won t wait for a table in a restaurant Exmpl. Attributes Target Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 T F F T Some $$$ F T French 0 10 T X 2 T F F T Full $ F F Thai 30 60 F X 3 F T F F Some $ F F Burger 0 10 T X 4 T F T T Full $ F F Thai 10 30 T X 5 T F T F Full $$$ F T French >60 F X 6 F T F T Some $$ T T Italian 0 10 T X 7 F T F F None $ T F Burger 0 10 F X 8 F F F T Some $$ T T Thai 0 10 T X 9 F T T F Full $ T F Burger >60 F X 10 T T T T Full $$$ F T Italian 10 30 F X 11 F F F F None $ F F Thai 0 10 F X 12 T T T T Full $ F F Burger 30 60 T B. Beckert: KI für IM p.11

Decision Trees A possible representation for hypotheses Example The correct tree for deciding whether to wait B. Beckert: KI für IM p.12

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) B. Beckert: KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) B. Beckert: KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) Decision tree for training examples probably won t generalize to new examples B. Beckert: KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) Decision tree for training examples probably won t generalize to new examples Compact decision trees are preferable B. Beckert: KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) Decision tree for training examples probably won t generalize to new examples Compact decision trees are preferable More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent with training set may get worse predictions B. Beckert: KI für IM p.13

Decision Trees Example For Boolean functions: truth-table row = path to leaf in decision tree B. Beckert: KI für IM p.14

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? B. Beckert: KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions B. Beckert: KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows B. Beckert: KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n B. Beckert: KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n Example With 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees B. Beckert: KI für IM p.15

Decision Tree Learning Aim Find a small tree consistent with the training examples Idea (Recursively) choose most significant attribute as root of (sub)tree B. Beckert: KI für IM p.16

Choosing an Attribute Idea A good attribute splits the examples into subsets that are (ideally) all positive or all negative, i.e., gives much information about the classification B. Beckert: KI für IM p.17

Choosing an Attribute Idea A good attribute splits the examples into subsets that are (ideally) all positive or all negative, i.e., gives much information about the classification Example Patrons is a better choice B. Beckert: KI für IM p.17

Decision Tree Learning: Algorithm function DTL(examples,attributes,default) returns a decision tree if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return MAJORITY-VALUE(examples) else best CHOOSE-ATTRIBUTE(attributes,examples) tree a new decision tree with root test best m MAJORITY-VALUE(examples) for each value v i of best do examples i {elements of examples with best = v i } subtree,dtl(examples i,attributes best,m) add a branch to tree with label v i and subtree subtree return tree B. Beckert: KI für IM p.18

Example Decision tree learned from the 12 examples Substantially simpler than true tree A more complex hypothesis isn t justified by small amount of data B. Beckert: KI für IM p.19

Performance Measurement Hume s Problem of Induction How do we know that h f? Use theorems of computational/statistical learning theory Try h on a new test set of examples (use same distribution over example space as training set) B. Beckert: KI für IM p.20

Performance Measurement Learning curve % correct on test set as a function of training set size 1 0.9 % correct on test set 0.8 0.7 0.6 0.5 0.4 0 20 40 60 80 100 Training set size B. Beckert: KI für IM p.21

Performance Measurement (cont.) Learning curve depends on realizable (can express target function) vs. non-realizable Non-realizability can be due to missing attributes, or restricted hypothesis class (e.g., thresholded linear function) redundant expressiveness (e.g., loads of irrelevant attributes) B. Beckert: KI für IM p.22

Summary Learning needed for unknown environments, lazy designers Learning agent = performance element + learning element Learning method depends on type of performance element, available feedback, type of component to be improved For supervised learning, the aim is to find a simple hypothesis approximately consistent with training examples Decision tree learning using information gain Learning performance = prediction accuracy measured on test set B. Beckert: KI für IM p.23