CS 687 Jana Kosecka. Uncertainty, Bayesian Networks Chapter 13, Russell and Norvig Chapter 14,

Similar documents
Handling Uncertainty

BAYES CLASSIFIER. Ivan Michael Siregar APLYSIT IT SOLUTION CENTER. Jl. Ir. H. Djuanda 109 Bandung

Uncertainty. Chapter 13

Artificial Intelligence Uncertainty

Artificial Intelligence

Pengju XJTU 2016

Probability theory: elements

Uncertainty. Outline

Uncertainty. Chapter 13, Sections 1 6

Uncertainty. Chapter 13. Chapter 13 1

Probabilistic Reasoning

Uncertainty. Chapter 13

Pengju

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Probabilistic Robotics. Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics (S. Thurn et al.

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Web-Mining Agents Data Mining

CS 561: Artificial Intelligence

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Fusion in simple models

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

7 Classification: Naïve Bayes Classifier

Artificial Intelligence CS 6364

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Uncertain Knowledge and Reasoning

10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Complexity of Regularization RBF Networks

Bayesian Learning Features of Bayesian learning methods:

COMP 328: Machine Learning

Danielle Maddix AA238 Final Project December 9, 2016

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

Naïve Bayes for Text Classification

CS 188: Artificial Intelligence. Our Status in CS188

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Probabilistic Robotics

Chapter 13 Quantifying Uncertainty

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Chapter 8 Hypothesis Testing

CS 188: Artificial Intelligence Fall 2009

Probabilistic representation and reasoning

Quantifying uncertainty & Bayesian networks

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

The Bayesian Learning

Naïve Bayes Classifiers

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

CSE 473: Artificial Intelligence

Probability Based Learning

Where are we in CS 440?

Algorithms for Classification: The Basic Methods

CS 188: Artificial Intelligence Fall 2011

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Bayesian Classification. Bayesian Classification: Why?

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Basic Probability and Decisions

Probabilistic Graphical Models

Reinforcement Learning Wrap-up

Our Status. We re done with Part I Search and Planning!

CSCE 478/878 Lecture 6: Bayesian Learning

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

UVA CS / Introduc8on to Machine Learning and Data Mining

Uncertainty and Bayesian Networks

The Naïve Bayes Classifier. Machine Learning Fall 2017

Artificial Intelligence Programming Probability

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton

Where are we in CS 440?

Methods of evaluating tests

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

CS 5100: Founda.ons of Ar.ficial Intelligence

Naive Bayes Classifier. Danushka Bollegala

Mining Classification Knowledge

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

NPTEL STRUCTURAL RELIABILITY

Advanced Computational Fluid Dynamics AA215A Lecture 4

Mining Classification Knowledge

Probabilistic representation and reasoning

Our Status in CSE 5522

10.5 Unsupervised Bayesian Learning

Stephen Scott.

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Canimals. borrowed, with thanks, from Malaspina University College/Kwantlen University College

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Maximum Entropy and Exponential Families

Differential Equations 8/24/2010

Uncertainty. CmpE 540 Principles of Artificial Intelligence Pınar Yolum Uncertainty. Sources of Uncertainty

Probability and Decision Theory

Transcription:

CS 687 Jana Koseka Unertainty Bayesian Networks Chapter 13 Russell and Norvig Chapter 14 14.1-14.3

Outline Unertainty robability Syntax and Semantis Inferene Independene and Bayes' Rule

Syntax Basi element: random variable Similar to propositional logi: possible worlds defined by assignment of values to random variables. Boolean random variables e.g. Cavity do I have a avity? <true false> Disrete random variables e.g. Weather is one of <sunnyrainyloudysnow> Domain values must be exhaustive and mutually exlusive Elementary proposition onstruted by assignment of a value to a random variable: e.g. Weather sunny Cavity false abbreviated as avity

Syntax Atomi event: A omplete speifiation of the state of the world about whih the agent is unertain E.g. if the world onsists of only two Boolean variables Cavity and Toothahe then there are 4 distint atomi events: Cavity false Toothahe false Cavity false Toothahe true Cavity true Toothahe false Cavity true Toothahe true Atomi events are mutually exlusive and exhaustive

Axioms of probability For any propositions A B 0 A 1 true 1 and false 0 A B A + B - A B

rior probability rior or unonditional probabilities of propositions e.g. Cavity true 0.1 and Weather sunny 0.72 orrespond to belief prior to arrival of any new evidene robability distribution gives values for all possible assignments: Weather <0.720.10.080.1> normalized i.e. sums to 1 Joint probability distribution for a set of random variables gives the probability of every atomi event on those random variables WeatherCavity a 4 2 matrix of values: Weather sunny rainy loudy snow Cavity true 0.144 0.02 0.016 0.02 Cavity false 0.576 0.08 0.064 0.08

Joint Distribution Weather sunny rainy loudy snow Cavity true 0.144 0.02 0.016 0.02 Cavity false 0.576 0.08 0.064 0.08 Every question about the domain an be answered from joint probability distribution

Conditional probability Conditional or posterior probabilities e.g. avity toothahe 0.8 i.e. given that toothahe is all I know Notation for onditional distributions: Cavity Toothahe 2-element vetor of 2-element vetors If we know more e.g. avity is also given then we have avity toothaheavity 1 New evidene may be irrelevant allowing simplifiation e.g. avity toothahe sunny avity toothahe 0.8

Conditional probability Definition of onditional probability: a b a b b rodut rule gives an alternative formulation: a b a bb b aa A general version holds for whole distributions e.g. WeatherCavity Weather Cavity Cavity View as a set of 4 2 equations not matrix multipliation This is analogous to logial reasoning where logial agent annot simultaneously believe A B and ~A and B Where do probabilities ome from? frequentist objetivist subjetivist Bayesian

Inferene by enumeration Start with the joint probability distribution: For any proposition φ sum the atomi events where it is true: φ Σ ω:ω φ ω

Inferene by enumeration Start with the joint probability distribution: For any proposition φ sum the atomi events where it is true: φ Σ ω:ω φ ω toothahe 0.108 + 0.012 + 0.016 + 0.064 0.2

Inferene by enumeration Start with the joint probability distribution: For any proposition φ sum the atomi events where it is true: φ Σ ω:ω φ ω toothahe 0.108 + 0.012 + 0.016 + 0.064 0.2 toothahe V avity 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 0.28 roess of summing out marginalization sum out all possible values of the other variables Y Y z Cavity Cavity z z Z z {CathTootahe}

Inferene by enumeration Start with the joint probability distribution: Can also ompute onditional probabilities: avity toothahe avity toothahe toothahe 0.016 + 0.064 0.108 + 0.012 + 0.016 + 0.064 0.4

Normalization Denominator an be viewed as a normalization onstant α Cavity toothahe α Cavitytoothahe α [Cavitytoothaheath + Cavitytoothahe ath] α [<0.1080.016> + <0.0120.064>] α <0.120.08> <0.60.4> General idea: ompute distribution on query variable by fixing evidene variables and summing over hidden variables

Inferene by enumeration ontd. Typially we are interested in the posterior joint distribution of the query variables given speifi values e for the evidene variables E Let the hidden variables be Y E Then the required summation of joint entries is done by summing out the hidden variables: E e αe e ασ h E e Y y The terms in the summation are joint entries beause E and Y together exhaust the set of random variables Obvious problems: 1. Worst-ase time omplexity Od n where d is the largest arity 2. Spae omplexity Od n to store the joint distribution 3. How to find the numbers for Od n entries?

Independene A and B are independent iff A B A or B A B or A B A B Toothahe Cath Cavity Weather Toothahe Cath Cavity Weather 32 entries redued to 12; for n independent biased oins O2 n On Absolute independene powerful but rare Dentistry is a large field with hundreds of variables none of whih are independent.

rodut rule Bayes' rule: or in distribution form Bayes' Rule a b a bb b aa Y a b YY b aa b Useful for assessing diagnosti probability from ausal probability: Cause Effet Effet Cause Cause / Effet Note: posterior probability of meningitis still very small!

Bayes' Rule Baye s rule Y YY More general version onditionalized on some evidene Y ey e Y e e E.g. let M be meningitis S be stiff nek: m s s mm s Normalization same for m and ~m Y α Y Y 0.8 0.001 0.1 0.008

Bayes' Rule and ombining evidene Cavity toothahe ath αtoothahe ath Cavity Cavity We an assume independene in the presene of Cavity αtoothahe Cavity ath Cavity Cavity Given Cavity toothahe and ath are independent In general Conditional Independene Y Z ZY Z

Conditional independene Toothahe Cavity Cath has 2 3 1 7 independent entries If I have a avity the probability that the probe athes in it doesn't depend on whether I have a toothahe: 1 ath toothahe avity ath avity The same independene holds if I haven't got a avity: 2 ath toothahe avity ath avity Cath is onditionally independent of Toothahe given Cavity: Cath ToothaheCavity Cath Cavity

Conditional independene ontd. Write out full joint distribution given the onditional independene assumption Toothahe Cath Cavity Toothahe Cath Cavity Cavity Toothahe CavityCath Cavity Cavity Toothahe Cavity Cath Cavity Cavity I.e. 2 + 2 + 1 5 independent numbers In most ases the use of onditional independene redues the size of the representation of the joint distribution from exponential in n to linear in n.

Naïve Bayes Naïve Bayes model the effets are independent given the ause CauseEffet 1 Effet n Cause π i Effet i Cause This simplifying assumption often works well Example SAM lassifiation

Slide from Dan Klein

Slide from Dan Klein

Slide from Dan Klein

Slide from Dan Klein

32 robabilisti Classifiation MA lassifiation rule MA: Maximum A osterior Assign x to * if Generative lassifiation with the MA rule Apply Bayesian rule to onvert them into posterior probabilities Then apply the MA rule L C C > 1 * * x x L i C C C C C i i i i i 12 for x x x x

33 Naïve Bayes Bayes lassifiation Diffiulty: learning the joint probability Naïve Bayes lassifiation Assumption that all input attributes are onditionally independent! MA lassifiation rule: for 1 C C C C C n 1 C n 2 1 2 1 2 2 1 2 1 C C C C C C C C n n n n n L n n x x x x ] [ ] [ 1 * 1 * * * 1 > 2 1 n x x x x

Naïve Bayes Naïve Bayes Algorithm for disrete input attributes Learning hase: Given a training set S For eah target value of i i 1 L ˆ C estimate C with examples in S; For every attribute value x ˆ j i x jk C i estimate of eah attribute C Look up tables to assign the label * to if ˆ ʹ ˆ ʹ jk i ˆ ʹ j x jk j j i 1 n; k 1 N with examples in S; Output: onditional probability tables; for elements j N j L Test hase: Given an unknown instane ʹ aʹ aʹ * * * * [ a1 an ] ˆ > [ a1 an ] ˆ 1 ˆ ʹ 1 n L j 34

Example Example: lay Tennis 35

Example Learning hase Outlook layyes layno Sunny 2/9 3/5 Overast 4/9 0/5 Rain 3/9 2/5 Humidity layye s layn o High 3/9 4/5 Normal 6/9 1/5 Temperatur e layyes layno Hot 2/9 2/5 Mild 4/9 2/5 Cool 3/9 1/5 Wind layyes layno Strong 3/9 3/5 Weak 6/9 2/5 layyes 9/14 layno 5/14 36

Test hase Given a new instane Example x OutlookSunny TemperatureCool HumidityHigh WindStrong Look up tables OutlookSunny layyes 2/9 TemperatureCool layyes 3/9 HuminityHigh layyes 3/9 WindStrong layyes 3/9 layyes 9/14 OutlookSunny layno 3/5 TemperatureCool layno 1/5 HuminityHigh layno 4/5 WindStrong layno 3/5 layno 5/14 MA rule Yes x : [Sunny YesCool YesHigh YesStrong Yes]layYes 0.0053 No x : [Sunny No Cool NoHigh NoStrong No]layNo 0.0206 Given the fat Yes x < No x 37 we label x to be No.

Relevant Issues Violation of Independene Assumption For many real world tasks Nevertheless naïve Bayes works surprisingly well anyway! Zero onditional probability roblem If no example ontains the attribute value j ajk ˆ j ajk C i 0 In this irumstane ˆ x ˆ a ˆ x 0 during test For a remedy onditional probabilities estimated with Laplaian smoothing n 38 1 1 n C C n C 1 i jk i n i n + mp ˆ j ajk C i n + m : number of training examples for whih n : number of training examples for whih C a and C p : prior estimate usually p 1/ t for t possible values of m : weight to prior number of "ʺvirtual"ʺ examples m 1 j i jk j i

Relevant Issues Continuous-valued Input Attributes Numberless values for an attribute Conditional probability modeled with the normal distribution 2 1 j µ ji ˆ j C i exp 2 2πσ ji 2σ ji µ : meanavearage of attribute values of examples σ ji ji : standarddeviation of attribute values Learning hase: for 1 n C 1 L Output: n L normal distributions and C i i 1 L Test hase: for ʹ 1ʹ nʹ Calulate onditional probabilities with all the normal distributions Apply the MA rule to make a deision j j of examples for whih C for whih C i i 39

Conlusions Naïve Bayes based on the independene assumption Training is very easy and fast; just requiring onsidering eah attribute in eah lass separately Test is straightforward; just looking up tables or alulating onditional probabilities with normal distributions A popular generative model erformane ompetitive to most of state-of-the-art lassifiers even in presene of violating independene assumption Many suessful appliations e.g. spam mail filtering A good andidate of a base learner in ensemble learning Apart from lassifiation naïve Bayes an do more 40