Bayesian Belief Network

Similar documents
Bayesian Belief Network

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Bayesian Belief Network

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Machine Learning 4771

15-780: Graduate Artificial Intelligence. Density estimation

Elementary manipulations of probabilities

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)

As stated by Laplace, Probability is common sense reduced to calculation.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Probability and MLE.

Random Variables, Sampling and Estimation

Mixtures of Gaussians and the EM Algorithm

Chapter 6 Sampling Distributions

Properties and Hypothesis Testing

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Lecture 2. The Lovász Local Lemma

Exponential Families and Bayesian Inference

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Exercises Advanced Data Mining: Solutions

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Bayesian networks are graphical models that characterize how variables are independent of each other.

Expectation-Maximization Algorithm.

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

10-701/ Machine Learning Mid-term Exam Solution

Chapter 8: Estimating with Confidence

Problem Set 2 Solutions

CS284A: Representations and Algorithms in Molecular Biology

Distribution of Random Samples & Limit theorems

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Statistical Pattern Recognition

A widely used display of protein shapes is based on the coordinates of the alpha carbons - - C α

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

Probability and statistics: basic terms

CSE 527, Additional notes on MLE & EM

CS 330 Discussion - Probability

Quick Review of Probability

Convergence of random variables. (telegram style notes) P.J.C. Spreij

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 6 Principles of Data Reduction

Infinite Sequences and Series

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Basics of Probability Theory (for Theory of Computation courses)

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning

PROBABILITY LOGIC: Part 2

Approximations and more PMFs and PDFs

CS322: Network Analysis. Problem Set 2 - Fall 2009

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

1.010 Uncertainty in Engineering Fall 2008

5. Likelihood Ratio Tests

Quick Review of Probability

Math 61CM - Solutions to homework 3

Lecture 1 Probability and Statistics

Lecture 12: November 13, 2018

CHAPTER 5. Theory and Solution Using Matrix Techniques

11 Correlation and Regression

Empirical Distributions

Problem Set 4 Due Oct, 12

An Introduction to Randomized Algorithms

Module 1 Fundamentals in statistics

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Some special clique problems

Topic 9: Sampling Distributions of Estimators

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

CS161 Handout 05 Summer 2013 July 10, 2013 Mathematical Terms and Identities

Statistics 511 Additional Materials

MATH 1910 Workshop Solution

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Lecture 2: April 3, 2013

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Frequentist Inference

Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University

Machine Learning Brett Bernstein

Estimation for Complete Data

Generalized Semi- Markov Processes (GSMP)

A PROBABILITY PRIMER

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Probabilistic Reasoning Systems

Stat 421-SP2012 Interval Estimation Section

SNAP Centre Workshop. Basic Algebraic Manipulation

A statistical method to determine sample size to estimate characteristic value of soil parameters

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Lecture 9: Hierarchy Theorems

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Pattern Classification, Ch4 (Part 1)

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Introductory statistics

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

What is Probability?

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Sieve Estimators: Consistency and Rates of Convergence

Transcription:

Bayesia Belief Network Ucertaity & Probability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier Joit distributios Naive Bayes Classifier 1

Ucertaity Our mai tool is the probability theory, which assigs to each setece umerical degree of belief betwee 0 ad 1 It provides a way of summarizig the ucertaity Variables Boolea radom variables: cavity might be true or false Discrete radom variables: weather might be suy, raiy, cloudy, sow Weather=suy) Weather=raiy) Weather=cloudy) Weather=sow) Cotiuous radom variables: the temperature has cotiuous values 2

Where do probabilities come from? Frequets: From experimets: form ay fiite sample, we ca estimate the true fractio ad also calculate how accurate our estimatio is likely to be Subjective: Aget s believe Objectivist: True ature of the uiverse, that the probability up heads with probability 0.5 is a probability of the coi Before the evidece is obtaied; prior probability a) the prior probability that the propositio is true cavity)=0.1 After the evidece is obtaied; posterior probability a b) The probability of a give that all we kow is b cavity toothache)=0.8 3

Axioms of Probability (Kolmogorov s axioms, first published i Germa 1933) All probabilities are betwee 0 ad 1. For ay propositio a 0 a) 1 true)=1, false)=0 The probability of disjuctio is give by a b) = a) + b) a b) Product rule a b) = a b) b) a b) = b a) a) 4

Theorem of total probability If evets A 1,..., A are mutually exclusive with the Bayes s rule (Reveret Thomas Bayes 1702-1761) He set dow his fidigs o probability i "Essay Towards Solvig a Problem i the Doctrie of Chaces" (1763), published posthumously i the Philosophical Trasactios of the Royal Society of Lodo b a) = a b)b) a) 5

6 Diagosis What is the probability of meigitis i the patiet with stiff eck? A doctor kows that the disease meigitis causes the patiet to have a stiff eck i 50% of the time -> s m) Prior Probabilities: That the patiet has meigitis is 1/50.000 -> m) That the patiet has a stiff eck is 1/20 -> s) m s) = 0.5 *0.00002 0.05 = 0.0002 m s) = s m)m) s) Normalizatio ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( x P y P y x P x y P x P y P y x P x y P = = 0.6,0.4 0.12,0.08 ) ( ), ( ) ( ) ( ) ( ) ( ) ( 1 = = + = α α α x y P x y P Y P Y X P X Y P x y P x y P

Bayes Theorem h) = prior probability of hypothesis h D) = prior probability of traiig data D h D) = probability of h give D D h) = probability of D give h Choosig Hypotheses Geerally wat the most probable hypothesis give the traiig data Maximum a posteriori hypothesis h MAP : 7

If assume h i )=h j ) for all h i ad h j, the ca further simplify, ad choose the Maximum likelihood (ML) hypothesis 8

Example Does patiet have cacer or ot? A patiet takes a lab test ad the result comes back positive. The test returs a correct positive result (+) i oly 98% of the cases i which the disease is actually preset, ad a correct egative result (-) i oly 97% of the cases i which the disease is ot preset Furthermore, 0.008 of the etire populatio have this cacer Suppose a positive result (+) is retured... 9

Normalizatio 0.0078 cacer +) = 0.0078 + 0.0298 = 0.20745 cacer +) = 0.0298 0.0078 + 0.0298 = 0.79255 The result of Bayesia iferece depeds strogly o the prior probabilities, which must be available i order to apply the method Joit distributio A joit distributio for toothache, cavity, catch, detist s probe catches i my tooth :-( We eed to kow the coditioal probabilities of the cojuctio of toothache ad cavity What ca a detist coclude if the probe catches i the achig tooth? toothache catch cavity)cavity) cavity toothache catch) = toothache cavity) For possible variables there are 2 possible combiatios 10

Coditioal Idepedece Oce we kow that the patiet has cavity we do ot expect the probability of the probe catchig to deped o the presece of toothache catch cavity toothache) = catch cavity) toothache cavity catch) = toothache cavity) Idepedece betwee a ad b a b) = a) b a) = b) a b) = a) b) toothache, catch, cavity, Weather = cloudy) = = Weather = cloudy) toothache, catch, cavity) The decompositio of large probabilistic domais ito weakly coected subsets via coditioal idepedece is oe of the most importat developmets i the recet history of AI This ca work well, eve the assumptio is ot true! 11

A sigle cause directly ifluece a umber of effects, all of which are coditioally idepedet cause, effect1, effect2,... effect) = cause) effecti cause) i= 1 Naive Bayes Classifier Assume target fuctio f: X è V, where each istace x described by attributes a 1, a 2.. a Most probable value of f(x) is: 12

v NB Naive Bayes assumptio: which gives Naive Bayes Algorithm For each target value v j ç estimate v j ) For each attribute value a i of each attribute a ç estimate a i v j ) 13

Traiig dataset Class: C1:buys_computer= yes C2:buys_computer= o Data sample: X = (age<=30, Icome=medium, Studet=yes Credit_ratig=Fair) age icome studet credit_ratig buys_computer <=30 high o fair o <=30 high o excellet o 30 40 high o fair yes >40 medium o fair yes >40 low yes fair yes >40 low yes excellet o 31 40 low yes excellet yes <=30 medium o fair o <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellet yes 31 40 medium o excellet yes 31 40 high yes fair yes >40 medium o excellet o Naïve Bayesia Classifier: Example Compute X C i ) for each class age= <30 buys_computer= yes ) = 2/9=0.222 age= <30 buys_computer= o ) = 3/5 =0.6 icome= medium buys_computer= yes )= 4/9 =0.444 icome= medium buys_computer= o ) = 2/5 = 0.4 studet= yes buys_computer= yes)= 6/9 =0.667 studet= yes buys_computer= o )= 1/5=0.2 credit_ratig= fair buys_computer= yes )=6/9=0.667 credit_ratig= fair buys_computer= o )=2/5=0.4 buys_computer= yes )=9/ 14 buys_computer= o )=5/ 14 X=(age<=30,icome =medium, studet=yes,credit_ratig=fair) X C i ) : X buys_computer= yes )= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044 X buys_computer= o )= 0.6 x 0.4 x 0.2 x 0.4 =0.019 X C i )*C i ) : X buys_computer= yes ) * buys_computer= yes )=0.028 X buys_computer= o ) * buys_computer= o )=0.007 X belogs to class buys_computer=yes 14

Coditioal idepedece assumptio is ofte violated...but it works surprisigly well ayway Naive Bayes assumptio of coditioal idepedece too restrictive But it's itractable without some such assumptios... Bayesia Belief etworks describe coditioal idepedece amog subsets of variables allows combiig prior kowledge about (i)depedecies amog variables with observed traiig data 15

Bayesia etworks A simple, graphical otatio for coditioal idepedece assertios ad hece for compact specificatio of full joit distributios Sytax: a set of odes, oe per variable a directed, acyclic graph (lik "directly iflueces") a coditioal distributio for each ode give its parets: P (X i Parets (X i )) I the simplest case, coditioal distributio represeted as a coditioal probability table (CPT) givig the distributio over X i for each combiatio of paret values Bayesia Networks Bayesia belief etwork allows a subset of the variables coditioally idepedet A graphical model of causal relatioships Represets depedecy amog the variables Gives a specificatio of joit probability distributio X Z Y P q Nodes: radom variables q Liks: depedecy q X,Y are the parets of Z, ad Y is the paret of P q No depedecy betwee Z ad P q Has o loops or cycles 16

Coditioal Idepedece Oce we kow that the patiet has cavity we do ot expect the probability of the probe catchig to deped o the presece of toothache catch cavity toothache) = catch cavity) toothache cavity catch) = toothache cavity) Idepedece betwee a ad b a b) = a) b a) = b) Example Topology of etwork ecodes coditioal idepedece assertios: Weather is idepedet of the other variables Toothache ad Catch are coditioally idepedet give Cavity 17

Bayesia Belief Network: A Example Family History Smoker (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) LC 0.8 0.5 0.7 0.1 LugCacer Emphysema ~LC 0.2 0.5 0.3 0.9 PositiveXRay Dyspea Bayesia Belief Networks The coditioal probability table for the variable LugCacer: Shows the coditioal probability for each possible combiatio of its parets Example I'm at work, eighbor Joh calls to say my alarm is rigig, but eighbor Mary does't call. Sometimes it's set off by mior earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohCalls, MaryCalls Network topology reflects "causal" kowledge: A burglar ca set the alarm off A earthquake ca set the alarm off The alarm ca cause Mary to call The alarm ca cause Joh to call 18

Belief Networks Burglary B) 0.001 Earthquake E) 0.002 Alarm Burg. Earth. A) t t.95 t f.94 f t.29 f f.001 JohCalls A J) t.90 f.05 MaryCalls A M) t.7 f.01 Full Joit Distributio x,..., x ) = x parets( X )) 1 i i i= 1 j m a b e) = j a) m a) a b e) b) e) = 0.9 0.7 0.001 0.999 0.998 = 0.00062 19

Compactess A CPT for Boolea X i with k Boolea parets has 2 k rows for the combiatios of paret values Each row requires oe umber p for X i = true (the umber for X i = false is just 1-p) If each variable has o more tha k parets, the complete etwork requires O( 2 k ) umbers I.e., grows liearly with, vs. O(2 ) for the full joit distributio For burglary et, 1 + 1 + 4 + 2 + 2 = 10 umbers (vs. 2 5-1 = 31) Iferece i Bayesia Networks How ca oe ifer the (probabilities of) values of oe or more etwork variables, give observed values of others? Bayes et cotais all iformatio eeded for this iferece If oly oe variable with ukow value, easy to ifer it I geeral case, problem is NP hard 20

Example I the burglary etwork, we migth observe the evet i which JohCalls=true ad MarryCalls=true We could ask for the probability that the burglary has occurred Burglary JohCalls=ture,MarryCalls=true) Remember - Joit distributio cavity toothache) = = cavity toothache) = cavity toothache) toothache) 0.108 + 0.012 0.108 + 0.012 + 0.016 + 0.064 = 0.6 = cavity toothache) toothache) 0.016 + 0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 21

Normalizatio 1 = y x) + y x) Y X ) = α X Y ) Y ) α y x), y x) α 0.12,0.08 = 0.6,0.4 Normalizatio Cavity toothache) = αcavity, toothache) = α[cavity, toothache,catch) + Cavity, toothache, catch)] = α[< 0.108,0.016 > + < 0.012,0.064 >] = α < 0.12,0.08 >=< 0.6,0.4 > X is the query variable E evidece variable Y remaiig uobservable variable X e) = αx,e) = α X,e, y) Summatio over all possible y (all possible values of the uobservable variables Y) y 22

Burglary JohCalls=ture,MarryCalls=true) The hidde variables of the query are Earthquake ad Alarm B j,m) = αb, j,m) = α e B,e,a, j,m) For Burglary=true i the Bayesai etwork a b j,m) = α b)e)a b,e) j a)m a) e a To compute we had to add four terms, each computed by multiplyig five umbers I the worst case, where we have to sum out almost all variables, the complexity of the etwork with Boolea variables is O(2 ) 23

b) is costat ad ca be moved out, e) term ca be moved outside summatio a b j,m) = αb) e) a b,e) j a)m a) e a JohCalls=true ad MarryCalls=true, the probability that the burglary has occured is aboud 28% B, j,m) = α < 0.00059224,0.0014919 > < 0.284,0.716 > Computatio for Burglary=true 24

Variable elimiatio algorithm Elimiate repeated calculatio Dyamic programmig Irrelevat variables (X query variable, E evidece variables) 25

Complexity of exact iferece The burglary etwork belogs to a family of etworks i which there is at most oe udirected path betwee tow odes i the etwork These are called sigly coected etworks or polytrees The time ad space complexity of exact iferece i polytrees is liear i the size of etwork Size is defied by the umber of CPT etries If the umber of parets of each ode is bouded by a costat, the the complexity will be also liear i the umber of odes For multiply coected etworks variable elimiatio ca have expoetial time ad space complexity 26

Costructig Bayesia Networks A Bayesia etwork is a correct represetatio of the domai oly if each ode is coditioally idepedet of its predecessors i the orderig, give its parets MarryCalls JohCalls,Alarm,Eathquake,Bulgary)=MaryCalls Alarm) Coditioal Idepedece relatios i Bayesia etworks The topological sematics is give either of the specificatios of DESCENDANTS or MARKOV BLANKET 27

Local sematics Example JohCalls is idipedet of Burglary ad Earthquake give the value of Alarm 28

Example Burglary is idipedet of JohCalls ad MaryCalls give Alarm ad Earthquake 29

Costructig Bayesia etworks 1. Choose a orderig of variables X 1,,X 2. For i = 1 to add X i to the etwork select parets from X 1,,X i-1 such that P (X i Parets(X i )) = P (X i X 1,... X i-1 ) This choice of parets guaratees: P (X 1,,X ) = π i =1 P (X i X 1,, X i-1 ) (chai rule) = π i =1 P (X i Parets(X i )) (by costructio) The compactess of Bayesia etworks is a example of locally structured systems Each subcompoet iteracts directly with oly bouded umber of other compoets Costructig Bayesia etworks is difficult Each variable should be directly iflueced by oly a few others The etwork topology reflects thes direct iflueces 30

Example Suppose we choose the orderig M, J, A, B, E J M) = J)? Example Suppose we choose the orderig M, J, A, B, E J M) = J)? No A J, M) = A J)? A J, M) = A)? No B A, J, M) = B A)? B A, J, M) = B)? 31

Example Suppose we choose the orderig M, J, A, B, E J M) = J)? No A J, M) = A J)? A J, M) = A)? No B A, J, M) = B A)? Yes B A, J, M) = B)? No E B, A,J, M) = E A)? E B, A, J, M) = E A, B)? Example Suppose we choose the orderig M, J, A, B, E J M) = J)? No A J, M) = A J)? A J, M) = A)? No B A, J, M) = B A)? Yes B A, J, M) = B)? No E B, A,J, M) = E A)? No E B, A, J, M) = E A, B)? Yes 32

Example cotd. Decidig coditioal idepedece is hard i o causal directios (Causal models ad coditioal idepedece seem hardwired for humas!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 umbers eeded Some liks represet teuous relatioship that require difficult ad uatural probability judgmet, such the probability of Earthquake give Burglary ad Alarm 33