Bayesian Decision Theory

Similar documents
CHAPTER 3: BAYESIAN DECISION THEORY

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Evaluation for sets of classes

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Artificial Intelligence Bayesian Networks

Classification Bayesian Classifiers

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Generative classification models

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Speech and Language Processing

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Hidden Markov Models

Machine learning: Density estimation

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

10-701/ Machine Learning, Fall 2005 Homework 3

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Reasoning under Uncertainty

Kernel Methods and SVMs Extension

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Evaluation of classifiers MLPs

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Bayesian belief networks

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Classification as a Regression Problem

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

1 Convex Optimization

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Statistical pattern recognition

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Homework Assignment 3 Due in class, Thursday October 15

Retrieval Models: Language models

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Classification learning II

Bayesian classification CISC 5800 Professor Daniel Leeds

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

CS47300: Web Information Search and Management

Lecture 19 of 42. MAP and MLE continued, Minimum Description Length (MDL)

Learning from Data 1 Naive Bayes

Chapter 9: Statistical Inference and the Relationship between Two Variables

} Often, when learning, we deal with uncertainty:

EM and Structure Learning

Statistics for Economics & Business

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

MDL-Based Unsupervised Attribute Ranking

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

What Independencies does a Bayes Net Model? Bayesian Networks: Independencies and Inference. Quick proof that independence is symmetric

Maximum Likelihood Estimation (MLE)

Limited Dependent Variables

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Cell Biology. Lecture 1: 10-Oct-12. Marco Grzegorczyk. (Gen-)Regulatory Network. Microarray Chips. (Gen-)Regulatory Network. (Gen-)Regulatory Network

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

Lecture Notes on Linear Regression

Natural Language Processing and Information Retrieval

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Lecture 10 Support Vector Machines II

Pattern Classification

Linear Approximation with Regularization and Moving Least Squares

The big picture. Outline

Multilayer Perceptron (MLP)

MAXIMUM A POSTERIORI TRANSDUCTION

VQ widely used in coding speech, image, and video

Which Separator? Spring 1

Statistical Foundations of Pattern Recognition

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Chapter 8 Indicator Variables

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Week 5: Neural Networks

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Problem Set 9 Solutions

Representing arbitrary probability distributions Inference. Exact inference; Approximate inference

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Communication with AWGN Interference

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Decision-making and rationality

The Expectation-Maximization Algorithm

Lecture 3: Probability Distributions

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Linear Regression Analysis: Terminology and Notation

Chapter Newton s Method

Edge Isoperimetric Inequalities

NP-Completeness : Proofs

Naïve Bayes Classifier

Transcription:

Bayesan Decson heory Berln hen 2005 References:. E. Alpaydn Introducton to Machne Learnng hapter 3 2. om M. Mtchell Machne Learnng hapter 6

Revew: Basc Formulas for robabltes roduct Rule: probablty A B of a conuncton of two events A and B ( A B ) ( A B ) ( A B ) ( B ) ( B A) ( A) Sum Rule: probablty of a dsuncton of two events A and B ( A B ) ( A) + ( B ) ( A B ) A B heorem of total probablty: f events A A n are mutually exclusve and exhaustve ( A A 0 and A ) ( ) n ( ) n ( B ) ( B A ) n ( B A ) ( A ) A B A 2 A A n MLDM-Berln hen 2

Revew: Basc Formulas for robabltes (cont.) han Rule: probablty of a conuncton of many events A A2 A K ( A A K A ) 2 n n ( A ) ( A A ) ( A A A )... ( A A A K A ) 2 3 2 n 2 n MLDM-Berln hen 3

lassfcaton Illustratve ase : redt Scorng x x x2 hght 2 Low Gven a new applcaton 0 t ncome savngs - rs - rs x ( x) f > hoose 2 otherwse or equvalent ly f > hoose 2 otherwse t [ ] x x 2 0.5 ( x) ( x) 2 x 2 ( x) + ( x) Note that 2 MLDM-Berln hen 4

lassfcaton (cont.) Bayes lassfer We can use the probablty theory to mae nference from data ( x ) ( x ) ( ) ( x ) x x x : observed data (varable) : class hypothess ( x) : pror probablty of : pror probablty of x : probablty of gven MLDM-Berln hen 5

lassfcaton (cont.) alculate the posteror probablty of the concept after havng the observaton x or 2 ombne the pror and what the data tells usng Bayes rule ( x ) posteror ( x ) ( ) ( x ) lelhood pror evdence x 2 and 2 are mutually exclusve and exhaustve classes (concepts) ( x ) ( x ) + ( x ) ( x ) ( ) + ( x ) ( ) 2 2 2 MLDM-Berln hen 6

MLDM-Berln hen 7 lassfcaton (cont.) Bayes lassfer: Extended to mutually exclusve and exhaustve classes K K K and 0 x x x x x

lassfcaton (cont.) Maxmum lelhood classfer he posteror probablty that the data belongs to class L ( x ) ( x ) Have the same classfcaton result as that of Bayes lassfer f the pror probablty ( ) s assumed to be equal to each other max L max ( x) max( x ) max( x)? ( x ) ( ) ( x) MLDM-Berln hen 8

lassfcaton: Illustratve ase 2 Does a patent have cancer or not? A patent taes a lab test and result would be x "+" or. If the result comes bac postve ( x "+" ) 2 x " " 2. And we also new that the test returns a correct postve result (+) n only 98% of the cases n whch the dsease s actually present ( ( + ) 0.98) and a correct negatve result (-) n only 97% of the cases n whch the dsease s not present 97 2 0. Furthermore 0.008 of the entre populaton have ths cancer ( 0.008) MLDM-Berln hen 9

lassfcaton: Illustratve ase 2 (cont.) Bayes lassfer: ( + ) ( + ) 2 ( + ) ( + ) ( ) + + ) ( ) 2 0.98 0.008 0.03 0.992 + 0.98 0.008 0.0078 0.298 + 0.0078 0.2 2 ( + 2 ) ( 2 ) ( + ) ( ) + + ) ( ) 2 0.03 0.992 0.03 0.992 + 0.98 0.008 0.298 0.298 + 0.0078 0.79 2 ( + ) 0. 2 ( + ) 79 2 0. MLDM-Berln hen 0

lassfcaton: Illustratve ase 2 (cont.) Maxmum lelhood classfer: ( + ) 0.98 ( + ) 0. 03 2 MLDM-Berln hen

Losses and Rss Decsons are not always prefect E.g. Loan Applcaton he loss for a hgh-rs applcant erroneously accepted (false acceptance) may be dfferent from that for an erroneously reected low-rs applcant (false reecton) Much crtcal for other cases such as medcal dagnoss or earthquae predcton MLDM-Berln hen 2

Expected Rs α Hypothesze that the example x belongs to class Suppose the example actually belongs to some class Def: the Expected Rs for tang acton R λ K ( α x ) λ ( x ) 0 f f A zero-one loss functon All correct decsons have no loss and all error are equally costly α hoose the acton wth mnmum rs α arg mn ( x ) R α MLDM-Berln hen 3

Expected Rs (cont.) α hoosng the acton wth mnmum rs s equvalent to choosng the class wth the hghest posteror probablty R K ( α x ) λ ( x ) K - ( x ) ( x ) hoose the acton α wth α arg max ( x ) MLDM-Berln hen 4

Expected Rs: Reect Acton Involved Manual Decsons? Wrong decsons (msclassfcaton) may have very hgh cost Resort to a manual decson when automatc system has low certanty of ts decson Defne an addtonal acton of reect (or doubt) 0 f x λ λ f K + otherwse α K + λ 0 λ s the loss ncurred for choosng the (K+)st acton of reect.. K K + MLDM-Berln hen 5

MLDM-Berln hen 6 Expected Rs: Reect Acton Involved (cont.) he rs for choosng the reect ((K+)st) acton Recall that the rs for choosng acton λ λ λ α + K K K R x x x K α α K + x x x x K K R - λ α α K + x.. + K K K + α α

Expected Rs: Reect Acton Involved (cont.) he optmal decson rule s to: hoose Reect f R hat s hoose f R and ( α x) < R( α x) R ( α x) < R( α x) for all K ( α x) < R( α x) for all K f K + and Reect otherwese λ 0 λ When always reect the chosen acton When always accept the chosen acton ( x) > ( x) ( x) > λ K + for all K MLDM-Berln hen 7

Dscrmnant Functons lassfcaton can be thought of as a set of dscrmnant functons g x for each class such that hoose f...k g ( x) max g ( x) g x can be expressed by usng the Bayes s classfer (wth mnmum rs and no addtonal acton of reect) g ( x) R( α x) If the zero-one loss functon s mposed g x can also be expressed by ( x) ( x) g Wth the same ranng result we can have g ( x ) ( x ) ( ) MLDM-Berln hen 8

Dscrmnant Functons (cont.) he nstance space thus can be dvded nto K decson regons R... where R R K x g x ( x) max g MLDM-Berln hen 9

Dscrmnant Functons (cont.) For two-class problems we can merely defne a sngle dscrmnant functon g ( x) - g ( x) g x 2 ( x) f g > 0 hoose 2 otherwse MLDM-Berln hen 20

hoosng Hypotheses* : MA rteron In machne learnng we are nterested n fndng the best (most probable) hypothess (classfer) h c from some hypothess space H gven the observed t t t tranng data set X x r r c t 2 L n { } c h MA arg max h c H ( h X ) c c arg max h c H ( X h ) ( h ) c c ( X ) c c arg max h c H ( X h ) ( h ) c c c A Maxmum a osteror (MA) hypothess h MA MLDM-Berln hen 2

hoosng Hypotheses*: ML rteron If we further assume that every hypothess s equally probable a pror e.g. ( h ). he above equaton c h c can be smplfed as: h ML arg max h c H ( X h ) c c A Maxmum Lelhood (ML) hypothess h ML X c h often called the lelhood of the data set c gven h c X c MLDM-Berln hen 22

Naïve Bayes lassfer A smplfed approach to the Bayes s classfer x x... he attrbutes of an nstance/example are assumed to be ndependent condtoned on a gven class hypothess Naïve Bayes assumpton: Naïve Bayes lassfer: 2 x d x d ( x ) ( x x x ) ( x ) MA arg arg arg arg 2... max max max max d ( x ) n ( x ) ( ) ( x ) ( x... x ) ( ) d ( ) ( x ) n d n n arg max ( x ) ( ) MLDM-Berln hen 23

Naïve Bayes lassfer (cont.) x ( x x x ) A B Illustratve case Gven a data set wth 3-dmensonal Boolean examples x ( x A xb x ) tran a naïve Bayes classfer to predct the classfcaton Attrbute A F F F F Attrbute B F F F F Attrbute F F F lassfcaton D F F F ( D ) / 2 ( D F ) / 2 ( A D ) / 3 ( A F D ) ( B D ) / 3 ( B F D ) ( D ) / 3 ( B F D ) 2 / 3 2 / 3 2 / 3 ( A D F ) / 3 ( A F D F ) 2 / 3 ( B D F ) / 3 ( B F D F ) 2 / 3 ( D F ) 2 / 3 ( B F D F ) / 3 What s the predcted probablty ( D A B F )? What s the predcted probablty? ( D B ) MLDM-Berln hen 24

MLDM-Berln hen 25 Naïve Bayes lassfer (cont.) Illustratve case (cont.) 3 4 2 2 2 3 2 3 2 3 2 3 3 2 3 2 3 3 2 3 () + + + F D F D F B A D D F B A D D F B A F B A D D F B A F B A D 2 2 3 2 3 2 3 ) ( + + + F D F D B D D B D D B B D D B B D

How to ran a Naïve Bayes lassfer Naïve_Bayes_Learn(examples) For each target value v maxmum lelhood (ML) estmate of ( v ) ˆ v For each attrbute value a ( v ) maxmum lelhood (ML) estmate of ( a v ) ˆ a x lassfy_new_instance(x) x a a of each attrbute a x v v n or v v ( a ) a x v x( a ) x v x v v a v NB arg max v V ( v ) ( a v ) a x MLDM-Berln hen 26

Naïve Bayes: Example 2 onsder layenns agan and new nstance <Outloosunny emperaturecool Humdtyhgh Wndstrong> Want to compute v NB arg v V max ( v ) Outloo sunny v emperature cool v { yes no} ( Humdty hghv ) ( Wnd Strong v ) ( yes) ( Outloo sunny yes) ( emperature cool yes) ( Humdty hgh yes) ( Wnd Strong yes) ( no) ( Outloo sunny no) ( emperature cool no) ( Humdty hgh no) ( Wnd Strong no) 0. 206 0.0053 v NB no MLDM-Berln hen 27

Dealng wth Data Sparseness What f none of the tranng nstances wth target value v have attrbute value a? hen ˆ v ( a v ) NB arg 0 max v V and ( v ) ˆ ( a v ) ypcal soluton s Bayesan estmate for n ˆ... s number of tranng examples for whch n c s number of tranng examples for whch v v and p s pror estmate for ˆ ( a v ) s weght gven to pror (.e. number of vrtual examples) m ˆ ( a v ) n c n + + mp m ˆ Smoothng a v v v a a MLDM-Berln hen 28

Example: Learnng to lassfy ext For nstance Learn whch news artcles are of nterest Learn to classfy web pages by topc Naïve Bayes s among the most effectve algorthms What attrbutes shall we use to represent text documents he word occurs n each document poston MLDM-Berln hen 29

Example: Learnng to lassfy ext (cont.) arget oncept: Interestng? Document {+-}. Represent each document by vector of words one attrbute per word poston n document 2. Learnng Use tranng examples to estmate (+) (-) doc (doc +) doc w (doc -) Naïve Bayes condtonal ndependence assumpton length ( doc ) doc v ( a w v ) Where a s probablty that word n poston s w w v gven v me Invarant One more assumpton: ( a w v ) ( a m w v ) m a w MLDM-Berln hen 30

Example: Learnng to lassfy ext (cont.) Learn_Naïve_Bayes_ext(Examples V). ollect all words and other toens that occur n Examples Vocabulary all dstnct words and other toens n Examples 2. alculate the requred v and w v probablty terms docs subset of Examples for whch the target value s v docs Examples ext a sngle document created by concatenatng all members of docs n total number of words n ext (countng duplcate words multple tmes) w For each word n Vocabulary n number of tmes word w occurs n ( w v ) n + Smoothed ungram n + Vocabulary v MLDM-Berln hen 3

Example: Learnng to lassfy ext (cont.) lassfy_naïve_bayes_ext(doc) postons all word postons n Doc that contan toens found n Vocabulary v NB Return where v NB arg max v V ( v ) ( a v ) postons MLDM-Berln hen 32

Bayesan Networs remse Naïve Bayes assumpton of condtonal ndependence too restrctve But t s ntractable wthout some such assumptons Bayesan networs descrbe condtonal ndependence among subsets of varables Allows combnng pror nowledge about (n)dependences among varables wth observed tranng data Bayesan Networs also called Bayesan Belef Networs Bayes Nets Belef Networs robablstc Networs Graphcal Models etc. MLDM-Berln hen 33

Bayesan Networs (cont.) A smple graphcal notaton for condtonal ndependence assertons and hence for compact specfcaton of full ont dstrbutons Syntax A set of nodes one per varable (dscrete or contnuous) For dscrete varable they can be ether bnary or not A drected acyclc graph (ln/arrow drectly nfluences ) A condtonal dstrbuton for each node gven ts parents ( X arents) X x x In the smplest case condtonal dstrbuton represented as a ondtonal robablty able () gvng the dstrbuton over ( X ) for each combnaton of parent values MLDM-Berln hen 34

Bayesan Networs (cont.) E.g. nodes of dscrete bnary varables ondtonal robablty able () S B () F F F F 0.4 0. 0.8 0.2 Each node s asserted to be condtonally ndependent of ts nondescendants gven ts mmedate predecessors Drected acyclc graph MLDM-Berln hen 35

Example :Dentst Networ opology of networ encodes condtonal ndependence assertons Weather s ndependent of the other varables oothache and atch are condtonally ndependent gven avty avty s a drect cause of oothache and atch MLDM-Berln hen 36

ondtonal (In)dependence Defnton: X s condtonally ndependent of Y gven Z f the probablty dstrbuton governng X s ndependent of the value of Y gven the value of Z; that s f ( x y z ) ( X x Y y Z z ) ( X x Z z ) More compactly we can wrte ( X Y Z ) ( X Z ) ondtonal ndependence allows breang down nference nto calculaton over small group of varables MLDM-Berln hen 37

ondtonal (In)dependence (cont.) Example: hunder s condtonally ndependent of Ran gven Lghtnng (hunder Ran Lghtnng) (hunder Lghtnng) Recall that Naïve Bayes uses condtonal ndependence to ustfy ( X Y Z ) ( X Y Z ) ( Y Z ) ( X Z ) ( Y Z ) XY are mutually ndependent gven Z MLDM-Berln hen 38

ondtonal (In)dependence (cont.) Bayesan Networ also can be thought of as a causal graph n that llustrates causaltes between varables We can mae a dagnostc nference from the t ( R W )? predcton dagnoss ( R W ) ( R) ( R) W ( W ) ( W R) ( R) ( R) ( R) + ( W R) ( R) W 0.9 0.4 0.9 0.4 + 0.2 0.6 0.75 ( > ( R) 0.4) MLDM-Berln hen 39

ondtonal (In)dependence (cont.) Suppose that sprnler s ncluded as another cause of wet grass redctve nference ( W S ) ( W R S ) ( R S ) + ( W R S ) ( R S ) ( W R S ) ( R) + ( W R S ) ( R) ( W ) ( W R S ) ( R S ) + ( W R S ) ( R S ) + ( W R S ) ( R S ) + ( W R S ) ( R S ) ( W R S ) ( R ) ( S ) + ( W R S ) ( R ) ( S ) + ( W R S ) ( R ) ( S ) + ( W R S ) ( R ) ( S ) 0.95 0. 5 2 0.4 0.2 + 0.9 0.6 0.2 + 0.9 0.4 0.8 + 0. 0.6 0.8 0.95 0.4 + 0.9 0.6 0.92 Dagnostc nference (I) ( S W ) ( W S ) ( S ) ( W ) 0.92 0.2 0.52 0.35 ( > ( S ) 0.2 ) MLDM-Berln hen 40

ondtonal (In)dependence (cont.) Dagnostc nference (II) S R W ( W R S ) ( S R ) ( W R ) ( W R S ) ( S ) ( W R ) 0.95 0.9 0.2 0.2 ( > ( S ) 0. 2 ) ( W R ) ( W R S ) ( S R ) + ( W R S ) ( S R ) ( W R S ) ( S ) + ( W R S ) ( S ) 0.95 0.2 + 0.9 0.8 0.9 MLDM-Berln hen 4

Example 2: Burglary Networ You re at wor neghbor John calls to say your alarm s rngng but neghbor Mary doesn't call. Sometmes t's set off by mnor earthquaes. Is there a burglar? ( Burglary Johnall Maryall F )? Varables: Burglar Earthquae Alarm Johnalls Maryalls Networ topology reflects causal nowledge A burglar can set the alarm off An earthquae can set the alarm off he alarm can cause Mary to call he alarm can cause John to call But John sometmes confuses the telephone rngng wth the alarm Mary les rather loud musc and sometmes msses the alarm MLDM-Berln hen 42

Example 2: Burglary Networ ondtonal robablty able () each row shows prob. gven a state of parents For Boolean varables ust the prob. for true s shown MLDM-Berln hen 43

ompactness han rule A for Boolean X wth Boolean (true/false) parents has 2 rows for the combnatons of parent values Each row requres one number p for X true (the number for X false s ust -p) If each varable has no more than parents the complete networ requres O(n 2 ) numbers I.e. grows lnearly wth n vs. O(2 n ) for the full ont dstrbuton For burglary net + + 4 + 2 + 2 0 numbers (vs. 2 5-3?) 2 0 2 0 2 2 2 2 ( B E A J M ) ( B) ( E B) ( A B E) ( J B E A) ( M B E A J ) ( B) ( E) ( A B E) ( J A) ( M A) 4 2 2 MLDM-Berln hen 44

Global Semantcs Global semantcs defnes the full ont dstrbuton as the product of the local condtonal dstrbutons n ( X X ) X arents( X ) n... he Bayesan Networ s semantcally A representaton of the ont dstrbuton A encodng of a collecton of condtonal ndependence statements E.g. ( J ( J M 0.00062 A B E) A) ( M A) ( A B E) ( B) ( E) 0.90 0.70 0.00 0.999 0.998 MLDM-Berln hen 45

Local Semantcs Local semantcs: each node s condtonally ndependent of ts nondescendants gven ts parents Local semantcs global semantcs MLDM-Berln hen 46

Marov Blanet Each node s condtonally ndependent of all others gven ts parents + chldren + chldren's parents MLDM-Berln hen 47

onstructng Bayesan Networs Need a method such that a seres of locally testable assertons of condtonal ndependence guarantees the requred global semantcs. hoose an orderng of varables X.. X.. X n 2. For to n add X to the networ and select parents from X such.. X that arents( X ) { X X } ( X X X ) ( X arents( X )).. hs choce of parents guarantees the global semantcs n ( X... X n ) ( X X.. X ) n ( X arents( X )) (chan rule).. (by constructon) MLDM-Berln hen 48

Example for onstructng Bayesan Networ Suppose we choose the orderng: M J A B E (J M) (J)? MLDM-Berln hen 49

Example (cont.) Suppose we choose the orderng: M J A B E (J M) (J)? No (A JM) (A J)? (A JM) (A)? MLDM-Berln hen 50

Example (cont.) Suppose we choose the orderng: M J A B E (J M) (J)? No (A JM) (A J)? No (A JM) (A)? No (B AJM) (B A)? (B AJM) (B)? MLDM-Berln hen 5

Example (cont.) Suppose we choose the orderng: M J A B E (J M) (J)? No (A JM) (A J)? No (A JM) (A)? No (B AJM) (B A)? Yes (B AJM) (B)? No (E BAJM) (E A)? (E BAJM) (E BA)? MLDM-Berln hen 52

Example (cont.) Suppose we choose the orderng: M J A B E (J M) (J)? No (A JM) (A J)? No (A JM) (A)? No (B AJM) (B A)? Yes (B AJM) (B)? No (E BAJM) (E A)? No (E BAJM) (E BA)? Yes MLDM-Berln hen 53

Example (cont.) Summary Decdng condtonal ndependence s hard n noncausal drectons (ausal models and condtonal ndependence seem hardwred for humans!) Assessng condtonal probabltes s hard n noncausal drectons Networ s less compact: + 2 + 4 + 2 + 4 3 numbers needed 2 4 2 4 MLDM-Berln hen 54

Inference ass Smple queres: compute posteror margnal E.g. ( Burglary Johnalls true Marryalls true) onunctve queres: X X E e X X E e ( X E e) Optmal decsons: probablstc nference ( Outcome Acton Evdence) ( X E e) MLDM-Berln hen 55

MLDM-Berln hen 56 Inference by Enumeraton Slghtly ntellgent way to sum out varables from the ont wthout actually constructng ts explct representaton Smple query on the burglary networ Rewrte full ont entres usng product of entres: m a a e B a e B m a a e B a e B m a e B m B e a e a e a α α α e a m a e B m m B m m B m B ) ( α α α

Evaluaton ree Enumeraton s neffcent: repeated computaton\al e E.g. computes a m a for each value of MLDM-Berln hen 57

HW-4: Bayesan Networs A new bnary varable concernng cat mang nose on the roof (roof) S R redctve nferences ( F )? W F ( F S )? MLDM-Berln hen 58

MLDM-Berln hen 59 Bayesan Networs for Informaton Retreval D W t w d t d w d d w t w d t d t w d t d w t d w d Documents opcs Words

MLDM-Berln hen 60 Bayesan Networs for Informaton Retreval D W t w d t t w d t d d w d d w t w t d t w t d w d Documents opcs Words