Machine Learning 4771

Size: px

Start display at page:

Download "Machine Learning 4771"

Cora Hodges
6 years ago
Views:

1 Machie Learig 4771 Istructor: Toy Jebara

2 Topic 14 Structurig Probability Fuctios for Storage Structurig Probability Fuctios for Iferece Basic Graphical Models Graphical Models Parameters as Nodes

3 Structurig PDFs for Storage Probability tables quickly grow if p has may variables p(x = p( flu?,headache?,...,temperature? For D true/false medical variables Expoetial blow-up of storage size for the probability Example: 8x8 biary images of digits If multiomial with M choices, probabilities are how big? As i Naïve Bayes or Multivariate Beroulli, if words were idepedet thigs are much more efficiet p(x = p( flu?p ( headache?...p ( temperature? For D true/false medical variables (really eve less tha that table size = 2 D table size = 2 D

4 Structurig PDFs for Iferece Iferece: goal is to predict some variables give others x1: flu x2: fever x3: sius ifectio Patiet claims headache x4: temperature ad high temperature. x5: sius swellig Does he have a flu? x6: headache Give fidigs variables X f ad ukow variables X u predict queried variables X q Classical approach: truth tables (slow or logic etworks Moder approach: probability tables (slow or Bayesia etworks (fast belief propagatio, juctio tree algorithm

5 From Logic Nets to Bayes Nets 1980 s expert systems & logic etworks became popular x1 x2 x1 v x2 x1^x2 x1 -> x2 T T T T T T F T F F F T T F T F F F F T Problem: icosistecy, 2 paths ca give differet aswers Problem: rules are hard, istead use soft probability tables x3 = x1 ^ x2 x3=0 x3=1 p(x3 x1,x2 x3=0 x3=1 x2=0 x2=1 x2=0 x2=1 x2=0 x2=1 x2=0 x2=1 x1=0 x1= x1=0 x1= x1=0 x1= x1=0 x1= These directed graphs are called Bayesia Networks

6 Graphical Models & Bayes Nets Idepedece assumptios make probability tables smaller But real evets i the world ot completely idepedet! Complete idepedece is urealistic Graphical models use a graph to describe more subtle depedecies ad idepedecies: amely: coditioal idepedecies (like causality but ot exactly Directed Graphical Model, also called Bayesia Network use a directed acylic graph (DAG. Neural Network = Graphical Fuctio Represetatio Bayesia Network = Graphical Probability Represetatio

7 Graphical Models & Bayes Nets Node: a radom variable (discrete or cotiuous Idepedet: o lik Depedet: lik Arrow: from paret to child (like causality, ot exactly Child: destiatio of arrow, respose Paret: root of arrow, trigger paretsof child i = pa i = π i Graph: depedece/idepedece Graph: shows factorizatio of joit joit = products of coditioals p( x 1,,x = p( x i pa i = p x i π i DAG: directed acyclic graph p(x,y = p(xp(y p(x,y = p(y xp(x x

8 Basic Graphical Models Idepedece: all odes are uliked Shadig: variable is observed, coditio o it moves to the right of the bar i the pdf Examples of simplest coditioal idepedece situatios p( x 1,,x = p( x i pa i = p( x i π i 1 Markov chai: p( x,y,z = p( xp( y xp( z y Example biary evets: x = presidet says war y = geeral orders attack z = soldier shoots gu x z y = p ( x,y,z p( y,z = p ( x y p x y,z

9 Basic Graphical Models 2 1 Cause, 2 effects: y = flu x = sore throat z = temperature 3 2 Causes, 1 effect: x = rai y = wet driveway z = car oil leak p( x,y,z = p( yp( x yp( z y p( x,y,z = p( xp( zp( y x,z x z y Explaiig away x z x x z y Each coditioal is a mii-table (Multiomial or Beroulli coditioed o parets

10 Basic Graphical Models 2 1 Cause, 2 effects: y = flu x = sore throat z = temperature 3 2 Causes, 1 effect: x = dad is diabetic y = child is diabetic z = mom is diabetic p( x,y,z = p( yp( x yp( z y p( x,y,z = p( xp( zp( y x,z x z y Explaiig away x z x x z y Each coditioal is a mii-table (Multiomial or Beroulli coditioed o parets

11 Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x = p x i π i p( x 1,,x 6 = p( x 1

12 Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x,,x 6 = p x i π i = = p( x 1 p( x 2 = p( x 1 p( x 2 p( x 3 = p( x 1 p( x 2 p( x 3 p( x 4 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 How big are these tables (if biary variables?

13 Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x,,x 6 = p x i π i = = p( x 1 p( x 2 = p( x 1 p( x 2 p( x 3 = p( x 1 p( x 2 p( x 3 p( x 4 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 How big are these tables (if biary variables?

14 Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x Iterpretatio??? = p x i π i p( x 1,,x 6 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p( x 6,x 5

15 Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x Iterpretatio: 1: flu 2: fever 3: sius ifectio 4: temperature 5: sius swellig 6: headache =,,x 6 = p x i π i p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5

16 Graphical Models Normalizig probability tables. Joit distributios sum to 1. BUT, coditioals sum to 1 for each settig of parets. p(x x =0 = 1 p x p(x,y 4-1 p(x y 4-2 x,y p( x,y = 1 x x = 1 = 1 p x y = 0 p x y = 1 p(x,y,z 8-1 p(x y,z 8-4 p x,y,z = 1 x,y,z y=0 y=1 x=0 x=1 x x x x = 1 = 1 = 1 = 1 p x y = 0,z = 0 p x y = 1,z = 0 p x y = 0,z = 1 p x y = 1,z = 1

17 Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x Iterpretatio 1: flu 2: fever 3: sius ifectio 4: temperature 5: sius swellig 6: headache =,,x 6 = p x i π i p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x vs. degrees of freedom

18 Parameters as Nodes Cosider the model variable θ ALSO as a radom variable But would eed a prior distributio P(θ igore for ow Recall: Naïve Bayes, word probabilities are idepedet Text: Multivariate Beroulli x = α d ( d 1 α 1 x d d p x α Text: Multiomial p( X α = d =1 M m=1 X m! M X m=1 m! M X α m m=1 m

19 Cotiuous Coditioal Models I previous slide, θ ad α were a radom variable i graph But, θ ad α are cotiuous Network ca have both discrete & cotiuous odes Joit factorizes ito coditioals that are either: 1 discrete coditioal probability tables 2 cotiuous coditioal probability distributios Most popular cotiuous distributio = Gaussia

20 Graphical Models I EM, we saw how to hadle odes that are: observed (shaded, hidde variables (E, parameters (M But, oly cosidered simple iid, sigle paret, structures More geerally, have arbitrary DAG without loops Notatio: { } = { odes/radomvars,edges } { } {( x i,x j : i j} { } = subset G = X,E X = x 1,,x M E = X c = x 1,x 3,x 4 Wat to do 4 thigs with these graphical models: 1 Lear Parameters (to fit to data 2 Query idepedece/depedece 3 Perform Iferece (get margials/max a posteriori 4 Compute Likelihood (e.g. for classificatio

21 Graphical Models Graph factorizes probability: Topological graph: odes are i order so that parets π come before childre =,,x 6 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 = p( x i π i,,x Questio? Which is the more geeral graph?

22 Graphical Models Graph factorizes probability: Topological graph: odes are i order so that parets π come before childre =,,x 6 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 = p( x i π i,,x Questio? Which is the more geeral graph? Coditioal probability tables ca be chose to make busier graph look like simpler graph =

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=