Machine Learning PDF Free Download

Machie Learig 4771 Istructor: Toy Jebara

Topic 14 Structurig Probability Fuctios for Storage Structurig Probability Fuctios for Iferece Basic Graphical Models Graphical Models Parameters as Nodes

Structurig PDFs for Storage Probability tables quickly grow if p has may variables p(x = p( flu?,headache?,...,temperature? For D true/false medical variables Expoetial blow-up of storage size for the probability Example: 8x8 biary images of digits If multiomial with M choices, probabilities are how big? As i Naïve Bayes or Multivariate Beroulli, if words were idepedet thigs are much more efficiet p(x = p( flu?p ( headache?...p ( temperature? 0.73 0.27 For D true/false medical variables (really eve less tha that table size = 2 D 0.2 0.8 0.54 0.46 table size = 2 D

Structurig PDFs for Iferece Iferece: goal is to predict some variables give others x1: flu x2: fever x3: sius ifectio Patiet claims headache x4: temperature ad high temperature. x5: sius swellig Does he have a flu? x6: headache Give fidigs variables X f ad ukow variables X u predict queried variables X q Classical approach: truth tables (slow or logic etworks Moder approach: probability tables (slow or Bayesia etworks (fast belief propagatio, juctio tree algorithm

From Logic Nets to Bayes Nets 1980 s expert systems & logic etworks became popular x1 x2 x1 v x2 x1^x2 x1 -> x2 T T T T T T F T F F F T T F T F F F F T Problem: icosistecy, 2 paths ca give differet aswers Problem: rules are hard, istead use soft probability tables x3 = x1 ^ x2 x3=0 x3=1 p(x3 x1,x2 x3=0 x3=1 x2=0 x2=1 x2=0 x2=1 x2=0 x2=1 x2=0 x2=1 x1=0 x1=1 1.0 1.0 1.0 0.0 x1=0 x1=1 0.0 0.0 0.0 1.0 x1=0 x1=1 0.8 0.7 0.7 0.1 x1=0 x1=1 0.2 0.3 0.3 0.9 These directed graphs are called Bayesia Networks

Graphical Models & Bayes Nets Idepedece assumptios make probability tables smaller But real evets i the world ot completely idepedet! Complete idepedece is urealistic Graphical models use a graph to describe more subtle depedecies ad idepedecies: amely: coditioal idepedecies (like causality but ot exactly Directed Graphical Model, also called Bayesia Network use a directed acylic graph (DAG. Neural Network = Graphical Fuctio Represetatio Bayesia Network = Graphical Probability Represetatio

Graphical Models & Bayes Nets Node: a radom variable (discrete or cotiuous Idepedet: o lik Depedet: lik Arrow: from paret to child (like causality, ot exactly Child: destiatio of arrow, respose Paret: root of arrow, trigger paretsof child i = pa i = π i Graph: depedece/idepedece Graph: shows factorizatio of joit joit = products of coditioals p( x 1,,x = p( x i pa i = p x i π i DAG: directed acyclic graph p(x,y = p(xp(y p(x,y = p(y xp(x x

Basic Graphical Models Idepedece: all odes are uliked Shadig: variable is observed, coditio o it moves to the right of the bar i the pdf Examples of simplest coditioal idepedece situatios p( x 1,,x = p( x i pa i = p( x i π i 1 Markov chai: p( x,y,z = p( xp( y xp( z y Example biary evets: x = presidet says war y = geeral orders attack z = soldier shoots gu x z y = p ( x,y,z p( y,z = p ( x y p x y,z

Basic Graphical Models 2 1 Cause, 2 effects: y = flu x = sore throat z = temperature 3 2 Causes, 1 effect: x = rai y = wet driveway z = car oil leak p( x,y,z = p( yp( x yp( z y p( x,y,z = p( xp( zp( y x,z x z y Explaiig away x z x x z y Each coditioal is a mii-table (Multiomial or Beroulli coditioed o parets

Basic Graphical Models 2 1 Cause, 2 effects: y = flu x = sore throat z = temperature 3 2 Causes, 1 effect: x = dad is diabetic y = child is diabetic z = mom is diabetic p( x,y,z = p( yp( x yp( z y p( x,y,z = p( xp( zp( y x,z x z y Explaiig away x z x x z y Each coditioal is a mii-table (Multiomial or Beroulli coditioed o parets

Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x = p x i π i p( x 1,,x 6 = p( x 1

Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x,,x 6 = p x i π i = = p( x 1 p( x 2 = p( x 1 p( x 2 p( x 3 = p( x 1 p( x 2 p( x 3 p( x 4 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 How big are these tables (if biary variables?

Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x Iterpretatio??? = p x i π i p( x 1,,x 6 = p( x 1 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p( x 6,x 5

Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x Iterpretatio: 1: flu 2: fever 3: sius ifectio 4: temperature 5: sius swellig 6: headache =,,x 6 = p x i π i p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5

Graphical Models Normalizig probability tables. Joit distributios sum to 1. BUT, coditioals sum to 1 for each settig of parets. p(x 2-1 1 x =0 = 1 p x p(x,y 4-1 p(x y 4-2 x,y p( x,y = 1 x x = 1 = 1 p x y = 0 p x y = 1 p(x,y,z 8-1 p(x y,z 8-4 p x,y,z = 1 x,y,z y=0 y=1 x=0 x=1 x x x x = 1 = 1 = 1 = 1 p x y = 0,z = 0 p x y = 1,z = 0 p x y = 0,z = 1 p x y = 1,z = 1

Graphical Models Example: factorizatio of the followig system of variables = p( x i pa i,,x Iterpretatio 1: flu 2: fever 3: sius ifectio 4: temperature 5: sius swellig 6: headache =,,x 6 = p x i π i p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 2 6 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 4 63 13 vs. degrees of freedom

Parameters as Nodes Cosider the model variable θ ALSO as a radom variable But would eed a prior distributio P(θ igore for ow Recall: Naïve Bayes, word probabilities are idepedet Text: Multivariate Beroulli 50000 x = α d ( d 1 α 1 x d d p x α Text: Multiomial p( X α = d =1 M m=1 X m! M X m=1 m! M X α m m=1 m

Cotiuous Coditioal Models I previous slide, θ ad α were a radom variable i graph But, θ ad α are cotiuous Network ca have both discrete & cotiuous odes Joit factorizes ito coditioals that are either: 1 discrete coditioal probability tables 2 cotiuous coditioal probability distributios Most popular cotiuous distributio = Gaussia

Graphical Models I EM, we saw how to hadle odes that are: observed (shaded, hidde variables (E, parameters (M But, oly cosidered simple iid, sigle paret, structures More geerally, have arbitrary DAG without loops Notatio: { } = { odes/radomvars,edges } { } {( x i,x j : i j} { } = subset G = X,E X = x 1,,x M E = X c = x 1,x 3,x 4 Wat to do 4 thigs with these graphical models: 1 Lear Parameters (to fit to data 2 Query idepedece/depedece 3 Perform Iferece (get margials/max a posteriori 4 Compute Likelihood (e.g. for classificatio

Graphical Models Graph factorizes probability: Topological graph: odes are i order so that parets π come before childre =,,x 6 p( x 2 p( x 3 p( x 4 p( x 5 x 3 p x 6,x 5 = p( x i π i,,x Questio? Which is the more geeral graph?

Machine Learning 4771