Particle-Based Approximate Inference on Graphical Model

Similar documents
MCMC and Gibbs Sampling. Kayhan Batmanghelich

Sampling Algorithms for Probabilistic Graphical models

STA 4273H: Statistical Machine Learning

Markov chain Monte Carlo Lecture 9

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Markov Networks.

Introduction to Machine Learning CMU-10701

Machine Learning for Data Science (CS4786) Lecture 24

The Ising model and Markov chain Monte Carlo

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Graphical Models and Kernel Methods

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Bayesian Inference and MCMC

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Variational Inference. Sargur Srihari

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo

CS 343: Artificial Intelligence

LECTURE 15 Markov chain Monte Carlo

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Monte Carlo Methods. Leon Gu CSD, CMU

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

An Introduction to Bayesian Networks: Representation and Approximate Inference

CS 188: Artificial Intelligence. Bayes Nets

Probabilistic Graphical Networks: Definitions and Basic Results

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Approximate Inference

Bayesian networks: approximate inference

Approximate Inference using MCMC

Inference in Bayesian Networks

Markov Chains and MCMC

Artificial Intelligence

Probabilistic Machine Learning

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

13: Variational inference II

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

Lecture 7 and 8: Markov Chain Monte Carlo

Intelligent Systems (AI-2)

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Basic Sampling Methods

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Answers and expectations

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayes Nets: Sampling

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

4 : Exact Inference: Variable Elimination

Representation of undirected GM. Kayhan Batmanghelich

CPSC 540: Machine Learning

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Monte Carlo in Bayesian Statistics

17 : Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Sargur Srihari

Introduction to Machine Learning

Artificial Intelligence Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Doing Bayesian Integrals

Markov Chain Monte Carlo

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Computational statistics

28 : Approximate Inference - Distributed MCMC

Bayes Nets III: Inference

Lecture 8: Bayesian Networks

STA 4273H: Statistical Machine Learning

16 : Approximate Inference: Markov Chain Monte Carlo

The Origin of Deep Learning. Lili Mou Jan, 2015

Introduction to Machine Learning CMU-10701

Markov Chain Monte Carlo (MCMC)

PROBABILISTIC REASONING SYSTEMS

Markov chain Monte Carlo

On Markov chain Monte Carlo methods for tall data

p L yi z n m x N n xi

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Bayesian Networks (Part II)

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Introduction to Restricted Boltzmann Machines

Markov Chain Monte Carlo methods


SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Bayesian Methods for Machine Learning

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

16 : Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Lecture 6: Graphical Models

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov Chain Monte Carlo

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Bayes Networks 6.872/HST.950

Transcription:

article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition & Machine Learning Ch.. Bishop

In terms of difficulty, there are 3 types of inference problem. Inference which is easily solved with Bayes rule. Inference which is tractable using some dynamic programming technique. e.g. Variable Elimination or J-tree algorithm Today s focus Inference which is proved intractable & should be solved using some Approimate Method. e.g. Approimation with Optimization or Sampling technique.

Agenda When to use article-based Approimate Inference? Forward Sampling & Importance Sampling Markov Chain Monte Carlo MCMC Collapsed articles

Agenda When to use article-based Approimate Inference? Forward Sampling & Importance Sampling Markov Chain Monte Carlo MCMC Collapsed articles

Eample : General Factorial HMM A clique size=5, intractable most of times. o tractable elimination eist V V 2 V 3 V V 2 V 3 W W 2 W 3 W W 2 W 3 Y Y 2 Y 3 Y Y 2 Y 3 Z Z 2 Z 3 Moralize Z Z 2 Z 3 2 3 2 3

Some Model are Intractable for Eact Inference Eample: A Grid MRF

Some Model are Intractable for Eact Inference Eample: A Grid MRF

Some Model are Intractable for Eact Inference Eample: A Grid MRF

Some Model are Intractable for Eact Inference Eample: A Grid MRF

Some Model are Intractable for Eact Inference Eample: A Grid MRF Approimate Inference needed. Generally, we will have clique of size for a * grid, which is indeed intractable.

General idea of article-based Monte Carlo Approimation Most of Queries we want can be formed as: E [ f ]...... K * f... K which is intractable most of time. Assume we can generate i.i.d. samples n from, we can approimate above using: fˆ n f n It s a unbiased estimator whose variance converges to 0 when. E[ fˆ] Var[ fˆ] E[ n f n ] E[ f ] n Var[ f ] Var[ f ] 2 n Intractable when K. K Var. not Related to dimension of. Var0 as

Which roblem can use article-based Monte Carlo Approimation? Type of queries:. Likelihood of evidence/assignments on variables 2. Conditional robability of some variables given others. 3. Most robable Assignment for some variables given others. roblem which can be written as following form: E [ f ]...... K * f... K K

Marginal Distribution Monte Carlo }] [{ } *{,, k k k k k k k k k k k E k k k article-based Approimation: To Compute Marginal Distribution on k } { ˆ n k n k f Just count the proportion of samples in which k = k

Marginal Joint Distribution Monte Carlo }] & [{ } & *{,,,,, j j i i i j j i i j k j i ij j j i i j j i i E ij ij article-based Approimation: To Compute Marginal Distribution on i, j }, { ˆ n j n j i n i f Just count the proportion of samples in which i = i & j = j

So What s the roblem? ote what we can do is: Evaluate the probability/likelihood =,, K = K. What we cannot do is: Summation / Integration in high-dim. space:,, K. What we want to do for approimation is: Draw samples from,, K. How to make better use of samples? How to know we ve sampled enough?

How to draw Samples from? Forward Sampling draw from ancestor to descendant in B. Rejection Sampling create samples using Forward Sampling, and reject those inconsistent with evidence. Importance Sampling Sample from proposal dist. Q, but give large weight on sample with high likelihood in. Markov Chain Monte Carlo Define a Transition Dist. T s.t. samples can get closer and closer to.

Agenda When to use article-based Approimate Inference? Forward Sampling & Importance Sampling Markov Chain Monte Carlo MCMC Collapsed articles

Forward Sampling B E A t t 0.95 B 0.00 Burglary t f 0.94 f t 0.29 f f 0.0 Earthquake E 0.002 Alarm A J t 0.90 f 0.05 John Calls Mary Calls A M t 0.70 f 0.0

Forward Sampling B E A t t 0.95 B ~b 0.00 Burglary t f 0.94 a f t 0.29 f f 0.0 Earthquake E e 0.002 Alarm A J t 0.90 j f 0.05 John Calls i.i.d Sample : ~b,e,a,j,~m Mary Calls A M t 0.70 ~m f 0.0

Forward Sampling Samples : ~b,e,a,j,~m ~b,~e, a,~j, ~m article-based Represent of the joint distribution B,E,A,J,M. n n n m M b B m M b B } ~, { ~, n n m M m M } { What if we want samples from B, E, A J=j, M=~m?. Collect all samples in which J=j, M=~m. 2. Those samples form the particle-based representation of B, E, A J=j, M=~m. Disadvantage.

Forward Sampling from Z Data? Z Z Z 2 Z 3 Z K 0 0. Forward Sampling times. 2. Collect all samples Z n, n in which =, 2 =0, 3 =, K =0. 3. Those samples form the particle-based representation of Z. How many such samples can we get?? *Data!! Less than if not large enough Solutions.

Importance Sampling to the Rescue ] * [ * * * ] [ f Q E f Q Q f f E Q We need not draw from to compute E [f] : That is, we can draw from an arbitrary distribution Q, but give larger weights on samples having higher probability under. n n n n f Q f E * ] [ ˆ

Importance Sampling to the Rescue ] ~ [ ~ ~ Q E Q Q Z Q Sometimes we can only evaluate an unnormalized distribution : ~ ] * ~ [ ] [ f Q E Z f E Q n n n Q Z ~ ˆ n n n n n n n Q f Q f E f E ~ ~ * ~ Ẑ ] [ ˆ ] [ ˆ ~, Z where ote that we can compute only if we can evaluate a normalized distribution, that is, we have or is from a B. Ẑ Q Q Z Q Then we can estimate Z as follows:

Importance Sampling from Z Data? Z ZA ZB 2 ZA 3 ZA K 0 0. Sampling from Z, a normalized distribution obtained from B truncating the part with evidence.

Importance Sampling from Z Data? Z ZA ZB 2 ZA 3 ZA K w n Data Z 0 0 A 0 B A... 0 A. Sampling from Z, a normalized distribution obtained from B truncating the part with evidence. 2. Give each sample Zn, n a weight: w n ~ Z Z Data Z Q Z Z Data Z

Importance Sampling from Z Data? Z Z Z 2 0 Z 3 Z K 0. Sampling from Z, a normalized distribution obtained from B truncating the part with evidence. 2. Give each sample Zn, n a weight: w n ~ Z Q Z Z Data Z Z 3. The effective number of samples is ˆData n w n n Data Z eff w n Data Z A,B,A,,A w= 0.0 A,B,A,,B 0.3 B,B,B,,A.0 n n eff =.3 Data= eff / =.3/3

Importance Sampling from Z Data? Z Z Z 2 Z 3 0 0 Z K A,B,A,,A w= 0.0 A,B,A,,B 0.3 B,B,B,,A.0 eff =.3 Data= eff / =.3/3 To get estimate of Z Data : 0.0*0 0.3*0.0* ˆ Z B Data.3 0.76 0.0*0 0.3*.0*0 ˆ Z A, ZK B Data.3 0.23 Any joint dist. can be estimated. o out of clique problem

Bayesian Treatment with Importance Sampling E. θ Y= = logistic θ * + θ 0 Often, osterior on parameters θ : d Data Data ata Data Data D θ Y is intractable because many types of θ Data θ cannot be integrated analytically. We need not evaluate the integration to estimate θ Data using Importance Sampling. ˆ } { } { D ˆ Data a a Data Data a a Data ata a n n n n n n n Approimate with:

^ What s the problem? Q Val If and Q not matched properly Only small number of samples will fall in the region with high. Very large needed to get a good picture of.

How Z and QZ Match? When evidence is close to root, forward sampling is a good QZ, which can generate samples with high likelihood in Z. Evidence QZ close to Z

How Z and QZ Match? When evidence is on the leaves, forward sampling is a bad QZ, yields very low likelihood= Z. QZ far from Z So we need very large sample size to get a good picture of Z. Evidence Can we improve with time to draw from a distribution more like the desired Z? MCMC try to draw from a distribution closer and closer to Z. Apply equally well in B & MRF.

Agenda When to use article-based Approimate Inference? Forward Sampling & Importance Sampling Markov Chain Monte Carlo MCMC Collapsed articles

What is Markov Chain MC? A set of Random Variables: =,. K ossible Configurations of Variables change with Time: t = t,. K t which take transition following: t+ = t = = T There is a stationary distribution π T for Transition T, in which: π T = = π T = * T After transition, still the same distribution over all possible configurations ~ 3 E. The MC Markov Chain above has only variable taking on values {, 2, 3 }, There is a π T s.t. 0.25 0 0.75 T * T 0.2 0.5 0.3 0 0.7 0.3 0.2 0.5 0. 3 T 0.5 0.5 0

What is MCMC Markov Chain Monte Carlo? Importance Sampling is efficient only if Q matches well. Finding such Q is difficult. Instead, MCMC tries to find a transition dist. T, s.t. tends to transit into states with high, and finally follows stationary dist. π T =. T Setting 0 =any initial value, we samples, 2 M following T, and hope that M follows stationary distribution π T =. If M really does, we got a sample M from. Why will the MC converge to stationary distribution? there is a simple, useful sufficient condition: Regular Markov Chain : for finite state space Any state can reach any other states with prob. > 0. all entries of otential/cd > 0 M follows a unique π T as M large enough.

Eample Result

How to define T? ---- Gibbs Sampling Gibbs Sampling is the most popular one used in Graphical Model. In graphical model : It is easy to draw sample from each individual variable given others k -k, while drawing from the joint dist. of, 2,, K is difficult. So, we define T in Gibbs-Sampling as : Taking transition of ~ K in turn with transition distribution : T, T 2 2 2, T K K K Where T k k k = k = k -k Redraw k ~ conditional dist. given all others. In a Graphical Model, k = k -k = k = k Markov Blanket k

Gibbs Sampling for MRF Gibbs Sampling : t=. Initialize all variables randomly. for t = ~M for every variable 2. Draw t from t-. end end 0 0 0 0 0 Y Y, Y, Y Y 0, Y φ,y 0 0 5 9

Gibbs Sampling for MRF Gibbs Sampling : t=2. Initialize all variables randomly. for t = ~M for every variable 2. Draw t from t-. end end Y Y, Y, Y Y 0, Y For the central node: *9*9* 0.76 *9*9* 5***5 0 0 0 0 0 φ,y 0 0 5 9

Gibbs Sampling for MRF Gibbs Sampling : t=3. Initialize all variables randomly. for t = ~M for every variable 2. Draw t from t-. end end Y Y, Y, Y Y 0, Y For the central node: 9*9*9*9 0.99 9*9*9*9 *** 0 0 0 φ,y 0 0 5 9

Gibbs Sampling for MRF Gibbs Sampling : t=3. Initialize all variables randomly. for t = ~M for every variable 2. Draw t from t-. end end 0 0 0 When M is large enough, M follows stationary dist. : T C Z C Regularity: All entries in the otential are positive. φ,y 0 0 5 9

Why Gibbs Sampling has π T =? To prove is the stationary distribution, we prove is invariant under T k k k : Assume,, K currently follows = k -k * -k,. After T k k k, -K still follows -k because they are unchanged. 2. After T k k k = k = k -k new state indep. from current value k k t still follows k -k. So, after T,, T K K, =,, K still follows. Uniqueness & Convergence guaranteed from Regularity of MC.

Gibbs Sampling not Always Work When drawing from individual variable is not possible: We can evaluate Y but not Y. ker, K, log, 2 n 0 2 2 2 0 trick nel Y w w istic Y w w w Y n on-linear Dependency : Large State Space :...,, statespace, 3 2 G G G Learning Structure In G G G Data G G Data Data G Too large state space to do summation d Y Y Y Intractable Integration Other MCMC like Metropolis-Hasting needed. see reference.

Metropolis-Hasting ----MCMC Metropolis-Hasting M-H is a general MCMC method to sample Y whenever we can evaluate Y. evaluation of Y not needed In M-H, instead of drawing from Y, we draw from another roposal Dist. T based on current sample, and Accept the roposal with probability: accept from to ' 'T' T ',, if o. w. 'T' T Accept? ' T

Eample : = μ,σ 2 roposal Dist. T =, 0.2 2 red: Reject green: Accept. ' '..,, ;, ; ' - ', ' 2 2 case this T T o w if from to accept

Eample : Structure osterior = G Data roposal Distribution: T G G' add/remove a randomly chosen edge of G G' B A T G G' C B A C accept from G to G', if Data G' Data G', o. w. Data G Data G T G G' T G' G this case.

Why Metropolis-Hasting has π T =? Detailed-Balance Sufficient Condition: If π T *T = π T *T, then π T is stationary under T. Given desired π T =, and a roposal dist. T, we can let Detailed Balance satisfied using accept prob. A :.., ' ' ' ' ' ', ' ' ' ' '* * ' ' e know : ', ' ' o w T T T T A define T T T T W then T T Assume T T π T π T

How to Collect Samples? Assume we want collecting samples:. Run times of MCMC and collect their M th samples. M MC M MC 2. Run time of MCMC and collect M+ th ~ M+ th samples. M M+ MC What s the difference??

How to Collect Samples? Assume we want collecting samples: Cost = M* samples. Run times of MCMC and collect their M th samples. M MC Independent Samples M MC 2. Run time of MCMC and collect M+ th ~ M+ th samples. Cost = M + samples M M+ MC Correlated Samples

Comparison Cost = M* samples M M MC Independent Samples MC Cost = M + samples M MC M+ Correlated Samples E[ fˆ] E[ n f n ] E[ n f n ] n E[ f n ] E[ f ] o Independent Assumption Used Unbiased Estimator in both cases. For simple analysis, Take =2: Var[ fˆ] Var 4 Var[ f [ f 2 f ] Var[ f 2 2 ] Var[ f ] ] 2 2 Var[ fˆ] Var[ f f ] 2 2 2 Var[ f ] Var[ f ] 2Cov[ f, f ] 4 Var[ f ] Var[ f ] Var[ f ] 2 * f, f 2 2 2 ractically, many correlated samples right outperforms few independent samples left.

How to Check Convergence? M M+ MC Should be consistent if converge to π T M M+ MC 2 Check Ratio B W close to enough. assume K MCs,each with samples. f K K k f k B Var. between MC K - K k f k f 2 W Var. within MC K K k n f k, n f k 2

The Critical roblem of MCMC When ρ, M, Var[.] not decreasing with MCMC cannot yield acceptable result in reasonable time. 0.000 0.0002 Taking very large M to converge to π T.

How to Reduce Correlation ρ among Samples? Taking Large Step in Sample Space : Block Gibbs Sampling Collapsed-article Sampling

roblem of Gibbs Sampling Correlation ρ between samples is high, when correlation among variables ~ K is high. 0 Y Y 0 φy, 0 0 99 99 Taking very large M to converge to π T.

Block Gibbs Sampling Draw block of variables jointly:,y=y Marginal 00 00 Y Y φy, 0 0 99 99 Converge to π T much quickly.

Block Gibbs Sampling Divide into several tractable blocks, 2,, B. Each block b can be drawn jointly given variables in other blocks. V V 2 V 3 W W 2 W 3 Y Y 2 Y 3 Z Z 2 Z 3 2 3

Block Gibbs Sampling Divide into several tractable blocks, 2,, B. Each block b can be drawn jointly given variables in other blocks. Given samples on other blocks, Drawing a block jointly from is tractable. V V 2 V 3 W W 2 W 3 Y Y 2 Y 3 Z Z 2 Z 3 2 3

Block Gibbs Sampling Divide into several tractable blocks, 2,, B. Each block b can be drawn jointly given variables in other blocks. Given samples on other blocks, Drawing a block jointly from is tractable. V V 2 V 3 W W 2 W 3 Y Y 2 Y 3 Z Z 2 Z 3 2 3

Block Gibbs Sampling Divide into several tractable blocks, 2,, B. Each block b can be drawn jointly given variables in other blocks. Given samples on other blocks, Drawing a block jointly from is tractable. V V 2 V 3 W W 2 W 3 Y Y 2 Y 3 Z Z 2 Z 3 2 3

Block Gibbs Sampling Divide into several tractable blocks, 2,, B. Each block b can be drawn jointly given variables in other blocks. Given samples on other blocks, Drawing a block jointly from is tractable. V V 2 V 3 W W 2 W 3 Y Y 2 Y 3 Z Z 2 Z 3 2 3

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw A from: MA= B fa,bmb A B C D E fa,b fb,c fc,d fd,e MB MC MD

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw B from: fa=a,bmb a B C D E fa,b fb,c fc,d fd,e MB MC MD

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw C from: fb=b,cmc a b C D E fa,b fb,c fc,d fd,e MC MD

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw D from: fc=c,dmd a b c D E fa,b fb,c fc,d fd,e MD

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw E from: fd=d,e a b c d E fa,b fb,c fc,d fd,e

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw E from: fd=d,e a b c d e fa,b fb,c fc,d fd,e

Block Gibbs Sampling by VE Drawing from a block b jointly may need pass of VE. Draw Another Block.

KC K K K C C C K K K C C C S T T T T T T S KC. Intractable with Eact Inference. 2. Gibbs Sampling may converge too slow. K K K C C C T T T C C C Divide into 4 blocks:. Student type 2. KC type 3. KState 4. roblem Type K K K S KC

KC K K K C C C K K K C C C S T T T T T T S KC Student: Indep. to each other given samples on other blocks. Easy to Draw Jointly. K K K C C C T T T C C C K K K Divide into 4 blocks:. Student type 2. KC type 3. KState 4. roblem Type S KC

KC K K K C C C K K K C C C S T T T T T T S KC KC: Indep. to each other given samples on other blocks. Easy to Draw Jointly. K K K C C C T T T C C C K K K Divide into 4 blocks:. Student type 2. KC type 3. KState 4. roblem Type S KC

KC K K K C C C K K K C C C S T T T T T T S KC roblem Type: Indep. to each other given samples on other blocks. Easy to Draw Jointly. K K K C C C T T T C C C K K K Divide into 4 blocks:. Student type 2. KC type 3. KState 4. roblem Type S KC

KC K K K C C C K K K C C C S T T T T T T S KC KState: Chain-structured given samples on other blocks. Draw jointly using VE. K K K C C C T T T C C C K K K Divide into 4 blocks:. Student type 2. KC type 3. KState 4. roblem Type S KC

Agenda When to use Approimate Inference? Forward Sampling & Importance Sampling Markov Chain Monte Carlo MCMC Collapsed articles

Collapsed article Eact: article-based: Collapsed-article: * ] [ f f E ˆ n n f f p d p d p f f f E * * ] [, ] [ ˆ n n p d d n p d f f E Divide into 2 parts { p, d }, where d can do inference given p If p contains few variables, Var. can be much reduced!!

KC K K K C C C K K K C C C S T T T T T T S KC d can be eactly inferenced given p. K K K C C C T T T C C C K K K Divide into {p,d} p: d: Student type KC type roblem Type KState S KC

Collapsed article with VE S KC K K K C C C To draw k, Given all other variables in p sum out all other variables in d T T T fs,kc=k,k fs,kc=k,k,k2 S KC K K2 MS,KC=kc,K2 S KC K2 K3 MK,T=t Draw S given KC=k & T=t from: MS= K,K2 FS,KC=k,K,K2 MK,T=t MS, KC=k,K2 K T K2 T2 K3 T3 ft3

Collapsed article with VE S KC K K K C C C To draw k, Given all other variables in p sum out all other variables in d T T T fs=s,kc,k fs=s,kc,k,k2 S KC K K2 MS=s,KC,K2 S KC K2 K3 MK,T=t Draw KC given S=s & T=t from: MKC= K,K2 FS=s,KC,K,K2 MK,T=t MS=s,KC,K2 K T K2 T2 K3 T3 ft3

Collapsed article with VE S KC K K K C C C To draw k, Given all other variables in p sum out all other variables in d T T T S KC K K2 S KC K2 K3 MK,S=s, KC=k Draw T given S=s & KC=k from: MT = K MK,S=s,KC=k FK,T K T K2 T2 K3 T3 ft3

p Collect Samples d S, KC, T, T2, T3 K, K2, K3 Intel, Quick, Hard, Easy, Hard Intel, Slow, Easy, Easy, Hard.. Dull, Slow, Easy, Easy, Hard Average {/3,/3,/3}, {/4,/4,/2}, {/2,/2,0} {/2,/2,/4}, {/5,4/5,0}, {/4,/4,/2}.. {/3,/3,/3}, {/4,/4,/2}, {/2,/2,0} Average Eˆ [ f ] n d d n p f d, n p