Machine Learning for Data Science (CS4786) Lecture 24

Size: px

Start display at page:

Download "Machine Learning for Data Science (CS4786) Lecture 24"

Harry Cameron
5 years ago
Views:

1 Machine Learning for Data Science (CS4786) Lecture 24 HMM Particle Filter Course Webpage :

2 Rejection Sampling

3 Rejection Sampling Sample variables from joint distribution (both latent and observed variables)

4 Rejection Sampling Sample variables from joint distribution (both latent and observed variables) Throw away all ones that don t match observation

5 Rejection Sampling Sample variables from joint distribution (both latent and observed variables) Throw away all ones that don t match observation Now compute empirical frequencies

6 Rejection Sampling Sample variables from joint distribution (both latent and observed variables) Throw away all ones that don t match observation Now compute empirical frequencies Problem: too wasteful (too many Rejections)

7 REJECTION SAMPLING Algorithm: Topologically sort variables (parents first children later) For t = 1 to n (number of samples) For i = 1 to N (number of variables in model) End For End For Sample x t i P(X i Parents(X i ) already sampled) Discard samples that do not match observations Compute empirical frequencies

8 IMPORTANCE SAMPLING We really want to draw from distribution P. But we can only draw from distribution Q easily Trick: Draw x 1,...,x n Q Re-weight each sample x t by P(X = x t )Q(X = x t )

9 Importance Sampling

10 Importance Sampling For bayesian networks:

11 Importance Sampling For bayesian networks: What we want: P (X Latent1,...,X Latentm X Observed1,...X Observedn )

12 Importance Sampling For bayesian networks: What we want: P (X Latent1,...,X Latentm X Observed1,...X Observedn ) What we sample from: my P (X Parent(X )) Latentj Latentj j=1

13 Importance Sampling For bayesian networks: What we want: P (X Latent1,...,X Latentm X Observed1,...X Observedn ) What we sample from: my P (X Parent(X )) Latentj Latentj Weight: j=1 ny P (X Parent(X )) Observedi Observedi i=1

14 IMPORTANCE SAMPLING Likelihood weighting: Topologically sort variables (parents first children later) For t = 1 to n (number of samples) Set w t = 1 For i = 1 to N (number of variables) End For Output, End For If X i is observed, Set w t w t P(X i = x i Parents(X i ) = already sampled) Set x t i = x i (the observed value) Else, sample x t i P(X i Parents(X i ) = already sampled) P(Variable = valueobservation) = n t=1 w t1{variable = value} n t=1 w t

15 HIDDEN MARKOV MODEL (HMM)

16 HIDDEN MARKOV MODEL (HMM)

17 Example: HIDDEN MARKOV MODEL (HMM)

18 HIDDEN MARKOV MODEL (HMM) Example: But you don t observe location (dark room)

19 HIDDEN MARKOV MODEL (HMM) Example: But you don t observe location (dark room) You hear how close the bot is!

20 HIDDEN MARKOV MODEL (HMM) Example: But you don t observe location (dark room) You hear how close the bot is! What you hear: + noise

21 HIDDEN MARKOV MODEL (HMM) Example: But you don t observe location (dark room) You hear how close the bot is! What you hear: + noise Can you catch the Bot? In time?

22 HIDDEN MARKOV MODEL (HMM) S 1 S 2 S 3 X 1 X 2 X 3 Xt s are what you hear (observation) St s are the unseen locations (states) Eg: for m x m grid we have, K = m 2 states Number of alphabets = # colors you can observe

23 HIDDEN MARKOV MODEL (HMM) S 1 S 2 S 3 X 1 X 2 X 3 Eg: for m x m grid we have, K = m 2 states

24 HIDDEN MARKOV MODEL (HMM) S 1 S 2 S 3 X 1 X 2 X 3 Eg: for m x m grid we have, K = m 2 states Transition matrix is K x K (too large)

25 HIDDEN MARKOV MODEL (HMM) S 1 S 2 S 3 X 1 X 2 X 3 Eg: for m x m grid we have, K = m 2 states Transition matrix is K x K (too large) Use sampling to do approximate inference Number of samples n << m 4

26 Inference Question Can we compute (efficiently and approximately) P (S t x 1,...,x t 1 ) We cant afford too much time to compute since we need to move the bot in time

27 HIDDEN MARKOV MODEL (HMM)

28 HIDDEN MARKOV MODEL (HMM)

29 HIDDEN MARKOV MODEL (HMM)

30 HIDDEN MARKOV MODEL (HMM)

31 HIDDEN MARKOV MODEL (HMM)

32 HIDDEN MARKOV MODEL (HMM)

33 HIDDEN MARKOV MODEL (HMM)

34 HIDDEN MARKOV MODEL (HMM)

35 HIDDEN MARKOV MODEL (HMM)

36 HIDDEN MARKOV MODEL (HMM)

37 HIDDEN MARKOV MODEL (HMM)

38 HIDDEN MARKOV MODEL (HMM)

39 HIDDEN MARKOV MODEL (HMM)

40 HIDDEN MARKOV MODEL (HMM)

41 HIDDEN MARKOV MODEL (HMM)

42 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

43 HIDDEN MARKOV MODEL (HMM) Eg: say observations were Rejection sampling: Reject samples that don t match observations

44 HIDDEN MARKOV MODEL (HMM) Eg: say observations were Rejection sampling: Reject samples that don t match observations We can do this sequentially!

45 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

46 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

47 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

48 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

49 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

50 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

51 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

52 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

53 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

54 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

55 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

56 HIDDEN MARKOV MODEL (HMM) Eg: say observations were Problem: Most samples rejected

57 HIDDEN MARKOV MODEL (HMM) Eg: say observations were Multiple samples simultaneously. Problem: Most samples rejected

58 HIDDEN MARKOV MODEL (HMM) Eg: say observations were

59 HIDDEN MARKOV MODEL (HMM) Eg: say observations were Importance weighting: weight samples

60 HIDDEN MARKOV MODEL (HMM) Eg: say observations were Importance weighting: weight samples

61 HIDDEN MARKOV MODEL (HMM) Eg: say observations were P( X1=13) x P( X2=8) x P( X3=9) x P( X5=24) x P( X5=19) xp( X4=14) Importance weighting: weight samples

62 HMM PARTICLE FILTER Use multiple samples and track each ones weights.

63 HMM PARTICLE FILTER Use multiple samples and track each ones weights.

64 HMM PARTICLE FILTER Use multiple samples and track each ones weights.

65 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1)

66 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1)

67 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2)

68 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2)

69 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2) P( X3)

70 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2) P( X3) P( X5) P( X5) P( X4)

71 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2) P( X3) P( X5) P( X5) P( X4) This is same as 6 separate samples

72 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2) P( X3) P( X5) P( X5) P( X4) This is same as 6 separate samples Instead of tracking each sample s weight, resample according to weights

73 HMM PARTICLE FILTER Use multiple samples and track each ones weights. P( X1) P( X2) P( X3) P( X5) P( X5) P( X4) This is same as 6 separate samples Instead of tracking each sample s weight, resample according to weights Problem: Too many samples have negligible weight!

74 HMM PARTICLE FILTER Instead of tracking each one, resample!

75 HMM PARTICLE FILTER Instead of tracking each one, resample!

76 HMM PARTICLE FILTER Instead of tracking each one, resample!

77 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1)

78 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1)

79 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1)

80 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2)

81 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2)

82 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2)

83 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2) P( X3)

84 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2) P( X3)

85 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2) P( X3) P( X5) P( X5) P( X4)

86 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2) P( X3) P( X5) P( X5) P( X4) On every round, transfer particles from previous states according to transition probability

87 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2) P( X3) P( X5) P( X5) P( X4) On every round, transfer particles from previous states according to transition probability Resample particles according to P(observation state)

88 HMM PARTICLE FILTER Instead of tracking each one, resample! P( X1) P( X2) P( X3) P( X5) P( X5) P( X4) On every round, transfer particles from previous states according to transition probability Resample particles according to P(observation state) Use new particles to proceed

89 Particle Filtering

90 Particle Filtering Without resampling, we carry many particles with very small probabilities

91 Particle Filtering Without resampling, we carry many particles with very small probabilities too many samples needed for a good estimate

92 Particle Filtering Without resampling, we carry many particles with very small probabilities too many samples needed for a good estimate By resampling, we got rid of samples with very small probabilities

93 Particle Filtering Without resampling, we carry many particles with very small probabilities too many samples needed for a good estimate By resampling, we got rid of samples with very small probabilities Hence fewer samples suffice

94 HMM PARTICLE FILTER Inference time only depends on number of samples Of course more the samples the better accuracy Often we don t need too many samples. Why?

95 Gibbs Sampling Repeat n times for, n samples, Start with arbitrary value for variables Replace each variable by new sample from P(Variable all other variables) Go over all variables multiple times Return final sample of the N variables

96 VARIATIONAL INFERENCE Basic idea: we want to infer P(UnobservedObserved) We create a new parametric distribution Q (Unobserved) where is picked based on Obervations We pick such that, Q is close to P(UnobservedObserved) Closeness measured using KL divergence Mean-field approximation, Q (X 1,...,X m ) = m Q j (X j ) j=1

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each