Inverse Problems in the Bayesian Framework

Size: px
Start display at page:

Download "Inverse Problems in the Bayesian Framework"

Transcription

1 Inverse Problems in the Bayesian Framework Daniela Calvetti Case Western Reserve University Cleveland, Ohio Raleigh, NC, July 2016

2 Bayes Formula Stochastic model: Two random variables X R n, B R m, where B is the observed quantity, X is the quantity of primary interest. Goal: Find the posterior probability density, π X (x b) = density of X, given the observation B = b, b = b observed.

3 Bayes Formula Prior information: Given the prior density π X (x), encoding our prior information or prior belief about the possible vaues of X. Likelihood: Assuming that X = x, what would the forward model predict for the value distribution of B? Encode this information in π B (b x). Evidence: With prior and likelihood, compute π B (b) = π XB (x, b)dx = π B (b x)π X (x)dx. R n R n Bayes formula for probability densities π X (x b) = π B(b x)π X (x). π B (b)

4 Linear Inverse Problems Consider the problem of estimating x from b = Ax + e, A R m n. Stochastic extension: Write B = AX + E, and assume the noise and prior model X N (0, D), E N (0, C). Usually it is assumed that X and E are independent. In particular, E { XE T} = E { X } E { E } T = 0.

5 Linear Inverse Problems However, this is not necessary, and we may have E { XE T} = R R n m. Define a new random variable [ X Z = B ] R n+m. Covariance matrix of Z: ZZ T = = [ ] X [ X B T B T ] [ XX T XB T ] BX T BB T. Compute the expectation of this matrix.

6 Linear Inverse Problems Expectations: E { XX T} = D, E { XB T} = E { X (AX + E) T} = E { XX T A T + XE T} = E { XX T} A T + E { XE T} = DA T + R. Furthermore, E { BX T} = E { XB T} T = AD + R T,

7 Linear Inverse Problems and, finally, E { BB T} = E { (AX + E)(AX + E) T} = E { AXX T A T + EX T A T + AXE T + EE T} = ADA T + R T A T + AR + C. Conclusion: [ Cov(Z) = D AD + R T DA T + R ADA T + R T A T + AR + C ] [ Γ11 Γ = 12 Γ 21 Γ 22 ].

8 Schur Complements and Conditioning Given a Gaussian random variable [ ] X Z = R n+m B with covariance Γ R (m+n) (m+n), what is the probability density of X, given B = b?

9 Schur Complements and Conditioning Assume that X N (0, Γ), where Γ R n n is a given SPD matrix. Partitioning of X, X = [ X1 X 2 ] R k R n k. Question: Assume that X 2 = x 2 is observed. What is the conditional probability density of X 1, π X1 (x 1 x 2 ) =?

10 Schur Complements and Conditioning Write π X (x) = π X1,X 2 (x 1, x 2 ). Bayes formula: The distribution of unknown part x 1 provided that x 2 is known, is π X1 (x 1 x 2 ) π X1,X 2 (x 1, x 2 ), x 2 = x 2,observed. In terms of the Gaussian density, π X1,X 2 (x 1, x 2 ) exp ( 1 ) 2 x T Γ 1 x. (1)

11 Schur Complements and Conditioning Partitioning of the covariance matrix: [ ] Γ11 Γ Γ = 12 R n n, (2) Γ 21 Γ 22 where and Γ 11 R k k, Γ 22 R (n k) (n k), k < n, Γ 12 = Γ T 21 R k (n k).

12 Schur Complements and Conditioning Precision matrix B = Γ 1. Partition B: [ B11 B B = 12 B 21 B 22 ] R n n. (3) Quadratic form x T Bx appearing in the exponential: [ ] [ ] [ B11 B Bx = 12 x1 B11 x = 1 + B 12 x 2 B 21 B 22 x 2 B 21 x 1 + B 22 x 2 ],

13 Schur Complements and Conditioning x T Bx = [ x1 T x2 T ] [ ] B 11 x 1 + B 12 x 2 B 21 x 1 + B 22 x 2 = x1 T ( ) ( ) B11 x 1 + B 12 x 2 + x T 2 B21 x 1 + B 22 x 2 = x1 T B 11 x 1 + 2x1 T B 12 x 2 + x2 T B 22 x 2 = ( x 1 + B 1 11 B ) TB11 ( 12x 2 x1 + B 1 11 B ) 12x 2 + x2 T ( B22 B 21 B 1 11 B 12) x2. }{{} independent of x 1

14 Schur Complements and Conditioning From this key equation for conditional densities it follows that ( π X1 (x 1 x 2 ) exp 1 ( x1 + B B ) TB11 ( 12x 2 x1 + B 1 11 B ) ) 12x 2. Thus the conditional density is Gaussian, with mean and covariance matrix x 1 = B 1 11 B 12x 2, C = B Question: How to express these formulas in terms of Γ?

15 Schur Complements and Conditioning Consider a partitioned SPD matrix Γ R n n. For any v R k, x 0 v T Γ 11 v = [ v T 0 ] [ ] [ Γ 11 Γ 12 v T Γ 21 Γ 22 0 ] > 0, showing the positive definiteness of Γ 11. The same holds for Γ 22. In particular, Γ 11 and Γ 22 are invertible.

16 Schur Complements and Conditioning To calculate the inverse of Γ, we solve the equation Γx = y in block form. By partitioning, x = [ x1 x 2 ] R k R n k, y = [ y1 y 2 ] R k R n k. we have Γ 11 x 1 + Γ 12 x 2 = y 1, Γ 21 x 1 + Γ 22 x 2 = y 2.

17 Schur Complements and Conditioning Eliminate x 2 from the second equation, x 2 = Γ 1 ( ) 22 y2 Γ 21 x 1, substitute back into the first equation: Γ 11 x 1 + Γ 12 Γ 1 ( ) 22 y2 Γ 21 x 1 = y1, and by rearranging the terms, ( Γ11 Γ 12 Γ 1 22 Γ ) 21 x1 = y 1 Γ 12 Γ 1 22 y 2. Define the Schur complement of Γ 22 : Γ22 = Γ 11 Γ 12 Γ 1 22 Γ 21

18 Schur Complements and Conditioning It can be shown that Γ 22 must be invertible, and therefore x 1 = 1 Γ 22 y 1 1 Γ 22 Γ 12Γ 1 22 y 2. Similarly, interchanging the roles of x 1 and x 2, where x 2 = is the Schur complement of Γ Γ 11 y 1 2 Γ 11 Γ 21Γ 1 11 y 1, Γ11 = Γ 22 Γ 21 Γ 1 11 Γ 12 In matrix form: [ ] [ x1 = x 2 Γ 1 22 Γ 1 22 Γ 12Γ 1 22 Γ 1 11 Γ 1 11 Γ 21Γ 1 11 ] [ ] y1 y 2

19 Schur Complements and Conditioning Conclusion: [ Γ 1 = Γ 1 22 Γ 1 22 Γ 12Γ 1 22 Γ 1 11 Γ 1 11 Γ 21Γ 1 11 ] [ B11 B = 12 B 21 B 22 ]. Thus The conditional density π X1 (x 1 x 2 ) is a Gaussian π X1 (x 1 x 2 ) N (x 1, C), where and x 1 = B 1 11 B 12x 2 = Γ 12 Γ 1 22 x 2, C = B 1 11 = Γ 22.

20 Linear Inverse Problems For simplicity, let us assume that R = 0. Posterior density π X (x b) is a Gaussian density, with mean and covariance x = Γ 12 Γ 1 22 b = DAT( ADA T + C ) 1 b, Φ = Γ 22 = Γ 11 Γ 12 Γ 1 22 Γ 21 = D DA T( ADA T + C ) 1 AD.

21 Example: Numerical differentiation Discretization: f (t) = t 0 g(τ)dτ + noise. b = Ax + e, where 1 A = n

22 Stochastic extension Stochastic model B = AX + E. Model for noise: Independent components, Probability density: π E (e) = E j N (0, σ 2 ), 1 j n. ( ) 1 n/2 ( 2πσ 2 exp 1 ) 2σ 2 e 2. Likelihood: ( π B (b x) exp 1 b Ax 2 2σ2 ).

23 Autoregressive Prior Models Discrete model, x j = g(t j ), t j = j n, 0 j n, Consider two possible prior models: 1. We know that x 0 = 0, and believe that the absolute value of the slope of g is bounded by some m 1 > We know that x 0 = x n = 0 and believe that the curvature of g is bounded by some m 2 > 0.

24 Autoregressive models 1. Slope: g (t j ) x j x j 1, h = 1 h n, Prior information: We believe that x j x j 1 h m 1 with some uncertainty. 2. Curvature: g (t j ) x j 1 2x j + x j+1 h 2. Prior information: We believe that x j 1 2x j + x j+1 h 2 m 2 with some uncertainty.

25 Autoregressive model In both cases, we assume that x j is a realization of a random variable X j. Boundary conditions: 1. X 0 = 0 with certainty. Probabilistic model for X j, 1 j n. 2. X 0 = X n = 0 with certainty. Probabilistic model for X j, 1 j n 1.

26 Autoregressive prior models 1. First order prior: X j = X j 1 + γw j, W j N (0, 1), γ = h m Second order prior: X j = 1 2 (X j 1 + X j+1 ) + γw j, W j N (0, 1), γ = 1 2 h2 m 2. x j x j x j+1 x j 1 STD=γ x j 1 STD=γ

27 Matrix form: first order model System of equations: X 1 = X 1 X 0 = γw 1 X 2 X 1 = γw 2.. X n X n 1 = γw n L 1 = Rn n, X = X 1 X 2. X n, W = W 1 W 2. W n. L 1 X = γw, W N (0, I n ),

28 Matrix form: second order model System of equations: X 2 2X 1 = X 2 2X 1 + X 0 = γw 1 X 3 2X 2 + X 1 = γw 2.. 2X n 1 X n 2 = X n 2X n 1 + X n 2 = γw n 1 L 2 = R(n 1) (n 1), X = X 1 X 2., X n 1 L 2 X = γw, W N (0, I n 1 ),

29 Prior density Given a model that is, LX = γw, W N (0, I), π W (w) exp ( 12 ) w 2, we conclude that π X (x) exp ( 1 ) ( 2γ 2 Lx 2 = exp 1 [ ] ) 1 2 x T γ 2 LT L x. The inverse of the covariance matrix = precision matrix is D 1 = 1 γ 2 LT L.

30 Testing a Prior Question: Given a covariance D, how can we check if the prior corresponds to our expectations? Symmetric decomposition of the precision matrix (let γ = 1 for simplicity): D 1 = L T L. We know that W = LX N (0, I). Sampling of X : 1. Draw a realization w N (0, I n ) 2. Set x = L 1 w.

31 Random draws from priors Generate m draws from the prior using the Matlab command randn. n = 100; t = (0:1/n:1); m = 5; % number of discretization intervals % number of draws % First order model. Boundary condition X_0 = 0 L1 = diag(ones(1,n),0) - diag(ones(1,n-1),-1); gamma = 1/n; % m_1 = 1 W = gamma*randn(n,m); X = L1\W;

32 Plots of the random draws

33 Belief envelopes Diagonal elements of the posterior covariance: Γ jj = ηj 2 = posterior variance of X j. Posterior belief: x j 2η j < X j < x j + 2η j with posterior probability 95%.

34 Bayesian solution Mean solution and 2 STD belief envelope

35 Linear Inverse Problems An alternative (but equivalent) formula for the posterior: Prior, ( π X (x) exp 1 ) 2 x T D 1 x, and likelihood, Posterior density: π B (b x) exp ( 1 ) 2 (b Ax)T C 1 (b Ax). π X (x b) π X (x)π B (b x) exp ( 12 Q(x) ),

36 Linear Inverse Problems Quadratic term in the exponential: Q(x) = (b Ax) T C 1 (b Ax) + x T D 1 x Collect the terms of the same order in x together: Q(x) = x T ( A T C 1 A + D 1) x 2x T A T C 1 b + b T C 1 b. }{{} =M Complete the square: Q(x) = (x T M 1 A T C 1 b) T M(x T M 1 A T C 1 b) +...

37 Linear Inverse Problems Conclusion: The posterior mean and covariance have alternative expressions, x = ( A T C 1 A + D 1) 1 A T C 1 b and Φ = ( A T C 1 A + D 1) 1. The formula for x is also known as Wiener filtered solution.

38 Tikhonov Regularization Revisited Consider the linear model b = Ax + e, e N (0, C). Likelihood: π B (b x) exp Assume a Gaussian prior: or, in terms of densities, π X (x) exp ( 1 ) 2 (b Ax)T C 1 (b Ax). X N (0, D), ( 1 ) 2 x T D 1 x.

39 Tikhonov Regularization Revisited Bayes formula: p X (x b) p X (x)p B (b x) ( = exp 1 2 x T D 1 x 1 ) 2 (b Ax)T C 1 (b Ax). By writing D 1 = L T L, the negative of the exponent is H(x) = 1 ( b Ax 2 2 C + Lx 2), a Tikhonov functional.

40 Tikhonov Regularization Revisited Therefore, x MAP = argmax { π X (x b) } = argmin { b Ax 2 C + Lx 2}. The Tikhonov regularization parameter is absorbed in the prior covariance matrix as well as the noise covariance matrix.

41 Non-Gaussian models The Gaussian models are insufficient when the forward model is non-linear, the prior is non-gaussian, the noise is non-additive, the noise is non-gaussian.

42 Monte Carlo Integration Assume that a probability density π X is given in R n. Problem: Estimate numerically an integral of type E { f (X ) } = f (x)π X (x)dx =? R n Expectation of X : f (x) = x. Covariance of X : f (x) = (x x)(x x) T.

43 Monte Carlo Integration Difficulties with numerical integration using quadrature methods: We may not know the support of µ X (Support: The set in which the function is not vanishing). Where should we put our quadrature points? If n is large, an integration grid becomes huge: K points/direction means K n grid points. Try Monte Carlo integration!

44 Monte Carlo Integration Example: Given a two-dimensional set Ω R 2. Estimate the area of Ω. Raindrop integration: Assume that Ω Q = [0, a] [0, b]. Draw points from uniform density over Q: { x 1, x 2,..., x N}, x j Uniform(Q). Estimate of the area Ω : solve for Ω. Ω Q = Ω ab # of points x j Ω, N

45 Monte Carlo Integration The approximation corresponds to Monte Carlo integral Ω Q = 1 χ Ω (x)dx 1 Q Q N N χ Ω (x j ), j=1 where χ Ω (x) = { 1 if x Ω 0 ifx / Ω and 1/N is the equal weight that every point x j has. 1 Q χ Q(x) = uniform density over Q

46 Monte Carlo Integration Generalize: Given a probability density π X, write f (x)π X (x)dx 1 R n N N f (x j ), j=1 where { x 1, x 2,..., x N} is drawn independently from the probability distribution π X. Problem: How does one draw from a probability density in R n?

47 Sampling and Markov chains: Random walk Random walk is a process of moving around by taking random steps. Most elementary random walk: 1. Start at a point of your choice x 0 R n. 2. Draw a random vector w 1 N (0, I ) and set x 1 = x 0 + σw Repeat the process: Set x k+1 = x k + σw k+1, w k+1 N (0, I ).

48 Sampling and Markov chains: Random walk In terms of random variables: X k+1 = X k + σw k+1, W k+1 N (0, I n ). The conditional density of X k+1, given X k = x k is π(x k+1 x k ) = ( 1 (2πσ 2 exp 1 ) ) n/2 2σ 2 x k x k+1 2 = q k (x k, x k+1 ). The function q k is called the kth transition kernel.

49 Sampling and Markov chains: Random walk Since q 0 = q 1 = q 2 =..., i.e., the step is always equally distributed, we call the random walk time invariant. (k = time). The chain { Xk, k = 0, 1, } of random variables, is called discrete time stochastic process. Particular feature: the probability distribution X k depends of the past only through the previous member X k 1 : π(x k+1 x 0, x 1,..., x k ) = π(x k+1 x k ). A stochastic process having this property is called a Markov chain.

50 Sampling and Markov chains: Random walk Given: an arbitrary transition kernel q, a random variable X with probability density π X (x) = p(x), generate a new random variable Y by using the kernel q(x, y), that is, π(y x) = q(x, y). Question: What is the probability density of this new variable? The answer is found by marginalization, π Y (y) = π(y x)π X (x)dx = q(x, y)p(x)dx.

51 Sampling and Markov chains: Random walk If the probability density of the new variable is equal to the one of the old one, that is, q(x, y)p(x)dx = p(y), p is called an invariant density of the transition kernel q. The classical problem in the theory of Markov chains is: Given a transition kernel, find the corresponding invariant density.

52 Invariant density and sampling Recall the sampling problem: Given a probability density p = p(x), generate a sample that is distributed according to it. If we had a transition kernel q with invariant density p, generating such sample from p(x) would be easy: Start with some x 0 ; draw x 1 from q(x 0, x 1 ); In general, given x k, draw x k+1 from q(x k, x k+1 ). Rephrasing the sampling problem: Given a probability density p, find a kernel q such that p is its invariant density.

53 Metropolis Hastings algorithm Given a transition density y K(x, y), x R n current point, consider a Markov process: if x R n is the current point, we have two possibilities: 1. Stay at x with probability r(x), 0 r(x) < 1, 2. Move by using a transition kernel K(x, y).

54 Let x and y be realizations of random variables X, Y. π X (x) = p(x), and y is generated according to the algorithm above. Question: What is the probability density of Y?

55 Let A be the event that we opt for moving from x, A be the event of staying put. The probability of Y B R n assuming a move is P { Y B X = x, A } = K(x, y)dy. The kernel K is scaled so that P { X = x, A } = P { Y R n X = x, A } = K(x, y)dy = 1 r(x). R n (4) B

56 On the other hand, if we stay put, Y B happens only if X B, P { Y B X = x, A } = r(x)χ B (x) = where χ B is the characteristic function of B. { r(x), if x B, 0, if x / B,

57 Hence, the total probability of arriving from x to B is P { Y B X = x } = P { Y B X = x, A } + P { Y B X = x, A } = K(x, y)dy + r(x)χ B (x). B

58 Marginalize over x and calculate the probability of Y B P { Y B } = P { Y B X = x } p(x)dx = = = B B ( p(x) ( ( B ) K(x, y)dy dx + χ B (x)r(x)p(x)dx ) p(x)k(x, y)dx dy + r(x)p(x)dx B ) p(x)k(x, y)dx + r(y)p(y) dy.

59 Since P { Y B } = B π(y)dy, we must have π Y (y) = p(x)k(x, y)dx + r(y)p(y). Our goal is then to find a kernel K such that π Y (y) = p(y), that is p(y) = p(x)k(x, y)dx + r(y)p(y), or, equivalently, (1 r(y))p(y) = p(x)k(x, y)dx.

60 Substituting (4) in this formula, with the roles of x and y interchanged, we obtain p(y)k(y, x)dx = p(x)k(x, y)dx This equation is called the balance equation. This holds, in particular, if the integrands are equal, p(y)k(y, x) = p(x)k(x, y). The latter equation is known as detailed balance equation.

61 Metropolis-Hastings algorithm Start by a selecting a proposal distribution, or candidate generating kernel q(x, y); The kernel should be chosen so that generating a Markov chain with it is easy. A Gaussian kernel is a popular choice.

62 Metropolis-Hastings algorithm If q satisfies the detailed balance equation, i.e., p(y)q(y, x) = p(x)q(x, y), we are done, since p is an invariant density. More likely, the equality does not hold. If p(y)q(y, x) < p(x)q(x, y). (5) force the detailed balance equation to hold, defining K as where α is chosen so that K(x, y) = α(x, y)q(x, y), p(y)α(y, x)q(y, x) = p(x)α(x, y)q(x, y).

63 Metropolis-Hastings algorithm The kernel α need not be symmetric, so let α(y, x) = 1. Now the other factor is uniquely determined. We must have α(x, y) = p(y)q(y, x) p(x)q(x, y) < 1. Observe that if the inequality (5) goes the other way, interchange the roles of x and y, and let α(x, y) = 1. In summary K(x, y) = α(x, y)q(x, y), { } p(y)q(y, x) α(x, y) = min 1,. p(x)q(x, y)

64 Metropolis-Hastings algorithm Draws in two phases: 1. Given x, draw y using the transition kernel q(x, y). 2. Calculate the acceptance ratio, α(x, y) = p(y)q(y, x) p(x)q(x, y). 3. Flip the α coin: draw t Uniform([0, 1]); if α > t, accept y, otherwise stay where you are.

65 Metropolis-Hastings algorithm: Random walk proposal If q(x, y) = q(y, x), the algorithm simplifies: 1. Given x, draw y using the transition kernel q(x, y). 2. Calculate the acceptance ratio, α(x, y) = p(y) p(x). 3. Flip the α coin: draw t Uniform([0, 1]); if α > t, accept y, otherwise stay where you are.

66 Random walk Metropolis-Hastings 1. Set sample size N. Pick initial point x 1. Set k = Propose a new point, y = x k + δw, w N (0, I). 3. Compute the acceptance ratio, α = π(y) π(x k ). 4. Flip α-coin: Draw ξ Uniform([0, 1]), 4.1 If α ξ, accept: x k+1 = y, 4.2 If α < ξ, stay put: x k+1 = x k. 5. If k < N, increase k k + 1 and continue from 2., else stop.

67 Two steps Build the program in two steps: 1. Random walk sampling, no rejections 2. Add the rejection step

68 Sampling, no rejections nsample = 10000; % Sample size x = [0;0]; % Initial point step = 0.1; % Step size of the random walk Sample = NaN(2,nsample); % For memory allocation Sample(:,1) = x; for j = 2:N y = x + step*randn(2,1); end % Accept unconditionally x = y; Sample(:,j) = x;

69 Add the α-coin Write the condition in logarithmic form: π(y) π(x) > t log π(y) log π(x) > log t,

70 x = [0;0]; % Initial point step = 0.1; % Step size of the random walk Sample = NaN(2,nsample); % For memory allocation Sample(:,1) = x; logpx =... for j = 2:N y = x + step*randn(2,1); logpy =... t = rand; if logpy - logpx > log(t) % accept x = y; logpx = logpy; end Sample(:,j) = x; end

71 If a move is not accepted, the previous point has to be repeated:..., x k 1, x k, x k, x k, x k+1, x k+2,... }{{} rejections The acceptance rate tells the relative rate of acceptances. Too low acceptance rate: The chain does not move Too high acceptance rate: The chain is essentially a random walk and learns nothing of the underlying distribution (cf. raising kids). What is a good acceptance rate? Rule of thumb: 15%-35%.

72 Example: Inverse problem in chemical kinetics Reversible single reaction pair, A k 1 B, k 1 Data: With known initial values, measure [A](t j ), 1 j n for The noisy observation model is t min = t 1 < t 2 < < t n = t max. b j = [A](t j ) + e j, e j = additive noise, noise level =σ. Inverse Problem: Estimate k 1 and k 1.

73 Forward model: Mass Balance Equations Denote Assuming unit volume, c 1 (t) = [A](t), c 2 (t) = [B](t). or where dc 1 dt dc 2 dt = k 1 c 1 + k 1 c 2, c 1 (0) = c 01 = k 1 c 1 k 1 c 2, c 2 (0) = c 02 [ dc dt = Kc, c = c1 c 2 [ k1 k K = 1 k 1 k 1 ], ].

74 Eigenvalues of K are Time constant Eigenvectors: v 1 = λ 1 = 0, λ 2 = k 1 k 1. [ 1 δ τ = ] [, v 2 = 1 k 1 + k ], δ = k 1 k 1. Solution: c = αv 1 + βv 2 e t/τ, α, β R.

75 Initial conditions imply In particular, α = c 01 + c δ, β = δc 01 c δ where c 1 (t) = f (t; k) = c 01 + c δ k = [ k1 k 1 δc 01 c 02 e t/τ, 1 + δ ].

76 Observation model: Data consists of n measurements of c 1 (t) corrupted by additive noise, b j = f (t j, k) + e j, 1 j n. Observation errors e j mutually independent, zero mean normally distributed, e j N (0, σ 2 ).

77 Likelihood density Assume that the noise is Gaussian white noise: ( π noise (e) exp 1 ) 2σ 2 e 2. Likelihood density is π(b k) exp 1 2σ 2 n (b j f (t j, k)) 2. j=1

78 Posterior density Flat prior over an interval: We believe that 0 < k 1 K 1, 0 < k 1 K 1, with some reasonable upper bounds. Write π prior (k) χ [0,K1 ](k 1 )χ [0,K 1 ](k 1 ). Posterior density by Bayes formula, π(k b) π prior (k)π(b k). Contour plots of the posterior density?

79 Data K 1 = 6, K 1 = 2, five uniformly distributed measurements of c 1 (t) over the interval [t min, t max ], t min = 0.1 τ, t max = 4.1 τ (left), t min = 5 τ, t max = 9 τ (right)

80 Posterior densities K 1 = 6, K 1 = 2, five uniformly distributed measurements of c 1 (t) over the interval [t min, t max ], t min = 0.1 τ, t max = 4.1 τ (left), t min = 5 τ, t max = 9 τ (right) The hair cross indicates the value used for data generation.

81 MCMC exploration Generate the data: Define t 1 t 2 t =. t n, b = b 1 b 2. b n, where b j = c 01 + c δ true δ truec 01 c δ true e t/τtrue + e j. δ true = k 1,true k 1,true, τ true = 1 k 1,true + k 1,true.

82 Random walk Metropolis-Hastings Start with the transient measurements. White noise proposal, k prop = k + δw, w N (0, I ). Choose first δ = 0.1, different initial points k 0 = (1, 2) or k 0 = (5, 0.1). Acceptance rates with these values are of the order 45%.

83 Scatter plots Transient measurements, t min = 0.1 τ, t max = 4.1 τ

84 Sample histories:first component Initial value k 1 = 1 (left) and k 1 = 5 (right)

85 Sample histories: Second component Initial value k 2 = 2 (left) and k 2 = 0.2 (right)

86 Burn-in: first component Initial value k 1 = 1 (left) and k 1 = 5 (right)

87 Burn-in: Second component Initial value k 2 = 2 (left) and k 2 = 0.2 (right)

88 Scatter plots: Steady state measurement t min = 5 τ, t max = 9 τ. Use the same step size as before. Initial point (k 1, k 2 ) = (1, 2).

89 Sample histories, a.k.a. fuzzy worms Increase the step size Initial point (k 1, k 2 ) = (1, 2). Acceptance remains high, about 55%

90 Scatter plots, steady state data Increase the step size Initial point (k 1, k 2 ) = (1, 2). Acceptance remains high, about 55%

91 Fuzzy worms

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. Model: where g(t) = a(t s)f(s)ds + e(t), a(t) t = (rapidly). The problem,

More information

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

In many cases, it is easier (and numerically more stable) to compute

In many cases, it is easier (and numerically more stable) to compute In many cases, it is easier (and numerically more stable) to compute r u (x, y) := log p u (y) + log q u (y, x) log p u (x) log q u (x, y), and then accept if U k < exp ( r u (X k 1, Y k ) ) and reject

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53

APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53 APM 541: Stochastic Modelling in Biology Bayesian Inference Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 53 Outline Outline 1 Introduction 2 Conjugate Distributions 3 Noninformative priors

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel

Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel K(x, y) = n π(y i y 1,..., y i 1, x i+1,..., x m ), i=1 and we set r(x) = 0. (move every time)

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo Lecture 9 Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

1 Introduction. Systems 2: Simulating Errors. Mobile Robot Systems. System Under. Environment

1 Introduction. Systems 2: Simulating Errors. Mobile Robot Systems. System Under. Environment Systems 2: Simulating Errors Introduction Simulating errors is a great way to test you calibration algorithms, your real-time identification algorithms, and your estimation algorithms. Conceptually, the

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland), Geoff Nicholls (Statistics, Oxford) fox@math.auckland.ac.nz

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Variational Learning : From exponential families to multilinear systems

Variational Learning : From exponential families to multilinear systems Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

A review of probability theory

A review of probability theory 1 A review of probability theory In this book we will study dynamical systems driven by noise. Noise is something that changes randomly with time, and quantities that do this are called stochastic processes.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Simulation - Lectures - Part III Markov chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo Simulation - Lectures - Part III Markov chain Monte Carlo Julien Berestycki Part A Simulation and Statistical Programming Hilary Term 2018 Part A Simulation. HT 2018. J. Berestycki. 1 / 50 Outline Markov

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Probabilistic numerics for deep learning

Probabilistic numerics for deep learning Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling

More information

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute Inversion basics y = Kx + ε x ˆ = (K T K) 1 K T y Erkki Kyrölä Finnish Meteorological Institute Day 3 Lecture 1 Retrieval techniques - Erkki Kyrölä 1 Contents 1. Introduction: Measurements, models, inversion

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Mobile Robotics II: Simultaneous localization and mapping

Mobile Robotics II: Simultaneous localization and mapping Mobile Robotics II: Simultaneous localization and mapping Introduction: probability theory, estimation Miroslav Kulich Intelligent and Mobile Robotics Group Gerstner Laboratory for Intelligent Decision

More information

ECE 636: Systems identification

ECE 636: Systems identification ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Numerical integration and importance sampling

Numerical integration and importance sampling 2 Numerical integration and importance sampling 2.1 Quadrature Consider the numerical evaluation of the integral I(a, b) = b a dx f(x) Rectangle rule: on small interval, construct interpolating function

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Computational Science and Engineering (Int. Master s Program)

Computational Science and Engineering (Int. Master s Program) Computational Science and Engineering (Int. Master s Program) TECHNISCHE UNIVERSITÄT MÜNCHEN Master s Thesis SPARSE GRID INTERPOLANTS AS SURROGATE MODELS IN STATISTICAL INVERSE PROBLEMS Author: Ao Mo-Hellenbrand

More information

Use of probability gradients in hybrid MCMC and a new convergence test

Use of probability gradients in hybrid MCMC and a new convergence test Use of probability gradients in hybrid MCMC and a new convergence test Ken Hanson Methods for Advanced Scientific Simulations Group This presentation available under http://www.lanl.gov/home/kmh/ June

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters D. Richard Brown III Worcester Polytechnic Institute 26-February-2009 Worcester Polytechnic Institute D. Richard Brown III 26-February-2009

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information