Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models
|
|
- Matthew Williamson
- 5 years ago
- Views:
Transcription
1 6 Dependent data The AR(p) model The MA(q) model Hidden Markov models
2 Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories capture sizes
3 Dependent data Eurostoxx 50 First four stock indices of of the financial index Eurostoxx ABN AMRO HOLDING AEGON t AHOLD KON t AIR LIQUIDE t t
4 Dependent data Markov chain Stochastic process (x t ) t T where distribution of x t given the past values x 0:(t 1) only depends on x t 1. Homogeneity: distribution of x t given the past constant in t T. Corresponding likelihood l(θ x 0:T ) = f 0 (x 0 θ) T f(x t x t 1,θ) t=1 [Homogeneity means f independent of t]
5 Dependent data Stationarity constraints Difference with the independent case: stationarity and causality constraints put restrictions on the parameter space
6 Dependent data Stationarity processes Definition (Stationary stochastic process) (x t ) t T is stationary if the joint distributions of (x 1,...,x k ) and (x 1+h,...,x k+h ) are the same for all h,k s. It is second-order stationary if, given the autocovariance function then γ x (r,s) = E[{x r E(x r )}{x s E(x s )}], r,s T, E(x t ) = µ and γ x (r,s) = γ x (r + t,s + t) γ x (r s) for all r,s,t T.
7 Dependent data Imposing or not imposing stationarity Bayesian inference on a non-stationary process can be [formaly] conducted Debate From a Bayesian point of view, to impose the stationarity condition is objectionable:stationarity requirement on finite datasets artificial and/or datasets themselves should indicate whether the model is stationary Reasons for imposing stationarity:asymptotics (Bayes estimators are not necessarily convergent in non-stationary settings) causality, identifiability and... common practice.
8 Dependent data Unknown stationarity constraints Practical difficulty: for complex models, stationarity constraints get quite involved to the point of being unknown in some cases
9 The AR(p) model The AR(1) model Case of linear Markovian dependence on the last value x t = µ + (x t 1 µ) + ǫ t,ǫ t i.i.d. N (0,σ 2 ) If < 1, (x t ) t Z can be written as x t = µ + j=0 and this is a stationary representation. jǫ t j
10 The AR(p) model Stationary but... If > 1, alternative stationary representation x t = µ j ǫ t+j. j=1 This stationary solution is criticized as artificial because x t is correlated with future white noises (ǫ t ) s>t, unlike the case when < 1. Non-causal representation...
11 The AR(p) model Standard constraint c Customary to restrict AR(1) processes to the case < 1 Thus use of a uniform prior on [ 1,1] for Exclusion of the case = 1 that leads to a random walk because the process is then a random walk [no stationary solution]
12 The AR(p) model The AR(p) model Conditional model x t x t 1,... N Generalisation of AR(1) ( µ + ) p i(x t i µ),σ 2 i=1 Among the most commonly used models in dynamic settings More challenging than the static models (stationarity constraints) Different models depending on the processing of the starting value x 0
13 The AR(p) model Stationarity+causality Stationarity constraints in the prior as a restriction on the values of θ. Theorem AR(p) model second-order stationary and causal iff the roots of the polynomial p P(x) = 1 ix i are all outside the unit circle i=1
14 The AR(p) model Initial conditions Unobserved initial values can be processed in various ways 1 All x i s (i > 0) set equal to µ, for computational convenience 2 Under stationarity and causality constraints, (x t ) t Z has a stationary distribution: Assume x p: 1 distributed from stationary N p (µ1 p,a) distribution Corresponding marginal likelihood Z T (! 2 ) Y σ T 1 px exp x 2σ 2 t µ i(x t i µ) t=0 i=1 f(x p: 1 µ,a)dx p: 1,
15 The AR(p) model Initial conditions (cont d) 3 Condition instead on the initial observed values x 0:(p 1) l c (µ, 1,..., p, σ x p:t,x 0:(p 1) ) ( ) T 2 p /2σ σ T exp x t µ i(x t i µ) 2. t=p i=1
16 The AR(p) model Prior selection For AR(1) model, Jeffreys prior associated with the stationary representation is π J 1 (µ,σ 2, ) 1 σ Extension to higher orders quite complicated ( part)! Natural conjugate prior for θ = (µ, 1,..., p,σ 2 ) : normal distribution on (µ, 1,..., p) and inverse gamma distribution on σ 2... and for constrained s?
17 The AR(p) model Stationarity constraints Under stationarity constraints, complex parameter space: each value of needs to be checked for roots of corresponding polynomial with modulus less than 1 E.g., for an AR(2) process with autoregressive polynomial P(u) = 1 1u 2u 2, constraint is < 1, 1 2 < 1 and 2 < 1. Skip Durbin forward
18 The AR(p) model A first useful reparameterisation Durbin Levinson recursion proposes a reparametrisation from the parameters i to the partial autocorrelations ψ i [ 1,1] which allow for a uniform prior on the hypercube. Partial autocorrelation defined as ψ i = corr (x t E[x t x t+1,...,x t+i 1 ], [see also Yule-Walker equations] x t+i E[x t+1 x t+1,...,x t+i 1 ])
19 The AR(p) model Durbin Levinson recursion Transform 1 Define ϕ ii = ψ i and ϕ ij = ϕ (i 1)j ψ i ϕ (i 1)(i j), for i > 1 and j = 1,,i 1. 2 Take i = ϕ pi for i = 1,,p.
20 The AR(p) model Stationarity & priors For AR(1) model, Jeffreys prior associated with the stationary representation is π J 1 (µ,σ 2, ) 1 σ Within the non-stationary region > 1, Jeffreys prior is π J 2 (µ,σ2, ) 1 σ T T(1 2). The dominant part of the prior is the non-stationary region!
21 The AR(p) model Alternative prior The reference prior π1 J is only defined when the stationary constraint holds. Idea Symmetrise to the region > 1 { π B (µ,σ 2, ) 1 1/ 1 2 if < 1, σ 2 1/ 2 1 if > 1,, pi x
22 The AR(p) model MCMC consequences When devising an MCMC algorithm, use the Durbin-Levinson recursion to end up with single normal simulations of the ψ i s since the j s are linear functions of the ψ i s
23 The AR(p) model Root parameterisation Skip Durbin back Lag polynomial representation ( ) p Id ib i x t = ǫ t i=1 with (inverse) roots p (Id λ i B) x t = ǫ t i=1 Closed form expression of the likelihood as a function of the (inverse) roots
24 The AR(p) model Uniform prior under stationarity Stationarity The λ i s are within the unit circle if in C [complex numbers] and within [ 1,1] if in R [real numbers] Naturally associated with a flat prior on either the unit circle or [ 1, 1] 1 1 k/ I 1 λ i <1 π I λ i <1 λ i R λ i R where k/2 + 1 number of possible cases Term k/2 + 1 is important for reversible jump applications
25 The AR(p) model MCMC consequences In a Gibbs sampler, each λ i can be simulated conditionaly on the others since p (Id λ i B) x t = y t λ i y t 1 = ǫ t i=1 where Y t = i i (Id λ i B) x t
26 The AR(p) model Metropolis-Hastings implementation 1 use the prior π itself as a proposal on the (inverse) roots of P, selecting one or several roots of P to be simulated from π; 2 acceptance ratio is likelihood ratio 3 need to watch out for real/complex dichotomy
27 The AR(p) model A [paradoxical] reversible jump implementation Define model M 2k (0 k p/2 ) as corresponding to a number 2k of complex roots o k p/2 ) Moving from model M 2k to model M 2k+2 means that two real roots have been replaced by two conjugate complex roots. Propose jump from M 2k to M 2k+2 with probability 1/2 and from M 2k to M 2k 2 with probability 1/2 [boundary exceptions] accept move from M 2k to M 2k+ or 2 with probability l c (µ, 1,..., p, σ x p:t,x 0:(p 1) ) l c (µ, 1,..., p, σ x p:t,x 0:(p 1) ) 1,
28 The AR(p) model Checking your code Try with no data and recover the prior k
29 The AR(p) model Checking your code Try with no data and recover the prior λ λ1
30 The AR(p) model Order estimation Typical setting for model choice: determine order p of AR(p) model Roots [may] change drastically from one p to the other. No difficulty from the previous perspective: recycle above reversible jump algorithm
31 The AR(p) model AR(?) reversible jump algorithm Use (purely birth-and-death) proposals based on the uniform prior k k+1 [Creation of real root] k k+2 [Creation of complex root] k k-1 [Deletion of real root] k k-2 [Deletion of complex root]
32 The AR(p) model Reversible jump output xt t p αi p=3 p=4 p= αi αi αi p=3 means mean Im(λi) α AR(3) simulated dataset of 530 points (upper left) with true parameters α i ( 0.1, 0.3, 0.4) and σ = 1. First histogram associated with p, following histograms with the α i s, for different values of p, and of σ 2. Final graph: scatterplot of the complex roots. One before last: evolution of α 1, α 2, α 3. σ 2 Iterations Re(λi)
33 The MA(q) model The MA(q) model Alternative type of time series q x t = µ + ǫ t ϑ j ǫ t j, ǫ t N(0,σ 2 ) j=1 Stationary but, for identifiability considerations, the polynomial q Q(x) = 1 ϑ j x j j=1 must have all its roots outside the unit circle.
34 The MA(q) model Identifiability Example For the MA(1) model, x t = µ + ǫ t ϑ 1 ǫ t 1, var(x t ) = (1 + ϑ 2 1 )σ2 can also be written x t = µ + ǫ t 1 1 t, ǫ N(0,ϑ ϑ 1 ǫ 2 1 σ2 ), Both pairs (ϑ 1,σ) & (1/ϑ 1,ϑ 1 σ) lead to alternative representations of the same model.
35 The MA(q) model Properties of MA models Non-Markovian model (but special case of hidden Markov) Autocovariance γ x (s) is null for s > q
36 The MA(q) model Representations x 1:T is a normal random variable with constant mean µ and covariance matrix σ 2 γ 1 γ 2... γ q γ 1 σ 2 γ 1... γ q 1 γ q Σ =..., γ 1 σ 2 with ( s q) q s γ s = σ 2 i=0 Not manageable in practice [large T s] ϑ i ϑ i+ s
37 The MA(q) model Representations (contd.) Conditional on past (ǫ 0,...,ǫ q+1 ), where (t > 0) L(µ, ϑ 1,..., ϑ q, σ x 1:T, ǫ 0,..., ǫ q+1 ) T q σ T exp x t µ + ϑ jˆǫ t j t=1 ˆǫ t = x t µ + j=1 2 /2σ 2, q ϑ jˆǫ t j, ˆǫ 0 = ǫ 0,..., ˆǫ 1 q = ǫ 1 q j=1 Recursive definition of the likelihood, still costly O(T q)
38 The MA(q) model Recycling the AR algorithm Same algorithm as in the AR(p) case when modifying the likelihood Simulation of the past noises ǫ i (i = 1,...,q) done via a Metropolis-Hastings step with target 0 f(ǫ 0,...,ǫ q+1 x 1:T,µ,σ,ϑ) i= q+1 e ǫ2 i /2σ2 T t=1 e bǫ2 t /2σ2,
39 The MA(q) model Representations (contd.) Encompassing approach for general time series models State-space representation x t = Gy t + ε t, (1) y t+1 = Fy t + ξ t, (2) (1) is the observation equation and (2) is the state equation Note As seen below, this is a special case of hidden Markov model
40 The MA(q) model MA(q) state-space representation For the MA(q) model, take y t = (ǫ t q,...,ǫ t 1,ǫ t ) and then y t+1 = y t + ǫ t x t = µ ( ϑ q ϑ q 1... ϑ 1 1 ) y t.
41 The MA(q) model MA(q) state-space representation (cont d) Example For the MA(1) model, observation equation with x t = (1 0)y t y t = (y 1t y 2t ) directed by the state equation ( ( ) 0 1 y t+1 = )y 0 0 t + ǫ t+1. 1ϑ1
42 The MA(q) model ARMA extension ARMA(p,q) model x t p ix t 1 = µ + ǫ t i=1 q ϑ j ǫ t j, ǫ t N(0,σ 2 ) Identical stationarity and identifiability conditions for both groups ( 1,..., p) and (ϑ 1,...,ϑ q ) j=1
43 The MA(q) model Reparameterisation Identical root representations p (Id λ i B)x t = i=1 State-space representation and q (Id η i B)ǫ t i=1 x t = x t = µ ( ϑ r 1 ϑ r 2... ϑ 1 1 ) y t y t+1 = B A r r 1 r yt + ǫt B., 0A 1 under the convention that m = 0 if m > p and ϑ m = 0 if m > q.
44 The MA(q) model Bayesian approximation Quasi-identical MCMC implementation: 1 Simulate ( 1,..., p) conditional on (ϑ 1,...,ϑ q ) and µ 2 Simulate (ϑ 1,...,ϑ q ) conditional on ( 1,..., p) and µ 3 Simulate (µ,σ) conditional on ( 1,..., p) and (ϑ 1,...,ϑ q ) c Code can be recycled almost as is!
45 Hidden Markov models Hidden Markov models Generalisation both of a mixture and of a state space model. Example Extension of a mixture model with Markov dependence x t z,x j j t N(µ zt,σz 2 t ), P(z t = u z j,j < t) = p zt 1 u, (u = 1,...,k) Label switching also strikes in this model!
46 Hidden Markov models Generic dependence graph x t x t+1 y t y t+1 (x t,y t ) x 0:(t 1),y 0:(t 1) f(y t y t 1 )f(x t y t )
47 Hidden Markov models Definition Observable series {x t } t 1 associated with a second process {y t } t 1, with a finite set of N possible values such that 1. indicators Y t have an homogeneous Markov dynamic p(y t y 1:t 1 ) = p(y t y t 1 ) = P yt 1 y t where y 1:t 1 denotes the sequence {y 1,y 2,...,y t 1 }. 2. Observables x t are independent conditionally on the indicators y t T p(x 1:T y 1:T ) = p(x t y t ) t=1
48 Hidden Markov models Dnadataset DNA sequence [made of A, C, G, and T s] corresponding to a complete HIV genome where A, C, G, and T have been recoded as 1,...,4. Possible modeling by a two-state hidden Markov model with Y = {1,2} and X = {1,2,3,4}
49 Hidden Markov models Parameterization For the Markov bit, transition matrix P = [p ij ] and initial distribution for the observables, where = P N p ij = 1 j=1 f i (x t ) = p(x t y t = i) = f(x t θ i ) usually within the same parametrized class of distributions.
50 Hidden Markov models Finite case When both hidden and observed chains are finite, with Y = {1,...,κ} and X = {1,...,k}, parameter θ made up of p probability vectors q 1 = (q 1 1,...,q1 k ),...,qκ = (q κ 1,...,qκ k ) Joint distribution of (x t,y t ) 0 t T y0 q y 0 x 0 T t=1 p yt 1 y t q yt x t,
51 Hidden Markov models Bayesian inference in the finite case Posterior of (θ, P) given (x t,y t ) t factorizes as π(θ, P) y0 κ i=1 j=1 k (qj i )n ij κ p i=1 j=1 p m ij ij, where n ij # of visits to state j by the x t s when the corresponding y t s are equal to i and m ij # of transitions from state i to state j on the hidden chain (y t ) t N Under a flat prior on p ij s and q i j s, posterior distributions are [almost] Dirichlet [initial distribution side effect]
52 Hidden Markov models MCMC implementation Finite State HMM Gibbs Sampler Initialization: 1 Generate random values of the p ij s and of the q i j s 2 Generate the hidden Markov chain (y t ) 0 t T by (i = 1, 2) P(y t = i) { p ii qx i 0 if t = 0, p yt 1i qx i t if t > 0, and compute the corresponding sufficient statistics
53 Hidden Markov models MCMC implementation (cont d) Finite State HMM Gibbs Sampler Iteration m (m 1): 1 Generate (p i1,..., p iκ ) D(1 + n i1,...,1 + n iκ ) (q i 1,...,qi k ) D(1 + m i1,...,1 + m ik ) and correct for missing initial probability by a MH step with acceptance probability y 0 / y0 2 Generate successively each y t (0 t T) by { p ii qx i P(y t = i x t, y t 1, y t+1 ) 1 p iy1 if t = 0, p yt 1i qx i t p iyt+1 if t > 0, and compute corresponding sufficient statistics
54 Hidden Markov models Dnadataset p p q q q q q q 3 2
55 Hidden Markov models Forward-Backward formulae Existence of a (magical) recurrence relation that provides the observed likelihood function in manageable computing time Called forward-backward or Baum Welch formulas
56 Hidden Markov models Observed likelihood computation Likelihood of the complete model simple: T l c (θ x,y) = p yt 1 y t f(x t θ yt ) t=2 but likelihood of the observed model is not: l(θ x) = l c (θ x,y) y {1,...,κ} T c O(κ T ) complexity
57 Hidden Markov models Forward-Backward paradox It is possible to express the (observed) likelihood L O (θ x) in O(T 2 κ) computations, based on the Markov property of the pair (x t,y t ). Direct to backward smoothing
58 Hidden Markov models Conditional distributions We have and p(y 1:t x 1:t ) = f(x t y t )p(y 1:t x 1:(t 1) ) p(x t x 1:(t 1) ) p(y 1:t x 1:(t 1) ) = k(y t y t 1 )p(y 1:(t 1) x 1:(t 1) ) [Smoothing/Bayes] [Prediction] where k(y t y t 1 ) = p yt 1 y t associated with the matrix P and f(x t y t ) = f(x t θ yt )
59 Hidden Markov models Update of predictive Therefore p(y 1:t x 1:t ) = p(y t x 1:(t 1) )f(x t y t ) p(x t x 1:(t 1) ) = f(x t y t )k(y t y t 1 ) p(y p(x t x 1:(t 1) ) 1:(t 1) x 1:(t 1) ) with the same order of complexity for p(y 1:t x 1:t ) as for p(x t x 1:(t 1) )
60 Hidden Markov models Propagation and actualization equations p(y t x 1:(t 1) ) = y 1:(t 1) p(y 1:(t 1) x 1:(t 1) )k(y t y t 1 ) and p(y t x 1:t ) = p(y t x 1:(t 1) )f(x t y t ) p(x t x 1:(t 1) ). [Propagation] [Actualization]
61 Hidden Markov models Forward backward equations (1) Evaluation of p(y t x 1:T ) t T by forward-backward algorithm Denote t T γ t (i) = P(y t = i x 1:T ) α t (i) = p(x 1:t,y t = i) β t (i) = p(x t+1:t y t = i)
62 Hidden Markov models Recurrence relations Then and α 1 (i) = f(x 1 y t = i) i κ α t+1 (j) = f(x t+1 y t+1 = j) α t (i)p ij j=1 i=1 β T (i) = 1 κ β t (i) = p ij f(x t+1 y t+1 = j)β t+1 (j) γ t (i) = α t(i)β t (i) κ α t (j)β t (j) j=1 [Forward] [Backward]
63 Hidden Markov models Extension of the recurrence relations For ξ t (i,j) = P(y t = i,y t+1 = j x 1:T ) i,j = 1,...,κ, we also have ξ t (i,j) = α t (i)p ij f(x t+1 y t = j)β t+1 (j) κ κ α t (i)p ij f(x t+1 y t+1 = j)β t+1 (j) i=1 j=1
64 Hidden Markov models Overflows and underflows On-line scalings of the α t (i) s and β T (i) s for each t by / κ / κ c t = 1 α t (i) and d t = 1 β t (i) i=1 i=1 avoid overflows or/and underflows for large datasets
65 Hidden Markov models Backward smoothing Recursive derivation of conditionals We have p(y s y s 1,x 1:t ) = p(y s y s 1,x s:t ) Therefore (s = T,T 1,...,1) [Markov property!] p(y s y s 1,x 1:T ) k(y s y s 1 )f(x s y s ) y s+1 p(y s+1 y s,x 1:T ) with p(y T y T 1,x 1:T ) k(y T y T 1 )f(x T y t ). [Backward equation]
66 Hidden Markov models End of the backward smoothing The first term is p(y 1 x 1:t ) π(y 1 )f(x 1 y 1 ) y 2 p(y 2 y 1,x 1:t ), with π stationary distribution of P The conditional for y s needs to be defined for each of the κ values of y s 1 c O(t κ 2 ) operations
67 Hidden Markov models Details Need to introduce unnormalized version of the conditionals p(y t y t 1,x 0:T ) such that p T(y T y T 1,x 0:T ) = p yt 1 y T f(x T y T ) κ p t (y t y t 1,x 1:T ) = p yt 1 y t f(x t y t ) p t+1 (i y t,x 1:T ) p 0(y 0 x 0:T ) = y0 f(x 0 y 0 ) i=1 κ p 1(i y 0,x 0:t ) i=1
68 Hidden Markov models Likelihood computation Bayes formula p(x 1:T ) = p(x 1:T y 1:T )p(y 1:T ) p(y 1:T x 1:T ) gives a representation of the likelihood based on the forward backward formulae and an arbitrary sequence x o 1:T (since the l.h.s. does not depend on x 1:T ). Obtained through the p t s as p(x 0:T ) = κ p 1(i x 0:T ) i=1
69 Hidden Markov models Prediction filter If Forward equations ϕ t (i) = p(y t = i x 1:t 1 ) ϕ 1 (j) = p(y 1 = j) ϕ t+1 (j) = 1 κ f(x t y t = i)ϕ t (i)p ij (t 1) c t i=1 where κ c t = f(x t y t = k)ϕ t (k), k=1
70 Hidden Markov models Likelihood computation (2) Follows the same principle as the backward equations The (log-)likelihood is thus [ t κ ] log p(x 1:t ) = log p(x t,y t = i x 1:(r 1) ) = r=1 i=1 [ t κ ] log f(x t y t = i)ϕ t (i) r=1 i=1
Basic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationMixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models
5 MCMC approaches Label switching MCMC for variable dimension models 291/459 Missing variable models Complexity of a model may originate from the fact that some piece of information is missing Example
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationDiscrete time processes
Discrete time processes Predictions are difficult. Especially about the future Mark Twain. Florian Herzog 2013 Modeling observed data When we model observed (realized) data, we encounter usually the following
More informationInference and estimation in probabilistic time series models
1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationInfering the Number of State Clusters in Hidden Markov Model and its Extension
Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University Elements of a Hidden Markov Model (HMM)
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationSteven L. Scott. Presented by Ahmet Engin Ural
Steven L. Scott Presented by Ahmet Engin Ural Overview of HMM Evaluating likelihoods The Likelihood Recursion The Forward-Backward Recursion Sampling HMM DG and FB samplers Autocovariance of samplers Some
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationTime Series 2. Robert Almgren. Sept. 21, 2009
Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationCh 4. Models For Stationary Time Series. Time Series Analysis
This chapter discusses the basic concept of a broad class of stationary parametric time series models the autoregressive moving average (ARMA) models. Let {Y t } denote the observed time series, and {e
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationIdentifiability, Invertibility
Identifiability, Invertibility Defn: If {ǫ t } is a white noise series and µ and b 0,..., b p are constants then X t = µ + b 0 ǫ t + b ǫ t + + b p ǫ t p is a moving average of order p; write MA(p). Q:
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationCovariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )
Covariance Stationary Time Series Stochastic Process: sequence of rv s ordered by time {Y t } {...,Y 1,Y 0,Y 1,...} Defn: {Y t } is covariance stationary if E[Y t ]μ for all t cov(y t,y t j )E[(Y t μ)(y
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationChapter 3 - Temporal processes
STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More information1 Linear Difference Equations
ARMA Handout Jialin Yu 1 Linear Difference Equations First order systems Let {ε t } t=1 denote an input sequence and {y t} t=1 sequence generated by denote an output y t = φy t 1 + ε t t = 1, 2,... with
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 18: Time Series Jan-Willem van de Meent (credit: Aggarwal Chapter 14.3) Time Series Data http://www.capitalhubs.com/2012/08/the-correlation-between-apple-product.html
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications
Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018 Overview Moving average processes Autoregressive
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More information1 Hidden Markov Models
1 Hidden Markov Models Hidden Markov models are used to model phenomena in areas as diverse as speech recognition, financial risk management, the gating of ion channels or gene-finding. They are in fact
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationModeling conditional distributions with mixture models: Theory and Inference
Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationTowards a Bayesian model for Cyber Security
Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute
More informationNon-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice
Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Roberta Paroli 1, Silvia Pistollato, Maria Rosa, and Luigi Spezia 3 1 Istituto di Statistica
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationClassic Time Series Analysis
Classic Time Series Analysis Concepts and Definitions Let Y be a random number with PDF f Y t ~f,t Define t =E[Y t ] m(t) is known as the trend Define the autocovariance t, s =COV [Y t,y s ] =E[ Y t t
More informationTime Series: Theory and Methods
Peter J. Brockwell Richard A. Davis Time Series: Theory and Methods Second Edition With 124 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition vn ix CHAPTER 1 Stationary
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationEconometría 2: Análisis de series de Tiempo
Econometría 2: Análisis de series de Tiempo Karoll GOMEZ kgomezp@unal.edu.co http://karollgomez.wordpress.com Segundo semestre 2016 III. Stationary models 1 Purely random process 2 Random walk (non-stationary)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationStatistical Inference and Methods
Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and
More informationParameter estimation: ACVF of AR processes
Parameter estimation: ACVF of AR processes Yule-Walker s for AR processes: a method of moments, i.e. µ = x and choose parameters so that γ(h) = ˆγ(h) (for h small ). 12 novembre 2013 1 / 8 Parameter estimation:
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationStochastic Processes: I. consider bowl of worms model for oscilloscope experiment:
Stochastic Processes: I consider bowl of worms model for oscilloscope experiment: SAPAscope 2.0 / 0 1 RESET SAPA2e 22, 23 II 1 stochastic process is: Stochastic Processes: II informally: bowl + drawing
More informationIntroduction to ARMA and GARCH processes
Introduction to ARMA and GARCH processes Fulvio Corsi SNS Pisa 3 March 2010 Fulvio Corsi Introduction to ARMA () and GARCH processes SNS Pisa 3 March 2010 1 / 24 Stationarity Strict stationarity: (X 1,
More informationRecursive Deviance Information Criterion for the Hidden Markov Model
International Journal of Statistics and Probability; Vol. 5, No. 1; 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Recursive Deviance Information Criterion for
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationBayesian Classification and Regression Trees
Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny
More informationARMA Models: I VIII 1
ARMA Models: I autoregressive moving-average (ARMA) processes play a key role in time series analysis for any positive integer p & any purely nondeterministic process {X t } with ACVF { X (h)}, there is
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationSTAT STOCHASTIC PROCESSES. Contents
STAT 3911 - STOCHASTIC PROCESSES ANDREW TULLOCH Contents 1. Stochastic Processes 2 2. Classification of states 2 3. Limit theorems for Markov chains 4 4. First step analysis 5 5. Branching processes 5
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationCS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm
CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm Jan-Willem van de Meent, 19 November 2016 1 Hidden Markov Models A hidden Markov model (HMM)
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo (and Bayesian Mixture Models) David M. Blei Columbia University October 14, 2014 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical
More informationTime Series I Time Domain Methods
Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT
More informationCh 6. Model Specification. Time Series Analysis
We start to build ARIMA(p,d,q) models. The subjects include: 1 how to determine p, d, q for a given series (Chapter 6); 2 how to estimate the parameters (φ s and θ s) of a specific ARIMA(p,d,q) model (Chapter
More informationBayesian Estimation of Input Output Tables for Russia
Bayesian Estimation of Input Output Tables for Russia Oleg Lugovoy (EDF, RANE) Andrey Polbin (RANE) Vladimir Potashnikov (RANE) WIOD Conference April 24, 2012 Groningen Outline Motivation Objectives Bayesian
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationLecture 2: Univariate Time Series
Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More information3 Theory of stationary random processes
3 Theory of stationary random processes 3.1 Linear filters and the General linear process A filter is a transformation of one random sequence {U t } into another, {Y t }. A linear filter is a transformation
More informationSwitching Regime Estimation
Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More information