HIDDEN MARKOV MODELS IN INDEX RETURNS
|
|
- Mary Stephens
- 5 years ago
- Views:
Transcription
1 HIDDEN MARKOV MODELS IN INDEX RETURNS JUSVIN DHILLON Abstract. In this paper, we consider discrete Markov Chains and hidden Markov models. After examining a common algorithm for estimating matrices of probability, we conclude by constructing a hidden Markov model of S&P 500 index returns. Contents. Markov Chains: Basic Principles 2. Krylov-Bogoliubov Argument 2 3. Hidden Markov Models 3 4. Baum-Welch Inference Algorithm 4 5. Kullback-Leibler Divergence Derivation 6 6. Hidden Markov Chains in Index Returns 8 Acknowledgments 0 References 0. Markov Chains: Basic Principles Definition.. Let (E, E be a measurable space and Ω be a sample space with σ-algebra F and probability measure P : F [0, ]. A random variable is an (F, E-measurable function X : Ω E. We consider in this paper a useful and intuitive collection of these functions. Definition.2. A discrete-time Markov chain on a state space Ω is an Ω-valued sequence of random variables X, X 2, X 3,... such that for all times k in N and states x, x 2, x 3,... P (X k = x k X k = x k, X k 2 = x k 2,...X = x = P (X k = x k X k = x k Thus the chain s next move depends only on the most recent state. Example.3. The simple random walk on integer lattice Z n with standard vectors e, e 2,...e n is a Markov chain satisfying P (X k = α X k = α ± e i = 2n α Z n Definition.4. If Ω is finite with Ω = n, the chain can be represented by an n n transition matrix P where p(i, j = P (X k = j X k = i. Date: August 30, 200.
2 2 JUSVIN DHILLON p, p,2 p,j p 2, p 2,2 p 2,j p i, p i,2 p i,j If the chain starts at state X = j and e j denotes the standard unit vector, then e j P indicates naturally the probability distribution of X 2 given that X = j. In general, if Ω = n and µ R n with n µ i = and µ i 0 for all i, then µp k is the probability distribution of the process after k transitions starting at µ. We denote P (X i+k = y X i = x by p k (x, y, the (x, yth entry of P k. Definition.5. A probability distribution π is stationary if π = πp. A stationary distribution thus represents an equilibrium for the random process. We prove the existence of this equilibrium in the next section. 2. Krylov-Bogoliubov Argument Definition 2.. A chain P is irreducible if for any two states x, y Ω there exists α Z such that p α (x, y > 0. Thus states in an irreducible chain are always accessible from any given state. In our Z n random walk, a point in the lattice can be reached from any other, so that the walk is irreducible. Definition 2.2. The period of state x is gcd {n N p n (x, x > 0}. If every state in a chain has period the chain is aperiodic. Otherwise, it is periodic. Proposition 2.3. If chain P is irreducible, then for all x, y Ω gcd {n N p n (x, x > 0} = gcd {n N p n (y, y > 0} Proof. Given states x and y, there exist i and j Z such that p i (x, y > 0 and p j (y, x > 0. Let {n N p n (α, α > 0} = G(α and let k = i + j. The G(x G(y, implying G(x G(y - {k}. elements of G(x and gcdg(y gcdg(x But k G(x G(y also implies that G(y G(x - {k} so that gcdg(x gcdg(y Thus gcdg(y divides all All possible probability vectors µ axiomatically satsify n µ i = and µ i 0 for all i. Thus the set P of probablity distributions on Ω with Ω = n is the standard N-simplex in R n. Containing P in the (n -sphere with radius 2 and observing that convexity connects any 2 points by a (closed line, we see that P is bounded, closed, and hence compact by the Heine-Borel Theorem. In the following proof, we will use the Bolzano-Weierstrass property of compact spaces to extract our convergent subsequence.
3 HIDDEN MARKOV MODELS IN INDEX RETURNS 3 Theorem 2.4. Krylov-Bogoliubov Argument: If Markov chain P is finite-state and discrete, then there exists π in P with π = πp. Proof. Fix ϕ in P and consider Cesàro average n ϕp i. As the Cesàro average is an arimethic average of probability vectors, its components remain positive and sum to n n. Thus Cesàro averages remain in P. { { Consider with convergent subsequence and n ϕp i} n= denote, albeit prophetically, π as the limit of the subsequence. continuity of linear map P, πp = lim = lim = lim = lim = lim = π k + i=2 ( nk ( nk ( nk ϕp i P ϕp i ϕp i + ϕp nk+ ϕp ϕp i + lim ϕp i ( ϕp nk+ ϕp ϕp i} = Observing the 3. Hidden Markov Models Given sufficient observations of a Markov chain, we can estimate transition matrix P by counting or bootstrapping the sample probabilities. A more interesting problem arises when the states are unobservable, but act as random variables on observable output set {o, o 2, o 3...}. Definition 3.. A hidden Markov model (P, Q, µ is a 3-tuple of transition matrix P, matrix of observation probabilities Q where q(i, j = P (O k = o i X k = x j for all times k in N, and initial probability vector µ. Note that the model is hidden irrespective of whether P is explicitly known. Instead, the chain states are difficult to observe or are defined by their output. Example 3.2. Consider a speech recognition machine which gathers observable sound waves {o, o 2, o 3...} as a proxy for spoken words {x, x 2, x 3...}. Given (statistical noise and speech variation, o i may indicate more than one x j, but since the machine knows P it weights its estimation in favor of likelier word orderings.
4 4 JUSVIN DHILLON The above diagrams summarize hidden Markov models. In figure A, dashed arrows represent possible state changes, while each state can emit all of the three possible observable outputs. In figure B, known as a trellis diagram, arrows represent conditional dependence and we consider a possible sequence of states and their outputs. Continuing our example above, the sentence {x t, x t2, x t3...} would be occurring while the speech recognition machine sees only sound waves {o t, o t2, o t3...}. 4. Baum-Welch Inference Algorithm Suppose we have K observations O = o t, O 2 = o t2,...o K = o tk, where o tk is any element from {o, o 2, o 3...}. Denoting H = (P, Q, µ, we will consider two simple ways to find P (o t, o t2,...o tk H. Definition 4.. For each time k and state x i, forward variable α is defined by (4.2 α k (x i = P (X k = x i o t, o t2,...o tk H This is the probability of the kth state being x i and seeing the observations through time k, given our model. From the definition of Q and µ, we see that and α (x i = P (X = x i O = o t H = q(t, iµ i α 2 (x i = P (X 2 = x i O = o t, O 2 = o t2 H = q(t 2, i so that recursively α k+ (x i = q(t k +, i α k (x j p ji. j= α 2 (x j p ji In particular, we can find α K (x i. Since this is the conditional probability given state x i, α K (x i = P (o t, o t2,...o tk H Definition 4.3. For each time k and state x i, backward variable β is defined by (4.4 β k (x i = P (o tk+, o tk+2,...o tk X k = x i H j=
5 HIDDEN MARKOV MODELS IN INDEX RETURNS 5 This is the probability of seeing the observations after time k, given our model and that the kth state is x i. Since given o tk and H and β K (x i = P (O K = o tk X K = x i H = β K (x i = P (O K = o tk, O K = o tk X K = x i H = recursively we have β k (x i = β k+ (x j p ij q(t k+, j. j= From the definitions of α and β we can write β K (x j p ij q(t K, j j= α k (x i β k (x i = P (o t, o t2,...o tk X k = x i, H Since this is the conditional probability given state x i, α k (x i β k (x i = P (o t, o t2,...o tk H Thus we have two ways to judge the fit of hidden Markov model H. Many useful probabilities can be expressed as a combination of forward and backward probabilities. For example, the probability of two states at two adjacent times given our observations and model is ξ k (i, j := P (X k = x i X k+ = x j o t, o t2,...o tk, H = P (X k = x i X k+ = x j o t, o t2,...o tk H P (o t, o t2,...o tk H = α k(x i P (X k+ = x j o tk+, o tk+2,...o tk H α k (x i β k (x i = α k(x i p ij β k+ (x j q(t k+, j α k (x i β k (x i The probability of state x i at time k given our observations and model is γ k (x i := P (X k = x i o t, o t2,...o tk, H = α k(x i β k (x i α k (x i β k (x i Observe that since β k (x i = β k+ (x j p ij q(t k+, j, j= γ k (x i = ξ k (i, j j=
6 6 JUSVIN DHILLON Definition 4.5. The Baum-Welch inference algorithm reestimates P, Q, and µ as (4.6 µ i = γ (x i (4.7 p ij = (4.8 q(i, j = K k= K ξ k (i, j γ k (x i k= K γ k (x j otk =o i k= K γ k (x j k= The origin of these values will be introduced in the next section. 5. Kullback-Leibler Divergence Derivation Definition 5.. For probability distributions P and P on discrete random variable X, their Kullback-Leibler divergence is D KL (P, P = ( log P (x x ( wherever log P (x is defined. As this is a (weighted average of the log-differences between distributions P and P, it is often used to judge the fit of an estimated distribution P to a theoretical distribution P. We consider a basic attribute of K-L divergence. Lemma 5.2. Jensen s inequality: If X is a discrete random variable and ϕ a convex function, ϕ(e [X] E [ϕ(x] Proof. Since E [X] = i i x i with i i =, by convexity ϕ( x + 2 x 2 ϕ(x + 2 ϕ(x 2 Suppose as an inductive hypothesis that for all (k -valued X with P Then ϕ ( k k i x i i ϕ(x i ( k ( ϕ i x i = ϕ x + ( k i=2 i x i
7 HIDDEN MARKOV MODELS IN INDEX RETURNS 7 But since k i=2 i = ( k i our hypothesis gives ( k ϕ(e [X] = ϕ i x i = k i ϕ(x i = E [ϕ(x] The continuous case is similar (but not needed for our purposes, and it is clear from above that Jensen s inequality holds for any weighted sum, not just discrete expectations. Proposition 5.3. Gibb s inequality D KL (P, P 0 Proof. Use Jensen s inequality with X = P P and ϕ = log ( D KL (P, P = ( log P = E [ϕ(x] log (x x x. Then ( E [P /P ] where ( log E [P /P ] ( = log i P (x i / i = log = 0 i Definition 5.4. For observations o t,...o tk is defined by and H = (P, Q, µ proxy probability Q Q(H = x t,...x tk t,...x tk o t,...o tk H log t,...x tk o t,...o tk H This probability mimics the probability of observations given a model, as shown below. Proposition 5.5. If Q(H Q( H, then P (o t,...o tk H P (o t,...o tk H. Proof. Let O = (given observations o t,...o tk. Using Gibb s Inequality with we have P = t,...x tk, O H, P = t,...x tk, O H P (O H P (O H
8 8 JUSVIN DHILLON 0 x t,...x tk ( P (O H = log P (O H ( P (O H = log P (O H Rearranging gives Since we conclude t,...x, O H ( tk t,...x tk, O HP (O log H P (O H P (O H t,...x tk, O H + log x t,...x tk Q(H Q( H + P (O H Q( H Q(H P (O H t,...x tk, O H P (O H log 0 Q( H Q(H ( P (O H P (O H P (o t,...o tk H P (o t,...o tk H ( t,...x tk, O H t,...x tk, O H Thus proxy probability Q allows us to compare the fit of different H to the observed data. Why examine Q over P (o t,...o tk H? Maximizing real-valued Q(H produces a critical point of H equal precisely to H in the Baum-Welch algorithm. As space prohibits, the reader may check this lengthy exercise in taking derivatives (Baum, 972. Since H = ( P, Q, µ is therefore a maximum log-likelihood estimation, we have P (o t, o t2,...o tk H P (o t, o t2,...o tk H, We can repeat the process iteratively to obtain {H, H, } H... with probabilities given {o t, o t2,...o tk } converging by the monotone convergence theorem. Repeated reestimation is the algorithmic component of Baum-Welch inference. 6. Hidden Markov Chains in Index Returns We conclude with an example that applies and optimizes a hidden Markov model using Baum-Welch inference. Recall that hidden Markov models allow for phenomena where states of the world {x t, x t2,...x tk } are difficult to observe, but rather can be understood by the observable output {o t, o t2,...o tk } they produce. Following the global liquidity and financial crisis of , general interest in market performance has reached new heights, accompanied as always by speculation over arrivals of Bull and Bear markets. Despite a vague sense that Bull equates with Good while Bear embodies the reverse (for the majority of equity investors, when to apply these two terms is not well-defined. As they are characterized by the difficult to observe (and often conflicting phenomenon of investor sentiment, determination of the states {x BullMarket, x BearMarket }
9 HIDDEN MARKOV MODELS IN INDEX RETURNS 9 lend themselves agreeably to hidden Markov structure. We will use as observable output monthly log-returns of the S&P 500, a capitalizationweighted index of 500 large companies based in the United States. To simplify matters and produce accessible probability matrices, we divide returns into three classes {o negative, o flat, o positive } where returns become negative or positive by exceeding - or %, respectively. No sound methodology was used to determine the cutoff, though annual returns exceeding ± (0 to 20 % are generally accepted markers of bullish or bearish markets. We examine monthly returns from July 970 through July 200. Operatively, to model bull and bear markets as a hidden Markov chain we input hypothesized probabilities of each return given each state, and each state given a previous state. The true probabilities are then optimized from the observations using Baum-Welch. Programming and iterating for this paper were done in R and Excel. Below are results after 0 iterations of the Baum-Welch inference algorithm for three different starting parameters. Initial I Bear... Bull... P (o n... P (o f... P (o p... P (... x Bear P (... x Bull Baum-Welch Optimization I Bear... Bull... P (o n... P (o f... P (o p... P (... x Bear P (... x Bull Initial II Bear... Bull... P (o n... P (o f... P (o p... P (... x Bear P (... x Bull Baum-Welch Optimization II Bear... Bull... P (o n... P (o f... P (o p... P (... x Bear P (... x Bull
10 0 JUSVIN DHILLON Initial III Bear... Bull... P (o n... P (o f... P (o p... P (... x Bear P (... x Bull Baum-Welch Optimization III Bear... Bull... P (o n... P (o f... P (o p... P (... x Bear P (... x Bull The Baum-Welch inference algorithm adjusts probabilities away from starting conditions, which as proved increases the likelihood of our model given the observations. Here, the observation probabilities appear most consistently altered. Although it remains sensitive to initial conditions, some relativistic patterns emerged which were confirmed in further trials. For instance, BullMarket x BullMarket > BearMarket x BearMarket Thus the model adapts to account for more frequent (and longer Bull Markets. Somewhat more surprisingly, P (o negative x BullMarket P (o flat x BullMarket After some research into whether this was possible, I read of market corrections, which are short-term drops in asset prices that realign them with the asset s economic value. During bull markets, positive investor sentiment and speculation lead to stock prices continually inflating, until checked by these short drops. Thus the model optimized to account for market features unknown (to me a priori. Acknowledgments. It is a pleasure to thank my mentor, Zachary Madden, for keeping this project on track and buoyed with helpful ideas. Thanks are of course due for Peter May and the whole REU staff for allowing us this opportunity to learn and research. References [] Baum, Petrie, Soules, and Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist., vol. 4, no., pp. 647, 970. [2] Levin, Peres, and Wilmer. Markov Chains and Mixing Times. dlevin/markov/markovmixing.pdf [3] Lloyd Welch. Hidden Markov Models and the Baum-Welch Algorithim. lototsky/math508/baum-welch.pdf [4] Narada Warakagoda. The Evaluation Problem and the Forward Algorithm. gerjanos/hmm/node7.htmleqm0dash.
MARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationarxiv: v2 [math.pr] 4 Sep 2017
arxiv:1708.08576v2 [math.pr] 4 Sep 2017 On the Speed of an Excited Asymmetric Random Walk Mike Cinkoske, Joe Jackson, Claire Plunkett September 5, 2017 Abstract An excited random walk is a non-markovian
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCONVERGENCE THEOREM FOR FINITE MARKOV CHAINS. Contents
CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS ARI FREEDMAN Abstract. In this expository paper, I will give an overview of the necessary conditions for convergence in Markov chains on finite state spaces.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationRANDOM WALKS IN Z d AND THE DIRICHLET PROBLEM
RNDOM WLKS IN Z d ND THE DIRICHLET PROBLEM ERIC GUN bstract. Random walks can be used to solve the Dirichlet problem the boundary value problem for harmonic functions. We begin by constructing the random
More informationINTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING
INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing
More informationLEARNING DYNAMIC SYSTEMS: MARKOV MODELS
LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More information1 Hidden Markov Models
1 Hidden Markov Models Hidden Markov models are used to model phenomena in areas as diverse as speech recognition, financial risk management, the gating of ion channels or gene-finding. They are in fact
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More information1 What is a hidden Markov model?
1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and
More informationA REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS
A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS Michael Ryan Hoff A Dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationP i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=
2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]
More informationL23: hidden Markov models
L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationLecture 11: Hidden Markov Models
Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing
More informationLet (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t
2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition
More informationA Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong
A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong Abstract: In this talk, a higher-order Interactive Hidden Markov Model
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 1
MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationMarkov Chains Handout for Stat 110
Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationCSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto
CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto Revisiting PoS tagging Will/MD the/dt chair/nn chair/?? the/dt meeting/nn from/in that/dt
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationStatistical NLP: Hidden Markov Models. Updated 12/15
Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first
More informationMARKOV CHAINS AND HIDDEN MARKOV MODELS
MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe
More informationLecture 5: Random Walks and Markov Chain
Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationMixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines
Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines Christopher Tosh Department of Computer Science and Engineering University of California, San Diego ctosh@cs.ucsd.edu Abstract The
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationCMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009
CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models Jimmy Lin The ischool University of Maryland Wednesday, September 30, 2009 Today s Agenda The great leap forward in NLP Hidden Markov
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationHidden Markov models for time series of counts with excess zeros
Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationSequence modelling. Marco Saerens (UCL) Slides references
Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationRECURSION EQUATION FOR
Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u
More informationHomework #2 Solutions Due: September 5, for all n N n 3 = n2 (n + 1) 2 4
Do the following exercises from the text: Chapter (Section 3):, 1, 17(a)-(b), 3 Prove that 1 3 + 3 + + n 3 n (n + 1) for all n N Proof The proof is by induction on n For n N, let S(n) be the statement
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMath 430/530: Advanced Calculus I Chapter 1: Sequences
Math 430/530: Advanced Calculus I Chapter 1: Sequences Fall 2018 Sequences A sequence is just a list of real numbers a 1, a 2,... with a definite order. We write a = (a n ) n=1 for a sequence a with n
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationDept. of Linguistics, Indiana University Fall 2009
1 / 14 Markov L645 Dept. of Linguistics, Indiana University Fall 2009 2 / 14 Markov (1) (review) Markov A Markov Model consists of: a finite set of statesω={s 1,...,s n }; an signal alphabetσ={σ 1,...,σ
More informationFixed Point Theorems
Fixed Point Theorems Definition: Let X be a set and let f : X X be a function that maps X into itself. (Such a function is often called an operator, a transformation, or a transform on X, and the notation
More informationMARKOV CHAINS AND MIXING TIMES
MARKOV CHAINS AND MIXING TIMES BEAU DABBS Abstract. This paper introduces the idea of a Markov chain, a random process which is independent of all states but its current one. We analyse some basic properties
More informationJorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function
890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationA model for competitive markets
A model for competitive markets Manuela Girotti MATH 345 Differential Equations Contents 1 Competitive markets 1 1.1 Competitive market with only one good......................... 2 1.2 Competitive market
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationMATH 301 INTRO TO ANALYSIS FALL 2016
MATH 301 INTRO TO ANALYSIS FALL 016 Homework 04 Professional Problem Consider the recursive sequence defined by x 1 = 3 and +1 = 1 4 for n 1. (a) Prove that ( ) converges. (Hint: show that ( ) is decreasing
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationEntropy Rate of Stochastic Processes
Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be
More informationµ n 1 (v )z n P (v, )
Plan More Examples (Countable-state case). Questions 1. Extended Examples 2. Ideas and Results Next Time: General-state Markov Chains Homework 4 typo Unless otherwise noted, let X be an irreducible, aperiodic
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationNotes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009
Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 These notes are based on Chapter 12 of David Blackwell and M. A.Girshick, Theory of Games and Statistical Decisions, John Wiley
More informationThe Transition Probability Function P ij (t)
The Transition Probability Function P ij (t) Consider a continuous time Markov chain {X(t), t 0}. We are interested in the probability that in t time units the process will be in state j, given that it
More informationMini-Course on Limits and Sequences. Peter Kwadwo Asante. B.S., Kwame Nkrumah University of Science and Technology, Ghana, 2014 A REPORT
Mini-Course on Limits and Sequences by Peter Kwadwo Asante B.S., Kwame Nkrumah University of Science and Technology, Ghana, 204 A REPORT submitted in partial fulfillment of the requirements for the degree
More informationSequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.
Chapter 3 Sequences Both the main elements of calculus (differentiation and integration) require the notion of a limit. Sequences will play a central role when we work with limits. Definition 3.. A Sequence
More informationAn application of hidden Markov models to asset allocation problems
Finance Stochast. 1, 229 238 (1997) c Springer-Verlag 1997 An application of hidden Markov models to asset allocation problems Robert J. Elliott 1, John van der Hoek 2 1 Department of Mathematical Sciences,
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationMultiscale Systems Engineering Research Group
Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden
More informationSummary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)
Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection
More informationA Recursive Learning Algorithm for Model Reduction of Hidden Markov Models
A Recursive Learning Algorithm for Model Reduction of Hidden Markov Models Kun Deng, Prashant G. Mehta, Sean P. Meyn, and Mathukumalli Vidyasagar Abstract This paper is concerned with a recursive learning
More informationHidden Markov Models NIKOLAY YAKOVETS
Hidden Markov Models NIKOLAY YAKOVETS A Markov System N states s 1,..,s N S 2 S 1 S 3 A Markov System N states s 1,..,s N S 2 S 1 S 3 modeling weather A Markov System state changes over time.. S 1 S 2
More informationMS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),
MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationExample: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected
4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X
More informationHidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391
Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n
More informationThe knee-jerk mapping
arxiv:math/0606068v [math.pr] 2 Jun 2006 The knee-jerk mapping Peter G. Doyle Jim Reeds Version dated 5 October 998 GNU FDL Abstract We claim to give the definitive theory of what we call the kneejerk
More informationMaster 2 Informatique Probabilistic Learning and Data Analysis
Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel
More informationStatistical Processing of Natural Language
Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative
More informationA Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 411 A Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces Paul M. Baggenstoss, Member, IEEE
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationA UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS
A UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS STEVO TODORCEVIC Abstract. We show that for each positive integer k there is a sequence F n : R k R of continuous functions which represents via point-wise
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationAN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES
AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim
More informationEECS E6870: Lecture 4: Hidden Markov Models
EECS E6870: Lecture 4: Hidden Markov Models Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY 10549 stanchen@us.ibm.com, picheny@us.ibm.com,
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More information