Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Similar documents
MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Bayesian Networks Practice

Bayesian Networks Practice

8 STOCHASTIC PROCESSES

Solution: (Course X071570: Stochastic Processes)

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

General Linear Model Introduction, Classes of Linear models and Estimation

1 Gambler s Ruin Problem

ECE 6960: Adv. Random Processes & Applications Lecture Notes, Fall 2010

Analysis of some entrance probabilities for killed birth-death processes

1 Probability Spaces and Random Variables

The Longest Run of Heads

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Learning Sequence Motif Models Using Gibbs Sampling

Named Entity Recognition using Maximum Entropy Model SEEM5680

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

An Analysis of Reliable Classifiers through ROC Isometrics

MATH 2710: NOTES FOR ANALYSIS

Biointelligence Lab School of Computer Sci. & Eng. Seoul National University

Brownian Motion and Random Prime Factorization

Homework Solution 4 for APPM4/5560 Markov Processes

Distributed Rule-Based Inference in the Presence of Redundant Information

Approximating min-max k-clustering

CMSC 425: Lecture 4 Geometry and Geometric Programming

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

6 Stationary Distributions

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Feedback-error control

Understanding and Using Availability

Statics and dynamics: some elementary concepts

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

4. Score normalization technical details We now discuss the technical details of the score normalization method.

1 Random Variables and Probability Distributions

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Understanding and Using Availability

Chapter 1: PROBABILITY BASICS

Notes on Instrumental Variables Methods

ECE 534 Information Theory - Midterm 2

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Introduction to Probability and Statistics

Analysis of M/M/n/K Queue with Multiple Priorities

The Poisson Regression Model

Topic 7: Using identity types

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Sums of independent random variables

Cryptanalysis of Pseudorandom Generators

Einführung in Stochastische Prozesse und Zeitreihenanalyse Vorlesung, 2013S, 2.0h March 2015 Hubalek/Scherrer

State Estimation with ARMarkov Models

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Graphical Models (Lecture 1 - Introduction)

B8.1 Martingales Through Measure Theory. Concept of independence

arxiv: v1 [physics.data-an] 26 Oct 2012

Universal Finite Memory Coding of Binary Sequences

A note on the random greedy triangle-packing algorithm

1-way quantum finite automata: strengths, weaknesses and generalizations

Developing A Deterioration Probabilistic Model for Rail Wear

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

Information collection on a graph

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Information collection on a graph

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

The non-stochastic multi-armed bandit problem

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

Department of Mathematics

Ensemble Forecasting the Number of New Car Registrations

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

PETER J. GRABNER AND ARNOLD KNOPFMACHER

Chapter 7: Special Distributions

Plotting the Wilson distribution

CSE 599d - Quantum Computing When Quantum Computers Fall Apart

Estimation of the large covariance matrix with two-step monotone missing data

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Convex Optimization methods for Computing Channel Capacity

Nuclear models: The liquid drop model Fermi-Gas Model

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;

Radial Basis Function Networks: Algorithms

Hotelling s Two- Sample T 2

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION COURSE III. Wednesday, August 16, :30 to 11:30 a.m.

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

Estimating Time-Series Models

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

Introduction to Probability for Graphical Models

Ordered and disordered dynamics in random networks

DRAFT - do not circulate

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

STA 4273H: Statistical Machine Learning

On split sample and randomized confidence intervals for binomial proportions

ON THE L P -CONVERGENCE OF A GIRSANOV THEOREM BASED PARTICLE FILTER. Eric Moulines. Center of Applied Mathematics École Polytechnique, France

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

ON GENERALIZED SARMANOV BIVARIATE DISTRIBUTIONS

Transcription:

and Markov Models Huizhen Yu janey.yu@cs.helsinki.fi Det. Comuter Science, Univ. of Helsinki Some Proerties of Probabilistic Models, Sring, 200 Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 2 / 32 Some Proerties of Some facts: Named after the Russian mathematician A. A. Markov The earliest and simlest model for deendent variables Has many fascinating asects and a wide range of alications Markov chains are often used in studying temoral and sequence data, for modeling short-range deendences (e.g., in biological sequence analysis), as well as for analyzing long-term behavior of systems (e.g., in queueing systems). Markov chains are also the basis of samling-based comutation methods called Markov Chain Monte Carlo. We introduce Markov chains and study a small art of its roerties, most of which relate to modeling short-range deendences. Huizhen Yu (U.H.) and Markov Models Jan. 2 3 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 4 / 32

Are DNA sequences urely random? A Motivating Examle A sequence segment of bases A, C, G, T from the human reroglucagon gene: GTATTAAATCCGTAGTCTGAACTAACTA Words such as CTGAC are susected to serve some biological function if they seem to occur often in a segment of the sequence. So it is of interest to measure the ofteness of the occurrences of a word. A oular way is to model the sequence at the background. Observed frequencies/roortions of airs of consecutive bases from a sequence of 57 bases: st base 2nd base A C G T A 0.359 0.43 0.67 0.33 C 0.384 0.56 0.023 0.437 G 0.305 0.99 0.50 0.345 T 0.284 0.82 0.77 0.357 overall 0.328 0.67 0.44 0.360 Some Proerties of Huizhen Yu (U.H.) and Markov Models Jan. 2 5 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 6 / 32 Recall of Indeendence Recall: for discrete random variables, Y, Z with joint distribution P, by definition, Y, Z are mutually indeendent if P( = x, Y = y, Z = z) = P( = x)p(y = y)p(z = z), x, y, z;, Y are conditionally indeendent given Z, i.e., Y Z, if P( = x Y = y, Z = z) = P( = x Z = z), x, y, z. (Our convention: the equalities hold for all x, y, z such that the quantities involved are well-defined under P.) Let, 2,... be an indexed sequence of discrete random variables with joint robability distribution P. If, 2,... are mutually indeendent, then by definition, for all n, P(, 2,..., n) = P( )P( 2) P( n), P( n+,..., n) = P( n+). Definition of a Markov Chain The sequence { n} is called a (discrete-time) Markov chain if it satisfies the Markov roerty: for all n and (x,..., x n), P( n+ = x n+ = x,..., n = x n) = P( n+ = x n+ n = x n), () i.e., n+, 2,..., n n. Recall: the joint PMF of, 2,... n can be exressed as (x, x 2,... x n) = (x )(x 2 x )(x 3 x, x 2) (x n x, x 2,... x n ). The Markov roerty (x i x,..., x i ) = (x i x i ), i then imlies that (x, x 2,..., x n) factorizes as (x, x 2,..., x n) = (x )(x 2 x ) (x n x n ), n. (2) In turn, this is equivalent to that for all m > n, (x n+, x n+2,..., x m x, x 2,..., x n) = (x n+, x n+2,..., x m x n). Informally, the future is indeendent of the ast given the resent. Huizhen Yu (U.H.) and Markov Models Jan. 2 7 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 8 / 32

Terminologies n: the state of the chain at time n S = {ossible x n, n}: the state sace (we assume S is finite) P( n+ = j n = i), i, j S: the transition robabilities The chain iaid to be homogeneous, if for all n and i, j S, P( n+ = j n = i) = indeendently of n; and inhomogeneous, otherwise. For a homogeneous chain, the S S matrix 2 3 2 m 2 22 2m 6 4.. 7 where m = S,.. 5 m m2 mm is called the transition robability matrix of the Markov chain, ( or transition matrix for short). Grahical Model of a Markov Chain The joint PMF of, 2,... n factorizes as (x, x 2,..., x n) = (x )(x 2 x ) (x n x n ). A ictorial reresentation of this factorization form: A directed grah G = (V, E) Vertex set: V = {,..., n} Edge set: E = {(i, i), i < n} i is associated with i, i V 2 3 This is a grahical model for Markov chains with length n. n- Each vertex i with its incoming edge reresents a term in the factorized exression of the PMF, (x i x i ). n Huizhen Yu (U.H.) and Markov Models Jan. 2 9 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 0 / 32 Parameters of a Markov Chain: Transition Probabilities Transition robabilities determine entirely the behavior of the chain: (x, x 2,... x n) = (x )(x 2 x ) (x n x n ) n Y = (x ) j= xj x j+ (for a homoneneous chain) For a homogeneous chain, P is determined by {, i, j S} and the initial distribution of. (Thus the number of free arameters is S 2.) Transition robability grah: another ictorial reresentation of a homogeneous Markov chain; it shows the structure of the chain in the state sace: 3 s 3 2 s s2 2 4 s 4 2 4 2 2 Nodes reresent ossible states and arcs ossible transitions. (Not to be confused with the grahical model.) Using this grah one may classify states as being recurrent, absorbing, or transient notions imortant for understanding long term behavior of the chain. Some Proerties of Huizhen Yu (U.H.) and Markov Models Jan. 2 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 2 / 32

of DNA Sequence Modeling A Markov chain model for the DNA sequence shown earlier: State sace S = {A, C, G, T} Transition robabilities (taken to be the observed frequencies) A C G T A 0.359 0.43 0.67 0.33 C 0.384 0.56 0.023 0.437 G 0.305 0.99 0.50 0.345 T 0.284 0.82 0.77 0.357 The robability of CTGAC given the first base being C : P(CTGAC = C) = CT TG GA AC T A = 0.437 0.77 0.305 0.43 0.00337. (The robabilitieoon become too small to handle as the sequence length grows. In ractice we work with ln P and the log transition robabilities instead: ln P(CTGAC = C) = ln CT + ln TG + ln GA + ln AC. ) C G of DNA Sequence Modeling Second-order Markov Chain: Joint PMF and grahical model (x, x 2,... x n) = (x, x 2)(x 3 x, x 2) (x n x n, x n 2) 2 3 4 n- n Corresondingly, {Y n} is a Markov chain where Y n = (Y n,, Y n,2) = ( n, n), P(Y n+, = Y n,2 Y n) =. Second-order model for the DNA sequence examle: State sace S = {A, C, G, T} {A, C, G, T} Size of transition robability matrix: 6 6 The number of arameters grows exonentially with the order. Transition robability grah (shown artially on the right) AA CA AC CC CT Higher order Markov chains are analogously defined. Huizhen Yu (U.H.) and Markov Models Jan. 2 3 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 4 / 32 Some Proerties of of Language Modeling A trivial examle: the transition robability grah of a Markov chain that can generate a sentence a cat slet here with start and end: start state a dog ate here end state B the cat slet there Sequences of English generated by two Markov models: Third-order letter model: THE GENERATED JOB PROVIDUAL BETTER TRAND THE DISPLAYED CODE, ABOVERY UPONDULTS WELL THE CODERST IN... E First-order word model: THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHODS FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD... Some Proerties of Alications in the biology and language modeling contexts include analyzing otentially imortant words sequence classification coding/comression Huizhen Yu (U.H.) and Markov Models Jan. 2 5 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 6 / 32

Some Proerties of Some Proerties of Two extreme cases Two Extreme Cases If, 2,... are mutually indeendent, { n} is certainly a Markov chain (of zeroth-order). If { n} is an arbitrary random sequence, then {Y n}, define by Y n = (, 2,... n), is a Markov chain. Because Y n+ = (Y n, n+ ) and in Y n the entire ast is ket. Notes: Consider a Markov chain { n}. At the current state x n = s, the ast is not necessarily forgotten. It may even haen that for some m < n, there is only one ossible ath (s m, s m+,... s n ) for m, m+,..., n such that n = s. But each time the chain visit, its behavior starting from s is robabilistically the same regardless of the aths it took to reach s in the ast. Geometrically Distributed Duration of Stay The geometric distribution with arameter q [0, ] has the PMF (m) = ( q) m q, m =, 2,... For indeendent trials each with success robability q, this is the distribution of the number M of trials until (and including) the first success. The mean and variance of M are E[M] = q, var(m) = q q 2. Consider a homogeneous Markov chain. Let s be some state with self-transition robability ss > 0. Let L s be the duration of stay at s after entering s at some arbitrary time n + : Then, for m, L s = m if n s, n+ = = n+m = s, n+m+ s, P(L s = m n s, n+ = s) = P(L s = m n+ = s) = ss m ( ss). Huizhen Yu (U.H.) and Markov Models Jan. 2 7 / 32 So the duration has the geometric distribution with mean ss. Huizhen Yu (U.H.) and Markov Models Jan. 2 8 / 32 Some Proerties of Some Proerties of (m) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. Geometrically Distributed Duration of Stay Geometric distribution has a memoryless roerty: P`time of the first success = r + m failed the first r trials = (m). Illustration of geometric distributions with q = ss: ss = 0. ss = 0.3 ss = 0.5 ss = 0.9 0 0 2 4 6 8 0 2 4 6 8 20 m The shae of such distributions may not match that found in data. When the duration distribution reflects an imortant asect of the nature of data, a general aroach is to model the duration distribution exlicitly by including as art of the state variable the time already sent at s after entering s. Subsequences of a Markov Chain Let {n k } be a sequence of integers with n n 2... Let Y k = nk, k. Is {Y k } a Markov chain? We check the form of the joint PMF (y,..., y k+ ). We have (y,..., y k+ ) = (x, x 2,..., x nk+ ), x i and by the Markov roerty of { n}, i<n k+ i {n,...,n k+ } (x, x 2,..., x nk+ ) = (x, x 2,..., x nk ) (x nk +, x nk +2,... x nk+ x nk ). So (y,..., y k+ ) equals i<n k i {n,...,n k } x i (x, x 2,..., x nk )! 0 @ n k <i<n k+ (x nk +, x nk +2,... x nk+ x nk ) A, x i = (x n, x n2,..., x nk ) (x nk+ x nk ) = (y, y 2,..., y k ) (y k+ y k ). Thihows so {Y k } is a Markov chain. Y k+ Y, Y 2,..., Y k Y k, Huizhen Yu (U.H.) and Markov Models Jan. 2 9 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 20 / 32

Some Proerties of Some Proerties of Subsequences of a Markov Chain The indeendence relation nk+ n, n2,..., nk nk read off from the grahical model: can actually be m-ste Transition Probabilities of a Markov chain The m-ste transition robabilities (m) satisfy the recursive formula 2 n n k- k nk+ nk searates nk+ from n, n2,..., nk in the grah. (The exact meaning of this will be exlained together with more general results in the future.) A secial choice of the index sequence {n k } is n k = (k ) m +, for some fixed integer m >. The corresonding Markov chain is, m+, 2m+,.... The transition robabilities (m) of this chain (homogeneous case) are the m-ste transition robabilities of the original chain: (m) = P( m+ = j = i), i, j S. n r m : (m) = P( m+ = j = i) = l S P( r+ = l = i) P( m+ = j r+ = l) = l S (r) il (m r) lj, i, j S. (3) This is called the Chaman-Kolmogorov equation. (Once it was conjectured to be an equivalent definition for Markov chains, but this turned out to be false.) In matrix form, Eq. (3) can be exressed as bp (m) = b P m = b P r bp m r, r m. where P b denotes the transition robability matrix of { n}, and P b(m) denotes the m-ste transition robability matrix with P b(m) = (m), i, j S. We examine next the relation between (m) and, i, j S. Huizhen Yu (U.H.) and Markov Models Jan. 2 2 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 22 / 32 Some Proerties of Some Proerties of Let, 2,..., n be a Markov chain. Reversing the Chain Let Y k = n+ k, k n. Is {Y k } a Markov chain? 2 j, k: j+k = n + j j+ n- Y Y n Yn- Y k k- Y2 Y The joint PMF (x j, x j+,..., x n) can be exressed as so (x j, x j+,..., x n) = (x j, x j+) (x j+2,..., x n x j+) n = (x j x j+) (x j+) (x j+2,..., x n x j+) = (x j x j+) (x j+, x j+2,..., x n), (x j x j+,..., x n) = (x j x j+). Thihows the reversed sequence {Y k } is a Markov chain. Another Conditional Indeendence Relation Let, 2,..., n be a Markov chain. Denote j = (,..., j, j+,..., n), all variables but j. Then, for all j n, P( j = x j j = x j) = P( j = x j j = x j, j+ = x j+). (4) Visualize the ositions of the variables in the grah: 2 j- j j+ n Derivation of Eq. (4): the joint PMF (x j, x j ) can be written as (x j, x j ) = (x,..., x j ) (x j x j )(x j+ x j ) (x j+2,..., x n x j+ ), so it has the form (x j, x j ) = h(x,..., x j ) g(x j, x j, x j+ ) ω(x j+,..., x n) for some functions h, g, ω. Thihows (one of the exercises) that j,..., j 2, j+2,..., n j, j+. Huizhen Yu (U.H.) and Markov Models Jan. 2 23 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 24 / 32

Some Proerties of Hidden Markov Models (HMM) Hidden Markov Models refer loosely to a broad class of models in which a subset of the random variables, denoted by = {, 2,...}, is modeled as a Markov chain, and their values are not observed in ractice; the rest of the random variables, denoted by Y, are observable in ractice, and P(Y ) usually has a simle form. A common case: Y = {Y, Y 2,...}, and conditional on, Y is are indeendent with P(Y i ) = P(Y i i, i). (Treat 0 as a dummy variable.) The joint PMF of (, Y ) then factorizes as (x, y) = Y (x i x i )(y i x i, x i). i (y i x i, x i): often called observation (or emission) robabilities A grahical model indicating the factorization form of : 2 Y Y2 3 Y 3 n- Y n- n Yn The sequence of ( i, Y i) jointly is a Markov chain. Huizhen Yu (U.H.) and Markov Models Jan. 2 25 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 26 / 32 HMM and Data-Generating Processes HMM for Parts-of-Seech Tagging 2 Y Y2 3 Y 3 n- Y n- n Yn In some alications, the above structure of HMM corresonds intuitively to the data-generating rocess: for instance, Seech recognition: seech, Y acoustic signals Tracking a moving target: ositions/velocities of the target, Y signals detected Robot navigation: ositions of a robot, Y observations of landmarks In other alications, the model can have nothing to do with the underlying data-generating rocess, and is introduced urely for questions at hand. Examles include sequence alignment, sequence labeling alications. Parts-of-seech tags for a sentence: ron v adv final unct. I drove home. Possible tags: I: n, ron drove: v, n home: n, adj, adv, v Reresent a sentence as a sequence of words W = (W, W 2,..., W n). Let the associated tags be T = (T, T 2,..., T n). A Markov chain model for (W i, T i), i n used in ractice is: (w, t) = ny (w i t i)(t i t i, t i 2). i= Question: Why W is treated as generated by T and not the other way around? Huizhen Yu (U.H.) and Markov Models Jan. 2 27 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 28 / 32

Another Artificial Examle of Sequence Labeling Suose two different tyes of regions, tye + and tye -, can occur in the same DNA sequence, and each tye has its own characteristic attern of transitions among the bases A, C, G, T. Given a sequence un-annotated with region tyes, we want to find where changes between the two tyes may occur and whether they occur sharly. An HMM for this roblem: Let Y = {Y i} corresond to a DNA sequence. Introduce variables Z i with z i {+, } to indicate which tye is in force at osition i. Let i = (Y i, Z i), and model = { i} as a Markov chain on S = {A, C, G, T} {+, }. Then, for an un-annotated sequence y, we solve arg max x P( = x Y = y), equivalently, arg max P(Z = z Y = y). z Any z in the latter set gives one of the most robable configurations of boundaries between tye + and tye - regions for the sequence y. Huizhen Yu (U.H.) and Markov Models Jan. 2 29 / 32 Inference and Learning with HMM Quantities of interest in alications usually include P(Y = y), and P( = x Y = y) for a single x arg max x P( = x Y = y), the most robable ath given Y = y P( i Y = y), the marginal distribution of i given Y = y; and arg max xi P( i = x i Y = y) Efficient inference algorithms are available utilizing the structure/factorization form of, comutation can be streamlined. To secify an HMM, we need to secify its toology (sace of i, relation between { i} and {Y i}), and then its arameters. Parameters of HMM in the earlier examles: transition robabilities and observation robabilities Parameter estimation: Comlete data case: sequences of states {x i} are given for estimation This case reduces to the case of Markov chains. Incomlete data case: sequences of states {x i} are unknown In this case estimation is based on the observed sequences of {y i}, and is tyically done with the so-called exectation-maximization (EM) algorithm. We will study these toics in some future lectures. Huizhen Yu (U.H.) and Markov Models Jan. 2 30 / 32 HMM vs. High Order Markov Models In the alication contexthown earlier, the main interest is on the hidden/latent variables { i}. Consider now the case where our interest iolely on {Y i} (for instance, to redict Y n+ given Y,..., Y n). We comare two choices of models for {Y i}: a Markov model for {Y i}, ossibly of high order; an HMM for {Y i}, in which we introduce auxiliary, latent random variables { i}. On Markov chains: Further Readings. A. C. Davison. Statistical Models, Cambridge Univ. Press, 2003. Cha. 6.. In an HMM for {Y i}, (y, y 2,..., y n) = x,...,x n ny (x i x i ) (y i x i, x i), i= (You may ski materials on. 229-232 about classification of states if not interested.) so generally, Y,..., Y n are fully deendent as modeled. In a Markov or high order Markov model, certain conditional indeendence among {Y i} is assumed. Thihows with HMM we aroximate the true distribution of {Y i} by relatively simle distributions of another kind than those in a Markov or high order Markov model. Huizhen Yu (U.H.) and Markov Models Jan. 2 3 / 32 Huizhen Yu (U.H.) and Markov Models Jan. 2 32 / 32