Large deviation bounds for functionals of Viterbi paths

Size: px

Start display at page:

Download "Large deviation bounds for functionals of Viterbi paths"

Derek Hutchinson
6 years ago
Views:

1 Large deviation bounds for functionals of Viterbi paths Arka P. Ghosh Elizabeth Kleiman Alexander Roitershtein February 6, 2009; Revised December 8, 2009 Abstract In a number of applications, the underlying stochastic process is modeled as a finitestate discrete-time Markov chain that cannot be observed directly and is represented by an auxiliary process. The maximum a posteriori MAP estimator is widely used to estimate states of this hidden Markov model through available observations. The MAP path estimator based on a finite number of observations is calculated by the Viterbi algorithm, and is often referred to as the Viterbi path. It was recently shown in [2, 3] and [6, 7] see also [2, 5] that under mild conditions, the sequence of estimators of a given state converges almost surely to a limiting regenerative process as the number of observations approaches infinity. This in particular implies a law of large numbers for some functionals of hidden states and finite Viterbi paths. The aim of this paper is to provide the corresponding large deviation estimates. MSC2000: primary 60J20, 60F0; secondary 62B5, 94A5. Keywords: hidden Markov models, maximum a posteriori path estimator, Viterbi algorithm, large deviations, regenerative processes. Introduction and statement of results Let X n n 0 be a discrete-time irreducible and aperiodic Markov chain in a finite state space D = {,..., d}, d N. We interpret X = X 0, X,... as an underlying state process that cannot be observed directly, and consider a sequence of accessible observations Y = Y 0, Y,..., that serves to produce an estimator for X. For instance, Y n can represent the measurement of a signal X n disturbed by noise. Given a realization of X, the sequence Y is formed by independent random variables Y n valued in a measurable space Y, S, with Y n dependent only on the single element X n of the sequence X. Department of Statistics and Department of Mathematics, Iowa State University, Ames, IA 500, USA; apghosh@iastate.edu Department of Computer Science and Department of Mathematics, Iowa State University, Ames, IA 500, USA; ellerne@iastate.edu Department of Mathematics, Iowa State University, Ames, IA 500, USA; roiterst@iastate.edu

2 Formally, let N 0 := N {0} and assume the existence of a probability space Ω, F, P that carries both the Markov chain X as well as an i.i.d. sequence of Ω, F-valued random variables ω n, n N 0 independent of X, such that Y n = hx n, ω n for some measurable function h : D Ω Y. We assume that X is stationary under P. Let p ij, i, j D, be the transition matrix of the Markov chain X. Then the sequence of pairs X n, Y n n N0 forms, under the law P, a stationary Markov chain with transition kernel defined for n N 0, i, j D, y Y, A S by Ki, y; j, A := P X n+ = j, Y n+ A X n = i, Y n = y = p ij P j, ω 0 h A 2 Notice that Ki, y; j, A is in fact independent of the value of y. The sequence X n, Y n n 0 constructed above is called a Hidden Markov Model HMM. We refer to monographs [4, 6, 7, 9, 0, 3, 9] for a general account on HMM. The Hidden Markov Models have numerous applications in various areas such as communications engineering, bio-informatics, finance, applied statistics, and more. An extensive list of applications, for instance to coding theory, speech recognition, pattern recognition, satellite communications, and bio-informatics, can be also found in [2, 3, 6, 7]. One of the basic problems associated with HMM is to determine the most likely sequence of hidden states X n M n=0 that could have generated a given output Y n M n=0. In other words, given a vector y n M n=0 Y M, M N 0, the problem is to find a feasible sequence of states U M 0, U M,..., U M M such that P X 0 = U M 0, X = U M,..., X M = U M M Y0 = y 0, Y = y,..., Y M = y M = max P X 0 = x 0, X = x,..., X M = x Y0 M = y 0, Y = y,..., Y M = y M, x 0,...,x M D M which is equivalent to P X n = U M n, Y n = y n : 0 n M = max x 0,...,x M D M P X n = x n, Y n = y n : 0 n M 3 A vector U n M M that satisfies 3 is called the maximum a-posteriori MAP path estimator. It can be efficiently calculated by the Viterbi algorithm [8, 9]. In general, the n=0 estimator is not uniquely defined even for a fixed outcome of observations y n M n=0. Therefore, we will assume throughout that an additional selection rule for example, according to the lexicographic order is applied to produce U n M M. The MAP path estimator is n=0 used for instance in cryptanalysis, speech recognition, machine translation, and statistics, see [7, 8, 9, 9] and also references in [2, 3, 6, 7]. In principle, for a fixed k N, the first k entries of U M M i might vary significantly i=0 when M increases and approaches infinity. Unfortunately, it seems that very little is known about the asymptotic properties of the MAP path estimator for general HMM. However, recently it was shown in [2, 3] that under certain mild conditions on the transition kernel of an HMM, which were further amended in [6, 7], there exists a strictly increasing sequence of integer-valued non-negative random variables T n n such that the following properties 2

3 hold In fact, we use here a slightly strengthened version of the corresponding results in [6, 7]. The proof that the statement holds in this form under the assumptions introduced in [6, 7] is given in Lemma 2.2 below. Condition RT. i T n n is a delayed sequence of positive integers renewal times, that is the increments τ n := T n+ T n, n, form an i.i.d. sequence which is furthermore independent of T. ii Both T and τ have finite moments of every order. In fact, there exist positive constants a > 0 and b > 0 such that P T > t ae bt and P τ > t ae bt, for all t > 0. 4 iii There exist a positive integer r 0 such that for any fixed i, U k m = U T i+r m for all k T i + r and m T i. 5 In particular, for any n 0, the following limit exists and the equality holds with probability one: where U n := lim U n k = U T kn n +r, 6 k k n = min{i N : T i + r n}, 7 that is k n is the unique sequence of integers such that T kn < n r T kn for all n 0. The interpretation of the above property is that at a random time T i + r, blocks of estimators U k T i +,..., U k T i here and henceforth we make the convention that T 0 = 0 are fixed for all k T i + r regardless of the future observations Y j j Ti +r. iv For n 0, let Z n := X n, U n. 8 Then the sequence Z := Z n n 0 forms a regenerative process with respect to the embedded renewal structure T n n 0. That is, the random blocks W n = Z Tn, Z Tn +,..., Z Tn+, n 0, 9 are independent and, moreover, W, W 2,... but possibly not W 0 are identically distributed. Following [2, 3, 6, 7] we refer to the sequence U = U n n 0 as the infinite Viterbi path. Condition RT yields a nice asymptotic behavior of the sequence U n, see Proposition in [2] for a summary and for instance [] for a general account of regenerative processes. In particular, it implies the law of large numbers that we describe below. Let ξ : D 2 R be a real-valued function, and for all n 0, i = 0,,..., n, define ξ n i = ξ X i, U n i and ξ n = ξ X n, U n, 0 3

4 and Ŝ n = n i=0 ξ n i and S n = n ξ i. i=0 An important practical example, which we borrow from [2], is { if x y, ξx, y = {x y} = 0 if x = y In this case S n and Ŝn count the number of places where the realization of the state X i differs from the estimators U i and U n i respectively. If Condition RT holds, then see [2, 6] there is a unique probability measure Q on the infinite product space D 2 N 0 where the sequence Z n n 0 is defined, which makes W n n 0 introduced in 9 into an i.i.d. sequence. Furthermore, under Condition RT, the following limits exist and the equalities hold: S n µ := lim n n = lim Ŝ n n n = E Q ŜT2 ŜT, P a.s. and Q a.s., 2 E Q T 2 T where E Q denotes the expectation with respect to measure Q. In fact, Q is the conditional measure given by Q = P Z T = 0, 3 where it is assumed that Z n n 0 is extended into a stationary sequence Z n n Z. The goal of this paper is to provide the following complementary large deviation estimates to the above law of large numbers. Theorem.. Let Assumption.3 hold. Then either Q Ŝ T2 = µt 2 = or there exist a constant γ 0, µ and a function Ix : µ γ, µ + γ [0, + such that i Ix is lower semi-continuous and convex, Iµ = 0, and Ix > 0 for x µ. ii lim n n log P Ŝ n nx = Ix for all x µ, µ + γ. iii lim n n log P Ŝ n nx = Ix for all x µ γ, µ. Assumption.3 stated below is a set of specific conditions on HMM that ensures the existence of a sequence T n n satisfying Condition RT. The rate function Ix is specified in 8 below. The proof of Theorem. is included in Section 2. First we show that Assumption.3 ensures that Condition RT is satisfied for an appropriate sequence of random times T n n. This is done in Lemma 2.2 below. Then we show that as long as the non-degeneracy condition Q Ŝ T2 = µt 2 is satisfied, the existence of the regeneration structure described by Condition RT implies the large deviation result stated in Theorem.. Related results for non-delayed regenerative processes in which case W 0 defined in 9 would be distributed the same as the rest of the random blocks W n, n can be found for instance in [, 4, 8]. We cannot apply general large deviation results for regenerative processes directly to our setting 4

5 for the following two reasons. First, in our case the first block W 0 is in general distributed differently from the rest of the blocks W n, n, defined in 9. Secondly, the estimators U n i for i > T kn +r differ in general from the limiting random variables U i. In principle, the contribution of these two factors to the tails asymptotic on the large deviation scale could be significant because both W 0 and n i=t k n +r+ ξn i where the last sum is assumed to be empty if T kn + r + > n have exponential tails. However, it turns out that this is not the case, and the large deviation result for our model holds with the same classical rate function Ix as it does for the non-delayed purely regenerative processes in [, 4, 8]. In fact, only a small modification is required in order to adapt the proof of a large deviation result for regenerative processes in [] to our model. We next state our assumptions on the underlying HMM. The set of assumptions that we use is taken from [6, 7]. We assume throughout that the distribution of Y n, conditioned on X n = i, has a density with respect to a reference measure ν for all i D. That is, there exist measurable functions f i : Y R +, i D, such that P Y 0 A X 0 = i = f i yνdy, A S, i D. 4 Y For each i D, let G i = {y Y : f i y > 0}. That is, the closure of G i is the support of the conditional distribution P Y 0 X 0 = i. Definition.2. We call a subset C D a cluster if min P Y 0 G i X 0 = j j C i C > 0 and max j C P j Y 0 G i X 0 = j = 0. i C In other words, a cluster is a maximal subset of states C such νg C > 0, where G C := i C G i. Assumption.3. [6, 7] For k D, let H k = max j D p jk. Assume that P f k Y 0 H k > max i k f iy 0 H i X 0 = k > 0, k D. 5 Moreover, there exists a cluster C D and m N such that the m-th power of the substochastic matrix H C = p ij i,j C is strictly positive. Assumption.3 is taken from the conditions of Lemma 3. in [6] and [7], and it is slightly weaker than the one used in [2, 3]. This assumption is satisfied if C = D and P f k Y 0 > a max i k f iy 0 X 0 = k > 0, k D, a > 0. 6 The latter condition holds for instance if f i is the density of a Gaussian random variable centered at i with variance σ 2 > 0. For a further discussion of Assumption.3 and examples of HMM that satisfy this assumption see Section III in [7] and also the beginning of Section 2 below, where some results of [7] are summarized and their relevance to Condition RT is explained. 5

6 2 Proof of Theorem. Recall the random variables ξ n i defined in 0. For the rest of the paper we will assume, actually without loss of generality, that ξ n i > 0 for all n 0 and integers i [0, n]. Indeed, since D is a finite set, the range of the function ξ is bounded, and hence if needed we can replace ξ by ξ = M + ξ with some M > 0 large enough such that ξ n i + M > 0 for all n 0, and i [0, n]. We start with the definition of the sequence of random times T n n that satisfies Condition RT. A similar sequence was first introduced in [2] in a slightly less general setting. In this paper, we use a version of the sequence which is defined in Section 4 of [6]. The authors of [6] do not explicitly show that Condition RT is fulfilled for this sequence. However, only a small modification of technical lemmas from [6] or [7] is required to demonstrate this result. Roughly speaking, for some integers M N and r [0, M ], the random times θ n := T n + r M + are defined, with the additional requirement θ n+ θ n > M, as successive hitting time of a set in the form q A, q D M, A Y M, for the Markov chain formed by the vectors of pairs R n = X i, Y i n+m i=n. More precisely, the following is shown in [6, 7]. See Lemmas 3. and 3.2 in either [6] or[7] for i and ii, and Section 4 [p. 202] of [6] for iii. We notice that a similar regeneration structure was first introduced in [2] under a more stringent condition than Assumption.3. Lemma 2.. [6, 7] Let Assumption.3 hold. Then there exist integer constants M N, r [0, M ], a state l D, a vector of states q D M, and a measurable set A Y M such that the following hold: i P X n M n=0 = q, Y n M n=0 A > 0, ii If for some k 0, Y i k+m i=k and m k + M. Furthermore, A, then U m j = U k+m j = l for any j k +M r iii Let θ = inf{k 0 : X i k+m i=k = q and Y i k+m i=k A}, and for n, θ n+ = inf{k θ n + M : X i k+m i=k = q and Y i k+m i=k A}. Define T n = θ n + M r. 7 For n 0, let L n := Y n, U n. Then the sequence L n n 0 forms a regenerative process with respect to the embedded renewal structure T n n 0. That is, the random blocks J n = L Tn, L Tn +,..., L Tn+, n 0, are independent and, moreover, J, J 2,... but possibly not J 0 are identically distributed. Furthermore, there is a deterministic function g : n N Yn n N Dn such that U Tn, U Tn+,..., U Tn+ = g Y Tn, Y Tn+,..., Y Tn+. 6

7 In fact, using Assumption.3, the set A is designed in [6, 7] in such a way that once a M-tuple of observable variables Y i n+m i=n that belongs to A occurs, the value of the estimator U m n+m r is set to l when m = n+m, and, moreover, it remains unchanged for all m n+m regardless of the values of the future observations Y i i n+m. The elements of the set A are called barriers in [6, 7]. Using basic properties of the Viterbi algorithm, it is not hard to check that the existence of the barriers implies claims ii and iii of the above lemma. In fact, the Viterbi algorithm is a dynamic programming procedure which is based on the fact that for any integer k 2 and states i, j D there exists a deterministic function g k,i,j : Y k D k such that U n m, U n+, m..., U m n+k = g k,i,j Yn, Y n+,..., Y n+k for all n 0 and m n + k once it is decided that U n m = i and U m n+k = j. See [6, 7] for details. We have: Lemma 2.2. Let Assumption.3 hold. Then Condition RT is satisfied for the random times T n defined in 7. Proof. We have to show that properties i iv listed in the statement of Condition RT hold for T n n N. i-ii Notice that θ n n defined in iii of Lemma 2. is besides the additional requirement that θ n+ θ n > M essentially the sequence of successive hitting times of the set q A for the Markov chain formed by the pairs of M-vectors X i, Y i n+m i=n, n 0. Therefore, i and ii follow from Lemma 2.-i and the fact that the M-vectors X i n+m i=n, n 0, form an irreducible Markov chain on the finite space D M = { x D M : P X i M i=0 = x > 0 } while the distribution of the observation Y i depends on the value of X i only. iii Follows from ii of Lemma 2. directly. iv The definition of the HMM together with 7 imply that X n, Y n n 0 is a regenerative process with respect to the renewal sequence T n n. The desired result follows from this property combined with the fact that U Tn, U Tn+,..., U Tn+ is a deterministic function of Y Tn, Y Tn+,..., Y Tn+ according to part iii of Lemma 2.. The proof of the lemma is completed. We next define the rate function Ix that appears in the statement of Theorem.. The rate function is standard in the large deviation theory of regenerative processes see for instance [, 4, 8]. Recall from 3 the conditional measure Q which makes W = W n n 0 defined in 9 into a stationary sequence of random blocks. For n let τ n = T n T n and let S, τ be a random pair distributed under the measure Q identically to any of T n k=t n ξ k, τ n, n. For any constants α R, Λ R, and x 0, set Γα, Λ = log E Q expα S Λτ, and define Λα = inf{λ R : Γα, Λ 0} and { } Ix = sup αx Λα, 8 α R 7

8 using the usual convention that the infimum over an empty set is +. We summarize some properties of the above defined quantities in the following lemma. Recall the definition of µ from 2. Lemma 2.3. The following hold: i Λα and Ix are both lower semi-continuous convex functions on R. Moreover, Iµ = 0 and Ix > 0 for x µ. ii If there is no c > 0 such that Q Ŝ T2 = ct 2 =, then a Λα is strictly convex in an neighborhood of 0. b Ix is finite in some neighborhood of µ, and for each x in that neighborhood we have Ix = α x x Λα x for some α x such that x µα x > 0 for x µ and lim x µ α x = 0. Proof. i The fact that Λα and Ix are both lower semi-continuous convex functions is wellknown, see for instance p in []. Furthermore, 4 implies that Γ α, Λα = 0 for α in a neighborhood of 0. Moreover, the dominated convergence theorem implies that Γ has continuous partial derivatives in a neighborhood of 0, 0. Therefore, by the implicit function theorem, Λα is differentiable in this neighborhood. In particular, Λ0 = 0, and Γ 0, 0 Λ 0 = α = µ. Γ Λ 0, 0 Since Λ is a convex function, this yields the second part of i, namely establishes that Iµ = 0 and Ix > 0 for x µ. ii If there is no constants c, b R such that Q Ŝ T2 ct 2 = b =, we can use the results of [] for both ii-a and ii-b, see Lemma 3. and the preceding paragraph on p of []. It remains to consider the case when Q Ŝ T2 ct 2 = b = with b 0 but there are no c > 0 such that Q Ŝ T2 ct 2 = 0 =. In this case, using the definition 8 and the estimate 4, we have for α in a neighborhood of 0, = E Q [ exp αst2 ΛαT 2 ] = EQ [ exp αb + ct2 ΛαT 2 ] = e αb E Q [ exp T2 αc Λα ]. That is, for α in a neighborhood of 0, E Q [ exp T2 αc Λα ] = e αb. 9 Suppose Λα is not strictly convex in any neighborhood of 0, that is Λα = kα for some k R and all α in a one-sided say positive neighborhood of zero. Let α > 0 and α 2 > α 8

9 be two values of α in the aforementioned neighborhood of zero for which 9 is true. Using α 2 Jensen s inequality E Q X α EQ X α 2 α, with X = exp T 2 α c Λα = exp T 2 α c k, one gets that the identity 9 can hold only if Q T 2 = c 0 = for some c0 > 0. But under this condition and the assumption Q Ŝ T2 ct 2 = b =, we would have Q Ŝ T2 T 2 c c 0 + b/c 0 = 0 = in contradiction to what we have assumed in the beginning of the paragraph. This completes the proof of part a, namely shows that Λα is strictly convex in a neighborhood of 0 provided that there is no c > 0 such that Q Ŝ T2 ct 2 = 0 =. To complete the proof of the lemma, it remains to show that part b of ii holds. Although the argument is standard, we give it here for the sake of completeness. We remark that in contrast to the usual assumptions, we consider properties of Λα and Ix only in some neighborhoods of 0 and µ respectively, and not in the whole domains where these functions are finite. Recall that Λα is an analytic and strictly convex function of α in a neighborhood of zero see for instance [] or [8]. In particular, Λ α exists and is strictly increasing in this neighborhood. Since Λ 0 = µ, it follows that for all x in a neighborhood of µ, there is a unique α x such that x = Λ α x. Since Λ α is a continuous increasing function, we have lim x µ α x = 0 and α x x µ > 0. Furthermore, since Λα is convex, we obtain Λα Λα x xα α x, and hence xα x Λα x xα Λα for all α R. This yields Ix = xα x Λα x, completing the proof of the lemma. Remark 2.4. Part ii-b of the above lemma shows that the rate function Ix is finite in a neighborhood of µ as long as there is no c > 0 such that Q Ŝ T2 ct 2 = 0 =. On the other hand, if Q Ŝ T2 = µt 2 =, then the definition of Λα implies Λα = µα for all α R, and hence Ix = + for x µ. Part of the claim i of Theorem. is in the part i of the above lemma. Therefore we now turn to the proof of parts ii and iii of Theorem.. We start from the observation that the large deviation asymptotic for the lower tail P Ŝ n nx with x < µ, can be deduced from the corresponding results for the upper tail P Ŝ n nx, x > µ. Indeed, let ξi, j = µ ξi, j and S n = n i ξ X i, U n i. Then Furthermore, for the function ξ we have P Ŝ n nx = P S n nµ x. 20 Γα, Λ = log E expαµτ W Λτ = log E exp αw Λ αµτ, and hence Λα := inf{λ R : Γα, Λ 0} = Λ α + µα, which in turn implies Ix := sup{α R : αx Λα} = sup{α R : αµ x Λ α} = Iµ x. Therefore, part iii of Theorem. can be derived from part ii applied to the auxiliary function ξ. It remains to prove part ii of the theorem. For the upper bound we adapt the proof of Lemma 3.2 in []. 9

10 Proposition 2.5. Let Assumption.3 hold and suppose in addition that ξi, j > 0 for all i, j D and Q Ŝ T2 = µt 2. Let Ix be as defined in 8. Then there exists a constant γ > 0 such that lim sup n n log P Ŝ n nx Ix for all x µ, µ + γ. Proof. Recall k n from 7. For n N and integers i [0, k n, let S i = T i j=0 ξ X j, U n j, with the convention that S = 0 if T = 0. Further, recall the random sequence k n from 7, and set R n = Ŝn S kn = n i=t kn ξ X i, U n i. We use the following series of estimates, where α > 0 and Λ > 0 are arbitrary positive parameters small enough to ensure that all the expectations below exist due to the tail estimates assumed in 4: P Ŝ n nx = P Skn + R n nx n = P Sj + R n nx; T j < n r T j+ j=0 n P Sj + R n nx; T j < n n P α S j + R n nx + Λn T j 0 j=0 j=0 n E [ exp α S ] j xn ΛT j n + αr n j=0 = E [ exp α S + R n ] exp αx Λn n exp j Γα, Λ j=0 E [ exp α S + R n ] exp αx Λn eγα,λn e Γα,Λ. By the Cauchy-Schwartz inequality, E [ exp α S + R n ] E [ exp 2α S ] E [ exp 2αRn ]. Therefore, using any M > 0 such that ξi, j < M for all i, j D, lim sup n n log E[ exp α S + R n ] = lim sup n lim n n log E[ exp 2αMn T ] kn = 0, 2n log E[ exp ] 2αR n where the first equality is due to 4, while in the last step we used the renewal theorem which shows that the law of n T kn converges, as n goes to infinity, to a proper limiting distribution. Therefore, if α > 0 and Λ > 0 are small enough and Γα, Λ 0, then lim sup n n log P Ŝ n nx αx Λ. 0

11 It follows then from the definition of Λα given in 8 that lim sup n The rest of the proof is standard. Let It follows from 2 that n log P Ŝ n nx αx Λα. 2 A := sup { α > 0 : E e α S <, E e αs 2 < }. lim sup n n log P Ŝ n nx sup αx Λα, 0<α<A and therefore lim sup n n log P Ŝ n nx Ix provided that α x 0, A, where α x is defined in the statement of Lemma 2.3. Since by part ii-b of Lemma 2.3, we have lim x µ α x = 0, this completes the proof of Proposition 2.5. For the lower bound in part ii of Theorem. we have Proposition 2.6. Let Assumption.3 hold and suppose in addition that ξi, j > 0 for all i, j D. Let Ix be as defined in 8. Then lim inf n log P Ŝ n n nx Ix for all x > µ. Using the assumption that ξi, j > 0 the proof of this proposition can be done by relying on the arguments used in the proof of Lemma 3.4 in [] nearly verbatim. For the sake of completeness we sketch the proof here. First, notice that the proposition is trivially true in the degenerate case Q Ŝ T2 = µt 2 = where Ix = + see Remark 2.4. Assume now that Q Ŝ T2 = µt 2. Then, as in Lemma 3. of [], we have: inf 0<γ< γγ x/γ, /γ = Ix, 22 where Γ u, v := sup α,λ R {αu Λv Γα, Λ}. Notice that since the infimum in 22 is taken over strictly positive values of γ, the proof of this identity given in [] works verbatim in our case. This is because considering only strictly positive values of γ makes the first renewal block, which is special and difficult to control in our case, irrelevant to the proof of the identity 22. Since ξ n i are assumed to be positive numbers, we have for any ε > 0, γ 0,, and x > µ : P Ŝ n xn P Ŝ T[γn] Ŝ xn; T [γn] < n P Ŝ T[γn] Ŝ x + ε γ [γn]; T [γn] < γ [γn], 23 where, as usual, [x] denotes the integer part of x R, that is [x] = max{z Z : z x}. Applying Cramer s large deviation theorem [5] to a sequence of i.i.d 2-dimensional vectors ŜT[γn] Ŝ x+ε[γn]; T γ [γn] we obtain n N lim n n log P Ŝ T[γn] Ŝ x + ε γ [γn]; T [γn] < γ [γn] = Γ x + ε/γ, /γ

12 Letting ε go to zero and using 23, we obtain lim inf n n log P Ŝ n xn inf 0<γ< γγ x/γ, /γ, which completes the proof of Proposition 2.6 in virtue of 22. Propositions 2.5 and 2.6 combined together yield the claim of part ii of Theorem., completing the proof of the theorem. References [] J. Bucklew, The blind simulation problem and regenerative processes, IEEE Trans. Inform. Theory , [2] A. Caliebe, Properties of the maximum a posteriori path estimator in hidden Markov models, IEEE Trans. Inform. Theory , 4 5. [3] A. Caliebe and U. Rösler, Convergence of the maximum a posteriori path estimator in hidden Markov models, IEEE Trans. Inform. Theory , [4] O. Cappé, E. Moulines, T. Rydén, Inference in Hidden Markov Models, Springer, [5] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, 2nd edition, Applications of Mathematics 38, Springer-Verlag, New York, 998. [6] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge Univ. Press, 998. [7] Y. Ephraim and N. Merhav, Hidden Markov processes, IEEE Trans. Inform. Theory , [8] G. D. Forney, The Viterbi algorithm, Proceedings of the IEEE 6 973, [9] X. Huang, Y. Ariki, and M. Jack, Hidden Markov Models for Speech Recognition, Edinburgh Univ. Press, 990. [0] F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 200. [] V. Kalashnikov, Topics on Regenerative Processes, CRC Press, Boca Raton, FL, 994. [2] A. Koloydenko, M. Käärik, and J. Lember, On adjusted Viterbi training, Acta Appl. Math , [3] A. Krogh, Computational Methods in Molecular Biology, Amsterdam: North-Holland, 998. [4] T. Kuczek and K. N. Crank, A large-deviation result for regenerative processes, J. Theor. Probab. 4 99,

13 [5] J. Lember and A. Koloydenko, Adjusted Viterbi training, Probab. Engrg. Inform. Sci , [6] J. Lember and A. Koloydenko, The adjusted Viterbi training for hidden Markov models, Bernoulli , [7] J. Lember and A. Koloydenko, A constructive proof of the existence of Viterbi processes, preprint, [8] P. Ney and E. Nummelin, Markov additive processes II. Large deviations, Ann. Probab , [9] L. R. Rabiner, A tutorial on Hidden Markov Models and selected applications in speech recognition, Proceedings of the IEEE ,

Kesten s power law for difference equations with coefficients induced by chains of infinite order

Kesten s power law for difference equations with coefficients induced by chains of infinite order Arka P. Ghosh 1,2, Diana Hay 1, Vivek Hirpara 1,3, Reza Rastegar 1, Alexander Roitershtein 1,, Ashley Schulteis