Asymptotic properties of the maximum likelihood estimator for a ballistic random walk in a random environment

Asymptotic properties of the maximum likelihood estimator for a ballistic random walk in a random environment Catherine Matias Joint works with F. Comets, M. Falconnet, D.& O. Loukianov Currently: Laboratoire Statistique & Génome, Évry, FRANCE Soon: Lab. Probabilités & Modèles Aléatoires, Paris, FRANCE http://cmatias.perso.math.cnrs.fr/

Outline Biophysical context Nearest-neighbour one-dimensional random walk in random environment MLE construction and properties RWRE and Branching process with immigration in random environment (BPIRE) Three examples Simulations

DNA unzipping RWRE introduced by [Chernov (67)] to model DNA replication. By the end of 90 s, various DNA unzipping experiments appeared. f G A C A C T C T A C C T G A 1 2 3 4 5 M A G A T G G A C T G T G T C f Goals DNA sequencing (exploratory), Study the structural properties of the molecule.

Model description I Random environment on Z ω = {ω x } x Z i.i.d. with ω x (0, 1) and ω x ν θ, θ Θ unknown parameter, Θ R d compact set, P θ = ν Z θ law on (0, 1) Z of ω and E θ expectation, Markov process conditional on the environment For fixed ω, let X = {X t } t N be the Markov chain on Z starting at X 0 = 0 and with transitions ω x if y = x + 1, P ω (X t+1 = y X t = x) = 1 ω x if y = x 1, 0 otherwise. P ω is the measure on the path space of X given ω (quenched law).

Model description II Random walk in random environment (RWRE) The (unconditional) law of X is the annealed law P θ ( ) = P ω ( )dp θ (ω), Note that X is not a Markov process. 1 ω x ω x 0 x 1 x x + 1

Limiting behaviour of X Let ρ x = 1 ω x ω x, x Z. [Solomon (75)] proved the classification: (a) Recurrent case: If E θ (log ρ 0 ) = 0, then = lim inf t X t < lim sup X t = +, t P θ -almost surely. (b) Transient case: if E θ (log ρ 0 ) < 0, then lim X t = +, t P θ -almost surely. If we moreover let T n = inf{t N : X t = n}, then (b1) Ballistic case: if E θ (ρ 0 ) < 1, then, P θ -almost surely, T n /n c, P θ -a.s. (b2) Sub-ballistic case: If E θ (ρ 0 ) 1, then T n /n + P θ -almost surely, when n tends to infinity.

Context Goal and context Goal: Estimate the parameter value θ relying on the observation of X [0,Tn]. In a much more general setting, [Adelman & Enriquez (04)] provide a link between the RWRE and the environment, leading to moment estimators for the distribution ν θ. Drawback: estimate some moments first and then invert a function to recover the parameter θ. May induce a loss of efficiency. We focus on maximum likelihood estimation (MLE). We assume a transient ballistic random walk.

MLE construction I We let L n x := T n 1 s=0 1{(X s, X s+1 ) = (x, x 1)} the number of left steps from site x and R n x the number of right steps (defined similarly). We have P ω (X [0,Tn]) = ω Rn x x (1 ω x ) Ln x x Z and (i.i.d. env.) P θ (X [0,Tn]) = x Z 1 Note that Only the visited sites contribute in this product. The number of visited sites x < 0 is bounded. For x = 1,..., n 1, R n x = L n x+1 + 1 0 a Rn x (1 a) Ln x dν θ (a). R n x 0 x x + 1 n L n x+1

MLE construction II Let φ θ be the function from N 2 to R given by φ θ (x, y) = log 1 The criterion function θ l n (θ) is defined as and our estimator is 0 a x+1 (1 a) y dν θ (a). (1) n 1 l n (θ) = φ θ (L n x+1, L n x), x=0 θ n Argmax l n (θ). θ Θ

Results: consistency, asymptotic normality and efficiency Under appropriate (and classical) assumptions, in the transient ballistic case, we establish that the MLE satisfies Consistency: lim n + θn = θ, in P -probability, Asymptotic normality: n( θ n θ ) P dist. N (0, Σ 1 θ ), Efficiency: Σ θ is the Fisher information matrix. Francis Comets, Mikael Falconnet, Oleg Loukianov, Dasha Loukianova & Catherine Matias Maximum likelihood estimator consistency for ballistic random walk in a parametric random environment. Stochastic Processes & Applications, 124(1): 268-288, 2014. Mikael Falconnet, Dasha Loukianova & Catherine Matias, Asymptotic normality and efficiency of the maximum likelihood estimator for the parameter of a ballistic random walk in a random environment. Mathematical Methods of Statistics, 23(1):1-19, 2014.

Underlying BPIRE I Main property (Kesten, Kozlov, Spitzer, 75) where (L n n, L n n 1,..., Ln 0 ) P (Z 0,..., Z n ) Z 0 = 0, and for k = 0,..., n 1, Z k+1 = Z k i=0 ξ k+1,i, with {ξ k,i } k N ;i N independent and m N, P ω (ξ k,i = m) = (1 ω k ) m ω k. Under annealed law P θ, {Z n } n N is an irreducible positive recurrent homogeneous Markov chain with transition kernel Q θ.

Underlying BPIRE II Consequence We have an equality in P -distribution l n (θ) = dist. n 1 k=0 φ θ(z k, Z k+1 ) and the right-hand side is (up to a constant) the likelihood of a positive recurrent Markov process. About ballistic assumption Stationary measure of (Z n ) has a finite first order moment only in the ballistic case. In this case, l n /n converges to a finite limit l, Sub-ballistic case studied in Mikael Falconnet, Arnaud Gloter & Dasha Loukianova Maximum likelihood estimation in the context of a sub-ballistic random walk in a parametric random environment. arxiv 1405.2880.

Examples of environment distributions I Example 1: Finite and known support Fix a 1 < a 2 (0, 1) and let ν p = pδ a1 + (1 p)δ a2, where δ a is the Dirac mass located at value a. Unknown parameter p Θ (0, 1) (namely θ = p) Assume that a 1, a 2 and Θ are such that the process is transient and ballistic. Then, the assumptions are satisfied and one can estimate p consistently and efficiently. May be generalised to K > 2 fixed and known support points and θ = (p 1,..., p K 1 ).

Examples of environment distributions II Example 2: Two unknown support points ν θ = pδ a1 + (1 p)δ a2 and unknown parameter θ = (p, a 1, a 2 ) Θ, where Θ is a compact subset of (0, 1) {(a 1, a 2 ) (0, 1) 2 : a 1 < a 2 } such that the process is transient and ballistic. Then, the assumptions are satisfied and one can estimate θ consistently. Moreover, if E θ (ρ 3 0 ) < 1, the MLE estimator is asymptotically normal and efficient.

Examples of environment distributions III Example 3: Beta distribution dν(a) = 1 B(α,β) aα 1 (1 a) β 1 da, Unknown parameter θ = (α, β) Θ where Θ is a compact subset of {(α, β) (0, + ) 2 : α > β + 1}. As E θ (ρ 0 ) = β/(α 1), the constraint α > β + 1 ensures that the process is transient and ballistic. Then, the assumptions are satisfied and one can estimate θ consistently and efficiently.

Simulations protocol Three models corresponding to the previous 3 examples, with θ as in Table 1. In each model, 1, 000 repeats of the following procedure Generate a random environment according to distribution ν θ on the set of sites { 10 4,..., 10 4 }. Run a random walk in this environment and stop it successively at the hitting times T n, with n {10 3 k; 1 k 10}. For each value of n, Estimate θ with MLE and [Adelman & Enriquez (04)] s procedure Estimate the Fisher information matrix Σ θ and compute a confidence interval for θ Simulation Fixed parameter Estimated parameter Example 1 (a 1, a 2 ) = (0.4, 0.7) p = 0.3 Example 2 - (a 1, a 2, p ) = (0.4, 0.7, 0.3) Example 3 - (α, β ) = (5, 1) Table : Parameter values for each experiment.

Boxplots of MLE (white) and [Adelman & Enriquez (04)] s estimate (grey) - Ex. 1 (ˆp) and 3 (ˆα, ˆβ) 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Boxplots of MLE - Ex. 2 (ˆp, â 1, â 2 ) 1 2 3 4 5 6 7 8 9 10 0.36 0.38 0.40 0.42 0.44 1 2 3 4 5 6 7 8 9 10 0.66 0.68 0.70 0.72 0.74 1 2 3 4 5 6 7 8 9 10 0.20 0.25 0.30 0.35 0.40 0.45

Empirical coverages of confidence regions Ex. 1 Ex. 2 Ex. 3 n 0.01 0.05 0.1 0.01 0.05 0.1 0.01 0.05 0.1 1000 0.994 0.952 0.899 0.992 0.953 0.909 0.977 0.942 0.901 2000 0.989 0.952 0.903 0.994 0.953 0.910 0.978 0.928 0.884 3000 0.988 0.942 0.901 0.990 0.938 0.886 0.981 0.940 0.889 4000 0.991 0.944 0.896 0.991 0.951 0.894 0.988 0.945 0.900 5000 0.990 0.942 0.896 0.993 0.942 0.891 0.986 0.941 0.883 6000 0.983 0.948 0.901 0.987 0.951 0.888 0.988 0.937 0.897 7000 0.986 0.950 0.900 0.992 0.951 0.900 0.986 0.942 0.898 8000 0.987 0.956 0.898 0.988 0.950 0.903 0.981 0.946 0.903 9000 0.990 0.959 0.913 0.990 0.949 0.893 0.985 0.939 0.901 10000 0.987 0.954 0.908 0.990 0.949 0.899 0.983 0.944 0.892 Table : Empirical coverages of (1 γ) asymptotic level confidence regions, for γ {0.01, 0.05, 0.1} and relying on 1000 iterations.

Conclusions Good performances of θ n on simulated data Unbiased estimator (like [Adelman & Enriquez (04)] s one) Less spread out than [Adelman & Enriquez (04)] s one (in fact efficient). Easier to compute (Ex. 2 [Adelman & Enriquez (04)] s estimate is out of reach). Confidence regions build from θ n have accurate empirical coverage. Questions?

References O. Adelman and N. Enriquez. Random walks in random environment: what a single trajectory tells. Israel J. Math., 142:205 220, 2004. A.A. Chernov. Replication of a multicomponent chain by the lightning mechanism. Biofizika, 12:297 301, 1967. F. Solomon. Random walks in a random environment. Ann. Probability, 3:1 31, 1975.