NONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION

Size: px
Start display at page:

Download "NONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION"

Transcription

1 NONLINEAR LEAST-SQUARES ESTIMATION DAVID POLLARD AND PETER RADCHENKO ABSTRACT. The paper uses empirical process techniques to study the asymptotics of the least-squares estimator for the fitting of a nonlinear regression function. By combining and extending ideas of Wu and Van de Geer, it establishes new consistency and central limit theorems that hold under only second moment assumptions on the errors. An application to a delicate example of Wu s illustrates the use of the new theorems, leading to a normal approximation to the least-squares estimator with unusual logarithmic rescalings. 1. INTRODUCTION Consider the model where we observe y i for i =1,...,nwith 1 y i = f i θ+u i where θ Θ. The unobserved f i can be random or deterministic functions. The unobserved errors u i are random with zero means and finite variances. The index set Θ might be infinite dimensional. Later in the paper it will prove convenient to also consider triangular arrays of observations. Think of fθ = f 1 θ,...,f n θ and u = u 1,...,u n as points in R n. The model specifies a surface M Θ = {fθ : θ Θ} in R n. The vector of observations Date: Original version April 1995; revised March 2002; revised 2 December Mathematics Subject Classification. Primary 62E20. Secondary: 60F05, 62G08, 62G20. Key words and phrases. Nonlinear least squares, empirical processes, subgaussian, consistency, central limit theorem. 1

2 NONLINEAR LEAST-SQUARES ESTIMATION 2 y =y 1,...,y n is a random point in R n. The least squares estimator LSE θ n is defined to minimize the distance of y to M Θ, θ n = argmin θ Θ y fθ 2, where denotes the usual Euclidean norm on R n. Many authors have considered the behavior of θ n as n when the y i are generated by the model for a fixed θ 0 in Θ. When the f i are deterministic, it is natural to express assertions about convergence of θ n in terms of the n-dimensional Euclidean distance κ n θ 1,θ 2 := fθ 1 fθ 2. For example, Jennrich 1969 took Θ to be a compact subset of R p, the errors {u i } to be iid with zero mean and finite variance, and the f i to be continuous functions in θ. He proved strong consistency of the least squares estimator under the assumption that n 1 κ n θ 1,θ 2 2 converges uniformly to a continuous function that is zero if and only if θ 1 = θ 2.Healsogave conditions for asymptotic normality. Under similar assumptions Wu 1981, Theorem 1 proved that existence of a consistent estimator for θ 0 implies that 2 κ n θ :=κ n θ, θ 0 at each θ θ 0. If Θ is finite, the divergence 2 is also a sufficient condition for the existence of a consistent estimator Wu 1981, Theorem 2. His main consistency result his Theorem 3 may be reexpressed as a general convergence assertion. Theorem 1. Suppose the {f i } are deterministic functions indexed by a subset Θ of R p. Suppose also that sup i varu i < and κ n θ at each θ θ 0.LetSbe a bounded subset of Θ\{θ 0 } and let R n := inf θ S κ n θ. Suppose there exist constants {L i } such that i sup θ S f i θ f i θ 0 L i for each i; ii f i θ 1 f i θ 2 L i θ 1 θ 2 for all θ 1,θ 2 S; iii i n L2 i = ORn α for some α<4. Then P{ θ n / S eventually } =1.

3 NONLINEAR LEAST-SQUARES ESTIMATION 3 Remark. Assumption i implies i n L2 i κ nθ 2 for each θ in S, which forces R n. If Θ is compact and if for each θ θ 0 there is a neighborhood S = S θ satisfying the conditions of the Lemma then θ n θ 0 almost surely. Wu s paper was the starting point for several authors. For example, both Lai 1994 and Skouras 2000 generalized Wu s consistency results by taking the functions f i θ = f i θ, ω as random processes indexed by θ. They took the {u i } as a martingale difference sequence, with {f i } a predictable sequence of functions with respect to a filtration {F i }. Another line of development is typified by the work of Van de Geer 1990 and Van de Geer and Wegkamp They took f i θ =fx i,θ, where F = {f θ :Θ} is a set of deterministic functions and the x i,u i are iid pairs. In fact they identified Θ with the index set F. Under a stronger assumption about the errors, they established rates of convergence of κ n θ n in terms of L 2 entropy conditions on F, using empirical process methods that were developed after Wu s work. The stronger assumption was that the errors are uniformly subgaussian. In general, we say that a random variable W has a subgaussian distribution if there exists some finite τ such that P exptw exp 1 2 τ 2 t 2 for all t R. We write τw for the smallest such τ. Van de Geer assumed that sup i τu i <. Remark. Notice that we must have PW =0when W is subgaussian because the linear term in the expansion of P exptw must vanish. When PW =0, subgaussianity is equivalent to existence of a finite constant β for which P{ W x} 2 exp x 2 /β 2 for all x 0. In our paper we try to bring together the two lines of development. Our main motivation for working on nonlinear least squares was an example presented by Wu 1981, page 507. He noted that his consistency theorem has difficulties with a simple model, 3 f i θ =λi µ for θ =λ, µ Θ, a compact subset of R R +.

4 NONLINEAR LEAST-SQUARES ESTIMATION 4 For example, condition 2 does not hold for θ 0 =0, 0 at any θ with µ>1/2. When θ 0 =λ 0, 1/2, Wu s method fails in a more subtle way, but Van de Geer 1990 s method would work if the errors satisfied the subgaussian assumption. In Section 4, under only second moment assumptions on the errors, we establish weak consistency and a central limit theorem. The main idea behind all the proofs ours, as well as those of Wu and Van de Geer is quite simple. The LSE also minimizes the random function 4 G n θ := y fθ 2 u 2 = κ n θ 2 2Z n θ where Z n θ :=u fθ u fθ 0. In particular, G n θ n G n θ 0 =0,thatis, 1 2 κ n θ n 2 Z n θ n. For every subset S of Θ, 5 P{ θ n S} P{ θ S : Z n θ 1κ 2 nθ 2 } 4Psup Z n θ 2 / inf κ nθ 4. θ S θ S The final bound calls for a maximal inequality for Z n. Our methods for controlling Z n are similar in spirit to those of Van de Geer. Under her subgaussian assumption, for every class of real functions {g θ : θ Θ}, the process 6 Xθ = i n u ig i θ has subgaussian increments. Indeed, if τu i τ for all i then 2 2 τ Xθ 1 Xθ 2 i n τu i 2 g i θ 1 g i θ 2 τ 2 gθ 1 gθ 2 2. That is, the tails of Xθ 1 Xθ 2 are controlled by the n-dimensional Euclidean distance between the vectors gθ 1 and gθ 2. This property allowed her to invoke a chaining bound similar to our Theorem 2 for the tail probabilities of sup θ S Z n θ for various annuli S = {θ : R κ n θ < 2R}. Under the weaker second moment assumption on the errors, we apply symmetrization arguments to transform to a problem involving a new process Znθ with conditionally subgaussian increments. We avoid Van de Geer s subgaussianity assumption at the cost of extra Lipschitz conditions on the f i θ, analogous to Assumption ii of Theorem 1,

5 NONLINEAR LEAST-SQUARES ESTIMATION 5 which lets us invoke chaining bounds for conditional second moments of sup θ S Znθ for various S. In Section 3 we prove a new consistency theorem Theorem 3 and a new central limit theorem Theorem 4, generalizing Wu s Theorem 5 for nonlinear LSEs. More precisely, our consistency theorem corresponds to an explicit bound for P{κ n θ n R}, but we state the result in a form that makes comparison with Theorem 1 easier. Our Theorem does not imply almost sure convergence, but our techniques could easily be adapted to that task. We regard the consistency as a preliminary to the next level of asymptotics and not as an end in itself. We describe the local asymptotic behavior with another approximation result, Theorem 4, which can easily be transformed into a central limit theorem under a variety of mild assumptions on the {u i } errors. For example, in Section 4 we apply the Theorem to the model 3, to sharpen the consistency result at θ 0 =1, 1/2 into the approximation 7 l 1/2 n λ n 1,l 3/2 n 1 2 µ n = i n u iζ i,n + o p 1 where l n := log n and ζ i,n = i 1/2 l 1/2 n l i /l n The sum on the right-hand side of 7 is of order O p 1 when sup i varu i <. Ifthe{u i } are also identically distributed, the sum has a limiting multivariate normal distribution. 2. MAXIMAL INEQUALITIES Assumption ii of Theorem 1 ensures that the increments Z n θ 1 Z n θ 2 are controlled by the ordinary Euclidean distance in Θ; we allow for control by more general metrics. Wu invoked a maximal inequality for sums of random continuous processes, a result derived from a bound on the covering numbers for M θ as a subset of R n under the usual Euclidean distance; we work with covering numbers for other metrics.

6 NONLINEAR LEAST-SQUARES ESTIMATION 6 Definition 1. Let T,d be a metric space. The covering number Nδ, S, d is defined as the size of the smallest δ-net for S, that is, the smallest N for which there are points t 1,...,t N in T with min i ds, t i δ for every s in S. Remark. The definition is the same for a pseudometric space, that is, a space where dθ 1,θ 2 =0need not imply θ 1 = θ 2. In fact, all results in our paper that refer to metric spaces also apply to pseudometric spaces. The slight increase in generality is sometimes convenient when dealing with metrics defined by L p norms on functions. Standard chaining arguments see, for example, Pollard 1989, give maximal inequalities for processes with subgaussian increments controlled by a metric on the index set. Theorem 2. Let {W t : t T } be a stochastic process, indexed by a metric space T,d, with subgaussian increments. Let T δ be a δ-net for T. Suppose: i there is a constant K such that τw s W t Kds, t for all s, t T ; ii J δ := δ 1 ρny, S, d dy <, where ρn := + log N. 0 Then there is a universal constant c 1 such that 1 P supt W t c 2 KJ δ + ρnδ, T, d max s Tδ τw s. 1 Remark. We should perhaps work with outer expectations because, in general, there is no guarantee that a supremum of uncountably many random variables is measurable. For concrete examples, such as the one discussed in Section 4, measurability can usually be established by routine separability arguments. Accordingly, we will ignore the issue in this paper. Under the assumption that varu i σ 2,theX process from 6 need not have subgaussian increments. However, it can be bounded in a stochastic sense by a symmetrized process X θ := i n ɛ iu i g i θ, where the 2n random variables ɛ 1,...,ɛ n,u 1,...,u n are mutually independent with P{ɛ i =+1} =1/2 =P{ɛ i = 1}. In fact, for each subset S of the index set Θ, 8 P sup θ S Xθ 2 4P sup θ S X θ 2.

7 NONLINEAR LEAST-SQUARES ESTIMATION 7 For a proof see, for example, van der Vaart and Wellner 1996, Lemma The process X has conditionally subgaussian increments with τ u Xθ 1 Xθ 2 i n u2 i g i θ 1 g i θ 2. The subscript u indicates the conditioning on u. Corollary 1. Let S δ be a δ-net for S and let X be as in 6. Suppose i Pu i =0and varu i σ 2 for i =1,...,n ii there is a metric d for which J δ := δ ρny, S, d dy < 0 iii there are constants L 1,...,L n for which g i θ 1 g i θ 2 L i dθ 1,θ 2 for all i and all θ 1,θ 2 S iv there are constants b 1,...,b n for which g i θ b i for all i and all θ in S. Then there is a universal constant c 2 such that where L := i L2 i and B := i b2 i. P sup X θ 2 c 2 2σ 2 LJ δ + BρNδ, S, d 2 θ S Proof. From 9, τ u X θ 1 X θ 2 L u dθ 1,θ 2 where L u := i n L2 i u2 i and τ u X θ B u := i n b2 i u2 i Apply Theorem 2 conditionally to the process X to bound P u sup θ S X θ 2. Then invoke inequality 8, using the fact that PL 2 u σ 2 L 2 and PB 2 u σ 2 B 2.

8 NONLINEAR LEAST-SQUARES ESTIMATION 8 3. LIMIT THEOREMS Inequality 5 and Corollary 1, with g i θ =f i θ f i θ 0, give us some probabilistic control over θ n. Theorem 3. Let S be a subset of Θ equipped with a pseudometric d. Let {L i : i = 1,...,n}, {b i : i =1,...,n}, and δ be positive constants such that i f i θ 1 f i θ 2 L i dθ 1,θ 2 for all θ 1,θ 2 S ii f i θ f i θ 0 b i for all θ S iii J δ := δ Ny, ρ S, d dy < 0 Then 2/R P{ θ n S} 4c 2 2σ Bρ 2 Nδ, S, d + LJ 4 δ, where R := inf{κθ :θ S}, and L 2 = i L2 i, and B 2 := i b2 i. The Theorem becomes more versatile in its application if we partition S into a countable union of subsets S k, each equipped with its own pseudometric and Lipschitz constants. We then have P{ θ n k S k } smaller than a sum over k of bounds analogous to those in the Theorem. As shown in Section 4, this method works well for the Wu example if we take S k = {θ : R k κ n θ <R k+1 }, for an {R k } sequence increasing geometrically. A similar appeal to Corollary 1, with the g i θ as partial derivatives of f i θ functions, gives us enough local control over Z n to go beyond consistency. To accommodate the application in Section 4, we change notation slightly by working with a triangular array: for each n, y in = f in θ 0 +u in, for i =1, 2,...,n, where the {u in : i =1,...,n} are unobserved independent random variables with mean zero and variance bounded by σ 2. Theorem 4. Suppose θ n θ 0 in probability, with θ 0 an interior point of Θ, a subset of R p. Suppose also:

9 NONLINEAR LEAST-SQUARES ESTIMATION 9 i Each f in is continuously differentiable in a neighborhood N of θ 0 with derivatives D in θ = f in θ / θ. ii γ 2 n := i n D inθ 0 2 as n. iii There are constants {M in } with i n M 2 in = Oγ 2 n andametricd on N for which D in θ 1 D in θ 2 M in dθ 1,θ 2 for θ 1,θ 2 N. iv The smallest eigenvalue of the matrix V n away from zero for n large enough. v 1 Ny, ρ N,d dy < 0 vi dθ, θ 0 0 as θ θ 0. Then κ n θ n =O p 1 and where ξ i,n = γn 1 Vn 1 D in θ 0. = γ 2 n γ n θ n θ 0 = i n ξ i,nu in + o p 1 = O p 1. i n D inθ 0 D in θ 0 is bounded Proof. Let D be the p n matrix with ith column D in θ 0, so that γ 2 n = tracedd and V n = γn 2 DD. The main idea of the proof is to replace fθ by fθ 0 +D θ θ 0, thereby approximating θ n by the least-squares solution θ n := θ 0 +DD 1 Du = argmin θ R p y fθ 0 D θ θ 0. To simplify notation, assume with no loss of generality, that fθ 0 =0and θ 0 =0.Also, drop extra n subscripts when the meaning is clear. The assertion of the Theorem is that θ n = θ n + o p γn 1. Without loss of generality, suppose the smallest eigenvalue of V n is larger than a fixed constant c 2 0 > 0. Then from which it follows that γ 2 n = tracedd sup t 1 D t 2 inf t 1 D t 2 = c 2 0γ 2 n, 10 c 0 t D t /γ n t for all t R p.

10 NONLINEAR LEAST-SQUARES ESTIMATION 10 Similarly, P Du 2 = trace DPuu D σ 2 γn, 2 implying that Du = O p γ n and θ n = γ 2 n Vn 1 Du = O p γn 1. In particular, P{θ n N} 1, because θ 0 is an interior point of Θ. Note also that P i n ξ iu i 2 σ 2 trace i n ξ iξ i=σ 2 tracev 1 n <. Consequently i n ξ iu i = O p 1. From the assumed consistency, we know that there is a sequence of balls N n N that shrink to {0} for which P{ θ n N n } 1. From vi and v, it follows that both r n := sup{dθ, 0 : θ N n } and J rn = r n ρ Ny, N,d dy converge to zero as 0 n. 11 The n 1 remainder vector Rθ :=fθ D θ has ith component 1 R i θ =f i θ D i 0 θ = θ D i tθ D i 0 dt. Uniformly in the neighborhood N n we have 1/2 Rθ θ M in 2 rn = o θ γ n, i n which, together with the upper bound from inequality 10, implies 0 12 fθ 2 = D θ 2 + oγ 2 n θ 2 =Oγ 2 n θ 2 as θ 0. In the neighborhood N n, via 11 we also have, u Rθ θ sup s Nn i u i D i s D i 0. From Corollary 1 with g i θ =D i θ D i 0 deduce that P sup s Nn u i D i s D i 0 2 c 2 2σ 2 Jr 2 i n i M in 2 = oγn, 2 which implies 13 u Rθ = o p γ n θ uniformly for θ N n.

11 NONLINEAR LEAST-SQUARES ESTIMATION 11 Approximations 12 and 13 give us uniform approximations for the criterion functions in the shrinking neighborhoods N n : G n θ = u fθ 2 u 2 14 = 2u fθ+ fθ 2 = 2u D θ + D θ 2 + o p γ n θ +o p γ 2 n θ 2 = u D θ n 2 u 2 + D θ θ n 2 + o p γ n θ +o p γn θ 2 2. The uniform smallness of the remainder terms lets us approximate G n at random points that are known to lie in N n. The rest of the argument is similar to that of Chernoff When θ n N n we have G n θ n G n 0, implying D θ n θ n 2 + o p γ n θ n +o p γ 2 n θ n 2 D θ n 2. Invoke 10 again, simplifying the last approximation to c 2 0 γ n θn γ n θ n 2 O p 1 + o p γ n θn + γ n θn 2. It follows that θ n = O p γ 1 n and, via 12, κ n θ n = f θ n = O p 1. We may also assume that N n shrinks slowly enough to ensure that P{θ n N n } 1. When both θ n and θ n lie in N n the inequality G n θ n G n θ n and approximation 14 give It follows that θ n = θ n + o p γ 1 n. D θ n θ n 2 + o p 1 o p 1. Remark. If the errors are iid and max ξ i,n = o1 then the distribution of i n ξ i,nu in is asymptotically N 0,σ 2 Vn 1.

12 NONLINEAR LEAST-SQUARES ESTIMATION ANALYSISOFMODEL3: WU S EXAMPLE The results in this section illustrate the work of our limit theorems in a particular case where Wu s method fails. We prove both consistency and a central limit theorem for the model 3 with θ 0 =λ 0, 1/2. In fact, without loss of generality, λ 0 =1. As before, let l n = log n. Remember θ =λ, µ with a R and 0 µ C µ for a finite constant C µ greater than 1/2, which ensures that θ 0 =1, 1/2 is an interior point of the parameter space. Taking C µ =1/2 would complicate the central limit theorem only slightly. The behavior of θ n is determined by the behavior of the function or its standardized version G n γ := i n i 1+γ for γ 1, g n β :=G n β/l n /G n 0 = i n i 1 /G n 0 exp βl i /l n, which is the moment generating function of the probability distribution that puts mass i 1 /G n 0 at l i /l n, for i =1,...,n. For large n, the function g n is well approximated by the increasing, nonnegative function e β 1/β for β 0 gβ =, 1 for β =0 the moment generating function of the uniform distribution on 0, 1. More precisely, comparison of the sum with the integral n 1 x 1+γ dx gives 15 G n γ = l n gγl n +r n γ with 0 r n γ 1 for γ 1. The distributions corresponding to both g n and g are concentrated on [0, 1]. Both functions have the properties described in the following lemma. Lemma 1. Suppose hγ =P expγx, the moment generating function of a probability distribution concentrated on [0, 1]. Then i log h is convex

13 NONLINEAR LEAST-SQUARES ESTIMATION 13 ii hγ 2 /h2γ is unimodal: increasing for γ<0, decreasing for γ>0, achieving its maximum value of 1 at γ =0 iii h γ hγ Proof. Assertion i is just the well known fact that the logarithm of a moment generating function is convex. Thus h /h, the derivative of log h, is an increasing function, which implies ii because d dγ log hγ 2 h2γ =2 h γ hγ 2γ 2h h2γ. Property iii comes from the representation h γ =P xe γx. Remark. Direct calculation shows that gγ 2 /g2γ is a symmetric function. Reparametrize by putting β =1 2µl n, with 1 2C µ l n β l n,andα = λ G n β/l n. Notice that fθ = α and that θ 0 corresponds to α 0 = G n 0 l n and β 0 =0.Also f i θ =αν i β/l n where ν i γ :=i 1/2 expγl i /2/ G n γ, and 16 κ n θ 2 = G n 0 λ 2 g n β 2λg n β/ We define ν i := sup γ 1 ν i γ. Lemma 2. For all α, β corresponding to θ =λ, µ R [0,C µ ]: i κ n θ G n 0 α κ n θ+ G n 0 ii i n ν2 i = O log log n iii dν i β/l n /dβ 1ν 2 iβ/l n iv f i α 1,β 1 f i α 2,β 2 α 1 α α 2 2 β 1 β 2 v f i θ f i θ 0 i 1/2 + α ν i ν i Proof. Inequalities i and v follow from the triangle inequality.

14 NONLINEAR LEAST-SQUARES ESTIMATION 14 For inequality ii, first note that ν Fori 2, separate out contributions from three ranges: νi 2 = max sup ν i γ 2, sup ν i γ 2, sup ν i γ 2. 1 γ 1/l n γ <1/l n γ 1/l n For γ 1/l n, invoke 15 to get a tractable upper bound: ν i γ 2 i 1 expγl i l n gγl n expγl i exp log γ + γ logi/n i 1 γ i 1. expγl n 1 1 e 1 The last expression achieves its maximum over [1/l n, 1] at 1/ logn/i if 1 i n/e γ 0 :=, 1 if n/e i n which gives 17 sup 1 γ 1/l n ν i γ 2 e 1 1 H n i n/e n where Hx :=1/ x log1/x. Similarly, if 1 <γl n < 1, ν i γ 2 expγl i il n gγl n expl i/l n il n g 1 e/g 1. il n The last term is smaller than a constant multiple of the bound from 17. Finally, if γ = δ 1/l n and i 2 then ν i γ 2 i 1 exp δl i δ 1 exp δl n In summary, for some universal constant C, i n/e νi 2 C max n 1 H n Bounding sums by integrals we thus have C 1 n νi 2 i=2 1/e 1/n Hx dx + H1/e/n + exp log δ δl i i 1 e 1 /1 e 1. 1 e 1 il i, 1 i log i n 2 if 2 i n. 1 x log x dx = O log log n.

15 NONLINEAR LEAST-SQUARES ESTIMATION 15 For iii note that 2 d dβ ν iβ/l n =2 d 1/2 dβ exp 1 βl li 2 i/l n G n 0g n β = g nβ ν i β, l n g n β which is bounded in absolute value by ν i β because 0 g nβ g n β. For iv f i α 1,β 1 f i α 2,β 2 α 1 α 2 ν i β 1 /l n + α 2 ν i β 1 /l n ν i β 2 /l n α 1 α 2 ν i + α 2 β 1 β ν i, the bound for the second term coming from the mean-value theorem and iii. Lemma 3. For ɛ>0, let N ɛ = {θ : max λ 1, β ɛ. Ifɛ is small enough, there exists a constant C ɛ > 0 such that inf{κ n θ :θ/ N ɛ } C ɛ ln when n is large enough. Proof. Suppose β ɛ. Remember that G n 0 l n. Minimize over λ the lower bound 16 for κ n θ 2 by choosing λ = g n β/2/g n β, then invoke Lemma 1ii. κ n θ 2 1 g nβ/2 2 gn ɛ/2 2 1 max, g n ɛ/2 2 1 gɛ/22 > 0. l n g n β g n ɛ g n ɛ gɛ If β ɛ and ɛ is small enough to make 1 ɛe ɛ/2 < 1 < 1 + ɛe ɛ/2,use κ n θ 2 = i n i 1 λ expβl i /2l n 1 2. If λ 1+ɛ bound each summand from below by i ɛe ɛ/2 1 2.Ifλ 1 ɛ bound each summand from below by i ɛe ɛ/ Consistency. On the annulus S R := {R κ n θ < 2R} we have Note that i n a K R := 2R + G n 0 f i θ 1 f i θ 2 K R ν i d R θ 1,θ 2 f i θ f i θ 0 b i := i 1/2 + K R ν i. where d R θ 1,θ 2 := α 1 α 2 /K R β 1 β 2 i 1/2 + K R ν i 2 = Oln + K 2 R log l n =OK 2 RL n where L n := log log n.

16 NONLINEAR LEAST-SQUARES ESTIMATION 16 The rectangle { α K R, β cl n } can be partitioned into Oy 1 l n /y subrectangles of d R -diameter at most y. Thus Ny, S R,d R C 0 l n /y 2 for a constant C 0 that depends only on C µ, which gives 1 0 ρ Ny, S R,d R dy = O Ln. Apply Theorem 3 with δ =1to conclude that P{ θ n S R } C 1 K 2 RL 2 n/r 4 C 2 R 2 + l n L 2 n/r 4. Put R = C 3 2 k l n L 2 n 1/4 then sum over k to deduce that P{κ n θ n C 3 l n L 2 n 1/4 } ɛ eventually if the constant C 3 is large enough. That is κ n θ n =O p l n L 2 n 1/4 and, via Lemma 3, λ n 1 = o p 1 and 2l n µ n µ 0 = β = o p Central limit theorem. This time work with the λ, β reparametrization, with D i λ, β = fi λ,β λ f i λ, β, f iλ,β β = λi 1/2+β/2ln = 1/λ, l i /2l n f i λ, β and θ 0 =λ 0,β 0 =1, 0. Take d as the usual two-dimensional Euclidean distance in the λ, β space. For simplicity of notation, we omit some n subscripts, even though the relationship between θ and λ, β changes with n. We have just shown that the LSE λ n, β n is consistent. Comparison of sums with analogous integrals gives the approximations 18 i n i 1 l p 1 i = l p n/p + r p with r p 1 for p =0, 1, 2,... In consequence, γ 2 n = i n D iλ 0,β 0 2 = i n i 1 1+l 2 i /4l 2 n = l n + O1

17 References 17 and V n = γ 2 n i n i 1 1 l i/2l n = V + O1/l n where V = 1 l i /2l n l 2 i /4l 2 n The smaller eigenvalue of V n converges to the smaller eigenvalue of the positive definite matrix V, which is strictly positive. Within the neighborhood N ɛ := {max λ 1, β f i λ, β and D i λ, β are bounded by a multiple of i 1/2. Thus ɛ}, for a fixed ɛ 1/2, both D i θ 1 D i θ 1 λ 1 1 λ 1 2 f i θ 1 +3 f i θ 1 f i θ 2 C ɛ i 1/2 dθ 1,θ 2. That is, we may take M i as a multiple of i 1/2, which gives i n M 2 i = Ol n. All the conditions of Theorem 4 are satisfied. We have ln λ n 1, β n = i n u ii 1/2 ln 1/2 1,l i /2l n V 1 + o p 1. REFERENCES Chernoff, H On the distribution of the likelihood ratio. Annals of Mathematical Statistics 25, Jennrich, R. I Asymptotic properties of non-linear least squares estimators. Annals of Mathematical Statistics 40, Lai, T. L Asymptotic properties of nonlinear least-squares estimates in stochastic regression models. Annals of Statistics 22, Pollard, D Asymptotics via empirical processes with discussion. Statistical Science 4, Skouras, K Strong consistency in nonlinear stochastic regression models. Annals of Statistics 28, Van de Geer, S Estimating a regression function. Annals of Statistics 18,

18 References 18 Van de Geer, S. and M. Wegkamp Consistency for the least squares estimator in nonparametric regression. Annals of Statistics 24, van der Vaart, A. W. and J. A. Wellner Weak Convergence and Empirical Process: With Applications to Statistics. Springer-Verlag. Wu, C.-F Asymptotic theory of nonlinear least squares estimation. Annals of Statistics 9, STATISTICS DEPARTMENT, YALE UNIVERSITY, BOX YALE STATION, NEW HAVEN, CT address: URL: pollard/; pvr4

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

5.1 Approximation of convex functions

5.1 Approximation of convex functions Chapter 5 Convexity Very rough draft 5.1 Approximation of convex functions Convexity::approximation bound Expand Convexity simplifies arguments from Chapter 3. Reasons: local minima are global minima and

More information

Much of this Chapter is unedited or incomplete.

Much of this Chapter is unedited or incomplete. Chapter 9 Chaining methods Much of this Chapter is unedited or incomplete. 1. The chaining strategy The previous Chapter gave several ways to bound the tails of max i N X i. The chaining method applies

More information

Classical regularity conditions

Classical regularity conditions Chapter 3 Classical regularity conditions Preliminary draft. Please do not distribute. The results from classical asymptotic theory typically require assumptions of pointwise differentiability of a criterion

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS

CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS CONDITIONED POISSON DISTRIBUTIONS AND THE CONCENTRATION OF CHROMATIC NUMBERS JOHN HARTIGAN, DAVID POLLARD AND SEKHAR TATIKONDA YALE UNIVERSITY ABSTRACT. The paper provides a simpler method for proving

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Local Asymptotic Normality

Local Asymptotic Normality Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments

More information

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative This chapter is another review of standard material in complex analysis. See for instance

More information

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative This chapter is another review of standard material in complex analysis. See for instance

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

MAT 257, Handout 13: December 5-7, 2011.

MAT 257, Handout 13: December 5-7, 2011. MAT 257, Handout 13: December 5-7, 2011. The Change of Variables Theorem. In these notes, I try to make more explicit some parts of Spivak s proof of the Change of Variable Theorem, and to supply most

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

The concentration of the chromatic number of random graphs

The concentration of the chromatic number of random graphs The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

A uniform central limit theorem for neural network based autoregressive processes with applications to change-point analysis

A uniform central limit theorem for neural network based autoregressive processes with applications to change-point analysis A uniform central limit theorem for neural network based autoregressive processes with applications to change-point analysis Claudia Kirch Joseph Tadjuidje Kamgaing March 6, 20 Abstract We consider an

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Existence and Uniqueness

Existence and Uniqueness Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities Maxim Raginsky and Igal Sason ISIT 2013, Istanbul, Turkey Capacity-Achieving Channel Codes The set-up DMC

More information

A GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION. Miljenko Huzak University of Zagreb,Croatia

A GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION. Miljenko Huzak University of Zagreb,Croatia GLASNIK MATEMATIČKI Vol. 36(56)(2001), 139 153 A GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION Miljenko Huzak University of Zagreb,Croatia Abstract. In this paper a version of the general

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1. Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC R. G. DOLGOARSHINNYKH Abstract. We establish law of large numbers for SIRS stochastic epidemic processes: as the population size increases the paths of SIRS epidemic

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

An elementary proof of the weak convergence of empirical processes

An elementary proof of the weak convergence of empirical processes An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Convergence of generalized entropy minimizers in sequences of convex problems

Convergence of generalized entropy minimizers in sequences of convex problems Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences

More information

9 Brownian Motion: Construction

9 Brownian Motion: Construction 9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of

More information

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS* LARGE EVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILE EPENENT RANOM VECTORS* Adam Jakubowski Alexander V. Nagaev Alexander Zaigraev Nicholas Copernicus University Faculty of Mathematics and Computer Science

More information

Observer design for a general class of triangular systems

Observer design for a general class of triangular systems 1st International Symposium on Mathematical Theory of Networks and Systems July 7-11, 014. Observer design for a general class of triangular systems Dimitris Boskos 1 John Tsinias Abstract The paper deals

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Mid Term-1 : Practice problems

Mid Term-1 : Practice problems Mid Term-1 : Practice problems These problems are meant only to provide practice; they do not necessarily reflect the difficulty level of the problems in the exam. The actual exam problems are likely to

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Institut für Mathematik

Institut für Mathematik U n i v e r s i t ä t A u g s b u r g Institut für Mathematik Martin Rasmussen, Peter Giesl A Note on Almost Periodic Variational Equations Preprint Nr. 13/28 14. März 28 Institut für Mathematik, Universitätsstraße,

More information

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence

More information

Spectra of adjacency matrices of random geometric graphs

Spectra of adjacency matrices of random geometric graphs Spectra of adjacency matrices of random geometric graphs Paul Blackwell, Mark Edmondson-Jones and Jonathan Jordan University of Sheffield 22nd December 2006 Abstract We investigate the spectral properties

More information

APPROXIMATING CONTINUOUS FUNCTIONS: WEIERSTRASS, BERNSTEIN, AND RUNGE

APPROXIMATING CONTINUOUS FUNCTIONS: WEIERSTRASS, BERNSTEIN, AND RUNGE APPROXIMATING CONTINUOUS FUNCTIONS: WEIERSTRASS, BERNSTEIN, AND RUNGE WILLIE WAI-YEUNG WONG. Introduction This set of notes is meant to describe some aspects of polynomial approximations to continuous

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 )

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 ) Electronic Journal of Differential Equations, Vol. 2006(2006), No. 5, pp. 6. ISSN: 072-669. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) A COUNTEREXAMPLE

More information

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 MIT OpenCourseWare http://ocw.mit.edu 18.409 Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower

More information

From Boltzmann Equations to Gas Dynamics: From DiPerna-Lions to Leray

From Boltzmann Equations to Gas Dynamics: From DiPerna-Lions to Leray From Boltzmann Equations to Gas Dynamics: From DiPerna-Lions to Leray C. David Levermore Department of Mathematics and Institute for Physical Science and Technology University of Maryland, College Park

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

ẋ = f(x, y), ẏ = g(x, y), (x, y) D, can only have periodic solutions if (f,g) changes sign in D or if (f,g)=0in D.

ẋ = f(x, y), ẏ = g(x, y), (x, y) D, can only have periodic solutions if (f,g) changes sign in D or if (f,g)=0in D. 4 Periodic Solutions We have shown that in the case of an autonomous equation the periodic solutions correspond with closed orbits in phase-space. Autonomous two-dimensional systems with phase-space R

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

A REPRESENTATION FOR THE KANTOROVICH RUBINSTEIN DISTANCE DEFINED BY THE CAMERON MARTIN NORM OF A GAUSSIAN MEASURE ON A BANACH SPACE

A REPRESENTATION FOR THE KANTOROVICH RUBINSTEIN DISTANCE DEFINED BY THE CAMERON MARTIN NORM OF A GAUSSIAN MEASURE ON A BANACH SPACE Theory of Stochastic Processes Vol. 21 (37), no. 2, 2016, pp. 84 90 G. V. RIABOV A REPRESENTATION FOR THE KANTOROVICH RUBINSTEIN DISTANCE DEFINED BY THE CAMERON MARTIN NORM OF A GAUSSIAN MEASURE ON A BANACH

More information

BERNOULLI ACTIONS AND INFINITE ENTROPY

BERNOULLI ACTIONS AND INFINITE ENTROPY BERNOULLI ACTIONS AND INFINITE ENTROPY DAVID KERR AND HANFENG LI Abstract. We show that, for countable sofic groups, a Bernoulli action with infinite entropy base has infinite entropy with respect to every

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

Notes on Gaussian processes and majorizing measures

Notes on Gaussian processes and majorizing measures Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS

NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS NOWHERE LOCALLY UNIFORMLY CONTINUOUS FUNCTIONS K. Jarosz Southern Illinois University at Edwardsville, IL 606, and Bowling Green State University, OH 43403 kjarosz@siue.edu September, 995 Abstract. Suppose

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS The Annals of Probability 2001, Vol. 29, No. 1, 411 417 MAJORIZING MEASURES WITHOUT MEASURES By Michel Talagrand URA 754 AU CNRS We give a reformulation of majorizing measures that does not involve measures,

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Econometrica Supplementary Material

Econometrica Supplementary Material Econometrica Supplementary Material SUPPLEMENT TO USING INSTRUMENTAL VARIABLES FOR INFERENCE ABOUT POLICY RELEVANT TREATMENT PARAMETERS Econometrica, Vol. 86, No. 5, September 2018, 1589 1619 MAGNE MOGSTAD

More information

A CRITERION FOR THE EXISTENCE OF TRANSVERSALS OF SET SYSTEMS

A CRITERION FOR THE EXISTENCE OF TRANSVERSALS OF SET SYSTEMS A CRITERION FOR THE EXISTENCE OF TRANSVERSALS OF SET SYSTEMS JERZY WOJCIECHOWSKI Abstract. Nash-Williams [6] formulated a condition that is necessary and sufficient for a countable family A = (A i ) i

More information

f(x)dx = lim f n (x)dx.

f(x)dx = lim f n (x)dx. Chapter 3 Lebesgue Integration 3.1 Introduction The concept of integration as a technique that both acts as an inverse to the operation of differentiation and also computes areas under curves goes back

More information

On the Existence of Almost Periodic Solutions of a Nonlinear Volterra Difference Equation

On the Existence of Almost Periodic Solutions of a Nonlinear Volterra Difference Equation International Journal of Difference Equations (IJDE). ISSN 0973-6069 Volume 2 Number 2 (2007), pp. 187 196 Research India Publications http://www.ripublication.com/ijde.htm On the Existence of Almost Periodic

More information

Theoretical Statistics. Lecture 14.

Theoretical Statistics. Lecture 14. Theoretical Statistics. Lecture 14. Peter Bartlett Metric entropy. 1. Chaining: Dudley s entropy integral 1 Recall: Sub-Gaussian processes Definition: A stochastic process θ X θ with indexing set T is

More information