ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE

Size: px
Start display at page:

Download "ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE"

Transcription

1 ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE By Ryan Martin and Surya T. Tokdar Indiana University-Purdue University Indianapolis and Duke University Here we explore general asymptotic properties of Predictive Recursion (PR) for nonparametric estimation of mixing distributions. Under fairly mild conditions, we prove that, when the mixture model is mis-specified, the estimated mixture converges almost surely in total variation to the mixture that minimizes the Kullback-Leibler divergence, and a bound on the (Hellinger contrast) rate of convergence is obtained. Moreover, when the model is identifiable, almost sure weak convergence of the mixing distribution estimate follows. PR assumes that the support of the mixing distribution is known. To remove this requirement, we propose a generalization that incorporates a sequence of supports, increasing with the sample size, that combines the efficiency of PR with the flexibility of mixture sieves. Under mild conditions, we obtain a bound on the rate of convergence of these new estimates. 1. Introduction. Despite a well-developed theory and numerous applications of mixture models, estimation of a mixing distribution remains a challenging statistical problem. However, some recent progress has been made through a computationally efficient nonparametric estimate due to Newton [12]; see also Newton, et al. [13] and Newton and Zhang [14]. A mixture model views data (X 1,,X n ) X n as independent observations from a density m(x) of the form (1.1) m f (x) = Θ p(x θ)f(θ) dµ(θ), f F where F = F(Θ,µ) is the class of densities dominated by a σ-finite measure µ on a parameter space Θ, and P = {p( θ) :θ Θ} is a parametric family of densities on X, dominated by a σ-finite measure ν. More succinctly, a mixture model assumes m M Θ := {m f : f F}, the convex hull of P. To estimate f from the data X 1,..., X n, Newton [12] proposed the following n-step recursive algorithm, called Predictive Recursion (PR): AMS 2000 subject classifications: Primary 62G20; secondary 62G05, 62G07, 62G35. Keywords and phrases: density estimation, Hellinger contrast, Kullback-Leibler projection, mixture models, recursive estimation. 1

2 2 R. MARTIN AND S. T. TOKDAR Algorithm PR. Choose an initial density f 0 on Θ and a sequence of weights {w i : i 1} (0, 1). For i =1,..., n, compute (1.2) f i (θ) = (1 w i )f i 1 (θ)+w i p(x i θ)f i 1 (θ) Θ p(x i θ )f i 1 (θ ) dµ(θ ), θ Θ and produce f n as the final estimate of f. Until recently, very little was known about the large-sample behavior of the estimates f n and m n. Ghosh and Tokdar [5] used a novel martingale argument to prove, when m = m f M Θ for finite Θ, that f n f almost surely. Martin and Ghosh [11] proved a slightly stronger consistency theorem via a stochastic approximation representation of f n. Most recently, Tokdar, Martin, and Ghosh [18] (henceforth, TMG) handled the case of a more general parameter space Θ by extending the martingale argument to the X -space, proving that the mixture density estimate (1.3) m n (x) := m fn (x) = p(x θ)f n (θ) dµ(θ), x X, converges almost surely to m f (x) in the L 1 topology. From this, consistency of f n in the weak topology on Θ is obtained. In this paper, we further extend the martingale-based arguments of TMG, establishing two important new results. First, in the more general context, where the mixture model M Θ need not contain the true density m, we show that the estimated mixture m n in (1.3) is asymptotically robust in the sense that it converges almost surely in the total variation (or Hellinger) topology to the mixture m f M Θ that minimizes the Kullback-Leibler (KL) divergence, K(m, m ϕ ) = log(m/m ϕ )m dν. When the mixing density is identifiable, we also obtain weak convergence of f n. Our second result is a bound on the rate of convergence for m n. For a unified treatment of the well- and mis-specified cases, we consider the Hellinger contrast (1.4) ρ(m n,m f )= 1 2 ( mn m f 1 ) 2 m dν where m f is the mixture that minimizes K(m, m ϕ ). The Hellinger contrast has been previously used to study asymptotics of maximum likelihood and Bayes estimates under model mis-specification; see, for example, Patilea [15], and Kleijn and van der Vaart [6]. In Section 3, we show that if Θ is compact, then ρ(m n,m f ) 0 almost surely at a rate faster than a 1/2 n, where a n = n i=1 w i. This establishes a direct connection between the choice of w i 1/2,

3 ASYMPTOTIC PROPERTIES OF PR 3 and the performance of the resulting PR estimate. Moreover, this bound is derived without using any structural knowledge about the mixands p(x θ) and applies to a wide range of such densities, including Normal, Gamma, and Poisson. The conditions on w i required by this result are satisfied by w i i γ for γ (2/3, 1], leading to a n n 1 γ. Consequently, the Hellinger contrast convergence rate of m n is faster than n (1 γ)/2. How this relates to the rate of convergence for f n remains an open question. Our nearly n 1/6 bound on the convergence rate for m n matches with the results in Genovese and Wasserman [3] derived for the special case of finite Gaussian mixtures. But it falls short of the nearly parametric rates obtained in Li and Barron [8] and Ghosal and van der Vaart [4]. However, Li and Barron s bound involves constants which can be infinite, and Ghosal and van der Vaart s proof seems to rely heavily on the smoothness of the Gaussian mixands. It is quite possible that the PR estimates converge faster than what our bounds suggest, but at this moment we are unsure how sharp our bounds are. A shortcoming of PR is that one needs to specify the compact set Θ a priori. To remove this requirement, we propose, in Section 4, a generalized PR (GPR) with an increasing sequence of supports Θ i Θ i+1. We obtain a bound on the rate of convergence for GPR in the special case where m = m f M Θ and f has an unknown compact support. 2. KL projection of m onto M Θ. It is quite natural to use the KL divergence to study the large-sample properties of PR. Indeed, Martin and Ghosh [11] remark that, for Θ finite, l(ϕ) =K(m, m ϕ ) is a Lyapunov function for the differential equation that characterizes the asymptotics of PR: roughly, the KL divergence controls the dynamics of PR, driving the estimates toward a stable equilibrium. But when m / M Θ, this equilibrium cannot be m. In such cases, we consider the mixture m f that is closest to m in a KL sense; that is, (2.1) K(m, m f )=K(m, M Θ ) := inf{k(m, m ϕ ):ϕ F} We call m f the KL projection of m onto M Θ. Leroux [7], Shyamalkumar [17], Patilea [15], and Kleijn and van der Vaart [6] each use similar ideas. Existence of the KL projection is an important issue, and various results are available; see, for example, Liese and Vadja [9, Chap. 8]. Here we prove a simple result which gives sufficient conditions for the existence of a KL projection in our special case of mixtures. Assume the following: A1. F is compact with respect to the weak topology. A2. θ p(x θ) is bounded and continuous for ν-almost all x.

4 4 R. MARTIN AND S. T. TOKDAR Lemma 2.1. Under A1 A2, there exists a mixing density f F such that K(m, m f )=K(m, M Θ ). Proof. Choose any ϕ F and any {ϕ s } F such that ϕ s ϕ weakly. Then A2 and Scheffé s theorem imply m ϕs m ϕ in the L 1 (ν) and, hence, the weak topology. Further, κ(ϕ) := K(m, m ϕ ) lim inf s K(m, m ϕ s ) = lim inf s κ(ϕ s), where the inequality follows from weak lower semi-continuity of K(m, ); see Liese and Vadja [9], Theorem Consequently, κ( ) is weakly lower semi-continuous and, therefore, must attain its infimum on the compact F. A similar proof may be given based on Lemma 4 of Brown, et al. [2]. Convexity of M Θ and of K(m, ) together imply that the KL projection m f in Lemma 2.1 is unique. However, in general, there could be many mixing densities f F that generate the KL projection m f. Identifiability is needed to guarantee uniqueness of the mixing density. 3. Robustness and rate of convergence. We begin with our assumptions and some preliminary lemmas proofs can be found in the Appendix. Let {w i : i 1} be the user-specified weight sequence in the PR algorithm, and define the sequence of partial sums a n = n i=1 w i (a 0 0). In addition to A1 A2 in Section 2, assume the following: A3. w n > 0, w n 0, n w n =, n w2 n <, and n a n 1wn 2 <. A4. There exists B< such that [ ] p(x θ1 ) 2 sup m(x) dν(x) < B. θ 1,θ 2 Θ p(x θ 2 ) Condition A3 is satisfied if w n n γ, for γ (2/3, 1]. The squareintegrability condition A4 is the strongest assumption, but it does hold for many common mixands, such as Normal or Poisson, provided that Θ is compact and m admits a moment-generating function on Θ. If one is willing to assume that m M Θ, then A4 can be replaced by a less restrictive condition, depending only on p(x θ); cf. assumption A5 in TMG. Our new results are partially based on calculations in TMG, and we begin by recording a few of these for future reference. First, let R(x) be the remainder term of a first-order Taylor approximation of log(1+x) at x = 0; that is, log(1 + x) =x x 2 R(x), x> 1, where R(x) satisfies (3.1) 0 R(x) max{1, (1 + x) 2 }, x > 1.

5 ASYMPTOTIC PROPERTIES OF PR 5 Next, write the PR estimate m n M Θ in (1.3) as where m n (x) = (1 w n )m n 1 (x)+w n h n,xn (x), h n,y (x) = Θ p(x θ)p(y θ)f n 1 (θ) m n 1 (y) For notational convenience, also define the function dµ(θ), x, y X. H n,y (x) = h n,y(x) 1, x, y X. m n 1 (x) Then the KL divergence K n = K(m, m n ) satisfies K n K n 1 = m log(m n 1 /m n ) dν = m log(1 + w n H n,xn ) dν = w n mh n,xn dν + wn 2 mhn,x 2 n R(w n H n,xn ) dν where R(x) satisfies (3.1). Let A n 1 be the σ-algebra generated by the data sequence X 1,..., X n 1. Since K n 1 is A n 1 -measurable, upon taking conditional expectation with respect to A n 1 we get (3.2) E(K n A n 1 ) K n 1 = w n T (f n 1 )+w 2 n E(Z n A n 1 ) where T ( ) and Z n are defined as (3.3) (3.4) T (ϕ) = Z n = Θ { X } 2 m(x) p(x θ) dν(x) ϕ(θ) dµ(θ) 1, ϕ F m ϕ (x) mh 2 n,x n R(w n H n,xn ) dν Note that T (f n 1 ) is exactly M n defined in TMG. The following properties of T ( ) will be critical in the proof of our main result. Lemma 3.1. (a) T (ϕ) 0 with equality iff K(m, m ϕ )=K(m, M Θ ). (b) Under A1, A2 and A4, T is continuous in the weak topology on F. For some further insight into the relationship between T (ϕ) and the KL divergence K(m, m ϕ ), define D(θ; ϕ) = m(x) p(x θ) dν(x) 1 m ϕ (x)

6 6 R. MARTIN AND S. T. TOKDAR and notice that T (ϕ) = D 2 (θ; ϕ) ϕ(θ) dµ(θ). Some analysis shows that D(θ; ϕ) is the Gâteaux derivative of K(m, η) at η = m ϕ in the direction of p( θ). Now, if T (ϕ) = 0, then D(θ; ϕ) = 0 for µ-almost all θ and, hence, D(ψ; ϕ) = D(θ; ϕ)ψ(θ) dµ(θ), the Gâteaux derivative of K(m, η) at η = m ϕ in the direction of m ψ, is zero for all ψ F. The fact that the Gâteaux derivative vanishes in all directions suggests that m ϕ is a point at which the infimum K(m, M Θ ) is attained, and this is exactly the conclusion of Lemma 3.1(a). A similar characterization of the NPMLE is given in Lindsay [10]. According to (3.2) and Lemma 3.1(a), K n would be a non-negative supermartingale were it not for the term involving Z n. Fortunately, since Z n is almost surely bounded, its presence causes no major difficulties. Lemma 3.2. Under A4, Z n 1+B a.s. for all n. Our last preliminary result of this section gets us closer to a convergence rate for K(m, m n ). Define K n = K(m, m n ) K(m, M Θ ) 0. Lemma 3.3. Under A3 A4, n w nk n < a.s. We are ready state and prove our main result. Those convergence properties advertised in Section 1 are corollaries to Theorem 3.4 that follows. Theorem 3.4. Under A1 A4, a n K n 0 a.s. Proof. Define two sequences of random variables, δ n = i=n+1 a i 1 wi 2 E(Z i A n ) and ε n = i=n+1 w i E(Ki A n ), and define K n = a n Kn + δ n + ε n 0. Using (3.2) we find that E( K n A n 1 ) K n = E(a n K n + δ n + ε n A n 1 ) a n 1 K n 1 δ n 1 ε n 1 w n K(m, M Θ ) a n E(K n A n 1 )+ i=n+1 a i 1 wi 2 E [E(Z i A n ) A n 1 ] + i=n+1 E [E(K i A n ) A n 1 ] a n 1 K n 1 a i 1 wi 2 E(Z i A n 1 ) w i E(K i A n 1 ). i=n i=n

7 ASYMPTOTIC PROPERTIES OF PR 7 After some simplification, using the fact that A n 1 A n, we have E( K n A n 1 ) K n a n E(K n A n 1 ) a n 1 K n 1 a n 1 wne(z 2 n A n 1 ) w n E(K n A n 1 ) ] = a n 1 [E(K n A n 1 ) K n 1 wn 2 E(Z n A n 1 ) = a n 1 w n T (f n 1 ) 0 Thus, K n forms a non-negative supermartingale so, by the martingale convergence theorem, there is a limit K 0 with finite expectation. But assumption A3, Lemma 3.2, and the dominated convergence theorem ensure that δ n and ε n both converge a.s. to zero; therefore, a n K n K a.s. It remains to show that K = 0 a.s. Suppose, on the other hand, that K > 0 with positive probability. From the previous display we have (3.5) K i 1 E( K i A i 1 ) a i 1 w i T (f i 1 ), i 2. Fix any two integers N and n. Taking conditional expectation with respect to A N 1 in (3.5) and summing gives (3.6) i=n a i 1 w i E[T (f i 1 ) A N 1 ] = = i=n i=n i=n E[ K i 1 E( K i A i 1 ) A N 1 ] { E( K i 1 A N 1 ) E[E( K i A i 1 ) A N 1 ] E( K i 1 K i A N 1 ) = K N 1 E( K N+n A N 1 ) } K N 1 We are assuming K > 0, so there exists ε> 0 such that K(m, m n ) >K(m, M Θ )+ε for all but perhaps finitely many n. Recall the proof of Lemma 2.1, which shows that the mapping κ(ϕ) =K(m, m ϕ ) is lower semi-continuous with respect to the weak topology on F. Consequently, G ε := {ϕ F : κ(ϕ) >K(m, M Θ )+ε} F.

8 8 R. MARTIN AND S. T. TOKDAR is a weakly open set, and its closure G ε is compact by A1. Since f n G ε for all but finitely many n, Lemma 3.1 implies that T (f n ) is bounded away from zero. This and A3 imply that the left-most sum in (3.6) goes to with positive probability as n. But the upper bound K N 1 is almost surely finite a contradiction. Therefore, K = 0 almost surely. Next we show that K n 0 implies m n m f 0, where denotes the L 1 (ν) norm. Corollary 3.5. Under A1 A4, m n converges to m f in L 1 (ν). Proof. Suppose this is not true; that is, there is an ε> 0 and a subsequence {n s } such that m ns m f >ε for all s. Then, by A1, this sequence has a further subsequence {n s(t) } such that f ns(t) converges weakly to some f. By A2, m ns(t) converges to m := m f pointwise and in L 1 (ν) by Scheffé s theorem; therefore, m m f. Define u t = m ns(t) /m 1. Then u t 0 pointwise and {u t } is uniformly integrable with respect to m dν by A4 and Jensen s inequality; indeed, sup t ( ) u 2 mns(t) 2 t m dν = sup 1 m dν < 1+B. t m Then Theorem 3.4, together with Vitali s theorem, implies that 0 K(m, m ) K(m, m f ) = K(m, m ) lim K(m, m ns(t) ) t = lim m log(m ns(t) /m ) dν lim t t u t m dν =0 Therefore, K(m, m )=K(m, m f ), and uniqueness of the KL projection implies m = m f, which contradicts our supposition. Theorem 3.4 and Corollary 3.5 seem to indicate that the estimates f n converge to some f F at which the infimum K(m, M Θ ) is attained. However, to conclude weak convergence of f n from L 1 convergence of m n, we need two additional conditions: A5. Identifiability: m ϕ = m ψ ν-a.e. implies ϕ = ψµ-a.e. A6. For any ε> 0 and any compact A X, there exists a compact B Θ such that A p(x θ) dν(x) <ε for all θ B. With conditions A5 A6 and Theorem 3 of TMG, the next result follows immediately from Corollary 3.5.

9 ASYMPTOTIC PROPERTIES OF PR 9 Corollary 3.6. Under A1 A6, f n f a.s. in the weak topology, where f F is the unique mixing density that satisfies K(m, m f )=K(m, M Θ ). In particular, if m M Θ, then f n is a consistent estimate of the true mixing density, in the weak topology. In the mis-specified case, even though Kn 0 implies m n m f 0, the L 1 (ν) rate does not easily follow without extra assumptions, such as boundedness of m f /m. But a Hellinger contrast rate is a direct consequence of Theorem 3.4. In the well-specified case, when m f = m, the Hellinger contrast reduces to the usual Hellinger distance, so our convergence rate results are comparable to those of, say, Genovese and Wasserman [3]. Corollary 3.7. Choose w n n γ, γ (2/3, 1], and assume A1 A4. (a) ρ(m n,m f )=o(n (1 γ)/2 ) a.s. (b) If m f /m L (m dν), then m n m f = o(n (1 γ)/2 ) a.s. Proof. Let Γ n = (m n /m f )m dν. An important well known property of KL projections that will be used in the proof is Γ n 1 (see, for example, Theorem 3 of Barron [1] or Lemma 2.1 of Patilea [15]). Part (a) follows from Lemma 2.4 of Patilea [15]. Indeed, 2ρ 2 (m n,m f )= ( ) 2 mn 1 m dν m f = (Γ n 1) + 2 ( 1 2 = K n mf log m dν m n ) mn m f m dν For part (b), let q n = mm n /m f Γ n, and notice that Kn K(m, q n). Then, Pinsker s inequality gives 2Kn m q n = 1 m n m n Γ Γ n n Γ m f n. m f L1 (m dν) The triangle inequality for L1 (m dν) implies m n m n 1 Γ m f n + (1 Γ L1 m (m) f n ), L1 (m dν) L1 (m dν)

10 10 R. MARTIN AND S. T. TOKDAR and hence 2Kn m n + (1 Γ n ) 1. m f L1 (m dν) The right-hand side is related to the L 1 (ν) error via Holder s inequality m n m f = m n m f dν mf m n = 1 m m f m dν C m n 1 m f L1 (m) where C := m f /m L (m dν) is finite by assumption. Therefore, { } m n m f C 2Kn + (1 Γ n ), and part (b) follows from the fact that (1 Γ n ) K n. 4. A generalized PR algorithm. To satisfy the conditions of Theorem 3.4, one typically would need the mixing parameter space Θ to be compact. Also, in order to calculate the estimate in practice, this Θ must be known since integration over Θ is required in (1.2). These requirements are similar to those in Ghosal and van der Vaart [4] and can be somewhat restrictive, particularly when there is no natural choice of Θ. A potential solution is to use a mixture sieve, which allows the support of the estimated mixing distribution to grow with the sample size. A motivation for this dynamic choice of support would be that, eventually, the support will be large enough so that the class of all mixtures over that support will be sufficiently rich. Borrowing on this idea, we propose a sieve-like extension of the PR algorithm which, instead of requiring Θ to be fixed and known, incorporates a sequence of compact mixing parameter spaces that increases with n. Let Θ denote the mixing parameter space, which may or may not be compact. For example, if the model is a Gaussian location-scale mixture, then Θ = R R +. The generalized PR algorithm is as follows. Algorithm GPR. Choose an increasing sequence of compact sets Θ n such that Θ n Θ, a bounded, strictly positive, µ-measurable function g(θ), and a sequence c n 0 that satisfies n log(1 + c n) <. Define g n (θ) =g(θ)i Θn\Θ n 1 (θ)/d n, where d n = Θ n\θ n 1 g dµ is the normalizing constant. Start with an initial estimate f 0 on Θ 0 and, for n 1, define fn (θ) = (1 w p(x n θ)f n 1 (θ) n)f n 1 (θ)+w n, θ Θ n 1 m n 1 (X n )

11 ASYMPTOTIC PROPERTIES OF PR 11 and then 1 1+c n fn (θ), θ Θ n 1 (4.1) f n (θ) = c n 1+c n g n (θ), θ Θ n \ Θ n 1 0, θ Θ c n As in (1.3), define m n := m fn as the final estimate of m. The motivation for using g n is that Θ n is small compared to Θ, so the estimate should be padded near the boundary of Θ n to compensate for the possibly heavy tails of f which might assign considerable mass to Θ c n. Asymptotically, any function g would suffice, so we recommend a default choice g(θ) 1, making g n (θ) a Uniform density on Θ n \ Θ n 1. Next we derive convergence properties of the GPR estimate m n. The primary obstacle in extending the results in Section 3 is that now the support of the mixing densities is changing with n this makes the comparisons of the mixing densities in the proof of Lemma 3.3 more difficult. Here we will consider only the case where m = m f M Θ, but the mixing density f and its support Θ f Θ are both unknown. Theorem 4.1 below establishes a bound on the rate of convergence in the case where Θ f is a compact subset of Θ. It is important to point out that, while we are restricting ourselves to the compact case, we need not assume Θ f to be known. Recall condition A4 in Section 3 requiring that the likelihood ratio be square integrable uniformly over Θ. In this case, we have a sequence Θ n, and we require a bound similar to that of A4 for each Θ n. To this end, define the sequence (4.2) B n = sup θ 1,θ 2,θ 3 Θ n [ ] p(x θ1 ) 2 p(x θ 3 ) dν(x), n 0. p(x θ 2 ) Since Θ n Θ n+1, the sequence B n is clearly increasing; if we are to push the proof of Theorem 3.4 through in this more general situation, we will need to control how fast B n increases. Theorem 4.1. Assume that Θ f Θ is compact and that conditions A2 A3 hold. Furthermore, assume that n a n 1w 2 n B n <. Then the GPR estimates m n satisfy a n K(m f,m n ) 0 a.s. See the Appendix for the proof. In general, the conditions of the theorem are satisfied if (4.3) w n (n α log n) 1, α [2/3, 1], and B n = O(log n).

12 12 R. MARTIN AND S. T. TOKDAR Example 4.2. Suppose that p(x θ) = e θ θ x /x! is a Poisson density (with respect to counting measure). Then x [ ] p(x θ1 ) 2 p(x θ 3 ) = exp{2(θ 2 θ 1 ) θ 3 + θ 2 p(x θ 2 ) 1θ 3 /θ2}. 2 Take Θ n =[α n,β n ] where α n and β n are to be determined. Then, B n = sup exp{2(θ 2 θ 1 ) θ 3 + θ1θ 2 3 /θ2} 2 exp{βn/α 3 2 n}. θ 1,θ 2,θ 3 Θ n If β n (c + log log n) 1/5, for some constant c>0such that Θ 0 suitably large, and α n = βn 1, then it is easy to check that B n = O(log n). Therefore, the conditions of Theorem 4.1 are satisfied with w n as in (4.3). Acknowledgments. The authors thank Professor J. K. Ghosh for his support and guidance throughout this and other projects. APPENDIX: PROOFS Proof of Lemma 3.1. For part (a), treat θ as a random element in Θ, whose distribution has density ϕ F, and define the random variable (A.4) g ϕ (θ) = m(x) p(x θ) dν(x). m ϕ (x) Then E ϕ {g ϕ (θ)} = g ϕ ϕ dµ = 1 and T (ϕ) =V ϕ {g ϕ (θ)} 0, with equality if and only if g ϕ =1µ-a.e. Next define { } Λ(ϕ) = log g ϕ (θ) f(θ) dµ(θ), Θ where f F is such that K(m, m f )=K(m, M Θ ). Note that T (ϕ) = 0 implies Λ(ϕ) = 0. By Jensen s inequality { } m f (x) Λ(ϕ) = log m(x) dν(x) X m ϕ (x) ( ) mf (x) log m(x) dν(x) m ϕ (x) X = K(m, m ϕ ) K(m, m f ) 0 so that Λ(ϕ) = 0 implies K(m, m ϕ )=K(m, m f ).

13 ASYMPTOTIC PROPERTIES OF PR 13 For part (b), take a sequence {ϕ s } F such that ϕ s converges weakly to some ϕ F, and let g ϕs and g ϕ be as in (A.4). Let r s (x, θ) =p(x θ)/m ϕs (x) so that g ϕs (θ) = r s (x, θ)m(x) dν(x). By A4 and Jensen s inequality, sup s rs 2 (x, θ)m(x) dν(x) B, which implies {r s (,θ)} is uniformly integrable with respect to m dν; therefore, g ϕs g ϕ µ-a.e. Since the weak topology on F is metrizable, to prove continuity of T it suffices to show that T (ϕ s ) T (ϕ). We have (A.5) T (ϕ s ) T (ϕ) = gϕ 2 s ϕ s dµ gϕ 2 ϕ dµ (gϕ 2 s gϕ)ϕ 2 s dµ + gϕ(ϕ 2 s ϕ) dµ The second term on the right-hand side of (A.5) goes to zero by definition of weak convergence, since g 2 ϕ B (by A4) and g 2 ϕ is continuous (by A1). Next, we know the following: g 2 ϕ s g 2 ϕ ϕ s 0 µ-a.e., g 2 ϕ s g 2 ϕ ϕ s 2Bϕ s, and ϕ s ϕ pointwise µ-a.e. and lim ϕ s dµ = 1 = ϕ dµ. Now, for the first term on the right-hand side of (A.5), (gϕ 2 s gϕ)ϕ 2 s dµ gϕ 2 s gϕ ϕ 2 s dµ 0, where convergence follows from the above properties and Theorem 1 of Pratt [16]. Therefore, T (ϕ s ) T (ϕ) and since the sequence {ϕ s } was arbitrary, T must be continuous. Proof of Lemma 3.2. Note that for a>0 and b (0, 1), we have (A.6) (a 1) 2 max{1, (1 + b(a 1)) 2 } max{(a 1) 2, (1/a 1) 2 }. Combining inequalities (3.1) and (A.6) we see that Hn,X 2 n R(w n H n,xn ) max ) ( ) 2 2 mn 1 1, 1 m n 1 h n,xn ( hn,xn and, since both h n,xn and m n 1 belong to M Θ for each n, we conclude from A4 and Jensen s inequality that Z n 1+B a.s.

14 14 R. MARTIN AND S. T. TOKDAR Proof of Lemma 3.3. This was proved in TMG for the case m M Θ. Here we only sketch the details for the more general case. Let f F be such that K(m, m f )=K(m, M Θ ). TMG s Taylor approximation and telescoping sum argument for the KL divergence between mixing densities reveals n K(f, f n ) K(f, f 0 )= f log(f i /f i 1 ) dµ i=1 Θ n n n = w i V i w i M i + wi 2 E i i=1 i=1 i=1 where, with a slight difference in notation compared to TMG, M i = ( ) mf 1 m dν, V i =1 m f(x i ) m i 1 m i 1 (X i ) + M i, and the E i s are uniformly bounded (by A4). From log(x) x 1 and the proof of Lemma 3.1(a), we get M i log(m f /m i 1 )m dν = Ki 1 0. To prove the claim, it suffices to show that n i=1 w i M i converges almost surely to a finite random variable. The same arguments in TMG show that both V := i=1 w i V i and E := i=1 w 2 i E i are almost surely finite, and non-negativity of the KL divergence implies n w i M i K(f, f 0 )+V + E < n. i=1 Then the left-hand side converges because the M i s are non-negative. Proof of Theorem 4.1. The proof is essentially the same as that of Theorem 3.4; we simply need to check that Lemmas 3.2 and 3.3 continue to hold in this new context. Since B n is increasing, the condition n a n 1w 2 nb n < implies that the conclusion of Lemma 3.2 holds, and the remainder term Z n in (3.4) satisfies n w2 n E(Z n) <. The key observation is that, since Θ f is compact, there is a number N = N(f) such that Θ f Θ n for all n N. Consequently, there are only N iterates f 0,..., f N 1 such that K(f, f i )= so, after appropriately shifting the time scale, we can apply the telescoping sum argument in

15 the proof of Lemma 3.3 to get K(f, f N+n ) K(f, f N )= = i=n+1 ASYMPTOTIC PROPERTIES OF PR 15 w i V i i=n+1 i=n+1 w i M i + f log(f i /f i 1 ) dµ i=n+1 w 2 i E i + i=n+1 log(1 + c i ) The right-most sum in the display converges by assumption and the others can be handled as in the proof of Lemma 3.3. Therefore, we can conclude that n=n w n K(m f,m n ) < a.s. To finish the proof, simply throw away the first N iterations and apply the argument used to prove Theorem 3.4. REFERENCES [1] Barron, A. R. (2000). Limits of information, Markov chains, and projection. In IEEE International Symposium on Information Theory. 25. [2] Brown, L. D., George, E. I., and Xu, X. (2008). Admissible predictive density estimation. Ann. Statist. 36, 3, MR [3] Genovese, C. R. and Wasserman, L. (2000). Rates of convergence for the Gaussian mixture sieve. Ann. Statist. 28, 4, MR [4] Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29, 5, MR [5] Ghosh, J. K. and Tokdar, S. T. (2006). Convergence and consistency of Newton s algorithm for estimating mixing distribution. In Frontiers in statistics. Imp. Coll. Press, London, MR [6] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinitedimensional Bayesian statistics. Ann. Statist. 34, 2, MR [7] Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20, 3, MR [8] Li, J. Q. and Barron, A. R. (2000). Mixture density estimation. In Advances in Neural Information Processing Systems, S. Solla, T. Leen, and K.-R. Mueller, Eds. MIT Press, Cambridge, Massachusetts, [9] Liese, F. and Vajda, I. (1987). Convex statistical distances. Teubner, Leipzig. [10] Lindsay, B. G. (1983). The geometry of mixture likelihoods: a general theory. Ann. Statist. 11, 1, MR [11] Martin, R. and Ghosh, J. K. (2008). Stochastic approximation and Newton s estimate of a mixing distribution. Statist. Sci. 23, 3, MR [12] Newton, M. A. (2002). On a nonparametric recursive estimator of the mixing distribution. Sankhyā Ser. A 64, 2, MR [13] Newton, M. A., Quintana, F. A., and Zhang, Y. (1998). Nonparametric Bayes methods using predictive updating. In Practical nonparametric and semiparametric Bayesian statistics. Vol Springer, New York, MR [14] Newton, M. A. and Zhang, Y. (1999). A recursive algorithm for nonparametric analysis with missing data. Biometrika 86, 1, MR

16 16 R. MARTIN AND S. T. TOKDAR [15] Patilea, V. (2001). Convex models, MLE and misspecification. Ann. Statist. 29, 1, MR [16] Pratt, J. W. (1960). On interchanging limits and integrals. Ann. Math. Statist. 31, MR [17] Shyamalkumar, N. (1996). Cyclic I 0 projections and its applications in statistics. Tech. Rep , Purdue University, Department of Statistics, West Lafayette, IN. [18] Tokdar, S. T., Martin, R., and Ghosh, J. K. (2009). Consistency of a recursive estimate of mixing distributions. Ann. Statist. 37, 5A, Department of Mathematical Sciences Indiana University-Purdue University Indianapolis 402 North Blackford Street Indianapolis, Indiana USA rgmartin@math.iupui.edu Department of Statistical Science Duke University 214 Old Chemistry Building Durham, North Carolina USA st118@stat.duke.edu

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Convergence of generalized entropy minimizers in sequences of convex problems

Convergence of generalized entropy minimizers in sequences of convex problems Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

On the distributional divergence of vector fields vanishing at infinity

On the distributional divergence of vector fields vanishing at infinity Proceedings of the Royal Society of Edinburgh, 141A, 65 76, 2011 On the distributional divergence of vector fields vanishing at infinity Thierry De Pauw Institut de Recherches en Mathématiques et Physique,

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Spectral theory for compact operators on Banach spaces

Spectral theory for compact operators on Banach spaces 68 Chapter 9 Spectral theory for compact operators on Banach spaces Recall that a subset S of a metric space X is precompact if its closure is compact, or equivalently every sequence contains a Cauchy

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Banach Spaces V: A Closer Look at the w- and the w -Topologies BS V c Gabriel Nagy Banach Spaces V: A Closer Look at the w- and the w -Topologies Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we discuss two important, but highly non-trivial,

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Spectral Theory of Orthogonal Polynomials

Spectral Theory of Orthogonal Polynomials Spectral Theory of Orthogonal Barry Simon IBM Professor of Mathematics and Theoretical Physics California Institute of Technology Pasadena, CA, U.S.A. Lecture 2: Szegö Theorem for OPUC for S Spectral Theory

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM

HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM Georgian Mathematical Journal Volume 9 (2002), Number 3, 591 600 NONEXPANSIVE MAPPINGS AND ITERATIVE METHODS IN UNIFORMLY CONVEX BANACH SPACES HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM

More information

Semiparametric Minimax Rates

Semiparametric Minimax Rates arxiv: math.pr/0000000 Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Hartogs Theorem: separate analyticity implies joint Paul Garrett garrett/

Hartogs Theorem: separate analyticity implies joint Paul Garrett  garrett/ (February 9, 25) Hartogs Theorem: separate analyticity implies joint Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ (The present proof of this old result roughly follows the proof

More information

An asymptotic ratio characterization of input-to-state stability

An asymptotic ratio characterization of input-to-state stability 1 An asymptotic ratio characterization of input-to-state stability Daniel Liberzon and Hyungbo Shim Abstract For continuous-time nonlinear systems with inputs, we introduce the notion of an asymptotic

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction

ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE KARL PETERSEN AND SUJIN SHIN Abstract. We show that two natural definitions of the relative pressure function for a locally

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by . Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Zeros of lacunary random polynomials

Zeros of lacunary random polynomials Zeros of lacunary random polynomials Igor E. Pritsker Dedicated to Norm Levenberg on his 60th birthday Abstract We study the asymptotic distribution of zeros for the lacunary random polynomials. It is

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1 Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,

More information

Banach Spaces II: Elementary Banach Space Theory

Banach Spaces II: Elementary Banach Space Theory BS II c Gabriel Nagy Banach Spaces II: Elementary Banach Space Theory Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we introduce Banach spaces and examine some of their

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Some Background Math Notes on Limsups, Sets, and Convexity

Some Background Math Notes on Limsups, Sets, and Convexity EE599 STOCHASTIC NETWORK OPTIMIZATION, MICHAEL J. NEELY, FALL 2008 1 Some Background Math Notes on Limsups, Sets, and Convexity I. LIMITS Let f(t) be a real valued function of time. Suppose f(t) converges

More information

ASYMPTOTICALLY NONEXPANSIVE MAPPINGS IN MODULAR FUNCTION SPACES ABSTRACT

ASYMPTOTICALLY NONEXPANSIVE MAPPINGS IN MODULAR FUNCTION SPACES ABSTRACT ASYMPTOTICALLY NONEXPANSIVE MAPPINGS IN MODULAR FUNCTION SPACES T. DOMINGUEZ-BENAVIDES, M.A. KHAMSI AND S. SAMADI ABSTRACT In this paper, we prove that if ρ is a convex, σ-finite modular function satisfying

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

METRIC HEIGHTS ON AN ABELIAN GROUP

METRIC HEIGHTS ON AN ABELIAN GROUP ROCKY MOUNTAIN JOURNAL OF MATHEMATICS Volume 44, Number 6, 2014 METRIC HEIGHTS ON AN ABELIAN GROUP CHARLES L. SAMUELS ABSTRACT. Suppose mα) denotes the Mahler measure of the non-zero algebraic number α.

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Regularity of the density for the stochastic heat equation

Regularity of the density for the stochastic heat equation Regularity of the density for the stochastic heat equation Carl Mueller 1 Department of Mathematics University of Rochester Rochester, NY 15627 USA email: cmlr@math.rochester.edu David Nualart 2 Department

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Existence and Uniqueness

Existence and Uniqueness Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect

More information

PCA sets and convexity

PCA sets and convexity F U N D A M E N T A MATHEMATICAE 163 (2000) PCA sets and convexity by Robert K a u f m a n (Urbana, IL) Abstract. Three sets occurring in functional analysis are shown to be of class PCA (also called Σ

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Iowa State University. Instructor: Alex Roitershtein Summer Exam #1. Solutions. x u = 2 x v

Iowa State University. Instructor: Alex Roitershtein Summer Exam #1. Solutions. x u = 2 x v Math 501 Iowa State University Introduction to Real Analysis Department of Mathematics Instructor: Alex Roitershtein Summer 015 Exam #1 Solutions This is a take-home examination. The exam includes 8 questions.

More information

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis Real Analysis, 2nd Edition, G.B.Folland Chapter 5 Elements of Functional Analysis Yung-Hsiang Huang 5.1 Normed Vector Spaces 1. Note for any x, y X and a, b K, x+y x + y and by ax b y x + b a x. 2. It

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

A NOTE ON ZERO SETS OF FRACTIONAL SOBOLEV FUNCTIONS WITH NEGATIVE POWER OF INTEGRABILITY. 1. Introduction

A NOTE ON ZERO SETS OF FRACTIONAL SOBOLEV FUNCTIONS WITH NEGATIVE POWER OF INTEGRABILITY. 1. Introduction A NOTE ON ZERO SETS OF FRACTIONAL SOBOLEV FUNCTIONS WITH NEGATIVE POWER OF INTEGRABILITY ARMIN SCHIKORRA Abstract. We extend a Poincaré-type inequality for functions with large zero-sets by Jiang and Lin

More information

Deviation Measures and Normals of Convex Bodies

Deviation Measures and Normals of Convex Bodies Beiträge zur Algebra und Geometrie Contributions to Algebra Geometry Volume 45 (2004), No. 1, 155-167. Deviation Measures Normals of Convex Bodies Dedicated to Professor August Florian on the occasion

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

Homework 4, 5, 6 Solutions. > 0, and so a n 0 = n + 1 n = ( n+1 n)( n+1+ n) 1 if n is odd 1/n if n is even diverges.

Homework 4, 5, 6 Solutions. > 0, and so a n 0 = n + 1 n = ( n+1 n)( n+1+ n) 1 if n is odd 1/n if n is even diverges. 2..2(a) lim a n = 0. Homework 4, 5, 6 Solutions Proof. Let ɛ > 0. Then for n n = 2+ 2ɛ we have 2n 3 4+ ɛ 3 > ɛ > 0, so 0 < 2n 3 < ɛ, and thus a n 0 = 2n 3 < ɛ. 2..2(g) lim ( n + n) = 0. Proof. Let ɛ >

More information

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable

More information

The Arzelà-Ascoli Theorem

The Arzelà-Ascoli Theorem John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps

More information

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY published in IMA Journal of Numerical Analysis (IMAJNA), Vol. 23, 1-9, 23. OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY SIEGFRIED M. RUMP Abstract. In this note we give lower

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption Chiaki Hara April 5, 2004 Abstract We give a theorem on the existence of an equilibrium price vector for an excess

More information

An almost sure invariance principle for additive functionals of Markov chains

An almost sure invariance principle for additive functionals of Markov chains Statistics and Probability Letters 78 2008 854 860 www.elsevier.com/locate/stapro An almost sure invariance principle for additive functionals of Markov chains F. Rassoul-Agha a, T. Seppäläinen b, a Department

More information

Zeros and coefficients

Zeros and coefficients Zeros and coefficients Alexandre Eremenko 0.28.205 Abstract Asymptotic distribution of zeros of sequences of analytic functions is studied. Let f n be a sequence of analytic functions in a region D in

More information

arxiv: v1 [math.oc] 18 Jul 2011

arxiv: v1 [math.oc] 18 Jul 2011 arxiv:1107.3493v1 [math.oc] 18 Jul 011 Tchebycheff systems and extremal problems for generalized moments: a brief survey Iosif Pinelis Department of Mathematical Sciences Michigan Technological University

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Functional Analysis HW #1

Functional Analysis HW #1 Functional Analysis HW #1 Sangchul Lee October 9, 2015 1 Solutions Solution of #1.1. Suppose that X

More information

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19 Introductory Analysis I Fall 204 Homework #9 Due: Wednesday, November 9 Here is an easy one, to serve as warmup Assume M is a compact metric space and N is a metric space Assume that f n : M N for each

More information

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall. .1 Limits of Sequences. CHAPTER.1.0. a) True. If converges, then there is an M > 0 such that M. Choose by Archimedes an N N such that N > M/ε. Then n N implies /n M/n M/N < ε. b) False. = n does not converge,

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

BERNOULLI ACTIONS AND INFINITE ENTROPY

BERNOULLI ACTIONS AND INFINITE ENTROPY BERNOULLI ACTIONS AND INFINITE ENTROPY DAVID KERR AND HANFENG LI Abstract. We show that, for countable sofic groups, a Bernoulli action with infinite entropy base has infinite entropy with respect to every

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

THE ALTERNATIVE DUNFORD-PETTIS PROPERTY FOR SUBSPACES OF THE COMPACT OPERATORS

THE ALTERNATIVE DUNFORD-PETTIS PROPERTY FOR SUBSPACES OF THE COMPACT OPERATORS THE ALTERNATIVE DUNFORD-PETTIS PROPERTY FOR SUBSPACES OF THE COMPACT OPERATORS MARÍA D. ACOSTA AND ANTONIO M. PERALTA Abstract. A Banach space X has the alternative Dunford-Pettis property if for every

More information

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974 LIMITS FOR QUEUES AS THE WAITING ROOM GROWS by Daniel P. Heyman Ward Whitt Bell Communications Research AT&T Bell Laboratories Red Bank, NJ 07701 Murray Hill, NJ 07974 May 11, 1988 ABSTRACT We study the

More information