On The Asymptotics of Minimum Disparity Estimation

Size: px
Start display at page:

Download "On The Asymptotics of Minimum Disparity Estimation"

Transcription

1 Noname manuscript No. (will be inserted by the editor) On The Asymptotics of Minimum Disparity Estimation Arun Kumar Kuchibhotla Ayanendranath Basu Received: date / Accepted: date Abstract Inference procedures based on the minimization of divergences are popular statistical tools. Beran (1977) proved consistency and asymptotic normality of the minimum Hellinger distance (MHD) estimator. This method was later extended to the large class of disparities in discrete models by Lindsay (1994) who proved existence of a sequence of roots of the estimating equation which is consistent and asymptotically normal. However the current literature does not provide a general asymptotic result about the minimizer of a generic disparity. In this paper we prove, under very general conditions, an asymptotic representation of the minimum disparity estimator itself (and not just for a root of the estimating equation), thus generalizing the results of Beran (1977) and Lindsay (1994). This leads to a general framework for minimum disparity estimation encompassing both discrete and continuous models. Keywords Disparity Quadratic Approximation Non-parametric Density Estimation 1 Introduction Different types of divergence measures have been used in the literature to measure the dissimilarity between two distributions. A prominent subclass of density-based divergences is the family of disparities which will be described in detail in Section 2. Given a density g and a family of parametric densities, a natural way of getting a best fitting parameter is to minimize a disparity measure between g and a density from the (parametric) family over the parameter space. When dealing with point estimation in parametric models, maximum likelihood is the most popular method of estimation. But other alternatives like the method of moments and M-estimators are also available. University of Pennsylvania and Indian Statistical Institute arunku@wharton.upenn.edu ayanbasu@isical.ac.in

2 2 Arun Kumar Kuchibhotla, Ayanendranath Basu Considering the efficiency of the estimator to be the criterion for comparison, the maximum likelihood estimator is one of the best under some regularity conditions. Rao (1961), Robertson (1972) and Fryer and Robertson (1972) have noted that there is a class of estimators containing the maximum likelihood estimator such that each estimator in the class is asymptotically efficient or asymptotically equivalent to the maximum likelihood estimator (up to order n 1/2 ). Many authors have followed this up by considering various other criteria like higher order efficiency in order to single out the maximum likelihood estimator as the best. But in the current era of big data, some errors in the generation, recording and transmission of data are not unexpected. Thus it appears justifiable that one should consider the asymptotic robustness of the estimators together with their asymptotic efficiency when comparing estimators. Note however that, while there is a well established concept of asymptotic efficiency of an estimator, there is no universal way of proving asymptotic robustness of the estimator or claiming that some estimator is the best robust estimator. Beran (1977) considered the minimum Hellinger distance estimator in continuous models. He appears to be the first to prove that there are estimators which are asymptotically fully efficient while enjoying strong robustness properties. Beran s (1977) approach required a non-parametric estimator of the data density. The Hellinger distance was then replaced by a general disparity by Lindsay (1994) who considered discrete models and used sample proportions as estimates of the actual density. A focal point of his work was the study of the properties of zeros of an estimating function obtained as the derivative of a disparity. The main result of Lindsay (1994) states that there exists a sequence of roots which is consistent and asymptotically normal with asymptotic variance coinciding with the inverse of Fisher information when the true density is an element of the parametric family. Later the results of Lindsay (1994) were extended by Basu and Lindsay (1994), Park and Basu (2004) and Kuchibhotla and Basu (2015) to continuous models under different conditions on the model, the kernel density estimate and the disparity generating function. However, these authors also consider the roots of an estimating equation rather than the minimum disparity estimator itself. As noted by Ferguson (1982), proving the asymptotic results for some sequence of roots of the disparity based estimating equation may not prove the same for the minimum disparity estimator. Also, the results of the previous authors only mention that there exists a good sequence of roots and do not prescribe how to get such a sequence when there are multiple roots of the estimating equation. In light of this discussion, we feel that one should derive the asymptotic results for the minimum disparity estimator. Also, an approach which parallels the framework of Lindsay (1994) in case of continuous models in terms of the conditions on the disparity does not exist in the literature. Although Kuchibhotla and Basu (2015) considered a set up where the disparity conditions are milder than those of Lindsay (1994), they have stronger conditions on the density estimator.

3 On The Asymptotics of Minimum Disparity Estimation 3 In this paper, we first prove a grand consistency theorem for the minimum disparity estimator under minimal conditions. We then develop an asymptotic representation of the minimum disparity estimator in a general framework. Our results are applicable whenever the densities exist with respect to a σ- finite base measure rather than being specific to the case of the Lebesgue measure. Also, the conditions on the disparity are exactly the same as those in Lindsay (1994). The specific achievements of this paper may be listed as follows. 1. Consistency is proved with minimal conditions for a suitable subclass of disparities; even the differentiability of the probability density function with respect to the parameter or smoothness of the disparity generating function are not required. 2. All the results proved in this paper relate to the minimizer of the disparity itself, and not just a suitable sequence of roots of the estimating equation. This is unlike most of the previous works done in this area; Beran (1977) is an exception. 3. The grand consistency theorem and the asymptotic representation of the disparity do not require the observations to be independent; neither is it necessary for the density estimator to be a kernel density estimator. 4. Theorem 4.1, together with Remark 10 establishes a general framework for minimum disparity estimation encompassing both discrete and continuous models. Results of Lindsay (1994) emerge as a special case. 5. The development described in the previous items establishes the legitimacy of the disparity based analogue of the likelihood ratio test considered in Theorems which depends explicitly on the minimizer of the disparity. This also avoids the possibility of having a negative statistic due to the use of a root which is not a global minimizer. We now outline the remaining sections of the paper. In Section 2, we present the grand consistency theorem of the minimum disparity estimator. In Section 3, we prove the quadratic approximation of the disparity which leads to an asymptotic representation of the minimum disparity estimator. In Section 4, we prove asymptotic normality of the estimating function which, combined with the asymptotic representation of the estimator, leads to the asymptotic normality of the minimum disparity estimator. In Section 5, we consider testing of hypothesis using disparities. Finally, we conclude with some remarks in Section 6. We try to present our results step-by-step so that the assumptions required for each step become transparent and the generalization of the results currently available only for kernel density estimators becomes easier. In this paper we deal with the asymptotic efficiency results of the minimum disparity estimator, and do not re-emphasize the well known robustness properties of these estimators. However, see Remark 5 and Theorem 5.3. Although we primarily follow the approach of Lindsay (1994) in defining the disparities, the class of disparities also coincides with the class of φ-divergences of Csiszár (1963) and Ali and Silvey (1966). Other authors have worked on the φ-divergence formulation and independently determined the properties of the

4 4 Arun Kumar Kuchibhotla, Ayanendranath Basu corresponding minimum distance procedures primarily in discrete models. See, for example, Morales et al. (1995) and Pardo (2006). However the literature is deficient in general results based on φ-divergences in continuous models, where the results are usually scattered, corresponding to specific divergences, as in Beran (1977) or Basu et al. (1997). 2 Consistency Let G represent the class of all probability distributions having densities with respect to some σ-finite base measure µ on some measurable space (Ω, Λ, µ) with Λ representing a σ-field on Ω. We assume that the true distribution G and the model F Θ = {F θ : θ Θ} belong to G. Let g and f θ be the corresponding densities (with respect to µ). Let X 1, X 2,..., X n be a random sample from G which is modelled by F Θ. We do not necessarily assume that the observations are independent, although we require them to be identically distributed. Our aim is to estimate the parameter θ by choosing the model density which gives the closest fit to the data. Let C be a real valued strictly convex function with C(0) = 0. Consider the divergence given by the form ( ) g(x) ρ C (g, f θ ) = C f θ (x) 1 f θ (x)dµ(x). This form describes the class of all disparities (Lindsay, 1994) between the densities g and f θ. For g(x) = 0 or f θ (x) = 0, we use the following convention ( ) 0 ( a ) 0C 0 1 = 0, and 0C 0 1 C(d) = a lim d d. The function C in the disparity ρ C (g, f θ ) is called the disparity generating function. An application of Jensen s inequality shows that ρ C (g, f θ ) 0 with equality if and only if g = f θ identically. If the base measure is the counting measure on the set {a 1, a 2,...}, then the disparity can be written as ρ C (g, f θ ) = i=1 ( ) g(ai ) C f θ (a i ) 1 f θ (a i ). If the base measure is the Lebesgue measure (λ) on the Euclidean space R m, then the disparity can be written as ( ) g ρ C (g, f θ ) = 1 f θ dλ. f θ R m C The residual δ(x) = (g(x)/f θ (x)) 1 has been called the Pearson residual in Lindsay (1994) and we follow this nomenclature here; the range of the Pearson residual is [ 1, ).

5 On The Asymptotics of Minimum Disparity Estimation 5 Remark 1 The disparity generating function can be changed to C 1 (δ) = C(δ) tδ (for any t R) without changing the disparity ρ C since δf θ dµ = 0. If C is differentiable, we can take t = C (0), so that C 1(0) = 0. Since the redefined disparity generating function C 1 is also strictly convex, any local minimum of C 1 represents a unique global minimum. Thus the condition C 1(0) = 0 actually makes the disparity generating function C 1 itself non-negative and therefore so is the integrand of the disparity. (Non-negativity of the disparity generating function, however, is not necessary for the disparity to be non-negative). Even when the disparity generating function C is not differentiable, one can still redefine it as C 1 (δ) = C(δ) tδ for any t, a sub-gradient of C at 0, so that C 1 is non-negative. See Proposition 8.5 of Vajda (1989) for more details. In our subsequent proofs, we will choose the disparity generating function to be non-negative. We denote by θ g = T (G), the best fitting parameter which minimizes ρ C (g, f θ ) over all θ Θ. To get an estimate of T (G), we consider minimizing an estimate of the disparity ρ C (g, f θ ) based on the random sample X 1, X 2,..., X n which are identically distributed with density g. A natural estimate of the disparity can be obtained by replacing g by a suitable density estimator g n. Thus, we consider the minimum disparity estimator ˆθ n of θ g defined by ˆθ n := arg min ρ C (g n, f θ ). θ The minimizer ˆθ n may not be unique and in such a case ˆθ n represents any one of the minimizers. We prove that any minimizer is strongly consistent under some conditions. The main component in the proof of the grand consistency theorem which follows is the uniform convergence of the objective function ρ C (g n, f θ ) to ρ C (g, f θ ) over all θ Θ; we then use van der Vaart (1998, Theorem 5.7) to get the strong consistency of any minimizer ˆθ n. Theorem 2.1 Suppose that the following assumptions hold: (C1) The parameter space Θ is compact; (C2) C( 1) + C ( ) <, where C ( ) = lim u C(u)/u; (C3) For each θ Θ and any sequence θ n θ, lim f θ n (x) = f θ (x), n for all x, except possibly on a set (which might depend on θ but not on the sequence {θ n }) of µ-measure zero. Also, θ g is the unique minimizer of the disparity ρ C (g, f θ ); (C4) g n is strongly consistent for g, i.e, g n (x) converges almost surely to g(x) for µ-almost all x. Then the minimum disparity estimator ˆθ n is strongly consistent for θ g. Proof First, we prove that continuity of f θ (x) in θ for almost all x implies that ρ C (g, f θ ) is also continuous in θ for every density g. This proves that under

6 6 Arun Kumar Kuchibhotla, Ayanendranath Basu assumptions (C1) and (C3), the minimizers ˆθ n and θ g exist since a continuous function on a compact set attains its minimum. Suppose θ n θ Θ as n. By Lemma 11.1 of Basu et al. (2011) and Remark 1, we have ( ) g 0 C 1 f θn C( 1) + C ( ) g f θn. (2.1) f θn 2 Clearly, the upper bound is integrable and g f θn dµ = 2 2 min{g, f θn }dµ. Since min{g, f θn } min{g, f θ } as n and min{g, f θn } g which is integrable, we have by dominated convergence theorem that min{g, f θn }dµ min{g, f θ }dµ, and hence g f θn dµ g f θ dµ. (2.2) Now using Pratt s lemma (Theorem 5.5, Gut (2013)) with Equations (2.1) and (2.2), we get, as n ρ C (g, f θn ) ρ C (g, f θ ), implying continuity of ρ C (g, f θ ) with respect to θ. Next we prove that the estimate of disparity ρ C (g n, f θ ) of ρ C (g, f θ ) is uniformly strongly consistent over θ Θ in the sense that sup ρ C (g n, f θ ) ρ C (g, f θ ) a.s 0, θ Θ as n. Since Θ is assumed to be compact and ρ C (g, f θ ) is continuous in θ, proving continuous convergence suffices to get uniform convergence (see Theorem 1 of Iséki (1957)). Take a sequence {θ n } Θ converging to θ Θ as n. Then we need to prove that ρ C (g n, f θn ) ρ C (g, f θn ) a.s 0, as n. We will prove using Pratt s lemma that ρ C (g n, f θn ) and ρ C (g, f θn ) converge (a.s) to ρ C (g, f θ ) as n. By Lemma 11.1 of Basu et al. (2011) with g n replacing g, we have 0 C ( gn f θn 1 ) f θn By triangle inequality, we get that g f θ dµ g n g dµ + g n f θn dµ g n g dµ + C( 1) + C ( ) g n f θn. 2 g n f θn dµ + f θ f θn dµ, (2.3) g f θ dµ + f θn f θ dµ. (2.4)

7 On The Asymptotics of Minimum Disparity Estimation 7 We know that g n (x) converges almost surely to g(x) for all x. Thus by Glick s theorem (Devroye and Györfi, 1985, page 10), that outside of a set B of measure 0, g n g dµ 0, as n. Fix an ω B c. Taking lim inf and lim sup on both sides in inequalities (2.3) and (2.4) respectively and using Glick s theorem, we get g f θ dµ lim inf g n f θn dµ, and lim sup g n f θn dµ g f θ dµ. n n Hence we get g n f θn dµ a.s g f θ dµ. Now applying Pratt s lemma for each ω B c and using the above relations, we get, as n, ρ C (g n, f θn ) a.s ρ C (g, f θ ). Similarly, one gets that ρ C (g, f θn ) converges almost surely to ρ C (g, f θ ) as n. Therefore by Theorem 1 of Iséki (1957), we have as n, sup ρ C (g n, f θ ) ρ C (g, f θ ) a.s 0. θ Θ By Theorem 5.7 of van der Vaart (1998), we get that any minimizer ˆθ n is strongly consistent for θ g. Remark 2 Under the assumption that g = f θ0 and that the model is identifiable (i.e, f θ1 = f θ2 if and only if θ 1 = θ 2 ), the minimizer θ g is unique and is given by θ 0. In general, it is easy to modify the proof of the above theorem to show that an approximate minimizer θ n satisfying ρ C (g n, f θn ) < ε n + inf θ Θ ρ C(g n, f θ ), is also strongly consistent for θ g. ε n a.s. 0, Remark 3 The above theorem is based on minimal assumptions. It does not even require the disparity generating function to be differentiable and the non-parametric density estimator to be the kernel density estimator. It only requires the density estimator to satisfy pointwise strong consistency. An elaborate discussion of different density estimators can be found in Prakasa Rao (1983). Various consistent estimators of density are available in the literature in many settings including i.i.d., censored and different types of dependent data. Some of these methods and references are mentioned in Table 1 (presented in the Supplementary Material); an examination of these settings demonstrates the generality of the grand consistency theorem which is applicable in all these cases. It is applicable even when Wald s consistency theorem, which requires log f θ (x) K(x) for some integrable function K, for maximum likelihood estimator is not applicable. See Chapter 17 of Ferguson (1996).

8 8 Arun Kumar Kuchibhotla, Ayanendranath Basu Remark 4 Theorem 2.1 only requires the parameter space to be a metric space (not necessarily an Euclidean space). Also, we do not require that the densities have to exist with respect to the Lebesgue measure and so the theorem is generally applicable for any class of densities with respect to any (a priori known) σ-finite measure. This theorem also treats the case of dependent observations provided a strongly consistent density estimator is available. In this respect, we observe that the existence of a (weakly) consistent density estimator implies that the minimum disparity estimator is (weakly) consistent. This theorem also does not require the observations to be real or R d valued. Remark 5 The only limitation (if at all) of the conditions above is that it requires the parameter space to be compact which can be relaxed to locally compact spaces if the actual disparity is a metric in the mathematical sense of the term. In this case, using the triangle inequality one gets the uniform convergence of the objective function proving the result. See Cheng and Vidyashankar (2006) for details in the case of the minimum Hellinger distance estimator. Also see remarks following Theorem 3.1 and Corollary 3.3 of Park and Basu (2004) for a discussion on accommodating some non-compact parameter spaces. Condition (C2) on the disparity generating function leads to an integrable upper bound on the integrand of the disparity. This condition is the only requirement on the disparity for Theorem 4.1 of Park and Basu (2004) which proves that the minimum disparity estimator under (C2) has an asymptotic breakdown point of at least 1/2. Thus all the minimum disparity estimators under assumption (C2) are strongly consistent and have asymptotic breakdown point of at least 1/2. Remark 6 Another condition under which consistency of the minimum disparity estimator can be proved was presented in Park and Basu (2004). These authors require the boundedness of the derivative of C. One can, instead, also use the boundedness of the sub-gradient of C and allow non-differentiability of the disparity generating function. This is because of the inequalities C(δ 1 ) C(δ 2 ) Ċ(δ 2)(δ 1 δ 2 ), C(δ 2 ) C(δ 1 ) Ċ(δ 1)(δ 2 δ 1 ), where Ċ represents any element of the sub-differential set. Thus, by strong consistency of the non-parametric density estimator and Glick s theorem, we get uniform convergence of the objective function implying the strong consistency of the minimum disparity estimator. Thus the differentiability condition of Park and Basu (2004) is avoidable and the boundedness requirement of the derivative can be replaced by the boundedness of a sub-gradient. The disparity between the distributions G, F G can also be calculated directly, without requiring the dominating measure, by using the alternative expression ( ) G(A) ϕ C (G, F ) = sup C F (A) 1 F (A), (2.5) D A A D

9 On The Asymptotics of Minimum Disparity Estimation 9 where A is a countably generated σ-algebra and the supremum extends over all finite partitions D A of the observation space. Here we have used G, F to represent the corresponding probability measures. See Liese and Vajda (1987) for more details. Note that ϕ C (G, F ) reduces to ρ C (g, f) when the densities g, f exist with respect to a common dominating measure. Remark 7 If we consider an increasing sequence of measurable decompositions D m of R d, then Equation (2.5) can be written as ( ) G(A) ρ C (G, F ) = lim m F (A) 1 F (A). A D m C This leads to an approximation of the disparity which can be used to avoid the non-parametric density estimation. Consider, for example, the decomposition sequence D (n) of R defined by the intervals [X i h n, X i + h n ), i = 1, 2,..., n, where h n = min i j X i X j /2. In this case, A D (n) C ( ) Gn (A) F (A) 1 F (A) = 1 n n i=1 ( ) 1 C 1 T i,n = 1 T i,n n n ξ(t i,n ), where ξ(x) = xc({1/x} 1) which is also a convex function and T i,n = nf [X i h n, X i + h n ). One can minimize this quantity instead of the integral form of the disparity. This method coincides exactly with the generalized spacings based estimation methodology of Ghosh and Jammalamadaka (2001). This remark is similar in spirit to Remark 1 of Györfi et al. (1994). See Remark 13 for some further comments in this connection. Also see Ekström (2001). Under differentiability of the model, thrice differentiability of the function C and the assumption that ˆθ n lies in the interior of Θ, ˆθ n can be obtained as a root of the equation A(δ n (x)) f θ (x)dµ(x) = 0, (2.6) where represents the gradient with respect to θ and A(δ) = C (δ)(δ + 1) C(δ). Let 2 denote the second derivative with respect to θ. The function A( ) is called the residual adjustment function (RAF) of the disparity and δ n = (g n /f θ ) 1 is the Pearson residual. An RAF is regular if A (δ) and A (δ)(δ+1) are bounded for δ [ 1, ). The RAF plays a vital role in the robustness properties of the estimator. The minimum disparity estimator need not always satisfy the estimating equation, for example, when the minimum is attained on the boundary of the parameter space or when C is not differentiable. i=1 3 Quadratic Approximation A popular and a powerful technique of proving asymptotic normality (or generally finding the asymptotic distribution) of maximizers or minimizers is to

10 10 Arun Kumar Kuchibhotla, Ayanendranath Basu prove a uniform quadratic approximation of the objective function. In the case of maximum likelihood estimation, local asymptotic normality (LAN) of the parametric model leads to a quadratic approximation of the log-likelihood and thus gives asymptotic normality of the maximum likelihood estimator. We will now follow the same approach in proving the asymptotic normality of the minimum disparity estimator. In this section, we prove an asymptotic representation of the estimator under the condition that θ g is an interior point of Θ. In case θ g Θ 0, the best fitting parameter θ g satisfies, under differentiability of f θ with respect to θ and differentiability of C, the equation ( ) g(x) A f θg (x) 1 f θg (x)dµ(x) = 0. (3.1) To establish uniform quadratic approximation of the disparity objective function, assume that Θ R p and define θ n = θ g + n 1/2 w Θ, and Λ n (w) = nρ C (g n, f θn ) nρ C (g n, f θg ). Let u θ (x) = log f θ (x) denote the usual likelihood score function. Consider the following assumptions, (A1) The residual adjustment function A is regular, i.e, the functions A ( ) and A ( )( + 1) are both bounded, say, by M; (A2) The parameter space Θ R p (for some p 1) is bounded and θ g Θ 0 ; (A3) Densities in the family F Θ are all twice continuously differentiable with respect to θ. Also, there exists a compact neighbourhood Θ g of θ g and a function M 0 such that the functions u θ, u θ and u θ u θ are all bounded by M 0 uniformly in θ Θ g and M 0 (X) has finite expectation. The function B(θ) is positive definite and is finite, where B(θ) := A (δ)(δ + 1)u θ u θ f θ dµ A(δ) 2 f θ dµ; (A4) The density estimator sequence {g n } also satisfies, almost surely as n, g n (x)m 0 (x)dµ(x) g(x)m 0 (x)dµ(x). (3.2) These assumptions are not very restrictive. Assumption (A1) is the very condition under which Lindsay (1994) presented his disparity framework in discrete models. Assumption (A2) is needed for applying Taylor series expansion around θ g. Assumption (A3) is required to prove uniform convergence of a quantity B n (θ) defined in the proof below. This assumption is one of the Cramér-Rao conditions required to prove asymptotic normality of the maximum likelihood estimator and is satisfied by the exponential family of densities. One could use separate bounds in (A3) for the three functions involved but this provides no theoretical advantage. Assumption (A4) can be seen as a generalization of the strong law of large numbers and holds by Kolmogorov s strong law of large numbers if the base measure is the counting measure and

11 On The Asymptotics of Minimum Disparity Estimation 11 g n is replaced by the usual proportions based estimator of a discrete density. Some simpler conditions under which assumption (A4) holds were presented in Theorem of van der Vaart and Wellner (1996) and in Zapa la (2008). The right hand side of Equation (3.2) is finite by assumption (A3). In particular cases like kernel density estimator, orthogonal series based estimator or delta sequence based estimator, assumption (A4) is not difficult to verify. Theorem 3.1 Under the assumptions (A1) - (A4), Λ n (w) = n 1/2 w A(δ n ) f θg dµ w B(θ g )w + o p (1), holds uniformly in w {w : w K} for every finite K. Here represents the Euclidean norm in R p. Proof First note that ( ) ( gn C 1 f θ = C gn 1 f θ f θ and Define B n (θ) = ) gn f θ f θ + C ( ) gn 1 f θ = A(δ n ) f θ, f θ [A(δ n ) f θ ] = A (δ n )(δ n + 1)u θ u θ A(δ n ) 2 f θ. A (δ n )(δ n + 1)u θ u θ dµ A(δ n ) 2 f θ dµ =: S 2 S 1. We will first prove that B n (θ) converges in probability to B(θ) uniformly in θ Θ g which will prove that sup B n (θ g + n 1/2 w) B n (θ g ) 2 sup B n (θ) B(θ) = o p (1). w: w K θ Θ g Here we use the fact that for any finite K, there exists a large enough n so that θ g + n 1/2 w Θ g for all w K. To deal with S 1, by assumption (A1) and Lemma 25 of Lindsay (1994), [A(δ n ) A(δ)] 2 f θ dµ A(δ n ) A(δ) 2 f θ dµ A(δ n ) A(δ) (δ n δ)a (δ) 2 f θ dµ + A (δ) g g n 2f θ dµ f θ ( B g 1/2 n g 1/2) 2 2 f θ dµ f θ + M g n g 2f θ dµ f θ (B + M) g n g 2f θ f θ dµ, (3.3)

12 12 Arun Kumar Kuchibhotla, Ayanendranath Basu because for any a, b > 0, ( a b) 2 a b. Since, g n (x) g(x) 2f θ f θ (g n (x) + g(x))m 0 (x), and (g n (x) + g(x))m 0 (x)dµ(x) a.s 2E[M 0 (X)], the quantity in inequality (3.3) converges almost surely to zero uniformly as n by generalized Pratt s lemma for random functions, Theorem 1 of Iséki (1957), and assumptions (A3) and (A4). Now consider S 2. Observe that {A (δ n )(δ n + 1) A (δ)(δ + 1)}u θ u θ f θ dµ [A (δ n ) A (δ)] (δ n + 1)u θ u θ f θ dµ + A (δ)(δ n δ)u θ u θ f θ dµ. (3.4) Note that, using assumption (A1), we have [A (δ n ) A (δ)] (δ n + 1)u θ u θ f θ 2Mg n (x)m 0 (x), A (δ)(g n g)u θ u θ M(g n (x) + g(x))m 0 (x). By (A1), Theorem 1 of Iséki (1957) and generalized Pratt s lemma along with the assumptions (A3), (A4), the terms on the right hand side of inequality (3.4) converge almost surely to zero uniformly on Θ g as n. Getting back to Λ n (w), we have, by a Taylor series expansion, { ( gn Λ n (w) = n C 1 f θn = n ) A(δ n )n 1/2 w f θg dµ + n 2 ( ) } g f θn C 1 f θg dµ f θg { n 1 w B n (θ )w }, where θ lies on the line joining θ g and θ g + n 1/2 w and so converges to θ g as n. By uniform convergence of B n (θ) to B(θ), we get Λ n (w) = n 1/2 w A(δ n ) f θg dµ w B(θ g )w + o p (1), (3.5) uniformly on {w R p : w K}. This quadratic approximation can be used to prove n 1/2 -consistency of the minimizer ˆθ n leading to an asymptotic representation.

13 On The Asymptotics of Minimum Disparity Estimation 13 Theorem 3.2 If A(δ n (x)) f θg (x)dµ(x) = O p (n 1/2 ) and ˆθ P n θ0, then the minimizer ˆθ n satisfies the equation ] n 1/2 (ˆθ n θ g ) = B 1 (θ g ) [n 1/2 A(δ n (x)) f θg (x)dµ(x) + o p (1). (3.6) Proof Note that the minimizer of Λ n (w) is n 1/2 (ˆθ n θ g ). Equation (3.5) proves that Λ n (w) and the quadratic term on the right hand side are close as stochastic processes indexed by w R p. Intuitively, this implies Equation (3.6). A rigorous argument follows the approach of Lemma 5.4 of Ichimura (1993). A close examination of the proof of Theorem 3.1 shows that ρ C (g n, fˆθn ) ρ C (g n, f θg ) = [ A(δ n (x)) f θg (x)dµ(x)] (ˆθ n θ g ) (ˆθ n θ g ) B(θ g )(ˆθ n θ g ) + o p ( ˆθ n θ g 2 ). Here represents the Euclidean norm and by convergence in probability of ˆθ n, there exists a N such that ˆθ n Θ g for all n N. Now following the proof of Lemma 5.4 of Ichimura (1993), we get that ˆθ n θ 0 = O p (n 1/2 ) and Equation (3.6). Remark 8 We do not require densities with respect to the Lebesgue measure. Nor do we require the observations to be independent. Remark 9 While we believe that our exposition adds substantially to the literature on disparities and minimum disparity estimators, it is important to recognize what has been left out. Our conditions (C2) and (A1) is not satisfied by the members of the Cressie-Read (Cressie and Read, 1984) family of disparities. We trust that this is compensated by the large number of disparities, including many that generate extremely robust estimators and tests, which do satisfy our conditions. We provide a more detailed discussion of this issue including a partial list of subclasses of disparities which satisfy assumptions (C2) and (A1), in the Supplementary Material. 4 Asymptotic Normality In this section, we present a general theorem proving asymptotic normality of the estimating function, A(δ n (x)) f θ (x)dµ(x), at θ = θ g, with some rate assumptions on the density estimator without assuming any particular form of the density estimator. Consider the following assumption.

14 14 Arun Kumar Kuchibhotla, Ayanendranath Basu (B1) The density estimator in conjunction with the parametric model F Θ satisfies n 1/2 A (δ) {g n (x) g(x)} u θg (x)dµ(x) L N(0, V (θ g )), for some positive definite matrix V (θ g ) and n 1/2 {g 1/2 n (x) g 1/2 (x)} 2 u θg (x) dµ(x) = o p (1). In the case of i.i.d. data, the first part of assumption (B1) readily holds if the base measure is the counting measure and the density estimator g n is the usual proportion based estimator; the conditions under which the second part holds were given in Lindsay (1994). If the base measure is the Lebesgue measure and the density estimator is the kernel density estimator, then assumption (B1) was verified through the proofs presented in Park and Basu (2004). Theorem 4.1 Under the assumptions (A1) and (B1), n 1/2 ( A(δ n ) f θg dµ ) L A(δ) f θg dµ N(0, V (θ g )). Proof The centering on the left hand side is actually zero by (3.1). Define S n (x) = A(δ n (x)) A(δ(x)) (δ n (x) δ(x))a (δ(x)). Observe that A(δ n ) f θ dµ A(δ) f θ dµ = + A (δ)(δ n δ) f θ (x)dµ S n (x) f θ (x)dµ. By Lemma 25 of Lindsay (1994), we have that S n (x) f θg (x)dµ(x) S n (x) f θg (x) dµ(x) ( ) B g 1/2 2 1/2 g n dµ = o uθg p(n 1/2 ), by assumption (B1). Another application of (B1) n 1/2 ( A(δ n ) f θg dµ ) L A(δ) f θg dµ N(0, V (θ g )).

15 On The Asymptotics of Minimum Disparity Estimation 15 Remark 10 This theorem, combined with Equation (3.1), Theorem 3.6 and the conditions implied therein imply that n 1/2 (ˆθ n θ g ) L N(0, B 1 (θ g )V (θ g )B 1 (θ g )). (4.1) Thus our approach essentially subsumes that of Lindsay (1994) and provides a unified framework of minimum disparity estimation in discrete and continuous models. This approach also includes the regression case solved recently by Hooker (2016) with relaxed assumptions in a more compact manner. Remark 11 When the base measure is the Lebesgue measure, the non-parametric density estimator involved is a kernel density estimator with bandwidth sequence h n, given by g n (x) = 1 nh n n ( ) x Xi K, i=1 and the random sample is an i.i.d. sample from a univariate distribution, we get, using assumptions of Theorem 3.4 of Park and Basu (2004), that h n V (θ g ) = Var(A (δ(x))u θg (X)). See also Cheng and Vidyashankar (2006) and Remark 13. Note that if g = f θ0, then θ g = θ 0 and V (θ 0 ) = B(θ 0 ) = I(θ 0 ) which is the Fisher information matrix. This proves that n 1/2 (ˆθ n θ 0 ) L N(0, I 1 (θ 0 )), which implies first order efficiency of the minimum disparity estimator. Actually, if the non-parametric density estimator is the kernel density estimator, one can show more, namely that under g = f θ0, n 1/2 (ˆθ n θ 0 ) = 1 n 1/2 I 1 (θ 0 ) n u θ0 (X i ) + o p (1), (4.2) which in turn shows that the maximum likelihood estimator and the minimum disparity estimator are asymptotically equivalent at the parametric model. Remark 12 The limit theorem (4.1) has an implicit advantage over the previous asymptotic normality results of Lindsay (1994), Park and Basu (2004) and Kuchibhotla and Basu (2015) in the sense that these authors prove that there exists a sequence of roots of the estimating equations which is strongly consistent and asymptotically normal but do not specify which roots. Our theorem proves that the minimizer of the disparity, which is also a root of the estimating equation, is strongly consistent and asymptotically normal. Remark 13 We thus obtain the first order efficiency of the minimum disparity estimator obtained from the integral form of the disparity. Note that the approximation method mentioned in Remark 7 leads to an inefficient estimator in general. See Theorem 3.2 of Ghosh and Jammalamadaka (2001). i=1

16 16 Arun Kumar Kuchibhotla, Ayanendranath Basu 5 Tests of Hypothesis A popular and widely used statistical tool for the hypothesis testing problem is the likelihood ratio test. The likelihood ratio test statistic can be viewed as the difference between the minimum of the likelihood disparity under the null and that without any constraint. Under certain regularity conditions, the likelihood ratio test enjoys some asymptotic optimality properties. However, as in the case of the maximum likelihood estimator, the likelihood ratio test exhibits poor robustness properties in many cases. As an alternative to the likelihood ratio test, Simpson (1989) introduced the Hellinger deviance test which was later generalized to disparity difference tests, in a unified way; see eg., Lindsay (1994) and Basu et al. (2011). However, the disparity difference test statistic can become potentially negative when using an arbitrary root of the estimating equation rather than the global minimizer. The set up under which we deal with the problem of hypothesis testing is as follows. We assume the parametric set up of Section 2 and let identically distributed random variables X 1, X 2,..., X n be available from the true distribution G. We assume the equivalence presented in Equation (4.2), which is satisfied, for example, if the non-parametric density estimator is the kernel density estimator. The hypothesis testing problem under consideration is H 0 : θ Θ 0 and H 1 : θ Θ \ Θ 0, for a proper subset Θ 0 of Θ. As an analogue of the likelihood ratio test, define the test statistic, [ ] W C (g n ) := 2n ρ C (g n, ) ρ fˆθ0 C (g n, fˆθ), (5.1) where ˆθ and ˆθ 0 denote the unrestricted minimizer of ρ C (g n, f θ ) and the minimizer under the constraint of θ Θ 0 and g n is the kernel density estimate. We will now present the main theorem of this section which establishes the asymptotic distribution of W C. Theorem 5.1 Under the model f θ0, θ 0 Θ 0 and assumptions (A1) - (A4) and (B1), the limiting null distribution of the test statistic W C (g n ) is χ 2 r, where r is the number of restrictions imposed by the null hypothesis H 0. Proof A Taylor series expansion of ρ C (g n, fˆθ0 ) in θ around ˆθ, gives [ ] W C (g n ) = 2n ρ C (g n, ) ρ fˆθ0 C (g n, fˆθ) [ = 2n (ˆθ 0 ˆθ) ] ρ C (g n, fˆθ) [ ] 1 + 2n 2 (ˆθ 0 ˆθ) 2 ρ C (g n, f θ )(ˆθ 0 ˆθ),

17 On The Asymptotics of Minimum Disparity Estimation 17 where θ belongs to the line joining ˆθ 0 and ˆθ. Note that the first term in the last expression is zero as ˆθ is the minimizer of ρ C over Θ. So, we only need to deal with the second term in the expansion. Now [ W C (g n ) = n (ˆθ 0 ˆθ) I(θ 0 )(ˆθ 0 ˆθ) ] [ + n (ˆθ 0 ˆθ) { 2 ρ C (g n, f θ ) I(θ 0 )}(ˆθ 0 ˆθ) ]. (5.2) Under the model f θ0, n 1/2 (ˆθ 0 θ 0 ) and n 1/2 (ˆθ θ 0 ) are both O p (1) provided the null is true. Thus, n 1/2 (ˆθ 0 ˆθ) = O p (1) in this case. As in Theorem 3.5, 2 ρ C (g n, f θ ) = B n (θ) converges to B(θ) uniformly in θ Θ g. Note that B(θ 0 ) = I(θ 0 ) under g = f θ0. Since ˆθ 0 θ 0 = o p (1) and ˆθ θ 0 = o p (1), θ Θ g for large enough n and so 2 ρ C (g n, f θ ) I(θ 0 ) 2 ρ C (g n, f θ ) B(θ ) + B(θ ) I(θ 0 ) sup θ Θ g 2 ρ C (g n, f θ ) B(θ) + B(θ ) I(θ 0 ) P 0. Hence, by the arguments above, the second term on the right hand side of Equation (5.2) converges in probability to zero. By Equation (4.2), we have n 1/2 (ˆθ ˆθ 0 ) = n 1/2 (ˆθ ML ˆθ 0,ML ) + o p (1), where ˆθ ML and ˆθ 0,ML are the unrestricted and constrained maximum likelihood estimators. Hence, W C (g n ) is equivalent to the likelihood ratio test statistic under the model f θ0 in the sense that [ W C (g n ) n (ˆθ 0,ML ˆθ ML ) I(θ 0 )(ˆθ 0,ML ˆθ ] ML ) = o p (1). (5.3) From the theory of likelihood ratio test, we conclude that W C converges in distribution to a χ 2 r as n as stated. See Serfling (1980, Section 4.4.4) for a complete discussion on likelihood ratio test. Theorem 5.2 The conditions of Theorem 5.1 and the additional assumption that the parametric family F θ satisfies the local asymptotic normality (LAN) condition imply that under f θn, W C (g n ) 2 n i=1 as n, where θ n = θ 0 + τn 1/2. [ log fˆθml (X i ) log fˆθ0,ml (X i )] P 0, Proof Under the assumptions of Theorem 5.1, Equation (5.3) implies the stated claim under f θ0, since the Wald test statistic is equivalent to the likelihood ratio test statistic under the null. See Serfling (1980, pg ) for details. By LAN condition, we have that f θn is contiguous to f θ0 and so convergence in probability under f θ0 implies convergence in probability under f θn. Thus the test given by statistic W C (g n ) has the same asymptotic contiguous power as the likelihood ratio test under the conditions of this theorem.

18 18 Arun Kumar Kuchibhotla, Ayanendranath Basu The following theorem explores the stability of the limiting distribution of the test statistic W C (g n ) under contamination. For this theorem the null hypothesis under consideration is H 0 : θ g = θ 0, where the unknown true distribution G may or may not be in the model. Theorem 5.3 Under assumptions (A1) - (A4) and (B1), under the null hypothesis, the representation W C (g n ) Y 2n = Y 1 + o p (1) holds, where Y 1 χ 2 p and lim g fθ0 Y 2n = 0 for any C. Here by g f θ0, we mean convergence in the L 1 sense. The rate at which Y 2n converges to 0 depends on the form of C. Proof The proof closely follows the proof of Theorem 5.1. As in Theorem 5.1, we get by Taylor series expansion of the test statistic around ˆθ n, [ ] W C (g n ) = 2n ρ C (g n, f θ0 ) ρ C (g n, ) fˆθn = n(θ 0 ˆθ n ) 2 ρ C (g n, f θ )(θ 0 ˆθ n ), where θ belongs to the line joining θ and θ 0. As in Theorem 3.1, 2 ρ C (g n, f θ ) converges in probability to B(θ 0 ) under the null hypothesis. Hence, we have Note that W C (g n ) = n(θ 0 ˆθ n ) B(θ 0 )(θ 0 ˆθ n ) + o p (1). B(θ 0 ) = B(θ 0 )V 1 (θ 0 )B(θ 0 ) B(θ 0 ) [ V 1 (θ 0 ) B 1 (θ 0 ) ] B(θ 0 ). By Remark 10, we get n(ˆθ n θ 0 )B(θ 0 )V 1 (θ 0 )B(θ 0 )(ˆθ n θ 0 ) = Y 1 + o p (1), where Y 1 χ 2 p. The remaining term given by Y 2n = n(ˆθ n θ 0 )B(θ 0 ) [ V 1 (θ 0 ) B 1 (θ 0 ) ] B(θ 0 )(ˆθ n θ 0 ), becomes zero if g = f θ0 and stays close to zero as g f θ0. Remark 14 This result extends Theorem 6 of Lindsay (1994), which considered the case of a scalar parameter. In our case, if p = 1, both B = B(θ 0 ) and V = V (θ 0 ) are scalars, so that W C (g n ) = V B X n + o p (1), L where X n χ 2 1 under H 0. Thus V/B, as a function of the true density g, and the disparity generating function C( ) represents the inflation in the χ 2 distribution, and can be legitimately called the χ 2 inflation factor. This is exactly the same as the inflation factor described in Theorem 6, part (ii) of Lindsay (1994). When g = f θ0 is the true distribution, V = B so that there is

19 On The Asymptotics of Minimum Disparity Estimation 19 no inflation. However, when the true distribution is a point mass mixture contamination Lindsay (1994) demonstrated, using the binomial model, that the inflation factor for the likelihood ratio test rises sharply with the contamination proportion, whereas for the Hellinger deviance test this rise is significantly dampened in comparison. Our calculations in the normal mean model exhibit improvements of similar order between the likelihood ratio test and other robust tests, although we do not present the actual numbers here. In the multidimensional case, however, the relation is not so simple as now it requires comparison between the matrices B(θ 0 )V 1 (θ 0 )B(θ 0 ) and B(θ 0 ). It could be of interest to develop a single quantitative measure of inflation for the multidimensional case in the future. 6 Conclusions We have proved, under different sets of regularity conditions, strong consistency, asymptotic representation of the minimum disparity estimator, asymptotic normality of the estimating function and the asymptotic normality of the estimator. All these results except the grand consistency theorem require a thrice differentiable C. It is possible to prove an asymptotic representation for non-differentiable C. Using the equality, x y x = y [ ] y [ ] 1 {x>0} 1 {x<0} +2 1{x s} 1 {x 0} ds for x 0, it is not difficult to prove that [ n g n (x) f θn (x) dx = w [ n 1/2 1 2 w 0 ] g n (x) f θ (x) dµ(x) f θ (x) [ ] ] 1 {gn(x) f θ (x)>0} 1 {gn(x) f θ (x)<0} dµ(x) 2 f θ (x) [ 1 {gn(x) f θ (x)>0} 1 {gn(x) f θ (x)<0}] dµ(x)w + op (1). Here θ n = θ + n 1/2 w and the o p (1) is uniform in {w : w < K} for every finite K. Thus, proving asymptotic normality of the quantity in braces proves asymptotic normality of the L 1 -based parameter estimate. Another method of proving asymptotic normality of an estimator based on a non-differentiable objective function is to approximate it by a smooth objective function as in Amemiya (1982). Further exploration of these methods may lead to the development of a set up which accommodates non-differentiable C functions within the fold of minimum disparity estimation. Another possible generalization is to use quadratic mean differentiability of densities f θ and include non-regular families into the framework. However, the commonly used kernel density estimators may not be good estimators in this case. Also, one can consider various dependent data settings; see Table 1 in

20 20 Arun Kumar Kuchibhotla, Ayanendranath Basu the Supplementary Material. Using these estimators and the asymptotic representation derived in this paper, one might derive the asymptotic distribution of the minimum disparity estimator. Finally, we mention that the above results can be proved in a similar manner if one uses a Bayesian non-parametric density estimator following the techniques of Wu and Hooker (2013). Acknowledgements. The authors dedicate this work to the memory of Professor Bruce G. Lindsay. The authors also thank four anonymous reviewers whose comments led to an improved version of the manuscript. References Ali, S. M. and Silvey, S. D. (1966). A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. Ser. B, 28: Amemiya, T. (1982). Two stage least absolute deviations estimators. Econometrica, 50(3):pp Basu, A. and Lindsay, B. G. (1994). Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Statist. Math., 46(4): Basu, A., Sarkar, S., and Vidyashankar, A. N. (1997). Minimum negative exponential disparity estimation in parametric models. J. Statist. Plann. Inference, 58(2): Basu, A., Shioya, H., and Park, C. (2011). Statistical inference:the minimum distance approach. CRC Press, Boca Raton, FL. Beran, R. (1977). Minimum Hellinger distance estimates for parametric models. Ann. Statist., 5(3): Cheng, A.-l. and Vidyashankar, A. N. (2006). Minimum Hellinger distance estimation for randomized play the winner design. J. Statist. Plann. Inference, 136(6): Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B, 46(3): Csiszár, I. (1963). Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl., 8: Devroye, L. and Györfi, L. (1985). Nonparametric density estimation. Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics. John Wiley & Sons, Inc., New York. The L 1 view. Ekström, M. (2001). Consistency of generalized maximum spacing estimates. Scand. J. Statist., 28(2): Ferguson, T. S. (1982). An inconsistent maximum likelihood estimate. J. Amer. Statist. Assoc., 77(380): Ferguson, T. S. (1996). A course in large sample theory. Texts in Statistical Science Series. Chapman & Hall, London. Fryer, J. G. and Robertson, C. A. (1972). A comparison of some methods for estimating mixed normal distributions. Biometrika, 59(3):pp

21 On The Asymptotics of Minimum Disparity Estimation 21 Ghosh, K. and Jammalamadaka, S. R. (2001). A general estimation method using spacings. J. Statist. Plann. Inference, 93(1-2): Gut, A. (2013). Probability: a graduate course. Springer Texts in Statistics. Springer, New York, second edition. Györfi, L., Vajda, I., and van der Meulen, E. (1994). Minimum Hellinger distance point estimates consistent under weak family regularity. Math. Methods Statist., 3(1): Hooker, G. (2016). Consistency, efficiency and robustness of conditional disparity methods. Bernoulli, 22(2): Ichimura, H. (1993). Semiparametric least squares (sls) and weighted sls estimation of single-index models. J. Econometrics, 58(1-2): Iséki, K. (1957). A theorem on continuous convergence. Proc. Japan Acad., 33: Kuchibhotla, A. K. and Basu, A. (2015). A general set up for minimum disparity estimation. Statist. Probab. Lett., 96: Liese, F. and Vajda, I. (1987). Convex statistical distances, volume 95 of Teubner Texts in Mathematics. BSB B. G. Teubner Verlagsgesellschaft. Lindsay, B. G. (1994). Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Statist., 22(2): Morales, D., Pardo, L., and Vajda, I. (1995). Asymptotic divergence of estimates of discrete distributions. J. Statist. Plann. Inference, 48(3): Pardo, L. (2006). Statistical inference based on divergence measures, volume 185 of Statistics: Textbooks and Monographs. Chapman & Hall/CRC. Park, C. and Basu, A. (2004). Minimum disparity estimation: asymptotic normality and breakdown point results. Bull. Inform. Cybernet., 36: Prakasa Rao, B. L. S. (1983). Nonparametric functional estimation. Probability and Mathematical Statistics. Academic Press, Inc., New York. Rao, C. R. (1961). Asymptotic efficiency and limiting information. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, pages Univ. California Press, Berkeley, Calif. Robertson, C. A. (1972). On minimum discrepancy estimators. Sankhyā: The Indian Journal of Statistics, Series A, 34(2):pp Serfling, R. J. (1980). Approximation theorems of mathematical statistics. John Wiley & Sons, Inc., New York. Simpson, D. G. (1989). Hellinger deviance tests: efficiency, breakdown points, and examples. J. Amer. Statist. Assoc., 84(405): Vajda, I. (1989). Theory of statistical inference and information. Theory and decision library: Mathematical and statistical methods. van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge University Press, Cambridge. van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. Wu, Y. and Hooker, G. (2013). Hellinger distance and bayesian nonparametrics: Hierarchical models for robust and efficient bayesian inference. ArXiv e-prints:

22 22 Arun Kumar Kuchibhotla, Ayanendranath Basu Zapa la, A. M. (2008). Unbounded mappings and weak convergence of measures. Statist. Probab. Lett., 78(6):

Minimum Hellinger Distance Estimation with Inlier Modification

Minimum Hellinger Distance Estimation with Inlier Modification Sankhyā : The Indian Journal of Statistics 2008, Volume 70-B, Part 2, pp. 0-12 c 2008, Indian Statistical Institute Minimum Hellinger Distance Estimation with Inlier Modification Rohit Kumar Patra Indian

More information

Literature on Bregman divergences

Literature on Bregman divergences Literature on Bregman divergences Lecture series at Univ. Hawai i at Mānoa Peter Harremoës February 26, 2016 Information divergence was introduced by Kullback and Leibler [25] and later Kullback started

More information

ON A CLASS OF PERIMETER-TYPE DISTANCES OF PROBABILITY DISTRIBUTIONS

ON A CLASS OF PERIMETER-TYPE DISTANCES OF PROBABILITY DISTRIBUTIONS KYBERNETIKA VOLUME 32 (1996), NUMBER 4, PAGES 389-393 ON A CLASS OF PERIMETER-TYPE DISTANCES OF PROBABILITY DISTRIBUTIONS FERDINAND OSTERREICHER The class If, p G (l,oo], of /-divergences investigated

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir Korean J. Math. 6 (018), No. 3, pp. 349 371 https://doi.org/10.11568/kjm.018.6.3.349 INEQUALITIES FOR QUANTUM f-divergence OF CONVEX FUNCTIONS AND MATRICES Silvestru Sever Dragomir Abstract. Some inequalities

More information

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview Symmetry 010,, 1108-110; doi:10.3390/sym01108 OPEN ACCESS symmetry ISSN 073-8994 www.mdpi.com/journal/symmetry Review Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Improvements in the Small Sample Efficiency of the Minimum S-Divergence Estimators under Discrete Models arxiv: v1 [stat.

Improvements in the Small Sample Efficiency of the Minimum S-Divergence Estimators under Discrete Models arxiv: v1 [stat. Improvements in the Small Sample Efficiency of the Minimum S-Divergence Estimators under Discrete Models arxiv:1702.03557v1 [stat.me] 12 Feb 2017 Abhik Ghosh and Ayanendranath Basu Indian Statistical Institute,

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

THEOREM AND METRIZABILITY

THEOREM AND METRIZABILITY f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called

More information

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis Jon A. Wellner and Vladimir Koltchinskii Abstract. Proofs are given of the limiting null distributions of the

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky EMPIRICAL ENVELOPE MLE AND LR TESTS Mai Zhou University of Kentucky Summary We study in this paper some nonparametric inference problems where the nonparametric maximum likelihood estimator (NPMLE) are

More information

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

A Statistical Distance Approach to Dissimilarities in Ecological Data

A Statistical Distance Approach to Dissimilarities in Ecological Data Clemson University TigerPrints All Dissertations Dissertations 5-2015 A Statistical Distance Approach to Dissimilarities in Ecological Data Dominique Jerrod Morgan Clemson University Follow this and additional

More information

arxiv: v4 [math.st] 16 Nov 2015

arxiv: v4 [math.st] 16 Nov 2015 Influence Analysis of Robust Wald-type Tests Abhik Ghosh 1, Abhijit Mandal 2, Nirian Martín 3, Leandro Pardo 3 1 Indian Statistical Institute, Kolkata, India 2 Univesity of Minnesota, Minneapolis, USA

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Gaussian Estimation under Attack Uncertainty

Gaussian Estimation under Attack Uncertainty Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Econometrica Supplementary Material

Econometrica Supplementary Material Econometrica Supplementary Material SUPPLEMENT TO USING INSTRUMENTAL VARIABLES FOR INFERENCE ABOUT POLICY RELEVANT TREATMENT PARAMETERS Econometrica, Vol. 86, No. 5, September 2018, 1589 1619 MAGNE MOGSTAD

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals Acta Applicandae Mathematicae 78: 145 154, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 145 Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals M.

More information

Czechoslovak Mathematical Journal

Czechoslovak Mathematical Journal Czechoslovak Mathematical Journal Oktay Duman; Cihan Orhan µ-statistically convergent function sequences Czechoslovak Mathematical Journal, Vol. 54 (2004), No. 2, 413 422 Persistent URL: http://dml.cz/dmlcz/127899

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators

A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators Robert M. de Jong Department of Economics Michigan State University 215 Marshall Hall East

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007 Hypothesis Testing Daniel Schmierer Econ 312 March 30, 2007 Basics Parameter of interest: θ Θ Structure of the test: H 0 : θ Θ 0 H 1 : θ Θ 1 for some sets Θ 0, Θ 1 Θ where Θ 0 Θ 1 = (often Θ 1 = Θ Θ 0

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

ON BOUNDEDNESS OF MAXIMAL FUNCTIONS IN SOBOLEV SPACES

ON BOUNDEDNESS OF MAXIMAL FUNCTIONS IN SOBOLEV SPACES Annales Academiæ Scientiarum Fennicæ Mathematica Volumen 29, 2004, 167 176 ON BOUNDEDNESS OF MAXIMAL FUNCTIONS IN SOBOLEV SPACES Piotr Haj lasz and Jani Onninen Warsaw University, Institute of Mathematics

More information

The Hilbert Transform and Fine Continuity

The Hilbert Transform and Fine Continuity Irish Math. Soc. Bulletin 58 (2006), 8 9 8 The Hilbert Transform and Fine Continuity J. B. TWOMEY Abstract. It is shown that the Hilbert transform of a function having bounded variation in a finite interval

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS

ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS Bull. Korean Math. Soc. 53 (2016), No. 2, pp. 351 363 http://dx.doi.org/10.4134/bkms.2016.53.2.351 ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS Sangyeol Lee Abstract. In this paper, we

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Institut für Mathematik

Institut für Mathematik U n i v e r s i t ä t A u g s b u r g Institut für Mathematik Christoph Gietl, Fabian P. Reffel Continuity of f-projections and Applications to the Iterative Proportional Fitting Procedure Preprint Nr.

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Minimum distance tests and estimates based on ranks

Minimum distance tests and estimates based on ranks Minimum distance tests and estimates based on ranks Authors: Radim Navrátil Department of Mathematics and Statistics, Masaryk University Brno, Czech Republic (navratil@math.muni.cz) Abstract: It is well

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Optimal Sequential Procedures with Bayes Decision Rules

Optimal Sequential Procedures with Bayes Decision Rules International Mathematical Forum, 5, 2010, no. 43, 2137-2147 Optimal Sequential Procedures with Bayes Decision Rules Andrey Novikov Department of Mathematics Autonomous Metropolitan University - Iztapalapa

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

ON A MAXIMAL OPERATOR IN REARRANGEMENT INVARIANT BANACH FUNCTION SPACES ON METRIC SPACES

ON A MAXIMAL OPERATOR IN REARRANGEMENT INVARIANT BANACH FUNCTION SPACES ON METRIC SPACES Vasile Alecsandri University of Bacău Faculty of Sciences Scientific Studies and Research Series Mathematics and Informatics Vol. 27207), No., 49-60 ON A MAXIMAL OPRATOR IN RARRANGMNT INVARIANT BANACH

More information

Citation Osaka Journal of Mathematics. 41(4)

Citation Osaka Journal of Mathematics. 41(4) TitleA non quasi-invariance of the Brown Authors Sadasue, Gaku Citation Osaka Journal of Mathematics. 414 Issue 4-1 Date Text Version publisher URL http://hdl.handle.net/1194/1174 DOI Rights Osaka University

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

University of Pavia. M Estimators. Eduardo Rossi

University of Pavia. M Estimators. Eduardo Rossi University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample

More information

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM J. Appl. Prob. 49, 876 882 (2012 Printed in England Applied Probability Trust 2012 A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM BRIAN FRALIX and COLIN GALLAGHER, Clemson University Abstract

More information

A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces

A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Entropy 04, 6, 5853-5875; doi:0.3390/e65853 OPEN ACCESS entropy ISSN 099-4300 www.mdpi.com/journal/entropy Article A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Silvestru Sever

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

MINIMUM HELLINGER DISTANCE ESTIMATION IN A SEMIPARAMETRIC MIXTURE MODEL SIJIA XIANG. B.S., Zhejiang Normal University, China, 2010 A REPORT

MINIMUM HELLINGER DISTANCE ESTIMATION IN A SEMIPARAMETRIC MIXTURE MODEL SIJIA XIANG. B.S., Zhejiang Normal University, China, 2010 A REPORT MINIMUM HELLINGER DISTANCE ESTIMATION IN A SEMIPARAMETRIC MIXTURE MODEL by SIJIA XIANG B.S., Zhejiang Normal University, China, 2010 A REPORT submitted in partial fulfillment of the requirements for the

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

WE start with a general discussion. Suppose we have

WE start with a general discussion. Suppose we have 646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be

More information