Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means
|
|
- Bethanie Farmer
- 5 years ago
- Views:
Transcription
1 Scandinavian Journal of Statistics Vol. 38: doi: /j x Published by Blackwell Publishing Ltd. Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means XIMING WU Department of Agricultural Economics Texas A&M University SUOJIN WANG Department of Statistics Texas A&M University ABSTRACT. We propose an information-theoretic approach to approximate asymptotic distributions of statistics using the maximum entropy (ME) densities. Conventional ME densities are typically defined on a bounded support. For distributions defined on unbounded supports we use an asymptotically negligible dampening function for the ME approximation such that it is well defined on the real line. We establish order n 1 asymptotic equivalence between the proposed method and the classical Edgeworth approximation for general statistics that are smooth functions of sample means. Numerical examples are provided to demonstrate the efficacy of the proposed method. Key words: asymptotic approximation Edgeworth approximation maximum entropy density 1. Introduction Asymptotic approximations to the distributions of estimators or test statistics are commonly used in statistical inferences such as approximate confidence limits for parameters of interest or p-values for hypothesis tests. According to the central limit theorem a statistic often has a limiting normal distribution under mild regularity conditions. The normal approximation however can be inadequate especially when the sample size is small. In this study we are concerned with using higher-order approximation methods to approximate the distributions of statistics with limiting normal distributions. The most commonly used higher-order approximation method is that of Edgeworth which takes a polynomial form weighted by a normal distribution or density function. The Edgeworth approximation is known to be quite accurate near the mean of a distribution. As a result of its polynomial form however it can sometimes produce spurious oscillations and give poor fits to the exact distributions in parts of the tails. Moreover density and tail approximations can sometimes produce negative values. Suppose that we are interested in the exact density f n of a statistic S n. In this study we consider an alternative higher-order approximation of the form K fn (x) = exp γ njx j ψ ( ) γ n (1) where γ n = {γ nj} K are the coefficients of the canonical exponential density and ψ(γ n)is a normalization term such that fn integrates to unity. The proposed approximation has an appealing information-theoretic interpretation as the maximum entropy (ME) density which is obtained by maximizing the information entropy subject to some moment conditions. The ME approximation is strictly positive and well defined on a bounded support. When
2 Scand J Statist 38 Maximum entropy asymptotic approximations 131 the underlying distribution has an unbounded support the ME approximation in its conventional form is not necessarily integrable. Instead of assuming a bounded support we propose a modified ME approximation with a dampening function that ensures the integrability of the approximation and at the same time is asymptotically negligible. Most existing studies on the ME density focus on its usage as a density estimator. This study is conceptually different in that it uses the ME density to approximate asymptotic distributions of statistics. For some given number of polynomial terms we investigate the performance of the approximation as n. In this study we focus on approximations up to order n 1 as approximations beyond order n 1 are rarely used in practice. Higher-order results can be obtained in a similar manner. We establish in the general framework of smooth functions of sample means that the ME approximations based on the first four moments of f n are asymptotically equivalent to the two-term Edgeworth approximation up to order n 1. We provide some numerical examples whose results suggest that the proposed method is a useful complement to the Edgeworth approximation especially when the sample size is small. The rest of the article is organized as follows. Section 2 reviews the literature on the Edgeworth approximation and the ME density. Section 3 proposes a higher-order approximation based on modified ME densities and establishes its asymptotic properties. Section 4 provides some numerical examples. Some concluding remarks are given in section 5. Proofs of the theorems and some technical details are collected in the appendices. 2. Background and motivation In this section we first review the literature on the Edgeworth approximation and some relevant variants. We then introduce the ME density based on which we construct our alternative higher-order approximations Edgeworth approximation In his seminal work Edgeworth (1905) applies the Gram Charlier approximation to the distribution of a sample mean and collects terms in powers of n 1/2. The resulting asymptotic approximation becomes known as the Edgeworth approximation. It has been extensively studied in the literature. See for example Barndorff-Nielsen & Cox (1989) Hall (1992) and references therein. Suppose that {Y i } n i = 1 is an i.i.d. random sample with mean zero and variance one. Let X n = 1 n n i = 1 Y i and f n be the density of X n. The two-term Edgeworth approximation to f n takes the form: { g n (x) = (x) 1 + κ 3H 3 (x) 6 n + κ 4H 4 (x) + κ2 3 H } 6(x) 24n 72n where (x) is the standard normal density κ j is the jth cumulant of Y 1 and H j is the jth Hermite polynomial of x. The Edgeworth approximation applies to not only sample means but also more general statistics. In this study we focus on smooth functions of sample means as does Hall (1992). Additional notations are required for more general statistics than sample means. We use bold face to denote a d-dimensional vector v and v (j) its jth element. Let S n be an asymptotically normal statistic with distribution F n and density f n. Define X = n 1 X i where X 1... X n are i.i.d. random column d-vectors with mean μ. Suppose that we have a smooth function B :R d R satisfying B(μ) = 0 and S n = n 1/2 B( X). For example B(x) = {τ(x) τ(μ)}/δ(μ) where
3 132 X. Wu and S. Wang Scand J Statist 38 τ(μ) is estimated by τ( X) and δ(μ) 2 is the asymptotic variance of n 1/2 τ( X); or B(x) = {τ(x) τ(μ)}/δ(x) where τ( ) and δ( ) are smooth functions and δ( X) is an estimate of δ(μ). For 2... denote the jth moment and cumulant of S n by μ nj and κ nj respectively. In general as the exact moments and cumulants for S n are not available we construct instead corresponding approximate moments and cumulants accurate to O(n 1 ) which are denoted by μ nj and κ nj respectively in the following. Define Z = n 1/2 ( X μ) and ( ) b i1...i j = j / x (i1 ) x ( ) i j B(x) x = μ. By a Taylor approximation and as B(μ) = 0 S n = n 1/2 B ( X ) = + n d d b i1 Z (i1 ) + n 1/2 1 2 i 1 = 1 d i 1 = 1 i 2 = 1 i 3 = 1 d i 1 = 1 i 2 = 1 d b i1 i 2 Z (i1 )Z (i2 ) d ( b i1 i 2 i 3 Z (i1 )Z (i2 )Z (i3 ) + O ) p n 3/2. The (approximate) moments of S n can then be computed from the aforementioned approximation. In principle one can compute its moments to an arbitrary degree of accuracy by taking a sufficiently large number of terms. Next we give general formulae that allow approximations accurate to order n 1. In particular we have μ n1 = 1 s 12 μ n n2 = s n s 22 μ n3 = 1 s 31 μ n n4 = s n s 42 (2) where s s 42 are defined in appendix 1. It is seen that E[Sn] j = μ nj + o(n 1 )...4. Define κ n1 = 1 k 12 κ n2 = k n n k 22 κ n3 = 1 k 31 κ n4 = 1 n n k 41 (3) where k k 41 are given in appendix 1. One can show that κ nj = κ n j + o(n 1 )...4. The Edgeworth approximation of the distribution of S n can be constructed using its approximate cumulants. See for example Bhattacharya & Ghosh (1978) and Hall (1992). In particular the two-term Edgeworth approximation to f n then takes the form: { g n (x) = (x) 1 + Q 1(x) + Q } 2(x) n n Q 1 (x) = k 12 H 1 (x) + k 31H 3 (x) ( ) 6 ( ) k22 + k12 2 H2 (x) k41 + 4k 12 k 31 H4 (x) Q 2 (x) = k2 31 H 6(x). 72 Note that g n (x) is a polynomial of degree 6 which might cause the so-called tail difficulties discussed in Wallace (1958). The tail of the Edgeworth approximation exhibits increasing oscillations as x increases owing to the high-degree polynomial. Moreover the Edgeworth approximation is not necessarily a bona fide density because it sometimes produces negative density approximations at the tails of distributions. Aiming to restrain the tail difficulties owing to polynomial oscillations Niki & Konishi (1986) propose a transformation method for the Edgeworth approximation. Let a general statistic S n = n(t n μ T )/σ T where μ T and σ 2 T are the mean and variance of T n. Niki &
4 Scand J Statist 38 Maximum entropy asymptotic approximations 133 Konishi (1986) note that the tail difficulties can be mitigated through a transformation φ(t n ) such that the third cumulant of standardized φ(t n ) is zero in which case the Edgeworth approximation is reduced to a degree 4 polynomial. The transformed statistic takes the form z n = n(φ(t n) φ(μ T )) σ T where φ is the first derivative of φ. For example let T φ (μ T ) n be the sample average of n i.i.d. chi-squared variates with k degrees of freedom then φ(x) = x 1/3. It is however generally rather difficult to derive the proposed transformation except for some special cases. Moreover the Edgeworth approximation for the transformed statistic can still be negative. Being a higher-order approximation around the mean the Edgeworth approximation is known to work well near the mean of a distribution but its performance sometimes deteriorates at the tails. Hampel (1973) introduces the so-called small sample asymptotic method which is essentially a local Edgeworth approximation. Although generally more accurate than the Edgeworth approximation Hampel s method requires the knowledge of f n which limits its applicability in practice. Field & Hampel (1982) briefly discuss an approximation to Hampel s small sample approximation that requires only the first few cumulants of S n (see also Field & Ronchetti 1990). Alternatively this approximation can be obtained by putting the Edgeworth approximation back to the exponent and collecting terms such that its first-order Taylor approximation is equivalent to the edgeworth approximation. Therefore we call this approximation the exponential Edgeworth (EE) approximation. Denote the EE approximation by (x) exp(k(x)). It can be constructed such that the first-order Taylor approximation of (x) exp(k(x)) at K(x) = 0 matches g n (x) uptoordern 1. We then have { Q h n (x) = (x)exp 1 (x) + Q 2a (x) + Q 2b (x) } n n Q 1(x) = k 12 H 1 (x) + k 31H 3 (x) 6 Q 2a(x) = k 22H 2 (x) + k 41H 4 (x) ( 2 24 Q 2b(x) = k2 12 H2 (x) H1 2(x)) + k ( 12k 31 H4 (x) H 1 (x)h 3 (x) ) ( + k2 31 H6 (x) H3 2(x)) (4) Clearly the EE approximation is strictly positive. In addition the exponent of h n (x) is a polynomial of degree 4 as H 6 (x) H3 2 (x) is a degree 4 polynomial. Thus the EE approximation is more likely to restrain polynomial oscillations. For a given x ( ) h n (x) is equivalent to the Edgeworth approximation g n (x) uptoordern 1. One problem with the EE approximation is that it is generally not integrable on the real line. To see this note that we can rewrite h n (x) in the canonical exponential form exp( 4 j = 0 γ njx j ) where γ nj... 4 are functions of k k 41. Obviously if γ n4 > 0 h n (x) goes to infinity as x. A second problem is about preservation of information. Both the two-term Edgeworth approximation g n and the EE approximation h n are based solely on { κ nj } 4. The g n preserves information in the sense that the first four cumulants of g n are identical to { κ nj } 4. However the first four cumulants of h n if existing are not exactly the same as { κ nj } 4. In this study we propose an alternative approximation that has the same functional form exp( 4 j = 0 γ njx j ) and at the same time preserves information such that its first four moments match { μ nj } 4. As is shown next it has an appealing information-theoretic interpretation.
5 134 X. Wu and S. Wang Scand J Statist ME density The exponential polynomial form of density approximation based on a set of moments arises naturally according to the principle of ME or minimum relative entropy. Given a continuous random variable X with density f the differential entropy is defined as: W(f ) = f (x)logf (x)dx. (5) The principle of ME provides a method of constructing a density from known moment conditions. Suppose all that is known about f is the first K moments. There might exist an infinite number of densities satisfying given moment conditions. Jaynes (1957) Principle of Maximum Entropy suggests selecting the density among all candidates satisfying the moment conditions which maximizes the entropy. The resulting ME density is uniquely determined as the one which is maximally noncommittal with regard to missing information and that it agrees with what is known but expresses maximum uncertainty with respect to all other matters. Formally the ME density is defined as: f = arg max W ( f ) f subject to the moment constraints that f (x) dx = 1 x j f (x) dx = μ j... K (6) where μ j = EX j. The problem can be solved using calculus of variations. Denote the Lagrangian { } K { L = f (x)logf (x) dx γ 0 f (x) dx 1 γ j x j f (x) dx μ j }. The necessary conditions for a stationary value are given by the Euler Lagrange equation log f (x) + 1 γ 0 K γ j x j = 0 (7) plus the constraints given in (6). Thus from (7) the ME density takes the general form: K K f (x) = exp γ j x j 1 + γ 0 = exp γ j x j ψ ( γ ) (8) where γ = {γ j } K and ψ(γ ) = log{ exp ( K γ j x j )dx} is the normalizing factor under the assumption that exp( K γ j x j )dx <. The Lagrangian multiplier γ can be solved from the K moment conditions in (6) by plugging f = f into (6). For distributions defined on R the necessary conditions for the integrability of f are that K is even and γ K < 0. A closely related concept is the relative entropy also called cross entropy or Kullback Leibler distance. Consider two densities f and f 0 defined on a common support. The relative entropy between f and f 0 is defined as: f (x)log f (x) dx. (9) f 0 (x) It is known that D( f f 0 ) 0 and D( f f 0 ) = 0 if and only if f (x) = f 0 (x) almost everywhere. Minimizing (9) subject to the same moment constraints (6) yields the minimum relative entropy density: D ( f f 0 ) =
6 Scand J Statist 38 Maximum entropy asymptotic approximations 135 f ( ) K x; f 0 = f0 (x)exp γ jf 0 x j ψ ( ) γ f 0 (10) ( ) { ( where γ = f 0 {γ jf 0 } K and ψ K ) } γ = f 0 log f0 (x)exp γ jf 0 x j dx is the normalizing factor under the assumption that ( K f 0 (x)exp γ jf 0 x )dx j <. Among all densities satisfying the given moment conditions f ( ) x; f 0 is the unique one that is closest to the reference density f 0 in terms of the relative entropy. Shore & Johnson (1981) provide an axiomatic justification of this approach. The ME density is a special case of the minimum relative entropy density wherein the reference density f 0 is set to be constant see e.g. Zellner & Highfield (1988) and Wu (2003). Existing studies on the ME densities mostly focus on its usage as a density estimator. The ME approximation is equivalent to approximating the logarithm of a density by polynomials which has long been studied. Earlier studies on the approximation of logdensities using polynomials include Neyman (1937) and Good (1963). When sample moments rather than population moments are used the maximum likelihood method gives efficient estimates of this canonical exponential family. Crain ( ) establishes existence and consistency of the maximum likelihood estimator of ME densities. Gilula & Haberman (2000) use a maximin argument based on information theory to derive the ME density based on summary statistics. Barron & Sheu (1991) allow the number of moments to increase with sample size and establish formally the ME approximation as a nonparametric density estimator. Their results are further extended to multivariate densities in Wu (forthcoming). This study is conceptually different. The ME densities are used to approximate distributions of statistics that are asymptotically normal. Instead of allowing K the degree of exponential polynomial increase with the sample size we study the accuracy of the approximation when the sample size n under some given K. Related work can be found in Harremoës (2005ab) and references therein. The former establishes lower bounds in terms of the relative entropy to determine the rate of convergence in the central limit theorem; the latter shows that the Edgeworth approximation can be considered as a linear extrapolation of the ME approximation. 3. ME approximation In this section we introduce an ME approximation to distributions of statistics that are asymptotically normal. As discussed before the ME density is not necessarily well defined for distributions on an unbounded support. Thus we propose a modified ME density that overcomes this constraint. This is particularly useful for asymptotic approximations to distributions of statistics that are defined on unbounded supports. We then establish the asymptotic properties of higher-order approximations based on modified ME densities Approximation for distributions on an unbounded support Whether an ME approximation is integrable on an unbounded support depends on the moment conditions. For example consider the case where the first four moments with μ 1 = 0 and μ 2 = 1 are used to construct an ME density. Rockinger & Jondeau (2002) compute numerically a boundary on the ( μ 3 μ 4 ) space beyond which an ME approximation is not obtainable. Under the same condition Tagliani (2003) shows that an ME approximation is not integrable if μ 3 = 0 and μ 4 > 0.
7 136 X. Wu and S. Wang Scand J Statist 38 We propose to use a dampening function in the exponent of an ME density to ensure its integrability. A similar approach is used in Wang (1992) to improve numerical stability of general saddlepoint approximations. Define the dampening function θ n (x) = c exp ( x ) (11) n 2 where c is a positive constant. The modified ME density based on { } 4 μ nj takes the form: 4 f nθ n (x) = exp γ njθ n x j θ n (x) ψ(γ nθ n ) (12) where the normalization term is defined as: 4 ψ(γ nθ n ) = log exp γ njθ n x j θ n (x) dx. This ME density can be obtained by f (x) min f (x)log f exp( θ n (x)) dx such that f (x)dx = 1 x j f (x) dx = μ nj...4. Thus the modified ME approximation is essentially a minimum cross entropy density based on given moment conditions and with a non-constant reference density proportional to exp ( θ n (x) ). The dampening function ensures that the exponent of the ME density is bounded above and goes to as x such that the ME density is integrable in R. One might be tempted to use a term like θ n (x) = cx r /n 2 as the dampening function where r is an even integer larger than 4. Integrability of the ME approximation however is not warranted with this choice. As is shown next the coefficient for x 4 γ n4θ n in the exponent is O(n 1 ). Suppose that r = 6. For x = O(n 1/3 ) it follows that γ n4θ n x 4 = O(n 1/3 ) and θ n (x) = O(1). Thus if γ n4θ n > 0 f nθ n (x) goes to infinity as x and is not integrable on R. Similarly we use the same dampening function to ensure the integrability of the EE approximations in R resulting in { Q h nθn (x) = (x)exp 1 (x) + Q 2a (x) + Q 2b (x) } θ n (x) (13) n n where Q 1 (x) Q 2a (x) and Q 2b (x) are defined in (4) and θ n(x) is given by (11). We note that the ME approximation adapts to the dampening function in the sense that its coefficients are determined through an optimization process that incorporates the dampening function as a reference density. In contrast the EE approximation does not have this appealing property because its coefficients are invariant to the introduction of a dampening function. This noteworthy difference is explored numerically in section 5. Although it does not affect the asymptotic result of order n 1 the dampening function may influence small sample performance of the ME and EE approximations. In practice we use the following rule: the constant c in the dampening function is chosen such that it is the smallest positive number that ensures both tails of the approximation decrease
8 Scand J Statist 38 Maximum entropy asymptotic approximations 137 monotonically. More precisely let f nθ n ( ; c ) be the ME approximation with a dampening function θ n ( ; c). We select c according to { c = inf a > 0: d f ( ) nθ n x; a > 0forx < 2 and d f ( ) } nθ n x; a < 0forx > 2. (14) dx dx In other words c is chosen such that the density approximation decays monotonically for x > 2as x which corresponds to approximately 5 per cent tail regions of the normal approximation. We use the lower bound c = 10 6 for c arbitrarily close to zero. The same rule is used to select the dampening function for the EE approximation (13) Asymptotic properties In this section we derive asymptotic properties of the ME approximation. As noted by Wallace (1958) asymptotic orders of several alternative higher-order approximations are established through their relationships with the Edgeworth approximation. The same approach is employed here. More specifically we use the EE approximation to facilitate our analysis noting that an appropriately constructed EE approximation is asymptotically equivalent to the Edgeworth approximation and at the same time shares a common canonical exponential form with the ME approximation. The asymptotic properties of the Edgeworth approximation have been extensively studied. Recall that we are interested in approximating the distribution of S n a smooth function of sample means as defined in section 2.1. We start with the following result for the Edgeworth approximation g n to f n. Lemma 1 (Hall 1992 p. 78). Assume that the function B has four continuous derivatives in a neighbourhood of μ and satisfies sup < x < n 1 S(nx) ( 1 + y ) 4 dy < where S(n x) ={y R d : n 1/2 B(μ + n 1/2 y)/σ = x}. Furthermore assume that E X 4 < and that the characteristic function φ of X satisfies φ(t) λ dt < R d for some λ 1. Then as n f n (x) g n (x) = o(n 1 ) uniformly in x. We next show that the EE approximation (13) is equivalent to the two-term Edgeworth approximation at order O(n 1 ). Theorem 1. Let h nθn be the (modified) EEapproximation (13) to f n. Under the conditions of lemma 1 f n (x) h nθn (x) = o(n 1 ) uniformly in x as n. A proof of theorem 1 and all other proofs are given in appendix 2. Next to make the ME approximation nθ n comparable with the Edgeworth and EE approximations we factor out (x). Reparameterizing γ n2θ n + 1 in (12) and denoting it by 2 γ n2θ n without confusion we obtain the following slightly different form: f
9 138 X. Wu and S. Wang Scand J Statist 38 4 f nθ n (x) = (x)exp γ njθ n x j θ n (x) ψ ( ) γ nθ n (15) where ψ ( ) γ nθ n = log 4 (x)exp γ njθ n x j θ n (x) dx. The asymptotic properties of f nθ n are established with the aid of the EE approximation (13). Heuristically using theorem 1 one can show that μ j ( h n θn )= μ nj + o(n 1 )for...4. Let μ( f ) = (μ j ( f )) 4 and be the Euclidean norm. As μ( f n) = μ( f nθ n ) + o(n 1 ) it follows that μ( h nθn ) μ( f nθ n ) = o(n 1 ). Under this condition we are then able to establish order n 1 equivalence between h nθn and f nθ n which share a common exponential polynomial form. By lemma 1 and theorem 1 we demonstrate formally the following result. Theorem 2. Let f nθ n be the ME approximation (15) to f n. Under the conditions of lemma 1 f n (x) f nθ n (x) = o ( n 1) uniformly in x as n. The asymptotic properties of the approximations to distribution functions readily follow from the aforementioned results. Let F n (x) = P ( S n x ). Denote by G n (x) the two-term Edgeworth approximation of F n. It is known that F n (x) G n (x) = o ( n 1) ; see for example Feller (1971). Define H nθn (x) = x h nθn (t) dt and F nθ n (x) = x f nθ n (t) dt respectively. Next we show that the same asymptotic order applies to H nθn and F nθ n. Theorem 3. Under the conditions of lemma 1 (a) the EE approximation to the distribution function satisfies that F n (x) H nθn (x) = o ( n 1) uniformly in x as n ;(b) the ME approximation to the distribution function satisfies that F n (x) F nθ n (x) = o ( n 1) uniformly in x as n. 4. Numerical examples In this section we provide some numerical examples to illustrate finite sample performance of the proposed method. Our experiments compare the Edgeworth the EE and the ME approximations. We also include the results of the normal approximation which is nested in all three higher-order approximations. Alternative higher-order asymptotic methods such as the bootstrap and the saddlepoint approximation are not considered. We focus our comparison between the ME and Edgeworth approximations because they share the same information requirement. For example only the first four moments or cumulants are required for approximations up to order n 1. The other methods are conceptually different. The bootstrap uses re-sampling and the saddlepoint approximation requires the knowledge of the moment-generating function. We leave the comparison with these alternative methods for future study. As by construction both Edgeworth and ME approximations integrate to unity we normalize the EE approximation such that it is directly comparable with the other two. We are interested in F n the distribution of S n that is a smooth function of sample means. Define s α by Pr ( S n s α ) = α. Let Ψn be some approximation to F n. In each experiment we report the one- and two-sided tail probabilities defined by
10 Scand J Statist 38 Maximum entropy asymptotic approximations 139 ( ) sα α 1 Ψn = dψ n (s) ( ) s1 α/2 α 2 Ψn = 1 dψ n (s). s α/2 We also report the relative error defined as α i ( Ψn ) α /α i = 1 2. In the first example suppose that we are interested in the distribution of the standardized mean S n = n i = 1 ( Yi 1 ) / 2n where {Yi } follows a Chi-squared distribution with one degree of freedom. The exact density of S n is given by f n (s) = 2n ( 2ns + n ) (n 2)/2 e ( 2ns + n)/2 Γ(n/2)2 n/2 s > n/2. We set n = 10. The constant c in the dampening function is chosen to be 10 6 for both the ME and EE approximations. The results are reported in Table 1. The accuracy of all approximations generally improves as α increases. The three higher-order approximations provide comparable performance and dominate the normal approximation. Although the magnitudes of relative errors are similar it is noted that both the one- and two-sided tail probabilities of the Edgeworth approximation are negative at extreme tails. We next examine a case where f n is symmetric and fat-tailed. Suppose that S n is the standardized mean of random variables from the t-distribution with five degrees of freedom where n = 10. Although the exact distribution of S n is unknown we can calculate its moments analytically (μ n3 = 0 and μ n4 = 3.6). As is discussed in the previous section for symmetric fat-tailed distributions both the ME and EE approximations if without a proper dampening function are not integrable on the real line. In this experiment the constant c in the dampening function is chosen to be 0.56 and 4.05 for the ME and EE approximations respectively using the automatic selection rule (14). We used 1 million simulations to obtain the critical values of the distribution of S n. The approximation results are reported in Table 2. The overall performance of the ME approximation is considerably better than that of the Edgeworth approximation which in turn outperforms the normal approximation. Unlike in the first example the performance of the EE approximation is essentially dominated by the other three. Note that a non-trivial c is chosen for both the ME and EE approximations in this case. A relatively heavy dampening function is chosen for the EE approximation to ensure its integrability. Its performance is clearly affected by the dampening function. In contrast the dampening function required for the ME approximation is considerably weaker. Table 1. Tail probabilities of approximated distributions of the standardized mean of n = 10χ 2 1 variables (NORM normal; ED Edgeworth; EE exponential Edgeworth; ME maximum entropy; RE relative errors) α NORM RE ED RE EE RE ME RE α 1 (Ψ n ) = s α dψ n(s) α 2 (Ψ n ) = 1 s 1 α/2 s α/2 dψ n (s)
11 140 X. Wu and S. Wang Scand J Statist 38 Table 2. Tail probabilities of approximated distribution of sample mean of n = 10 t 5 variables (NORM normal; ED Edgeworth; EE exponential Edgeworth; ME maximum entropy; RE relative errors) α NORM RE ED RE EE RE ME RE α 1 (Ψ n ) = s α dψ n(s) α 2 (Ψ n ) = 1 s 1 α/2 s α/2 dψ n (s) Generally the selection of the constant in the dampening function is expected to have little effects on the ME approximation because it adapts to the dampening function by adjusting its coefficients through an optimization process. However it has a non-negligible effect on the EE approximation whose coefficients are invariant to the introduction of the dampening function. To investigate how sensitive the reported results are with respect to the specification of the dampening function we conducted some further experiments. We used a grid search to locate the optimal c for the ME and EE approximations. The optimal c is defined as the minimizer of the sum of the eight relative errors reported in Table 2. The optimal c for the ME approximation is 0.6 which is obtained through a grid search on the interval [0.1 5] with an increment of 0.1. The optimal c for the EE approximation which is 5 is obtained in a similar fashion through a grid search on [1 20] with an increment of 1. Thus the cs selected by the automatic rule 0.56 and 4.05 respectively for the ME and EE approximations are quite close to their optimal values which are generally infeasible in practice. In the third experiment we used sample moments rather than population moments in the approximation of the distribution of a sample mean. Suppose that the variance is unknown in this case. We then have S n = Ȳ EY 1 n 1 n ( n t = 1 Yt Ȳ ) 2 where Ȳ = 1 n n t = 1 Y t. This studentized statistic is a smooth function of sample means. Without loss of generality suppose that {Y t } follows a distribution with mean zero and variance one. Using the general formulae given in appendix 1 we obtain the following approximate moments for S n : μ 3 μ n1 = 1 n μ 2 n2 = μ2 3 n μ 3 μ n3 = 7 n μ 2 n4 = μ2 3 2μ 4 n where μ j = E(Y j 1 )...4. The distribution of S n was obtained by 1 million Monte Carlo simulations. In our experiment we set n = 20 and generated {Y t } from the standard normal distribution. The population moments μ 3 and μ 4 are replaced by sample moments in the approximations. The dampening function is selected automatically according to the rule in (14). The experiment is repeated 500 times. Table 3 reports the average tail probabilities and corresponding
12 Scand J Statist 38 Maximum entropy asymptotic approximations 141 Table 3. Empirical tail probabilities of approximated distributions of studentized mean of n = 20 standard normal random variables (NORM normal; ED Edgeworth; EE exponential Edgeworth; ME maximum entropy; SE standard error) α NORM SE ED SE EE SE ME SE α 1 ( ˆΨ n ) = s α d ˆΨ n (s) (0.0045) (0.0013) (0.0105) (0.0013) (0.0100) (0.0059) (0.0163) (0.0042) (0.0135) (0.0093) (0.0195) (0.0065) (0.0240) (0.0208) (0.0296) (0.0156) α 2 ( ˆΨ n ) = 1 s 1 α/2 s d ˆΨ n (s) α/ (0.0074) (0.0012) (0.0097) (0.0013) (0.0148) (0.0064) (0.0145) (0.0036) (0.0197) (0.0105) (0.0164) (0.0056) (0.0351) (0.0177) (0.0181) (0.0131) standard errors. The results are qualitatively similar to those in Table 2. The ME and Edgeworth approximations dominate the EE and normal approximations. The overall performance of the ME approximation is better than that of the Edgeworth approximation especially for tail probabilities at 0.01 and In addition the standard errors of the ME approximation are smaller than those of the Edgeworth approximation in most cases. 5. Conclusion In this article we have proposed an ME approach for approximating the asymptotic distributions of smooth functions of sample means. Conventional ME densities are typically defined on a bounded support. The proposed ME approximation equipped with an asymptotically negligible dampening function is well defined on the real line. We have established order n 1 asymptotic equivalence between the proposed method and the classical Edgeworth approximation. Our numerical experiments show that the proposed method is comparable with and sometimes outperforms the Edgeworth approximation. Finally we note that the proposed method may be generalized to approximate bootstrap distributions of statistics and those of high dimensional statistics. References Barndorff-Nielsen O. E. & Cox D. R. (1989). Asymptotic techniques for use in statistics. Chapman and Hall London. Barron A. R. & Sheu C. (1991). Approximation of density functions by sequences of exponential families. Ann. Statist Bhattacharya R. N. & Ghosh J. K. (1978). On the validity of the formal Edgeworth approximation. Ann. Statist Crain B. R. (1974). Estimation of distributions using orthogonal approximations. Ann. Statist Crain B. R. (1977). An information theoretic approach to approximating a probability distribution. SIAM J. Appl. Math Edgeworth F. Y. (1905). The law of error. Trans. Camb. Philos. Soc Feller W. (1971). An introduction to probability theory and its application II. Wiley New York. Field C. A. & Hampel F. R. (1982). Small-sample asymptotic distributions of m-estimators of location. Biometrika Field C. A. & Ronchetti E. (1990). Small sample asymptotics. Institute of Mathematical Statistics Hayward CA. Good I. J. (1963). Maximum entropy for hypothesis formulation especially for multidimensional contingency tables. Ann. Math. Statist
13 142 X. Wu and S. Wang Scand J Statist 38 Gilula Z. & Haberman S. J. (2000). Density approximation by summary statistics: an information theoretic approach. Scand. J. Statist Hall P. (1992). The bootstrap and Edgeworth approximation. Springer-Verlag New York. Hampel F. R. (1973). Some small-sample asymptotics. In Proceedings of the Prague Symposium on Asymptotic Statistics (ed. J. Hajek) Charles University Prague. Harremoës P. (2005a). Lower bounds for divergence in central limit theorem. Electron. Notes Discrete Math Harremoës P. (2005b). Maximum entropy and the Edgeworth approximation. Proceedings of IEEE ISOC ITW 2005 on Coding and Complexity Jaynes E. T. (1957). Information theory and statistical mechanics. Phys. Rev Neyman J. (1937). Smooth test for goodness of fit. Skand. Aktuar Niki N. & Konishi S. (1986). Effects of transformations in higher order asymptotic approximations. Ann. Inst. Statist. Math Rockinger M. & Jondeau E. (2002). Entropy densities with an application to autoregressive conditional skewness and kurtosis. J. Econometrics Shore J. E. & Johnson R. W. (1981). Properties of cross-entropy minimization. IEEE Trans. Inform. Theory Tagliani A. (2003). A note on proximity of distributions in terms of coinciding moments. Appl. Math. Comput Wallace D. (1958). Asymptotic approximations to distributions. Ann. Math. Statist Wang S. (1992). General saddlepint approximation in the bootstrap. Statist. Probab. Lett Wu X. (2003). Calculation of maximum entropy densities with application to income distribution. J. Econometrics Wu X. (forthcoming). Multivariate exponential series density estimator. J. Econometrics. Zellner A. & Highfield R. A. (1988). Calculation of maximum entropy distribution and approximation of marginal posterior distributions. J. Econometrics Received December 2008 in final form January 2010 Ximing Wu Department of Agricultural Economics Texas A&M University College Station TX USA. xwu@tamu.edu Appendix 1: Approximate moments and cumulants We first define μ i1...i j = E[(X μ) (i1 ) (X μ) (ij )] j 1 and ν 1 = μ i1 ν 2 = μ i1 i 2 ν 3 = μ i1 i 2 i 3 ν 4 = μ i1 i 2 i 3 i 4 ν 22 = μ i1 i 2 μ i3 i 4 + μ i1 i 3 μ i2 i 4 + μ i1 i 4 μ i2 i 3 ν 23 = μ i1 i 2 μ i3 i 4 i 5 + μ i1 i 3 μ i2 i 4 i 5 + μ i1 i 4 μ i2 i 3 i 5 + μ i1 i 5 μ i2 i 3 i 4 + μ i2 i 3 μ i1 i 4 i 5 + μ i2 i 4 μ i1 i 3 i 5 + μ i2 i 5 μ i1 i 3 i 4 + μ i3 i 4 μ i1 i 2 i 5 ++μ i3 i 5 μ i1 i 2 i 4 + μ i4 i 5 μ i1 i 2 i 3 ν 222 = μ i1 i 2 μ i3 i 4 μ i5 i 6 + μ i1 i 2 μ i3 i 5 μ i4 i 6 + μ i1 i 2 μ i3 i 6 μ i4 i 5 + μ i1 i 3 μ i2 i 4 μ i5 i 6 + μ i1 i 3 μ i2 i 5 μ i4 i 6 + μ i1 i 3 μ i2 i 6 μ i4 i 5 + μ i1 i 4 μ i2 i 3 μ i5 i 6 + μ i1 i 4 μ i2 i 5 μ i3 i 6 + μ i1 i 4 μ i2 i 6 μ i3 i 5 + μ i1 i 5 μ i2 i 3 μ i4 i 6 + μ i1 i 5 μ i2 i 4 μ i3 i 6 + μ i1 i 5 μ i2 i 6 μ i3 i 4 + μ i1 i 6 μ i2 i 3 μ i4 i 5 + μ i1 i 6 μ i2 i 4 μ i3 i 5 + μ i1 i 6 μ i2 i 5 μ i3 i 4 where the dependence of νs oni 1... i 6 is suppressed for simplicity. One can then show that E[Z (i1 )] = ν 1 = 0 E[Z (i1 )Z (i2 )] = ν 2 E[Z (i1 )Z (i2 )Z (i3 )] = n 1/2 ν 3 E[Z (i1 ) Z (i4 )] = ν 22 + n 1 (ν 4 ν 22 ) + o(n 1 ) E[Z (i1 ) Z (i5 )] = n 1/2 ν 23 + o(n 1 ) E[Z (i1 ) Z (i6 )] = ν O(n 1 ).
14 Scand J Statist 38 Maximum entropy asymptotic approximations 143 Expanding E[Sn] j... 4 and collecting terms by order of n 1/2 then yields the first four approximate moments of S n. Denoting the j-fold summation d d i 1 = 1 i by i 1...i j to ease notation we obtain approximate moments of S n as follows: μ n1 = 1 1 b i1 i n 2 2 ν 2 = 1 s 12 i 1 i n 2 μ n2 = b i1 b i2 ν ( 1 b n i1 b i2 i 3 ν b i 1 b i2 i 3 i ) 4 b i 1 i 2 b i3 i 4 ν 22 i 1 i 2 i 1 i 2 i 3 i 1... i 6 i 1...i 4 = s n s 22 μ n3 = 1 3 b i1 b i2 b i3 ν 3 + n 2 b i 1 b i2 b i3 i 4 ν 22 = 1 s 31 i 1 i 2 i 3 i 1... i n 4 μ n4 = b i1 b i2 b i3 b i4 ν ( ) b n i1 b i2 b i3 b i4 ν4 ν b i1 b i2 b i3 b i4 i 5 ν 23 i 1 i 2 i 3 i 4 i 1...i 4 i 1... i 5 + ( 2 3 b i 1 b i2 b i3 b i4 i 5 i ) 2 b i 1 b i2 b i3 i 4 b i5 i 6 ν 222 = s n s 42 where the definitions of s s 42 are self-evident. As usually only a few b i1... i j terms are non-zero the actual formulae for the approximate moments of S n are often simpler than as appeared before. To construct approximate cumulants we first write the cumulants as a power series of n 1/2 such that κ n1 = E[S n ] = n 1/2 s 12 + o(n 1 ) = n 1/2 k 12 + o(n 1 ) κ n2 = E[Sn] 2 E[S n ] 2 = s 21 + n 1 (s 22 s12) 2 + o(n 1 ) = k 21 + n 1 k 22 + o(n 1 ) κ n3 = E[Sn] 3 3E[Sn]E[S 2 n ] + 2E[S n ] 3 = n 1/2 (s 31 3s 12 s 21 ) + o(n 1 ) = n 1/2 k 31 + o(n 1 ) κ n4 = E[Sn] 4 4E[Sn]E[S 3 n ] 3E[Sn] E[Sn]E[S 2 n ] 2 6E[S n ] 4 = n 1 (12s12s s 22 s 21 4s 31 s 12 + s 42 ) + o(n 1 ) = n 1 k 41 + o(n 1 ) where the definitions of k k 41 are self-evident. The second to last equality holds because s 41 3s21 2 = 0. When S n is a sample mean its cumulants can be easily calculated by κ nj = κ j /n j/2 1. For general statistics however this simple identity does not hold and the cumulants have to be constructed from the moments constructed before. Appendix 2: Proof of theorems Proof of theorem 1. Denote the exponent of (13) by K(x) = P 3(x) 6 n + P 4(x) 24n c exp ( ) x n 2 where P 3 (x) and P 4 (x) are polynomials of degrees 3 and 4 respectively whose coefficients are constants that do not depend on n.
15 144 X. Wu and S. Wang Scand J Statist 38 Without loss of generality assume that K(x) > 0. By Taylor s theorem there exists a z such that 0 < K(z) < K(x) and { h nθn (x) = (x) 1 + K(x) + K 2 (x) + K } 3 (x) exp(k(z)) 2! 3! { = (x) 1 + P 3(x) 6 n + P 4(x) 24n + P2 3 (x) } + (x)q(x) + (x) K 3 (x) exp(k(z)) (16) 72n 3! where Q(x) = P2 4 (x) 2(24n) + c2 exp(2 x ) 2 2n 4 + P 3(x)P 4 (x) P 3(x)c exp( x ) 144n 3/2 6n 5/2 P 4(x)c exp( x ) c exp( x ). (17) 24n 3 n 2 Note that the first term of (16) equals the Edgeworth approximation g n (x). As sup x R (x)x r < for all non-negative integers r we have sup x R (x)p r (x) <. Similarly one can show that sup x R P r (x) (x) exp( x ) <. It follows that (x)q(x) = O ( n 3/2). Note that { K 3 P3 (x) (x) = 6 n + P } 3 { 4(x) P3 (x) 3 24n 6 n + P } 2 4(x) c exp ( x ) 24n n 2 { P3 (x) n + P } { 4(x) c exp ( x ) } 2 { ( ) } 3 c exp x. 24n n 2 n 2 Using the fact that sup (x)p r (x) ( exp ( x )) j < j = r = x R we have (x) K 3 (x) = O ( n 3/2). 3! It follows that h nθn = g n + O ( n 3/2) + O ( n 3/2) exp ( K(z) ). (18) It now suffices to show that exp ( K(z) ) < such that h n θn = g n + o(n 1 ). Note that when z = o(n 1/6 ) we have P 3 (z)/ n = o(1) P 4 (z)/n = o(1) thus K(z) is bounded above because its third term is negative. When z = O(n 1/6 ) P 3 (z)/ n = O(1) P 4 (z)/n = o(1). But exp ( x ) /n 2 is increasing at a faster rate than any polynomial of z. Thus the third term of K(z) dominates and K(z) is bounded above. Lastly for z = O(n 1/6 + ε ) ε > 0 the third term of K(z) dominates by the same argument. It follows that sup exp z R ( P 3 (z) + P 4(z) n n c exp ( z ) n 2 ) <. Thus exp ( K(z) ) < and from (18) we have g n (x) h nθn (x) = o(n 1 ). By lemma 1 it then follows that f n (x) h nθn (x) f n (x) g n (x) + g n (x) h nθn (x) = o(n 1 ). As this result is valid for all x it holds uniformly in x.
16 Scand J Statist 38 Maximum entropy asymptotic approximations 145 Proof of theorem 2. Using an argument similar to the proof of theorem 1 we have for an arbitrary non-negative integer r ( ) μ r ( g n ) μ r hnθn x r g n (x) h nθn (x) dx = x r (x) Q(x) + K 3 (x) exp ( K(z) ) dx 3! = o(n 1 ) (19) where Q(x) is given by (17). The last equality uses the fact that x r (x)dx < for all non-negative integers r and the boundedness of exp(k(z)) established in the proof of theorem 1. Because for r = μ r ( g n ) = μ r ( f nθ n ) it follows that μ r ( f nθ n ) μ r ( h nθn ) = o(n 1 ). We then have μ( f nθ n ) μ( h nθn ) = o(n 1 ) where v = Σ d vj 2 is the Euclidean norm of a d-dimensional vector v. Let f (x) = (x) exp(σ 4 j = 0γ j x j c exp( x )) for arbitrary {γ j } 4 j = 0. Suppose that the moments μ( f ) ={μ j ( f )} 4 exist. There is a unique correspondence between μ( f ) and {γ j} 4 j = 0. Denote the coefficients {γ j } 4 j = 0 of f by γ(μ( f )). According to lemma 5 of Barron & Sheu (1991) γ(μ( f )) γ(μ(g)) and μ( f ) μ(g) are of the same asymptotic order for μ( f ) and μ(g) sufficiently close. Denote the coefficients of the ME and EE approximations by γ nθ n and γ n respectively. (Recall that the coefficients of the EE approximation are invariant to the dampening function θ n. Thus no subscript for θ n is required for its coefficients.) As μ( f nθ n ) μ( h nθn ) = o(n 1 ) we have γ nθ n γ n = o(n 1 ). Define K (x) = 4 j = 0 γ njθ n x j c exp( x ) and K (x) similarly. Without loss of generality n 2 assume that K (x) > 0. There exists a z such that 0 < K (z) < K (x) f nθ n (x) h 4 ( nθn (x) = (x) γ njθn γ nj) x j exp ( K (z) ) j = 0 exp ( K (z) ) 4 (x) x j γ njθn γ nj. (20) j = 0 From the proof of theorem 1 we have exp ( K (z) ) < and sup x R (x)x j < for all nonnegative integers j. It follows that f nθ n (x) h nθn (x) = o(n 1 ). Lastly we have f n (x) f nθ n (x) f n (x) h nθn (x) + h nθn (x) f nθ n (x) = o(n 1 ). As the result is valid for all x it holds uniformly in x. Proof of theorem 3. Note that F n (x) H nθn (x) = x x { f n (t) h nθn (t)} dt f n (t) h nθn (t) dt f n (t) g n (t) dt + g n (t) h nθn (t) dt.
17 146 X. Wu and S. Wang Scand J Statist 38 According to Hall (1992) the first integral is of order o(n 1 ). The second integral is of order o(n 1 ) according to (19) in the previous proof. It follows that F n (x) H nθn (x) = o(n 1 ) uniformly in x. It follows that F n (x) F nθ n (x) F n (x) H nθn (x) + H n θn (x) F nθ n (x) o(n 1 ) + h nθn (t) f nθ n (t) dt. Integrating the right-hand side of inequality (20) yields that h n θn (t) f nθ n (t) dt = o(n 1 ). Note that the presence of exp(k (z)) in (20) ensures that the n 1 order is preserved in the integration. Therefore F n (x) F nθ n (x) = o(n 1 ) uniformly in x.
Information Theoretic Asymptotic Approximations for Distributions of Statistics
Information Theoretic Asymptotic Approximations for Distributions of Statistics Ximing Wu Department of Agricultural Economics Texas A&M University Suojin Wang Department of Statistics Texas A&M University
More informationCalculation of maximum entropy densities with application to income distribution
Journal of Econometrics 115 (2003) 347 354 www.elsevier.com/locate/econbase Calculation of maximum entropy densities with application to income distribution Ximing Wu Department of Agricultural and Resource
More informationGMM Estimation of a Maximum Entropy Distribution with Interval Data
GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationASSESSING A VECTOR PARAMETER
SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;
More informationSTOCHASTIC GEOMETRY BIOIMAGING
CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2018 www.csgb.dk RESEARCH REPORT Anders Rønn-Nielsen and Eva B. Vedel Jensen Central limit theorem for mean and variogram estimators in Lévy based
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationMOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES
J. Korean Math. Soc. 47 1, No., pp. 63 75 DOI 1.4134/JKMS.1.47..63 MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES Ke-Ang Fu Li-Hua Hu Abstract. Let X n ; n 1 be a strictly stationary
More informationWeighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing
Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes
More informationGMM Estimation of a Maximum Entropy Distribution with Interval Data
GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff March 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationThis ODE arises in many physical systems that we shall investigate. + ( + 1)u = 0. (λ + s)x λ + s + ( + 1) a λ. (s + 1)(s + 2) a 0
Legendre equation This ODE arises in many physical systems that we shall investigate We choose We then have Substitution gives ( x 2 ) d 2 u du 2x 2 dx dx + ( + )u u x s a λ x λ a du dx λ a λ (λ + s)x
More informationROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS
ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various
More informationConvergence rates in weighted L 1 spaces of kernel density estimators for linear processes
Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,
More informationLarge deviations for random projections of l p balls
1/32 Large deviations for random projections of l p balls Nina Gantert CRM, september 5, 2016 Goal: Understanding random projections of high-dimensional convex sets. 2/32 2/32 Outline Goal: Understanding
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationRefining the Central Limit Theorem Approximation via Extreme Value Theory
Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of
More informationLARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*
LARGE EVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILE EPENENT RANOM VECTORS* Adam Jakubowski Alexander V. Nagaev Alexander Zaigraev Nicholas Copernicus University Faculty of Mathematics and Computer Science
More informationNonparametric Tests for Multi-parameter M-estimators
Nonparametric Tests for Multi-parameter M-estimators John Robinson School of Mathematics and Statistics University of Sydney The talk is based on joint work with John Kolassa. It follows from work over
More informationCONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3
CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3 N. Reid Department of Statistics University of Toronto Toronto,
More informationAdditive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535
Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationAn Alternative Method for Estimating and Simulating Maximum Entropy Densities
An Alternative Method for Estimating and Simulating Maximum Entropy Densities Jae-Young Kim and Joonhwan Lee Seoul National University May, 8 Abstract This paper proposes a method of estimating and simulating
More informationPenalized Exponential Series Estimation of Copula Densities
Penalized Exponential Series Estimation of Copula Densities Ximing Wu Abstract The exponential series density estimator is advantageous to the copula density estimation as it is strictly positive, explicitly
More informationTail bound inequalities and empirical likelihood for the mean
Tail bound inequalities and empirical likelihood for the mean Sandra Vucane 1 1 University of Latvia, Riga 29 th of September, 2011 Sandra Vucane (LU) Tail bound inequalities and EL for the mean 29.09.2011
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationRESEARCH REPORT. Estimation of sample spacing in stochastic processes. Anders Rønn-Nielsen, Jon Sporring and Eva B.
CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING www.csgb.dk RESEARCH REPORT 6 Anders Rønn-Nielsen, Jon Sporring and Eva B. Vedel Jensen Estimation of sample spacing in stochastic processes No. 7,
More informationAdjusted Empirical Likelihood for Long-memory Time Series Models
Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics
More informationDoes k-th Moment Exist?
Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationA Criterion for the Compound Poisson Distribution to be Maximum Entropy
A Criterion for the Compound Poisson Distribution to be Maximum Entropy Oliver Johnson Department of Mathematics University of Bristol University Walk Bristol, BS8 1TW, UK. Email: O.Johnson@bristol.ac.uk
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationStrong log-concavity is preserved by convolution
Strong log-concavity is preserved by convolution Jon A. Wellner Abstract. We review and formulate results concerning strong-log-concavity in both discrete and continuous settings. Although four different
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationA NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES
Bull. Korean Math. Soc. 52 (205), No. 3, pp. 825 836 http://dx.doi.org/0.434/bkms.205.52.3.825 A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES Yongfeng Wu and Mingzhu
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationGMM estimation of a maximum entropy distribution with interval data
Journal of Econometrics 138 (2007) 532 546 www.elsevier.com/locate/jeconom GMM estimation of a maximum entropy distribution with interval data Ximing Wu a, Jeffrey M. Perloff b, a Department of Agricultural
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationComputing Maximum Entropy Densities: A Hybrid Approach
Computing Maximum Entropy Densities: A Hybrid Approach Badong Chen Department of Precision Instruments and Mechanology Tsinghua University Beijing, 84, P. R. China Jinchun Hu Department of Precision Instruments
More informationInference on reliability in two-parameter exponential stress strength model
Metrika DOI 10.1007/s00184-006-0074-7 Inference on reliability in two-parameter exponential stress strength model K. Krishnamoorthy Shubhabrata Mukherjee Huizhen Guo Received: 19 January 2005 Springer-Verlag
More informationConjugate Predictive Distributions and Generalized Entropies
Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationTheoretical Statistics. Lecture 1.
1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.
More informationOn robust and efficient estimation of the center of. Symmetry.
On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract
More informationComplexity of two and multi-stage stochastic programming problems
Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept
More informationPRINCIPLES OF STATISTICAL INFERENCE
Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationA note on profile likelihood for exponential tilt mixture models
Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationTransformation-based Nonparametric Estimation of Multivariate Densities
Transformation-based Nonparametric Estimation of Multivariate Densities Meng-Shiuh Chang Ximing Wu March 9, 2013 Abstract We propose a probability-integral-transformation-based estimator of multivariate
More informationBahadur representations for bootstrap quantiles 1
Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationMathematical Methods for Physics and Engineering
Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory
More informationAn exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.
Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationDepartment of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Iran.
JIRSS (2012) Vol. 11, No. 2, pp 191-202 A Goodness of Fit Test For Exponentiality Based on Lin-Wong Information M. Abbasnejad, N. R. Arghami, M. Tavakoli Department of Statistics, School of Mathematical
More informationCalculus in Gauss Space
Calculus in Gauss Space 1. The Gradient Operator The -dimensional Lebesgue space is the measurable space (E (E )) where E =[0 1) or E = R endowed with the Lebesgue measure, and the calculus of functions
More informationRandom Bernstein-Markov factors
Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit
More informationA Weighted Generalized Maximum Entropy Estimator with a Data-driven Weight
Entropy 2009, 11, 1-x; doi:10.3390/e Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy A Weighted Generalized Maximum Entropy Estimator with a Data-driven Weight Ximing Wu Department
More informationMonotonicity and Aging Properties of Random Sums
Monotonicity and Aging Properties of Random Sums Jun Cai and Gordon E. Willmot Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario Canada N2L 3G1 E-mail: jcai@uwaterloo.ca,
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationChange detection problems in branching processes
Change detection problems in branching processes Outline of Ph.D. thesis by Tamás T. Szabó Thesis advisor: Professor Gyula Pap Doctoral School of Mathematics and Computer Science Bolyai Institute, University
More informationSums of exponentials of random walks
Sums of exponentials of random walks Robert de Jong Ohio State University August 27, 2009 Abstract This paper shows that the sum of the exponential of an oscillating random walk converges in distribution,
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationIntegrating Correlated Bayesian Networks Using Maximum Entropy
Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90
More informationVariance Function Estimation in Multivariate Nonparametric Regression
Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationHIGHER ORDER CUMULANTS OF RANDOM VECTORS, DIFFERENTIAL OPERATORS, AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES
HIGHER ORDER CUMULANTS OF RANDOM VECTORS DIFFERENTIAL OPERATORS AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES S RAO JAMMALAMADAKA T SUBBA RAO AND GYÖRGY TERDIK Abstract This paper provides
More informationPacking-Dimension Profiles and Fractional Brownian Motion
Under consideration for publication in Math. Proc. Camb. Phil. Soc. 1 Packing-Dimension Profiles and Fractional Brownian Motion By DAVAR KHOSHNEVISAN Department of Mathematics, 155 S. 1400 E., JWB 233,
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More information4 Sums of Independent Random Variables
4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables
More informationThe Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series
The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationEstimation of Dynamic Regression Models
University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.
More informationTHE information capacity is one of the most important
256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More information1 Lyapunov theory of stability
M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability
More informationZeros of lacunary random polynomials
Zeros of lacunary random polynomials Igor E. Pritsker Dedicated to Norm Levenberg on his 60th birthday Abstract We study the asymptotic distribution of zeros for the lacunary random polynomials. It is
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationPrediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate
www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation
More informationOptimal global rates of convergence for interpolation problems with random design
Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289
More informationGeometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling
Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,
More informationTesting Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy
This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer
More informationTHEOREM OF OSELEDETS. We recall some basic facts and terminology relative to linear cocycles and the multiplicative ergodic theorem of Oseledets [1].
THEOREM OF OSELEDETS We recall some basic facts and terminology relative to linear cocycles and the multiplicative ergodic theorem of Oseledets []. 0.. Cocycles over maps. Let µ be a probability measure
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationOn large deviations of sums of independent random variables
On large deviations of sums of independent random variables Zhishui Hu 12, Valentin V. Petrov 23 and John Robinson 2 1 Department of Statistics and Finance, University of Science and Technology of China,
More information