Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means

Size: px
Start display at page:

Download "Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means"

Transcription

1 Scandinavian Journal of Statistics Vol. 38: doi: /j x Published by Blackwell Publishing Ltd. Maximum Entropy Approximations for Asymptotic Distributions of Smooth Functions of Sample Means XIMING WU Department of Agricultural Economics Texas A&M University SUOJIN WANG Department of Statistics Texas A&M University ABSTRACT. We propose an information-theoretic approach to approximate asymptotic distributions of statistics using the maximum entropy (ME) densities. Conventional ME densities are typically defined on a bounded support. For distributions defined on unbounded supports we use an asymptotically negligible dampening function for the ME approximation such that it is well defined on the real line. We establish order n 1 asymptotic equivalence between the proposed method and the classical Edgeworth approximation for general statistics that are smooth functions of sample means. Numerical examples are provided to demonstrate the efficacy of the proposed method. Key words: asymptotic approximation Edgeworth approximation maximum entropy density 1. Introduction Asymptotic approximations to the distributions of estimators or test statistics are commonly used in statistical inferences such as approximate confidence limits for parameters of interest or p-values for hypothesis tests. According to the central limit theorem a statistic often has a limiting normal distribution under mild regularity conditions. The normal approximation however can be inadequate especially when the sample size is small. In this study we are concerned with using higher-order approximation methods to approximate the distributions of statistics with limiting normal distributions. The most commonly used higher-order approximation method is that of Edgeworth which takes a polynomial form weighted by a normal distribution or density function. The Edgeworth approximation is known to be quite accurate near the mean of a distribution. As a result of its polynomial form however it can sometimes produce spurious oscillations and give poor fits to the exact distributions in parts of the tails. Moreover density and tail approximations can sometimes produce negative values. Suppose that we are interested in the exact density f n of a statistic S n. In this study we consider an alternative higher-order approximation of the form K fn (x) = exp γ njx j ψ ( ) γ n (1) where γ n = {γ nj} K are the coefficients of the canonical exponential density and ψ(γ n)is a normalization term such that fn integrates to unity. The proposed approximation has an appealing information-theoretic interpretation as the maximum entropy (ME) density which is obtained by maximizing the information entropy subject to some moment conditions. The ME approximation is strictly positive and well defined on a bounded support. When

2 Scand J Statist 38 Maximum entropy asymptotic approximations 131 the underlying distribution has an unbounded support the ME approximation in its conventional form is not necessarily integrable. Instead of assuming a bounded support we propose a modified ME approximation with a dampening function that ensures the integrability of the approximation and at the same time is asymptotically negligible. Most existing studies on the ME density focus on its usage as a density estimator. This study is conceptually different in that it uses the ME density to approximate asymptotic distributions of statistics. For some given number of polynomial terms we investigate the performance of the approximation as n. In this study we focus on approximations up to order n 1 as approximations beyond order n 1 are rarely used in practice. Higher-order results can be obtained in a similar manner. We establish in the general framework of smooth functions of sample means that the ME approximations based on the first four moments of f n are asymptotically equivalent to the two-term Edgeworth approximation up to order n 1. We provide some numerical examples whose results suggest that the proposed method is a useful complement to the Edgeworth approximation especially when the sample size is small. The rest of the article is organized as follows. Section 2 reviews the literature on the Edgeworth approximation and the ME density. Section 3 proposes a higher-order approximation based on modified ME densities and establishes its asymptotic properties. Section 4 provides some numerical examples. Some concluding remarks are given in section 5. Proofs of the theorems and some technical details are collected in the appendices. 2. Background and motivation In this section we first review the literature on the Edgeworth approximation and some relevant variants. We then introduce the ME density based on which we construct our alternative higher-order approximations Edgeworth approximation In his seminal work Edgeworth (1905) applies the Gram Charlier approximation to the distribution of a sample mean and collects terms in powers of n 1/2. The resulting asymptotic approximation becomes known as the Edgeworth approximation. It has been extensively studied in the literature. See for example Barndorff-Nielsen & Cox (1989) Hall (1992) and references therein. Suppose that {Y i } n i = 1 is an i.i.d. random sample with mean zero and variance one. Let X n = 1 n n i = 1 Y i and f n be the density of X n. The two-term Edgeworth approximation to f n takes the form: { g n (x) = (x) 1 + κ 3H 3 (x) 6 n + κ 4H 4 (x) + κ2 3 H } 6(x) 24n 72n where (x) is the standard normal density κ j is the jth cumulant of Y 1 and H j is the jth Hermite polynomial of x. The Edgeworth approximation applies to not only sample means but also more general statistics. In this study we focus on smooth functions of sample means as does Hall (1992). Additional notations are required for more general statistics than sample means. We use bold face to denote a d-dimensional vector v and v (j) its jth element. Let S n be an asymptotically normal statistic with distribution F n and density f n. Define X = n 1 X i where X 1... X n are i.i.d. random column d-vectors with mean μ. Suppose that we have a smooth function B :R d R satisfying B(μ) = 0 and S n = n 1/2 B( X). For example B(x) = {τ(x) τ(μ)}/δ(μ) where

3 132 X. Wu and S. Wang Scand J Statist 38 τ(μ) is estimated by τ( X) and δ(μ) 2 is the asymptotic variance of n 1/2 τ( X); or B(x) = {τ(x) τ(μ)}/δ(x) where τ( ) and δ( ) are smooth functions and δ( X) is an estimate of δ(μ). For 2... denote the jth moment and cumulant of S n by μ nj and κ nj respectively. In general as the exact moments and cumulants for S n are not available we construct instead corresponding approximate moments and cumulants accurate to O(n 1 ) which are denoted by μ nj and κ nj respectively in the following. Define Z = n 1/2 ( X μ) and ( ) b i1...i j = j / x (i1 ) x ( ) i j B(x) x = μ. By a Taylor approximation and as B(μ) = 0 S n = n 1/2 B ( X ) = + n d d b i1 Z (i1 ) + n 1/2 1 2 i 1 = 1 d i 1 = 1 i 2 = 1 i 3 = 1 d i 1 = 1 i 2 = 1 d b i1 i 2 Z (i1 )Z (i2 ) d ( b i1 i 2 i 3 Z (i1 )Z (i2 )Z (i3 ) + O ) p n 3/2. The (approximate) moments of S n can then be computed from the aforementioned approximation. In principle one can compute its moments to an arbitrary degree of accuracy by taking a sufficiently large number of terms. Next we give general formulae that allow approximations accurate to order n 1. In particular we have μ n1 = 1 s 12 μ n n2 = s n s 22 μ n3 = 1 s 31 μ n n4 = s n s 42 (2) where s s 42 are defined in appendix 1. It is seen that E[Sn] j = μ nj + o(n 1 )...4. Define κ n1 = 1 k 12 κ n2 = k n n k 22 κ n3 = 1 k 31 κ n4 = 1 n n k 41 (3) where k k 41 are given in appendix 1. One can show that κ nj = κ n j + o(n 1 )...4. The Edgeworth approximation of the distribution of S n can be constructed using its approximate cumulants. See for example Bhattacharya & Ghosh (1978) and Hall (1992). In particular the two-term Edgeworth approximation to f n then takes the form: { g n (x) = (x) 1 + Q 1(x) + Q } 2(x) n n Q 1 (x) = k 12 H 1 (x) + k 31H 3 (x) ( ) 6 ( ) k22 + k12 2 H2 (x) k41 + 4k 12 k 31 H4 (x) Q 2 (x) = k2 31 H 6(x). 72 Note that g n (x) is a polynomial of degree 6 which might cause the so-called tail difficulties discussed in Wallace (1958). The tail of the Edgeworth approximation exhibits increasing oscillations as x increases owing to the high-degree polynomial. Moreover the Edgeworth approximation is not necessarily a bona fide density because it sometimes produces negative density approximations at the tails of distributions. Aiming to restrain the tail difficulties owing to polynomial oscillations Niki & Konishi (1986) propose a transformation method for the Edgeworth approximation. Let a general statistic S n = n(t n μ T )/σ T where μ T and σ 2 T are the mean and variance of T n. Niki &

4 Scand J Statist 38 Maximum entropy asymptotic approximations 133 Konishi (1986) note that the tail difficulties can be mitigated through a transformation φ(t n ) such that the third cumulant of standardized φ(t n ) is zero in which case the Edgeworth approximation is reduced to a degree 4 polynomial. The transformed statistic takes the form z n = n(φ(t n) φ(μ T )) σ T where φ is the first derivative of φ. For example let T φ (μ T ) n be the sample average of n i.i.d. chi-squared variates with k degrees of freedom then φ(x) = x 1/3. It is however generally rather difficult to derive the proposed transformation except for some special cases. Moreover the Edgeworth approximation for the transformed statistic can still be negative. Being a higher-order approximation around the mean the Edgeworth approximation is known to work well near the mean of a distribution but its performance sometimes deteriorates at the tails. Hampel (1973) introduces the so-called small sample asymptotic method which is essentially a local Edgeworth approximation. Although generally more accurate than the Edgeworth approximation Hampel s method requires the knowledge of f n which limits its applicability in practice. Field & Hampel (1982) briefly discuss an approximation to Hampel s small sample approximation that requires only the first few cumulants of S n (see also Field & Ronchetti 1990). Alternatively this approximation can be obtained by putting the Edgeworth approximation back to the exponent and collecting terms such that its first-order Taylor approximation is equivalent to the edgeworth approximation. Therefore we call this approximation the exponential Edgeworth (EE) approximation. Denote the EE approximation by (x) exp(k(x)). It can be constructed such that the first-order Taylor approximation of (x) exp(k(x)) at K(x) = 0 matches g n (x) uptoordern 1. We then have { Q h n (x) = (x)exp 1 (x) + Q 2a (x) + Q 2b (x) } n n Q 1(x) = k 12 H 1 (x) + k 31H 3 (x) 6 Q 2a(x) = k 22H 2 (x) + k 41H 4 (x) ( 2 24 Q 2b(x) = k2 12 H2 (x) H1 2(x)) + k ( 12k 31 H4 (x) H 1 (x)h 3 (x) ) ( + k2 31 H6 (x) H3 2(x)) (4) Clearly the EE approximation is strictly positive. In addition the exponent of h n (x) is a polynomial of degree 4 as H 6 (x) H3 2 (x) is a degree 4 polynomial. Thus the EE approximation is more likely to restrain polynomial oscillations. For a given x ( ) h n (x) is equivalent to the Edgeworth approximation g n (x) uptoordern 1. One problem with the EE approximation is that it is generally not integrable on the real line. To see this note that we can rewrite h n (x) in the canonical exponential form exp( 4 j = 0 γ njx j ) where γ nj... 4 are functions of k k 41. Obviously if γ n4 > 0 h n (x) goes to infinity as x. A second problem is about preservation of information. Both the two-term Edgeworth approximation g n and the EE approximation h n are based solely on { κ nj } 4. The g n preserves information in the sense that the first four cumulants of g n are identical to { κ nj } 4. However the first four cumulants of h n if existing are not exactly the same as { κ nj } 4. In this study we propose an alternative approximation that has the same functional form exp( 4 j = 0 γ njx j ) and at the same time preserves information such that its first four moments match { μ nj } 4. As is shown next it has an appealing information-theoretic interpretation.

5 134 X. Wu and S. Wang Scand J Statist ME density The exponential polynomial form of density approximation based on a set of moments arises naturally according to the principle of ME or minimum relative entropy. Given a continuous random variable X with density f the differential entropy is defined as: W(f ) = f (x)logf (x)dx. (5) The principle of ME provides a method of constructing a density from known moment conditions. Suppose all that is known about f is the first K moments. There might exist an infinite number of densities satisfying given moment conditions. Jaynes (1957) Principle of Maximum Entropy suggests selecting the density among all candidates satisfying the moment conditions which maximizes the entropy. The resulting ME density is uniquely determined as the one which is maximally noncommittal with regard to missing information and that it agrees with what is known but expresses maximum uncertainty with respect to all other matters. Formally the ME density is defined as: f = arg max W ( f ) f subject to the moment constraints that f (x) dx = 1 x j f (x) dx = μ j... K (6) where μ j = EX j. The problem can be solved using calculus of variations. Denote the Lagrangian { } K { L = f (x)logf (x) dx γ 0 f (x) dx 1 γ j x j f (x) dx μ j }. The necessary conditions for a stationary value are given by the Euler Lagrange equation log f (x) + 1 γ 0 K γ j x j = 0 (7) plus the constraints given in (6). Thus from (7) the ME density takes the general form: K K f (x) = exp γ j x j 1 + γ 0 = exp γ j x j ψ ( γ ) (8) where γ = {γ j } K and ψ(γ ) = log{ exp ( K γ j x j )dx} is the normalizing factor under the assumption that exp( K γ j x j )dx <. The Lagrangian multiplier γ can be solved from the K moment conditions in (6) by plugging f = f into (6). For distributions defined on R the necessary conditions for the integrability of f are that K is even and γ K < 0. A closely related concept is the relative entropy also called cross entropy or Kullback Leibler distance. Consider two densities f and f 0 defined on a common support. The relative entropy between f and f 0 is defined as: f (x)log f (x) dx. (9) f 0 (x) It is known that D( f f 0 ) 0 and D( f f 0 ) = 0 if and only if f (x) = f 0 (x) almost everywhere. Minimizing (9) subject to the same moment constraints (6) yields the minimum relative entropy density: D ( f f 0 ) =

6 Scand J Statist 38 Maximum entropy asymptotic approximations 135 f ( ) K x; f 0 = f0 (x)exp γ jf 0 x j ψ ( ) γ f 0 (10) ( ) { ( where γ = f 0 {γ jf 0 } K and ψ K ) } γ = f 0 log f0 (x)exp γ jf 0 x j dx is the normalizing factor under the assumption that ( K f 0 (x)exp γ jf 0 x )dx j <. Among all densities satisfying the given moment conditions f ( ) x; f 0 is the unique one that is closest to the reference density f 0 in terms of the relative entropy. Shore & Johnson (1981) provide an axiomatic justification of this approach. The ME density is a special case of the minimum relative entropy density wherein the reference density f 0 is set to be constant see e.g. Zellner & Highfield (1988) and Wu (2003). Existing studies on the ME densities mostly focus on its usage as a density estimator. The ME approximation is equivalent to approximating the logarithm of a density by polynomials which has long been studied. Earlier studies on the approximation of logdensities using polynomials include Neyman (1937) and Good (1963). When sample moments rather than population moments are used the maximum likelihood method gives efficient estimates of this canonical exponential family. Crain ( ) establishes existence and consistency of the maximum likelihood estimator of ME densities. Gilula & Haberman (2000) use a maximin argument based on information theory to derive the ME density based on summary statistics. Barron & Sheu (1991) allow the number of moments to increase with sample size and establish formally the ME approximation as a nonparametric density estimator. Their results are further extended to multivariate densities in Wu (forthcoming). This study is conceptually different. The ME densities are used to approximate distributions of statistics that are asymptotically normal. Instead of allowing K the degree of exponential polynomial increase with the sample size we study the accuracy of the approximation when the sample size n under some given K. Related work can be found in Harremoës (2005ab) and references therein. The former establishes lower bounds in terms of the relative entropy to determine the rate of convergence in the central limit theorem; the latter shows that the Edgeworth approximation can be considered as a linear extrapolation of the ME approximation. 3. ME approximation In this section we introduce an ME approximation to distributions of statistics that are asymptotically normal. As discussed before the ME density is not necessarily well defined for distributions on an unbounded support. Thus we propose a modified ME density that overcomes this constraint. This is particularly useful for asymptotic approximations to distributions of statistics that are defined on unbounded supports. We then establish the asymptotic properties of higher-order approximations based on modified ME densities Approximation for distributions on an unbounded support Whether an ME approximation is integrable on an unbounded support depends on the moment conditions. For example consider the case where the first four moments with μ 1 = 0 and μ 2 = 1 are used to construct an ME density. Rockinger & Jondeau (2002) compute numerically a boundary on the ( μ 3 μ 4 ) space beyond which an ME approximation is not obtainable. Under the same condition Tagliani (2003) shows that an ME approximation is not integrable if μ 3 = 0 and μ 4 > 0.

7 136 X. Wu and S. Wang Scand J Statist 38 We propose to use a dampening function in the exponent of an ME density to ensure its integrability. A similar approach is used in Wang (1992) to improve numerical stability of general saddlepoint approximations. Define the dampening function θ n (x) = c exp ( x ) (11) n 2 where c is a positive constant. The modified ME density based on { } 4 μ nj takes the form: 4 f nθ n (x) = exp γ njθ n x j θ n (x) ψ(γ nθ n ) (12) where the normalization term is defined as: 4 ψ(γ nθ n ) = log exp γ njθ n x j θ n (x) dx. This ME density can be obtained by f (x) min f (x)log f exp( θ n (x)) dx such that f (x)dx = 1 x j f (x) dx = μ nj...4. Thus the modified ME approximation is essentially a minimum cross entropy density based on given moment conditions and with a non-constant reference density proportional to exp ( θ n (x) ). The dampening function ensures that the exponent of the ME density is bounded above and goes to as x such that the ME density is integrable in R. One might be tempted to use a term like θ n (x) = cx r /n 2 as the dampening function where r is an even integer larger than 4. Integrability of the ME approximation however is not warranted with this choice. As is shown next the coefficient for x 4 γ n4θ n in the exponent is O(n 1 ). Suppose that r = 6. For x = O(n 1/3 ) it follows that γ n4θ n x 4 = O(n 1/3 ) and θ n (x) = O(1). Thus if γ n4θ n > 0 f nθ n (x) goes to infinity as x and is not integrable on R. Similarly we use the same dampening function to ensure the integrability of the EE approximations in R resulting in { Q h nθn (x) = (x)exp 1 (x) + Q 2a (x) + Q 2b (x) } θ n (x) (13) n n where Q 1 (x) Q 2a (x) and Q 2b (x) are defined in (4) and θ n(x) is given by (11). We note that the ME approximation adapts to the dampening function in the sense that its coefficients are determined through an optimization process that incorporates the dampening function as a reference density. In contrast the EE approximation does not have this appealing property because its coefficients are invariant to the introduction of a dampening function. This noteworthy difference is explored numerically in section 5. Although it does not affect the asymptotic result of order n 1 the dampening function may influence small sample performance of the ME and EE approximations. In practice we use the following rule: the constant c in the dampening function is chosen such that it is the smallest positive number that ensures both tails of the approximation decrease

8 Scand J Statist 38 Maximum entropy asymptotic approximations 137 monotonically. More precisely let f nθ n ( ; c ) be the ME approximation with a dampening function θ n ( ; c). We select c according to { c = inf a > 0: d f ( ) nθ n x; a > 0forx < 2 and d f ( ) } nθ n x; a < 0forx > 2. (14) dx dx In other words c is chosen such that the density approximation decays monotonically for x > 2as x which corresponds to approximately 5 per cent tail regions of the normal approximation. We use the lower bound c = 10 6 for c arbitrarily close to zero. The same rule is used to select the dampening function for the EE approximation (13) Asymptotic properties In this section we derive asymptotic properties of the ME approximation. As noted by Wallace (1958) asymptotic orders of several alternative higher-order approximations are established through their relationships with the Edgeworth approximation. The same approach is employed here. More specifically we use the EE approximation to facilitate our analysis noting that an appropriately constructed EE approximation is asymptotically equivalent to the Edgeworth approximation and at the same time shares a common canonical exponential form with the ME approximation. The asymptotic properties of the Edgeworth approximation have been extensively studied. Recall that we are interested in approximating the distribution of S n a smooth function of sample means as defined in section 2.1. We start with the following result for the Edgeworth approximation g n to f n. Lemma 1 (Hall 1992 p. 78). Assume that the function B has four continuous derivatives in a neighbourhood of μ and satisfies sup < x < n 1 S(nx) ( 1 + y ) 4 dy < where S(n x) ={y R d : n 1/2 B(μ + n 1/2 y)/σ = x}. Furthermore assume that E X 4 < and that the characteristic function φ of X satisfies φ(t) λ dt < R d for some λ 1. Then as n f n (x) g n (x) = o(n 1 ) uniformly in x. We next show that the EE approximation (13) is equivalent to the two-term Edgeworth approximation at order O(n 1 ). Theorem 1. Let h nθn be the (modified) EEapproximation (13) to f n. Under the conditions of lemma 1 f n (x) h nθn (x) = o(n 1 ) uniformly in x as n. A proof of theorem 1 and all other proofs are given in appendix 2. Next to make the ME approximation nθ n comparable with the Edgeworth and EE approximations we factor out (x). Reparameterizing γ n2θ n + 1 in (12) and denoting it by 2 γ n2θ n without confusion we obtain the following slightly different form: f

9 138 X. Wu and S. Wang Scand J Statist 38 4 f nθ n (x) = (x)exp γ njθ n x j θ n (x) ψ ( ) γ nθ n (15) where ψ ( ) γ nθ n = log 4 (x)exp γ njθ n x j θ n (x) dx. The asymptotic properties of f nθ n are established with the aid of the EE approximation (13). Heuristically using theorem 1 one can show that μ j ( h n θn )= μ nj + o(n 1 )for...4. Let μ( f ) = (μ j ( f )) 4 and be the Euclidean norm. As μ( f n) = μ( f nθ n ) + o(n 1 ) it follows that μ( h nθn ) μ( f nθ n ) = o(n 1 ). Under this condition we are then able to establish order n 1 equivalence between h nθn and f nθ n which share a common exponential polynomial form. By lemma 1 and theorem 1 we demonstrate formally the following result. Theorem 2. Let f nθ n be the ME approximation (15) to f n. Under the conditions of lemma 1 f n (x) f nθ n (x) = o ( n 1) uniformly in x as n. The asymptotic properties of the approximations to distribution functions readily follow from the aforementioned results. Let F n (x) = P ( S n x ). Denote by G n (x) the two-term Edgeworth approximation of F n. It is known that F n (x) G n (x) = o ( n 1) ; see for example Feller (1971). Define H nθn (x) = x h nθn (t) dt and F nθ n (x) = x f nθ n (t) dt respectively. Next we show that the same asymptotic order applies to H nθn and F nθ n. Theorem 3. Under the conditions of lemma 1 (a) the EE approximation to the distribution function satisfies that F n (x) H nθn (x) = o ( n 1) uniformly in x as n ;(b) the ME approximation to the distribution function satisfies that F n (x) F nθ n (x) = o ( n 1) uniformly in x as n. 4. Numerical examples In this section we provide some numerical examples to illustrate finite sample performance of the proposed method. Our experiments compare the Edgeworth the EE and the ME approximations. We also include the results of the normal approximation which is nested in all three higher-order approximations. Alternative higher-order asymptotic methods such as the bootstrap and the saddlepoint approximation are not considered. We focus our comparison between the ME and Edgeworth approximations because they share the same information requirement. For example only the first four moments or cumulants are required for approximations up to order n 1. The other methods are conceptually different. The bootstrap uses re-sampling and the saddlepoint approximation requires the knowledge of the moment-generating function. We leave the comparison with these alternative methods for future study. As by construction both Edgeworth and ME approximations integrate to unity we normalize the EE approximation such that it is directly comparable with the other two. We are interested in F n the distribution of S n that is a smooth function of sample means. Define s α by Pr ( S n s α ) = α. Let Ψn be some approximation to F n. In each experiment we report the one- and two-sided tail probabilities defined by

10 Scand J Statist 38 Maximum entropy asymptotic approximations 139 ( ) sα α 1 Ψn = dψ n (s) ( ) s1 α/2 α 2 Ψn = 1 dψ n (s). s α/2 We also report the relative error defined as α i ( Ψn ) α /α i = 1 2. In the first example suppose that we are interested in the distribution of the standardized mean S n = n i = 1 ( Yi 1 ) / 2n where {Yi } follows a Chi-squared distribution with one degree of freedom. The exact density of S n is given by f n (s) = 2n ( 2ns + n ) (n 2)/2 e ( 2ns + n)/2 Γ(n/2)2 n/2 s > n/2. We set n = 10. The constant c in the dampening function is chosen to be 10 6 for both the ME and EE approximations. The results are reported in Table 1. The accuracy of all approximations generally improves as α increases. The three higher-order approximations provide comparable performance and dominate the normal approximation. Although the magnitudes of relative errors are similar it is noted that both the one- and two-sided tail probabilities of the Edgeworth approximation are negative at extreme tails. We next examine a case where f n is symmetric and fat-tailed. Suppose that S n is the standardized mean of random variables from the t-distribution with five degrees of freedom where n = 10. Although the exact distribution of S n is unknown we can calculate its moments analytically (μ n3 = 0 and μ n4 = 3.6). As is discussed in the previous section for symmetric fat-tailed distributions both the ME and EE approximations if without a proper dampening function are not integrable on the real line. In this experiment the constant c in the dampening function is chosen to be 0.56 and 4.05 for the ME and EE approximations respectively using the automatic selection rule (14). We used 1 million simulations to obtain the critical values of the distribution of S n. The approximation results are reported in Table 2. The overall performance of the ME approximation is considerably better than that of the Edgeworth approximation which in turn outperforms the normal approximation. Unlike in the first example the performance of the EE approximation is essentially dominated by the other three. Note that a non-trivial c is chosen for both the ME and EE approximations in this case. A relatively heavy dampening function is chosen for the EE approximation to ensure its integrability. Its performance is clearly affected by the dampening function. In contrast the dampening function required for the ME approximation is considerably weaker. Table 1. Tail probabilities of approximated distributions of the standardized mean of n = 10χ 2 1 variables (NORM normal; ED Edgeworth; EE exponential Edgeworth; ME maximum entropy; RE relative errors) α NORM RE ED RE EE RE ME RE α 1 (Ψ n ) = s α dψ n(s) α 2 (Ψ n ) = 1 s 1 α/2 s α/2 dψ n (s)

11 140 X. Wu and S. Wang Scand J Statist 38 Table 2. Tail probabilities of approximated distribution of sample mean of n = 10 t 5 variables (NORM normal; ED Edgeworth; EE exponential Edgeworth; ME maximum entropy; RE relative errors) α NORM RE ED RE EE RE ME RE α 1 (Ψ n ) = s α dψ n(s) α 2 (Ψ n ) = 1 s 1 α/2 s α/2 dψ n (s) Generally the selection of the constant in the dampening function is expected to have little effects on the ME approximation because it adapts to the dampening function by adjusting its coefficients through an optimization process. However it has a non-negligible effect on the EE approximation whose coefficients are invariant to the introduction of the dampening function. To investigate how sensitive the reported results are with respect to the specification of the dampening function we conducted some further experiments. We used a grid search to locate the optimal c for the ME and EE approximations. The optimal c is defined as the minimizer of the sum of the eight relative errors reported in Table 2. The optimal c for the ME approximation is 0.6 which is obtained through a grid search on the interval [0.1 5] with an increment of 0.1. The optimal c for the EE approximation which is 5 is obtained in a similar fashion through a grid search on [1 20] with an increment of 1. Thus the cs selected by the automatic rule 0.56 and 4.05 respectively for the ME and EE approximations are quite close to their optimal values which are generally infeasible in practice. In the third experiment we used sample moments rather than population moments in the approximation of the distribution of a sample mean. Suppose that the variance is unknown in this case. We then have S n = Ȳ EY 1 n 1 n ( n t = 1 Yt Ȳ ) 2 where Ȳ = 1 n n t = 1 Y t. This studentized statistic is a smooth function of sample means. Without loss of generality suppose that {Y t } follows a distribution with mean zero and variance one. Using the general formulae given in appendix 1 we obtain the following approximate moments for S n : μ 3 μ n1 = 1 n μ 2 n2 = μ2 3 n μ 3 μ n3 = 7 n μ 2 n4 = μ2 3 2μ 4 n where μ j = E(Y j 1 )...4. The distribution of S n was obtained by 1 million Monte Carlo simulations. In our experiment we set n = 20 and generated {Y t } from the standard normal distribution. The population moments μ 3 and μ 4 are replaced by sample moments in the approximations. The dampening function is selected automatically according to the rule in (14). The experiment is repeated 500 times. Table 3 reports the average tail probabilities and corresponding

12 Scand J Statist 38 Maximum entropy asymptotic approximations 141 Table 3. Empirical tail probabilities of approximated distributions of studentized mean of n = 20 standard normal random variables (NORM normal; ED Edgeworth; EE exponential Edgeworth; ME maximum entropy; SE standard error) α NORM SE ED SE EE SE ME SE α 1 ( ˆΨ n ) = s α d ˆΨ n (s) (0.0045) (0.0013) (0.0105) (0.0013) (0.0100) (0.0059) (0.0163) (0.0042) (0.0135) (0.0093) (0.0195) (0.0065) (0.0240) (0.0208) (0.0296) (0.0156) α 2 ( ˆΨ n ) = 1 s 1 α/2 s d ˆΨ n (s) α/ (0.0074) (0.0012) (0.0097) (0.0013) (0.0148) (0.0064) (0.0145) (0.0036) (0.0197) (0.0105) (0.0164) (0.0056) (0.0351) (0.0177) (0.0181) (0.0131) standard errors. The results are qualitatively similar to those in Table 2. The ME and Edgeworth approximations dominate the EE and normal approximations. The overall performance of the ME approximation is better than that of the Edgeworth approximation especially for tail probabilities at 0.01 and In addition the standard errors of the ME approximation are smaller than those of the Edgeworth approximation in most cases. 5. Conclusion In this article we have proposed an ME approach for approximating the asymptotic distributions of smooth functions of sample means. Conventional ME densities are typically defined on a bounded support. The proposed ME approximation equipped with an asymptotically negligible dampening function is well defined on the real line. We have established order n 1 asymptotic equivalence between the proposed method and the classical Edgeworth approximation. Our numerical experiments show that the proposed method is comparable with and sometimes outperforms the Edgeworth approximation. Finally we note that the proposed method may be generalized to approximate bootstrap distributions of statistics and those of high dimensional statistics. References Barndorff-Nielsen O. E. & Cox D. R. (1989). Asymptotic techniques for use in statistics. Chapman and Hall London. Barron A. R. & Sheu C. (1991). Approximation of density functions by sequences of exponential families. Ann. Statist Bhattacharya R. N. & Ghosh J. K. (1978). On the validity of the formal Edgeworth approximation. Ann. Statist Crain B. R. (1974). Estimation of distributions using orthogonal approximations. Ann. Statist Crain B. R. (1977). An information theoretic approach to approximating a probability distribution. SIAM J. Appl. Math Edgeworth F. Y. (1905). The law of error. Trans. Camb. Philos. Soc Feller W. (1971). An introduction to probability theory and its application II. Wiley New York. Field C. A. & Hampel F. R. (1982). Small-sample asymptotic distributions of m-estimators of location. Biometrika Field C. A. & Ronchetti E. (1990). Small sample asymptotics. Institute of Mathematical Statistics Hayward CA. Good I. J. (1963). Maximum entropy for hypothesis formulation especially for multidimensional contingency tables. Ann. Math. Statist

13 142 X. Wu and S. Wang Scand J Statist 38 Gilula Z. & Haberman S. J. (2000). Density approximation by summary statistics: an information theoretic approach. Scand. J. Statist Hall P. (1992). The bootstrap and Edgeworth approximation. Springer-Verlag New York. Hampel F. R. (1973). Some small-sample asymptotics. In Proceedings of the Prague Symposium on Asymptotic Statistics (ed. J. Hajek) Charles University Prague. Harremoës P. (2005a). Lower bounds for divergence in central limit theorem. Electron. Notes Discrete Math Harremoës P. (2005b). Maximum entropy and the Edgeworth approximation. Proceedings of IEEE ISOC ITW 2005 on Coding and Complexity Jaynes E. T. (1957). Information theory and statistical mechanics. Phys. Rev Neyman J. (1937). Smooth test for goodness of fit. Skand. Aktuar Niki N. & Konishi S. (1986). Effects of transformations in higher order asymptotic approximations. Ann. Inst. Statist. Math Rockinger M. & Jondeau E. (2002). Entropy densities with an application to autoregressive conditional skewness and kurtosis. J. Econometrics Shore J. E. & Johnson R. W. (1981). Properties of cross-entropy minimization. IEEE Trans. Inform. Theory Tagliani A. (2003). A note on proximity of distributions in terms of coinciding moments. Appl. Math. Comput Wallace D. (1958). Asymptotic approximations to distributions. Ann. Math. Statist Wang S. (1992). General saddlepint approximation in the bootstrap. Statist. Probab. Lett Wu X. (2003). Calculation of maximum entropy densities with application to income distribution. J. Econometrics Wu X. (forthcoming). Multivariate exponential series density estimator. J. Econometrics. Zellner A. & Highfield R. A. (1988). Calculation of maximum entropy distribution and approximation of marginal posterior distributions. J. Econometrics Received December 2008 in final form January 2010 Ximing Wu Department of Agricultural Economics Texas A&M University College Station TX USA. xwu@tamu.edu Appendix 1: Approximate moments and cumulants We first define μ i1...i j = E[(X μ) (i1 ) (X μ) (ij )] j 1 and ν 1 = μ i1 ν 2 = μ i1 i 2 ν 3 = μ i1 i 2 i 3 ν 4 = μ i1 i 2 i 3 i 4 ν 22 = μ i1 i 2 μ i3 i 4 + μ i1 i 3 μ i2 i 4 + μ i1 i 4 μ i2 i 3 ν 23 = μ i1 i 2 μ i3 i 4 i 5 + μ i1 i 3 μ i2 i 4 i 5 + μ i1 i 4 μ i2 i 3 i 5 + μ i1 i 5 μ i2 i 3 i 4 + μ i2 i 3 μ i1 i 4 i 5 + μ i2 i 4 μ i1 i 3 i 5 + μ i2 i 5 μ i1 i 3 i 4 + μ i3 i 4 μ i1 i 2 i 5 ++μ i3 i 5 μ i1 i 2 i 4 + μ i4 i 5 μ i1 i 2 i 3 ν 222 = μ i1 i 2 μ i3 i 4 μ i5 i 6 + μ i1 i 2 μ i3 i 5 μ i4 i 6 + μ i1 i 2 μ i3 i 6 μ i4 i 5 + μ i1 i 3 μ i2 i 4 μ i5 i 6 + μ i1 i 3 μ i2 i 5 μ i4 i 6 + μ i1 i 3 μ i2 i 6 μ i4 i 5 + μ i1 i 4 μ i2 i 3 μ i5 i 6 + μ i1 i 4 μ i2 i 5 μ i3 i 6 + μ i1 i 4 μ i2 i 6 μ i3 i 5 + μ i1 i 5 μ i2 i 3 μ i4 i 6 + μ i1 i 5 μ i2 i 4 μ i3 i 6 + μ i1 i 5 μ i2 i 6 μ i3 i 4 + μ i1 i 6 μ i2 i 3 μ i4 i 5 + μ i1 i 6 μ i2 i 4 μ i3 i 5 + μ i1 i 6 μ i2 i 5 μ i3 i 4 where the dependence of νs oni 1... i 6 is suppressed for simplicity. One can then show that E[Z (i1 )] = ν 1 = 0 E[Z (i1 )Z (i2 )] = ν 2 E[Z (i1 )Z (i2 )Z (i3 )] = n 1/2 ν 3 E[Z (i1 ) Z (i4 )] = ν 22 + n 1 (ν 4 ν 22 ) + o(n 1 ) E[Z (i1 ) Z (i5 )] = n 1/2 ν 23 + o(n 1 ) E[Z (i1 ) Z (i6 )] = ν O(n 1 ).

14 Scand J Statist 38 Maximum entropy asymptotic approximations 143 Expanding E[Sn] j... 4 and collecting terms by order of n 1/2 then yields the first four approximate moments of S n. Denoting the j-fold summation d d i 1 = 1 i by i 1...i j to ease notation we obtain approximate moments of S n as follows: μ n1 = 1 1 b i1 i n 2 2 ν 2 = 1 s 12 i 1 i n 2 μ n2 = b i1 b i2 ν ( 1 b n i1 b i2 i 3 ν b i 1 b i2 i 3 i ) 4 b i 1 i 2 b i3 i 4 ν 22 i 1 i 2 i 1 i 2 i 3 i 1... i 6 i 1...i 4 = s n s 22 μ n3 = 1 3 b i1 b i2 b i3 ν 3 + n 2 b i 1 b i2 b i3 i 4 ν 22 = 1 s 31 i 1 i 2 i 3 i 1... i n 4 μ n4 = b i1 b i2 b i3 b i4 ν ( ) b n i1 b i2 b i3 b i4 ν4 ν b i1 b i2 b i3 b i4 i 5 ν 23 i 1 i 2 i 3 i 4 i 1...i 4 i 1... i 5 + ( 2 3 b i 1 b i2 b i3 b i4 i 5 i ) 2 b i 1 b i2 b i3 i 4 b i5 i 6 ν 222 = s n s 42 where the definitions of s s 42 are self-evident. As usually only a few b i1... i j terms are non-zero the actual formulae for the approximate moments of S n are often simpler than as appeared before. To construct approximate cumulants we first write the cumulants as a power series of n 1/2 such that κ n1 = E[S n ] = n 1/2 s 12 + o(n 1 ) = n 1/2 k 12 + o(n 1 ) κ n2 = E[Sn] 2 E[S n ] 2 = s 21 + n 1 (s 22 s12) 2 + o(n 1 ) = k 21 + n 1 k 22 + o(n 1 ) κ n3 = E[Sn] 3 3E[Sn]E[S 2 n ] + 2E[S n ] 3 = n 1/2 (s 31 3s 12 s 21 ) + o(n 1 ) = n 1/2 k 31 + o(n 1 ) κ n4 = E[Sn] 4 4E[Sn]E[S 3 n ] 3E[Sn] E[Sn]E[S 2 n ] 2 6E[S n ] 4 = n 1 (12s12s s 22 s 21 4s 31 s 12 + s 42 ) + o(n 1 ) = n 1 k 41 + o(n 1 ) where the definitions of k k 41 are self-evident. The second to last equality holds because s 41 3s21 2 = 0. When S n is a sample mean its cumulants can be easily calculated by κ nj = κ j /n j/2 1. For general statistics however this simple identity does not hold and the cumulants have to be constructed from the moments constructed before. Appendix 2: Proof of theorems Proof of theorem 1. Denote the exponent of (13) by K(x) = P 3(x) 6 n + P 4(x) 24n c exp ( ) x n 2 where P 3 (x) and P 4 (x) are polynomials of degrees 3 and 4 respectively whose coefficients are constants that do not depend on n.

15 144 X. Wu and S. Wang Scand J Statist 38 Without loss of generality assume that K(x) > 0. By Taylor s theorem there exists a z such that 0 < K(z) < K(x) and { h nθn (x) = (x) 1 + K(x) + K 2 (x) + K } 3 (x) exp(k(z)) 2! 3! { = (x) 1 + P 3(x) 6 n + P 4(x) 24n + P2 3 (x) } + (x)q(x) + (x) K 3 (x) exp(k(z)) (16) 72n 3! where Q(x) = P2 4 (x) 2(24n) + c2 exp(2 x ) 2 2n 4 + P 3(x)P 4 (x) P 3(x)c exp( x ) 144n 3/2 6n 5/2 P 4(x)c exp( x ) c exp( x ). (17) 24n 3 n 2 Note that the first term of (16) equals the Edgeworth approximation g n (x). As sup x R (x)x r < for all non-negative integers r we have sup x R (x)p r (x) <. Similarly one can show that sup x R P r (x) (x) exp( x ) <. It follows that (x)q(x) = O ( n 3/2). Note that { K 3 P3 (x) (x) = 6 n + P } 3 { 4(x) P3 (x) 3 24n 6 n + P } 2 4(x) c exp ( x ) 24n n 2 { P3 (x) n + P } { 4(x) c exp ( x ) } 2 { ( ) } 3 c exp x. 24n n 2 n 2 Using the fact that sup (x)p r (x) ( exp ( x )) j < j = r = x R we have (x) K 3 (x) = O ( n 3/2). 3! It follows that h nθn = g n + O ( n 3/2) + O ( n 3/2) exp ( K(z) ). (18) It now suffices to show that exp ( K(z) ) < such that h n θn = g n + o(n 1 ). Note that when z = o(n 1/6 ) we have P 3 (z)/ n = o(1) P 4 (z)/n = o(1) thus K(z) is bounded above because its third term is negative. When z = O(n 1/6 ) P 3 (z)/ n = O(1) P 4 (z)/n = o(1). But exp ( x ) /n 2 is increasing at a faster rate than any polynomial of z. Thus the third term of K(z) dominates and K(z) is bounded above. Lastly for z = O(n 1/6 + ε ) ε > 0 the third term of K(z) dominates by the same argument. It follows that sup exp z R ( P 3 (z) + P 4(z) n n c exp ( z ) n 2 ) <. Thus exp ( K(z) ) < and from (18) we have g n (x) h nθn (x) = o(n 1 ). By lemma 1 it then follows that f n (x) h nθn (x) f n (x) g n (x) + g n (x) h nθn (x) = o(n 1 ). As this result is valid for all x it holds uniformly in x.

16 Scand J Statist 38 Maximum entropy asymptotic approximations 145 Proof of theorem 2. Using an argument similar to the proof of theorem 1 we have for an arbitrary non-negative integer r ( ) μ r ( g n ) μ r hnθn x r g n (x) h nθn (x) dx = x r (x) Q(x) + K 3 (x) exp ( K(z) ) dx 3! = o(n 1 ) (19) where Q(x) is given by (17). The last equality uses the fact that x r (x)dx < for all non-negative integers r and the boundedness of exp(k(z)) established in the proof of theorem 1. Because for r = μ r ( g n ) = μ r ( f nθ n ) it follows that μ r ( f nθ n ) μ r ( h nθn ) = o(n 1 ). We then have μ( f nθ n ) μ( h nθn ) = o(n 1 ) where v = Σ d vj 2 is the Euclidean norm of a d-dimensional vector v. Let f (x) = (x) exp(σ 4 j = 0γ j x j c exp( x )) for arbitrary {γ j } 4 j = 0. Suppose that the moments μ( f ) ={μ j ( f )} 4 exist. There is a unique correspondence between μ( f ) and {γ j} 4 j = 0. Denote the coefficients {γ j } 4 j = 0 of f by γ(μ( f )). According to lemma 5 of Barron & Sheu (1991) γ(μ( f )) γ(μ(g)) and μ( f ) μ(g) are of the same asymptotic order for μ( f ) and μ(g) sufficiently close. Denote the coefficients of the ME and EE approximations by γ nθ n and γ n respectively. (Recall that the coefficients of the EE approximation are invariant to the dampening function θ n. Thus no subscript for θ n is required for its coefficients.) As μ( f nθ n ) μ( h nθn ) = o(n 1 ) we have γ nθ n γ n = o(n 1 ). Define K (x) = 4 j = 0 γ njθ n x j c exp( x ) and K (x) similarly. Without loss of generality n 2 assume that K (x) > 0. There exists a z such that 0 < K (z) < K (x) f nθ n (x) h 4 ( nθn (x) = (x) γ njθn γ nj) x j exp ( K (z) ) j = 0 exp ( K (z) ) 4 (x) x j γ njθn γ nj. (20) j = 0 From the proof of theorem 1 we have exp ( K (z) ) < and sup x R (x)x j < for all nonnegative integers j. It follows that f nθ n (x) h nθn (x) = o(n 1 ). Lastly we have f n (x) f nθ n (x) f n (x) h nθn (x) + h nθn (x) f nθ n (x) = o(n 1 ). As the result is valid for all x it holds uniformly in x. Proof of theorem 3. Note that F n (x) H nθn (x) = x x { f n (t) h nθn (t)} dt f n (t) h nθn (t) dt f n (t) g n (t) dt + g n (t) h nθn (t) dt.

17 146 X. Wu and S. Wang Scand J Statist 38 According to Hall (1992) the first integral is of order o(n 1 ). The second integral is of order o(n 1 ) according to (19) in the previous proof. It follows that F n (x) H nθn (x) = o(n 1 ) uniformly in x. It follows that F n (x) F nθ n (x) F n (x) H nθn (x) + H n θn (x) F nθ n (x) o(n 1 ) + h nθn (t) f nθ n (t) dt. Integrating the right-hand side of inequality (20) yields that h n θn (t) f nθ n (t) dt = o(n 1 ). Note that the presence of exp(k (z)) in (20) ensures that the n 1 order is preserved in the integration. Therefore F n (x) F nθ n (x) = o(n 1 ) uniformly in x.

Information Theoretic Asymptotic Approximations for Distributions of Statistics

Information Theoretic Asymptotic Approximations for Distributions of Statistics Information Theoretic Asymptotic Approximations for Distributions of Statistics Ximing Wu Department of Agricultural Economics Texas A&M University Suojin Wang Department of Statistics Texas A&M University

More information

Calculation of maximum entropy densities with application to income distribution

Calculation of maximum entropy densities with application to income distribution Journal of Econometrics 115 (2003) 347 354 www.elsevier.com/locate/econbase Calculation of maximum entropy densities with application to income distribution Ximing Wu Department of Agricultural and Resource

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

STOCHASTIC GEOMETRY BIOIMAGING

STOCHASTIC GEOMETRY BIOIMAGING CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2018 www.csgb.dk RESEARCH REPORT Anders Rønn-Nielsen and Eva B. Vedel Jensen Central limit theorem for mean and variogram estimators in Lévy based

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES J. Korean Math. Soc. 47 1, No., pp. 63 75 DOI 1.4134/JKMS.1.47..63 MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES Ke-Ang Fu Li-Hua Hu Abstract. Let X n ; n 1 be a strictly stationary

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff March 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

This ODE arises in many physical systems that we shall investigate. + ( + 1)u = 0. (λ + s)x λ + s + ( + 1) a λ. (s + 1)(s + 2) a 0

This ODE arises in many physical systems that we shall investigate. + ( + 1)u = 0. (λ + s)x λ + s + ( + 1) a λ. (s + 1)(s + 2) a 0 Legendre equation This ODE arises in many physical systems that we shall investigate We choose We then have Substitution gives ( x 2 ) d 2 u du 2x 2 dx dx + ( + )u u x s a λ x λ a du dx λ a λ (λ + s)x

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,

More information

Large deviations for random projections of l p balls

Large deviations for random projections of l p balls 1/32 Large deviations for random projections of l p balls Nina Gantert CRM, september 5, 2016 Goal: Understanding random projections of high-dimensional convex sets. 2/32 2/32 Outline Goal: Understanding

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS* LARGE EVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILE EPENENT RANOM VECTORS* Adam Jakubowski Alexander V. Nagaev Alexander Zaigraev Nicholas Copernicus University Faculty of Mathematics and Computer Science

More information

Nonparametric Tests for Multi-parameter M-estimators

Nonparametric Tests for Multi-parameter M-estimators Nonparametric Tests for Multi-parameter M-estimators John Robinson School of Mathematics and Statistics University of Sydney The talk is based on joint work with John Kolassa. It follows from work over

More information

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3 CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3 N. Reid Department of Statistics University of Toronto Toronto,

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

An Alternative Method for Estimating and Simulating Maximum Entropy Densities

An Alternative Method for Estimating and Simulating Maximum Entropy Densities An Alternative Method for Estimating and Simulating Maximum Entropy Densities Jae-Young Kim and Joonhwan Lee Seoul National University May, 8 Abstract This paper proposes a method of estimating and simulating

More information

Penalized Exponential Series Estimation of Copula Densities

Penalized Exponential Series Estimation of Copula Densities Penalized Exponential Series Estimation of Copula Densities Ximing Wu Abstract The exponential series density estimator is advantageous to the copula density estimation as it is strictly positive, explicitly

More information

Tail bound inequalities and empirical likelihood for the mean

Tail bound inequalities and empirical likelihood for the mean Tail bound inequalities and empirical likelihood for the mean Sandra Vucane 1 1 University of Latvia, Riga 29 th of September, 2011 Sandra Vucane (LU) Tail bound inequalities and EL for the mean 29.09.2011

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

RESEARCH REPORT. Estimation of sample spacing in stochastic processes. Anders Rønn-Nielsen, Jon Sporring and Eva B.

RESEARCH REPORT. Estimation of sample spacing in stochastic processes.   Anders Rønn-Nielsen, Jon Sporring and Eva B. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING www.csgb.dk RESEARCH REPORT 6 Anders Rønn-Nielsen, Jon Sporring and Eva B. Vedel Jensen Estimation of sample spacing in stochastic processes No. 7,

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

A Criterion for the Compound Poisson Distribution to be Maximum Entropy

A Criterion for the Compound Poisson Distribution to be Maximum Entropy A Criterion for the Compound Poisson Distribution to be Maximum Entropy Oliver Johnson Department of Mathematics University of Bristol University Walk Bristol, BS8 1TW, UK. Email: O.Johnson@bristol.ac.uk

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Strong log-concavity is preserved by convolution

Strong log-concavity is preserved by convolution Strong log-concavity is preserved by convolution Jon A. Wellner Abstract. We review and formulate results concerning strong-log-concavity in both discrete and continuous settings. Although four different

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES Bull. Korean Math. Soc. 52 (205), No. 3, pp. 825 836 http://dx.doi.org/0.434/bkms.205.52.3.825 A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES Yongfeng Wu and Mingzhu

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

GMM estimation of a maximum entropy distribution with interval data

GMM estimation of a maximum entropy distribution with interval data Journal of Econometrics 138 (2007) 532 546 www.elsevier.com/locate/jeconom GMM estimation of a maximum entropy distribution with interval data Ximing Wu a, Jeffrey M. Perloff b, a Department of Agricultural

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Computing Maximum Entropy Densities: A Hybrid Approach

Computing Maximum Entropy Densities: A Hybrid Approach Computing Maximum Entropy Densities: A Hybrid Approach Badong Chen Department of Precision Instruments and Mechanology Tsinghua University Beijing, 84, P. R. China Jinchun Hu Department of Precision Instruments

More information

Inference on reliability in two-parameter exponential stress strength model

Inference on reliability in two-parameter exponential stress strength model Metrika DOI 10.1007/s00184-006-0074-7 Inference on reliability in two-parameter exponential stress strength model K. Krishnamoorthy Shubhabrata Mukherjee Huizhen Guo Received: 19 January 2005 Springer-Verlag

More information

Conjugate Predictive Distributions and Generalized Entropies

Conjugate Predictive Distributions and Generalized Entropies Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Theoretical Statistics. Lecture 1.

Theoretical Statistics. Lecture 1. 1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Complexity of two and multi-stage stochastic programming problems

Complexity of two and multi-stage stochastic programming problems Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept

More information

PRINCIPLES OF STATISTICAL INFERENCE

PRINCIPLES OF STATISTICAL INFERENCE Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Transformation-based Nonparametric Estimation of Multivariate Densities

Transformation-based Nonparametric Estimation of Multivariate Densities Transformation-based Nonparametric Estimation of Multivariate Densities Meng-Shiuh Chang Ximing Wu March 9, 2013 Abstract We propose a probability-integral-transformation-based estimator of multivariate

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Iran.

Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Iran. JIRSS (2012) Vol. 11, No. 2, pp 191-202 A Goodness of Fit Test For Exponentiality Based on Lin-Wong Information M. Abbasnejad, N. R. Arghami, M. Tavakoli Department of Statistics, School of Mathematical

More information

Calculus in Gauss Space

Calculus in Gauss Space Calculus in Gauss Space 1. The Gradient Operator The -dimensional Lebesgue space is the measurable space (E (E )) where E =[0 1) or E = R endowed with the Lebesgue measure, and the calculus of functions

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

A Weighted Generalized Maximum Entropy Estimator with a Data-driven Weight

A Weighted Generalized Maximum Entropy Estimator with a Data-driven Weight Entropy 2009, 11, 1-x; doi:10.3390/e Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy A Weighted Generalized Maximum Entropy Estimator with a Data-driven Weight Ximing Wu Department

More information

Monotonicity and Aging Properties of Random Sums

Monotonicity and Aging Properties of Random Sums Monotonicity and Aging Properties of Random Sums Jun Cai and Gordon E. Willmot Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario Canada N2L 3G1 E-mail: jcai@uwaterloo.ca,

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Change detection problems in branching processes

Change detection problems in branching processes Change detection problems in branching processes Outline of Ph.D. thesis by Tamás T. Szabó Thesis advisor: Professor Gyula Pap Doctoral School of Mathematics and Computer Science Bolyai Institute, University

More information

Sums of exponentials of random walks

Sums of exponentials of random walks Sums of exponentials of random walks Robert de Jong Ohio State University August 27, 2009 Abstract This paper shows that the sum of the exponential of an oscillating random walk converges in distribution,

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Integrating Correlated Bayesian Networks Using Maximum Entropy

Integrating Correlated Bayesian Networks Using Maximum Entropy Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

HIGHER ORDER CUMULANTS OF RANDOM VECTORS, DIFFERENTIAL OPERATORS, AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES

HIGHER ORDER CUMULANTS OF RANDOM VECTORS, DIFFERENTIAL OPERATORS, AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES HIGHER ORDER CUMULANTS OF RANDOM VECTORS DIFFERENTIAL OPERATORS AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES S RAO JAMMALAMADAKA T SUBBA RAO AND GYÖRGY TERDIK Abstract This paper provides

More information

Packing-Dimension Profiles and Fractional Brownian Motion

Packing-Dimension Profiles and Fractional Brownian Motion Under consideration for publication in Math. Proc. Camb. Phil. Soc. 1 Packing-Dimension Profiles and Fractional Brownian Motion By DAVAR KHOSHNEVISAN Department of Mathematics, 155 S. 1400 E., JWB 233,

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

THE information capacity is one of the most important

THE information capacity is one of the most important 256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Zeros of lacunary random polynomials

Zeros of lacunary random polynomials Zeros of lacunary random polynomials Igor E. Pritsker Dedicated to Norm Levenberg on his 60th birthday Abstract We study the asymptotic distribution of zeros for the lacunary random polynomials. It is

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,

More information

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

THEOREM OF OSELEDETS. We recall some basic facts and terminology relative to linear cocycles and the multiplicative ergodic theorem of Oseledets [1].

THEOREM OF OSELEDETS. We recall some basic facts and terminology relative to linear cocycles and the multiplicative ergodic theorem of Oseledets [1]. THEOREM OF OSELEDETS We recall some basic facts and terminology relative to linear cocycles and the multiplicative ergodic theorem of Oseledets []. 0.. Cocycles over maps. Let µ be a probability measure

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

On large deviations of sums of independent random variables

On large deviations of sums of independent random variables On large deviations of sums of independent random variables Zhishui Hu 12, Valentin V. Petrov 23 and John Robinson 2 1 Department of Statistics and Finance, University of Science and Technology of China,

More information