Submitted to the Brazilian Journal of Probability and Statistics

Size: px
Start display at page:

Download "Submitted to the Brazilian Journal of Probability and Statistics"

Transcription

1 Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a The London School of Economics and Political Science b The University of Manchester Abstract. We use the delta method and Stein s method to derive, under regularity conditions, explicit upper bounds for the distributional distance between the distribution of the maximum likelihood estimator (MLE) of a d-dimensional parameter and its asymptotic multivariate normal distribution. Our bounds apply in situations in which the MLE can be written as a function of a sum of i.i.d. t-dimensional random vectors. We apply our general bound to establish a bound for the multivariate normal approximation of the MLE of the normal distribution with unknown mean and variance. Introduction The asymptotic normality of maximum likelihood estimators (MLEs), under regularity conditions, is one of the most well-known and fundamental results in mathematical statistics. In a very recent paper, Anastasiou (06) obtained explicit upper bounds on the distributional distance between the distribution of the MLE of a d-dimensional parameter and its asymptotic multivariate normal distribution. In the present paper, we use a different approach to derive such bounds in the special situation where the MLE can be written as a function of a sum of i.i.d. t-dimensional random vectors. In this setting, our bounds improve on (or are at least as good as) those of Anastasiou (06) in terms of sharpness, simplicity and strength of distributional distance. The process used to derive our bounds consists a combination of Stein s method and the multivariate delta method. It is a natural generalisation of the approach used by Anastasiou and Ley (07) to obtain improved bounds only in cases where the MLE can be expressed as a function of MSC 00 subject classifications: Primary 60F05, 6E7, 6F Keywords and phrases. Multi-parameter maximum likelihood estimation, multivariate normal distribution, Stein s method

2 A. Anastasiou and R. E. Gaunt a sum of i.i.d. random variables than those obtained by Anastasiou and Reinert (07) for the asymptotic normality of the single parameter MLE. Let us now introduce the notation and setting of this paper. Let θ = (θ,..., θ d ) denote the unknown parameter found in a statistical model. Let θ 0 = (θ 0,..., θ 0d ) be the true (still unknown) value and let Θ R d denote the parameter space. Suppose X = (X,..., X n ) is a random sample of n i.i.d. t-dimensional random vectors with joint probability density (or mass) function f(x θ), where x = (x, x,..., x n ). The likelihood function is L(θ; x) = f(x θ), and its natural logarithm, called the log-likelihood function, is denoted by l(θ; x). We shall assume that the MLE ˆθ n (X) of the parameter of interest θ exists and is unique. It should, however, be noted that uniqueness and existence of the MLE can not be taken for granted; see, for example, Billingsley (96) for an example of non-uniqueness. Assumptions that ensure existence and uniqueness of the MLE are given in Mäkeläinen et al. (98). In addition to assuming the existence and uniqueness of the MLE, we shall assume that the MLE takes the following form. Let q : Θ R d be such as all the entries of the function q have continuous partial derivatives with respect to θ and that q(ˆθ n (x)) = n g(x i ) = n ( g (x i ),..., g d (x i ) ) (.) for some g : R t R d. This setting is a natural generalisation of that of Anastasiou and Ley (07) to multiparameter MLEs. The representation (.) for the MLE allows us to approximate the MLE using a different approach to that given in Anastasiou (06), which results in a bound that improves on that obtained by Anastasiou (06). As we shall see in Section 3, a simple and important example of an MLE of the form (.) is given by the normal distribution with unknown mean and variance: { f(x θ) = exp } (x µ), x R. πσ σ In this paper, we derive general bounds for the distributional distance between an MLE of the form (.) and its limiting multivariate normal distribution. The rest of the paper is organised as follows. In Section, we derive a general upper bound on the distributional distance between the distribution of the MLE of a d-dimensional parameter and the limiting multivariate normal distribution for situations in which (.) is satisfied. In Section 3, we apply our general bound to obtain a bound for the multivariate

3 Multivariate normal approximation of the MLE 3 normal approximation of the MLE of the normal distribution with unknown mean and variance. The general bound In this section, we give quantitative results in order to compliment the qualitative result for the multivariate normal approximation of the MLE as expressed below in Theorem.. In the statement of this theorem, and the rest of the paper, I(θ 0 ) denotes the expected Fisher information matrix for one random vector, while [I(θ 0 )] denotes the principal square root of I(θ 0 ). Also, I d d denotes the d d identity matrix. To avoid confusion with the expected Fisher information matrix, the subscript will always be included in the notation for the d d identity matrix. Following Davison (008), the following assumptions are made in order for the asymptotic normality of the vector MLE to hold: (R.C.) The densities defined by any two different values of θ are distinct. (R.C.) l(θ; x) is three times differentiable with respect to the unknown vector parameter, θ, and the third partial derivatives are continuous in θ. (R.C.3) For any θ 0 Θ and for X denoting the support of the data, there exists ɛ 0 > 0 and functions M rst (x) (they can depend on θ 0 ), such that for θ = (θ, θ,..., θ d ) and r, s, t, j =,,..., d, 3 l(θ; x) θ r θ s θ t M rst(x), x X, θ j θ 0,j < ɛ 0, with E[M rst (X)] <. (R.C.4) For all θ Θ, E θ [l Xi (θ)] = 0. (R.C.5) The expected Fisher information matrix for one random vector I(θ) is finite, symmetric and positive definite. For r, s =,,..., d, its elements satisfy [ n[i(θ)] rs = E l(θ; X) ] [ ] l(θ; X) = E l(θ; X). θ r θ s θ r θ s This implies that ni(θ) is the covariance matrix of (l(θ; x)). These regularity conditions in the multi-parameter case resemble those in Anastasiou and Reinert (07) where the parameter is scalar. The following theorem gives the qualitative result for the asymptotic normality of a vector MLE (see Davison (008) for a proof).

4 4 A. Anastasiou and R. E. Gaunt Theorem.. Let X, X,..., X n be i.i.d. random vectors with probability density (mass) functions f(x i θ), i n, where θ Θ R d. Also let Z N d (0, I d d ), where 0 is the d zero vector and I d d is the d d identity matrix. Then, under (R.C.)-(R.C.5), [I(θ0 )] d (ˆθ n (X) θ 0 ) N d(0, I d d ). n Let us now introduce some notation. We let Cb n(rd ) denote the space of bounded functions on R d with bounded k-th order derivatives for k n. We abbreviate h := sup i x i h and h := sup i,j x i x j h, where = is the supremum norm. For ease of presentation, we let the subscript (m) denote an index for which ˆθn (x) (m) θ 0(m) is the largest among the d components: (m) {,,..., d} : ˆθn (x) (m) θ 0(m) ˆθn (x) j θ 0j, j {,,..., d}, and also Q (m) = Q (m) (X, θ 0 ) := ˆθ n (X) (m) θ 0(m). (.) The following theorem gives the main result of this paper. Theorem.. Let X, X,..., X n be i.i.d. R t -valued random vectors with probability density (or mass) function f(x i θ), for which the parameter space Θ is an open subset of R d. Let Z N d (0, I d d ). Assume that the MLE ˆθ n (X) exists and is unique and that (R.C.)-(R.C.5) are satisfied. Let q : Θ R d be twice differentiable and g : R t R d such that the special structure (.) is satisfied. Let ξ ij = [ ] K l= kl (g l(x i ) q l (θ 0 )), where K = [ q(θ 0 )] [I(θ 0 )] [ q(θ 0 )]. (.)

5 Multivariate normal approximation of the MLE 5 Also, let Q (m) be as in (.). Then, for all h Cb (Rd ), [ ( n ))] E h [I(θ0 )] (ˆθ n (X) θ 0 E[h(Z)] π h { } 4 E ξ n 3/ ij ξ ik ξ il + E [ξ ij ξ ik ] E ξ il j,k,l= + h ) ɛ E (ˆθn (X) j θ 0j + h where j= { E [ Aj (θ 0, X) ] [ Q(m) < ɛ +E Aj (θ 0, X) ] } Q(m) < ɛ, j= A j (θ 0, X) = n { ( ) [I(θ 0 )] [ q(θ0 )] K k= A j (θ 0, X) = n { ( [I(θ 0 )] [ q(θ0 )] ) jk k= } (ˆθn ) (X) l θ 0l Mklm (X). jk m= l= (.3) ( q k (ˆθ n (X) ) ) } q k (θ 0 ) ) (ˆθn (X) m θ 0m Remark.3.. For fixed d, it is clear that the first term is O(n / ). For the second term, using Cox and Snell (968) we know that the mean squared error (MSE) of the MLE is of order n. This yields, for fixed d, E [ d j= (ˆθ n (X) j θ 0,j ) ] = O(n ). To establish that (.3) is order n /, it therefore suffices to show that for the third term the quantities E [ A j (θ 0, X) Q (m) < ɛ ] and E [ A j (θ 0, X) Q (m) < ɛ ] are O(). For A j (θ 0, X), we use the Cauchy-Schwarz inequality to get that E (ˆθn ) (X) m θ 0m )(ˆθn (X) l θ 0l Mjkl (X) [E [(ˆθn ) ) ] (X) m θ 0m (ˆθn ] / (X) l θ 0l [ E [ (M klm (X)) Q (m) < ɛ ]] /. (.4) Since from Cox and Snell (968) the MSE is of order n, we apply (.4) to the expression of A j (θ 0, X) to get that E [ A j (θ 0, X) Q (m) < ɛ ] = O(). Whilst we suspect that E [ A j (θ 0, X) Q (m) < ɛ ] are generally

6 6 A. Anastasiou and R. E. Gaunt O(), we have been unable to establish this fact. In the univariate d = case, this term vanishes (see Anastasiou and Ley (07)), and this is also the case in our application in Section 3 (in which case the matrices I(θ 0 ) and q(θ 0 ) are diagonal).. An upper bound on the quantity E[h( [I(θ 0 )] (ˆθ n (X) θ 0 ))] E[h(Z)] was given in Anastasiou (06). A brief comparison of that bound and our bound follows. The bound in Anastasiou (06) is more general than our bound. It covers the case of independent but not necessarily identically distributed random vectors. In addition, the bound in Anastasiou (06) is more general in the sense that it is applicable whatever the form of the MLE is; our bound can be used only when (.) is satisfied. Furthermore, Anastasiou (06) gives bounds even for cases where the MLE cannot be expressed analytically. On the other hand, our bound is preferable when the MLE has a representation of the form (.). Our bound in (.3) applies to a wider class of test functions h; our class is Cb (Rd ), as opposed to Cb 3(Rd ), and has a better dependence on the dimension d; for fixed n, our bound is O(d 4 ), whereas the bound of Anastasiou (06) is O(d 6 ). Moreover, in situations where the MLE is already a sum of independent terms our bound is easier to use since if we apply q(x) = x to (.) yields R = 0 in (.). This means that our bound (.3) simplifies to [ ( n ))] E h [I(θ0 )] (ˆθ n (X) θ 0 E[h(Z)] π h { } 4 E ξ n 3/ ij ξ ik ξ il + E [ξ ij ξ ik ] E ξ il. j,k,l= Another simplification of the general bound is given in Section 3, where X, X,..., X n are taken to be i.i.d. normal random variables with unknown mean and variance, which can also be treated by Anastasiou (06). The simpler terms in our bound lead to a much improved bound for this example (see Remark 3. for more details). 3. If for all k, m, l {,,..., d}, θ m θ l q k (θ) is uniformly bounded in Θ, then we do not need to use ɛ or conditional expectations in deriving an upper bound (we essentially apply Theorem. with ɛ ). This leads to

7 Multivariate normal approximation of the MLE 7 the following simpler bound: [ ( n ))] E h [I(θ0 )] (ˆθn (X) θ 0 E[h(Z)] π h { } 4 E ξ n 3/ ij ξ ik ξ il + E[ξ ij ξ ik ] E ξ il + h j,k,l= { E A j (θ 0, X) + E A j (θ 0, X) }. (.5) j= To prove Theorem., we employ Stein s method and the multivariate delta method. Our strategy consists of benefiting from the special form of q (ˆθn (X) ), which is a sum of random vectors. It is here where multivariate delta method comes into play; instead of comparing ˆθ n (X) to Z N d (0, I d d ), we compare q (ˆθ n (X) ) to Z and then find upper bounds on the distributional distance between the distribution of ˆθ n (X) and q (ˆθ n (X) ). As well as recalling the multivariate delta method, we state and prove two lemmas that will be used in the proof of Theorem.. Theorem.4. (Multivariate delta method) Let the parameter θ Θ, where Θ R d. Furthermore, let Y n be a sequence of random vectors in R d which, for Z N d (0, I d d ), satisfies d n (Yn θ) Σ Z, n where Σ is a symmetric, positive definite covariance matrix. Then, for any function b : R d R d of which the Jacobian matrix b(θ) R d d exists and is invertible, we have that for Λ := [ b(θ)] Σ [ b(θ)], Lemma.5. Let d n (b (Yn ) b(θ)) Λ Z. n W = K [g(x i ) q(θ 0 )], (.6) where K is as in (.). Then EW = 0 and Cov(W) = I d d. Proof. By the multivariate central limit theorem, d (g(x i ) Eg(X )) N d(0, Cov(g(X )), (.7) n

8 8 A. Anastasiou and R. E. Gaunt and, by Theorem.4, ( ) ) d n q (ˆθ n (X) q(θ 0 ) N d(0, K). (.8) n Since q(ˆθ n (X)) = n n g(x i) and X, X,..., X n are i.i.d. random vectors, it follows that Eq(θ 0 ) = Eg(X ). Furthermore, K = Cov(g(X )) because the limiting multivariate normal distributions in (.7) and (.8) must have the same mean and covariance. It is now immediate that EW = 0, and a simple calculation using that K = Cov(g(X )) yields that Cov(W) = I d d. Lemma.6. Suppose that X,j,..., X n,j are independent for a fixed j, but that the random variables X i,,..., X i,d may be dependent for any fixed i. For j =,..., d, let W j = n n X ij and denote W = (W,..., W d ). Suppose that E [W] = 0 and Cov(W) = I d d. Then, for all h C b (Rd ), π h { E [h(w)] E [h(z)] 4 E X n 3/ ij X ik X il j,k,l= } + E [X ij X ik ] E X il. (.9) Proof. The following bound follows immediately from Lemma. of Gaunt and Reinert (06): E [h(w)] E [h(z)] 3 f(x) { n 3/ x i x i x i3 E X ij X ik X il j,k,l= } + E [X ij X ik ] E X il, (.0) where f solves the so-called Stein equation f(w) w f(w) = h(w) E [h(z)]. From Proposition. of Gaunt (06), we have the bound 3 f(x) π x i x i x i3 h, which when applied to (.0) yields (.9).

9 Multivariate normal approximation of the MLE 9 Proof of Theorem.. The triangle inequality yields [ ( n[i(θ0 ))] E h )] (ˆθ n (X) θ 0 E[h(Z)] R + R, (.) where [ ( nk ( R E h q (ˆθ n (X) ) ))] q (θ 0 ) E[h(Z)], [ ( n[i(θ0 )) R E h )] (ˆθ n (X) θ 0 ( nk ( h q (ˆθn (X) ) ]. q (θ 0 ))) (.) The rest of the proof focuses on bounding the terms R and R. Step : Upper bound for R. For ease of presentation, we denote by W = ( ) nk q(ˆθ n (X)) q(θ 0 ) = K as in (.6). Thus, W = (W, W,..., W d ) with W j = n l= [ K ]jl (g l(x i ) q l (θ 0 )) = n [g(x i ) q(θ 0 )], ξ ij, j {,..., d}, where ξ ij = d [ ] l= K jl (g l(x i ) q l (θ 0 )) are independent random variables, since the X i are assumed to be independent. Therefore, W j can be expressed as a sum of independent terms. This together with Lemma.5 ensures that the assumptions of Lemma.6 are satisfied. Therefore, (.9) yields R = E[h(W )] E[h(Z)] π h { } 4 E ξ n 3/ ij ξ ik ξ il + E[ξ ij ξ ik E ξ il. j,k,l= Step : Upper bound for R. For ease of presentation, we let ( n[i(θ0 )) C := C (h, q, X, θ 0 ) := h )] (ˆθn (X) θ 0 ( nk ( h q (ˆθ n (X) ) )) q (θ 0 ),

10 0 A. Anastasiou and R. E. Gaunt and our aim is to find an upper bound for E[C ]. Let us denote by [A] [j] the j th row of a matrix A. We begin by using a first order Taylor expansion to obtain ( n[i(θ0 )) ( nk ( h )] (ˆθ n (x) θ 0 = h q (ˆθ n (X) ) )) q (θ 0 ) + { n[[i(θ0 ) )] ][j] (ˆθn (x) θ 0 j= [ K ] [j] ( q (ˆθ n (X) ) ) } q(θ 0 ) h(t(x)), x j where t(x) is between ) n[i(θ 0 )] ( (ˆθn (x) θ 0 and nk q (ˆθn (X) ) q (θ 0 ) ). Rearranging the terms gives [ ] C h n [I(θ 0 )] j= [ K ] [j] [j] (ˆθ n (x) θ 0 ) ( q (ˆθ n (X) ) q(θ 0 )). (.3) For θ0 between ˆθ n (x) and θ 0, a second order Taylor expansion of q j (ˆθn (x) ) about θ 0 gives that, for all j {,,..., d}, q j (ˆθ n (x) ) = q j (θ 0 ) + + ) (ˆθn (x) k θ 0k q j (θ 0 ) θ k k= k= l= ) ) (ˆθn (x) k θ 0k (ˆθn (x) l θ 0l and after a small rearrangement of the terms we get that ( ) q j θ θ k θ 0 l ) [ q(θ 0 )] [j] (ˆθ n (x) θ 0 = qj (ˆθ n (x) ) q j (θ 0 ) ) (ˆθn (x) k θ 0k )(ˆθn (x) l θ 0l q j (θ0 θ k θ ). l k= l= Since this holds for all j {,,..., d}, we have that q(θ 0 ) (ˆθ ) n (x) θ 0 = q (ˆθ n (X) ) q(θ 0 ) ) (ˆθn (x) k θ 0k )(ˆθn (x) l θ 0l q(θ0 θ k θ ), l k= l=

11 Multivariate normal approximation of the MLE which then leads to ) n[i(θ0 )] (ˆθ n (X) θ 0 [I(θ 0)] [ q(θ0 )] = ( n[i(θ 0 )] [ q(θ0 )] q (ˆθ n (X) ) ) q(θ 0 ) k= l= ) (ˆθn (x) k θ 0k )(ˆθn (x) l θ 0l θ k θ l q(θ 0 ). (.4) Subtracting the quantity ( nk q (ˆθ n (X) ) ) q (θ 0 ) from both sides of (.4) now gives ) n[i(θ0 )] (ˆθ n (X) θ 0 ( nk q (ˆθ n (X) ) ) q(θ 0 ) = ( ) ( n [I(θ 0 )] [ q(θ0 )] K q (ˆθn (X) ) ) q(θ 0 ) [I(θ 0)] [ q(θ0 )] k= l= ) (ˆθn (x) k θ 0k )(ˆθn (x) l θ 0l q(θ0 θ k θ ). l Componentwise we have that [ ] ) n [I(θ 0 )] (ˆθ n (X) θ 0 [ K ] ( q (ˆθ [j] [j] n (X) ) ) q(θ 0 ) = ( [ n [I(θ 0 )] [ q(θ0 )] ] [ K ] ( [j] [j]) q (ˆθ n (X) ) ) q(θ 0 ) [ [I(θ 0 )] [ q(θ0 )] ] [j] ) ) (ˆθn (x) k θ 0k (ˆθn (x) l θ 0l q(θ0 θ k θ ) l = k= l= { ( ) [I(θ 0 )] [ q(θ0 )] K k= ( [I(θ 0 )] [ q(θ0 )] ) jk k= m= l= jk ) ) (ˆθn (x) m θ 0m (ˆθn (x) l θ 0l ( q k (ˆθ n (X) ) ) } q k (θ 0 ) q k (θ0 θ m θ ) l. (.5) We need to take into account that θ k θ l q j (θ) is not uniformly bounded in θ. However, due to (R.C.3), it is assumed that for any θ 0 Θ there

12 A. Anastasiou and R. E. Gaunt exists 0 < ɛ = ɛ(θ 0 ) and constants M jkl (X), j, k, l {,,..., d}, such that θ k θ l q j (θ) M jkl (X) for all θ Θ such that θ j θ 0j < ɛ, j {,,..., d}. Therefore, with ɛ > 0 and Q (m) as in (.), the law of total expectation yields R = E[C ] E C = E [ C Q (m) ɛ ] P ( Q (m) ɛ ) + E [ C Q (m) < ɛ ] P ( Q (m) < ɛ ). (.6) The Markov inequality applied to P ( Q (m) ɛ ) yields P ( ) ( ) Q(m) ɛ = P ˆθn (X) (m) θ 0(m) ɛ ) P (ˆθn (X) j θ 0j ɛ j= ) ɛ E (ˆθn (X) j θ 0j. (.7) j= In addition, P ( Q (m) < ɛ), which is a reasonable bound because the consistency of the MLE, a result of (R.C.)-(R.C.5), ensures that ˆθn (x) (m) θ 0(m) should be small and therefore P ( Q (m) < ɛ) = P ( ˆθ n (x) (m) θ 0(m) < ɛ). This result along with (.7) applied to (.6) gives R h ) ɛ E (ˆθn (X) j θ 0j + E [ C Q (m) < ɛ ]. (.8) j= For an upper bound on E [ C Q (m) < ɛ ], using (.3) and (.5) yields R h ) ɛ E (ˆθn (X) j θ 0j + h {E [ A j (θ 0, X) Q n (m) < ɛ ] j= + E [ A j (θ 0, X) Q (m) < ɛ ]}, with A j (θ 0, X) and A j (θ 0, X) given as in the statement of the theorem. Summing our bounds for R and R gives the assertion of the theorem. j= 3 Normally distributed random variables In this section, we illustrate the application of the general bound of Theorem. by considering the important case that X, X,..., X n are i.i.d. N(µ, σ )

13 Multivariate normal approximation of the MLE 3 random variables with θ 0 = (µ, σ ). Here the density function is f(x θ) = { exp πσ σ (x µ) }, x R. (3.) It is well-known that in this case the MLE exists, is unique and equal to ˆθ n (X) = ( X, n ( n Xi X ) ) ; see for example (Davison, 008, p.8). As we shall see in the proof of the following corollary, the MLE has a representation of the form (.). Thus, Theorem. can be applied to derive the following bound. Empirical results are given after the proof. Corollary 3.. Let X, X,..., X n be i.i.d. N ( µ, σ ) random variables and let Z N (0, I ). Then, for all h Cb (R ), [ ( n ))] E h [I(θ0 )] (ˆθ n (X) θ 0 E[h(Z)] 7 h + h. (3.) n n Remark 3.. An upper bound for the distributional distance between the MLE of the normal distribution, treated under canonical parametrisation, with unknown mean and variance has also been obtained in Anastasiou (06). That bound is also of order O(n / ), but our numerical constants are two orders of magnitude smaller. Furthermore, our test functions h are less restrictive, coming from the class Cb (R ) rather than the class Cb 3(R ). Finally, the bound in (3.) does not depend on the parameter θ 0 = (µ, σ ), while the bound given in Anastasiou (06) depends on the natural parameters and blows up when the variance tends to zero or infinity. Proof. From the general approach in the previous section, we want to have q : Θ R, such that We take which then yields ( q(ˆθ n (X)) = q X, n q (ˆθ n (x) ) = n ( ) g (x i ). g (x i ) q(θ) = q(θ, θ ) = ( θ, θ + (θ µ) ), (3.3) ) ( (X i X) = X, n = ( n X i, n (X i X) + ( ) ) X µ ) (X i µ), (3.4)

14 4 A. Anastasiou and R. E. Gaunt and therefore g (x i ) = x i, g (x i ) = (x i µ), (3.5) and thus the MLE has a representation of the form (.). We now note that the Jacobian matrix of q(θ) evaluated at θ 0 = (µ, σ ) is ( ) 0 q(θ 0 ) =. (3.6) 0 Furthermore, it is easily deduced from (3.3) that k, m, l {, }, we have that θ m θ l q k (θ). Hence, using Remark.3 we conclude that the general bound is as in (.5) and no positive constant ɛ is necessary. Now, for l(θ 0 ; X) denoting the log-likelihood function, we obtain from (3.) that the expected Fisher information matrix is I(θ 0 ) = ( ) n E 0 [ (l(θ 0 ; X))] = σ. (3.7) 0 σ 4 The results in (3.6) and (3.7) yield ( ) K = [ q(θ 0 )] [I(θ 0 )] [ q(θ 0 )] = [I(θ 0 )] σ 0 = 0 σ 4. (3.8) Using (.6), (3.4), (3.5) and (3.8), we get that W = K = ( n ( [g(x i ) q(θ 0 )] = σ 0 n X i µ σ n (X i µ) σ nσ ) and therefore, for i {,..., n},, 0 σ ) ( X i µ ) (X i µ) σ ξ i = X i µ, ξ i = (X i µ) σ. σ σ For the first term of the general upper bound in (.5), we are required to bound certain expectations involving ξ ij. We first note that X i µ d σ = Z N(0, ). Using the standard formulas for the moments and absolute moments of the standard normal distribution (see Winkelbauer (0)), we have, for all i =,..., n, E ξ i = E Z = π, Eξ i = EZ =, E ξ i 3 = E Z 3 = π

15 and, by Hölder s inequality, Multivariate normal approximation of the MLE 5 Therefore, and j,k,l E ξ i = E Z (E ( Z ) ) / =, Eξ i = E ( Z ) =, E ξ i 3 = = = E ξ ij ξ ik ξ il = j,k,l= 3/ E Z 3 ( 3/ E ( Z ) ) 4 3/4 ( EZ 8 3/ 4EZ 6 + 6EZ 4 4EZ + ) 3/4 3/ ( )3/4 = 5 3/4. (E ξ i 3 + 3EξiE ξ i + 3E ξ i Eξi + E ξ i 3 ) ( ) π π + 53/4 < 4.6n, (3.9) Eξ ij ξ ik E ξ il = j,l= ( ) EξijE ξ il = n +. (3.0) π Applying the results of (3.9) and (3.0) to the first term of the general bound in (.5) yields π h { } 4 E ξ n 3/ ij ξ ik ξ il + Eξ ij ξ ik E ξ il j,k,l= ( ( ) ) π h 4 4n n = h 7 h. (3.) n 3/ π n n For the second term, we need to calculate A j (θ 0, X) and A j (θ 0, X) as in Theorem.. The results of (3.6) and (3.8) yield to [I(θ 0 )] [ (q(θ0 ))] K = 0, where 0 denotes the by zero matrix. Therefore A j (θ 0, X) = 0 for

16 6 A. Anastasiou and R. E. Gaunt j =,. In order to calculate A j (θ 0, X), we note that θ m θ l q (θ) = 0, m, l {, }, q (θ) = q (θ) = q (θ) = 0, θ θ θ θ θ q (θ) =. θ Using this result as well as (3.6) and (3.7), we get that A (θ 0, X) = 0, A (θ 0, X) = n σ ( ) n ( ) X µ = X µ. σ Therefore the last term of the general upper bound in (.5) becomes h { E Aj (θ 0, X) + E A j (θ 0, X) } n j= = h n [ ( E A (θ 0, X) = h ) ] n σ E X µ = h. (3.) n Finally, the bounds in (3.) and (3.) are applied to (.5) to obtain the assertion of the corollary. Here, we carry out a large-scale simulation study to investigate the accuracy of the bound in (3.). For n = 0 j, j = 3, 4, 5, 6, we start by generating 0 4 trials of n random independent observations, X, following the N(µ, σ ) distribution. We take µ =, σ = for our simulations. Then [I(θ0 )] ) (ˆθ n (X) θ 0 is evaluated in each trial, which in turn gives a vector of 0 4 values. The function h(x, y) = ( x + y + ), which belongs in Cb (R ), is then applied to these values in order to get the sample mean, which we denote by Ê[ h ( [I(θ 0 )] ))] (ˆθn (X) θ 0. For the function h, it is straightforward to see that h = 3 3 8, h =. We use these values to calculate the bound in (3.). We define [ ( n ))] Q h (θ 0 ) := Ê h [I(θ0 )] (ˆθn (X) θ 0 Ẽ[h(Z)],

17 Multivariate normal approximation of the MLE 7 where Ẽ[h(Z)] = 0.46 is the approximation of E[h(Z)] up to three decimal places. We compare Q h (θ 0 ) with the bound in (3.), using the difference between their values as a measure of the error. The results from the simulations are shown in Table below. Table Simulation results for the N(, ) distribution n Q h (θ 0) Upper bound Error We see that the bound and the error decrease as the sample size gets larger. To be more precise, when at each step we increase the sample size by a factor of ten, the value of the upper bound drops by a factor close to 0, which is expected since the order of the bound is O(n / ), as can be seen from (3.). Acknowledgements This research occurred whilst AA was studying for a DPhil at the University of Oxford, supported by a Teaching Assistantship Bursary from the Department of Statistics, University of Oxford, and the EPSRC grant EP/K5033/. RG acknowledges support from EPSRC grant EP/K0340/ and is currently supported by a Dame Kathleen Ollerenshaw Research Fellowship. We would like to thank the referee for their helpful comments and suggestions. References Anastasiou, A. (06). Assessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous data. pdf/ pdf. Anastasiou, A. and C. Ley (07). Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method. ALEA: Latin American Journal of Probability and Mathematical Statistics 4, Anastasiou, A. and G. Reinert (07). Bounds for the normal approximation of the maximum likelihood estimator. Bernoulli 3, 9 8. Billingsley, P. (96). Statistical Methods in Markov Chains. The Annals of Mathematical Statistics 3, No., 40. Cox, D. R. and E. J. Snell (968). A General Definition of Residuals. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 30, Davison, A. C. (008). Statistical Models (First ed.). Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.

18 8 A. Anastasiou and R. E. Gaunt Gaunt, R. E. (06). Rates of Convergence in Normal Approximation Under Moment Conditions Via New Bounds on Solutions of the Stein Equation. Journal of Theoretical Probability 9, Gaunt, R. E. and G. Reinert (06). The rate of convergence of some asymptotically chisquare distributed statistics by Stein s method. pdf. Mäkeläinen, T., K. Schmidt, and G. P. H. Styan (98). On the existence and uniqueness of the maximum likelihood estimate of a vector-valued parameter in fixed-size samples. The Annals of Statistics 9, No.4, Winkelbauer, A. (0). Moments and absolute moments of the normal distribution.

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

Stein s Method Applied to Some Statistical Problems

Stein s Method Applied to Some Statistical Problems Stein s Method Applied to Some Statistical Problems Jay Bartroff Borchard Colloquium 2017 Jay Bartroff (USC) Stein s for Stats 4.Jul.17 1 / 36 Outline of this talk 1. Stein s Method 2. Bounds to the normal

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X. Chapter 6 Completeness Lecture 18 Recall from Definition 2.22 that a Cauchy sequence in (X, d) is a sequence whose terms get closer and closer together, without any limit being specified. In the Euclidean

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded=- 10.95415pt,/pgf/stepy/.expanded=- 10.95415pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore

More information

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A). Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale

More information

Normal Approximation for Hierarchical Structures

Normal Approximation for Hierarchical Structures Normal Approximation for Hierarchical Structures Larry Goldstein University of Southern California July 1, 2004 Abstract Given F : [a, b] k [a, b] and a non-constant X 0 with P (X 0 [a, b]) = 1, define

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Fisher Information & Efficiency

Fisher Information & Efficiency Fisher Information & Efficiency Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Introduction Let f(x θ) be the pdf of X for θ Θ; at times we will also consider a

More information

A Study on the Correlation of Bivariate And Trivariate Normal Models

A Study on the Correlation of Bivariate And Trivariate Normal Models Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 11-1-2013 A Study on the Correlation of Bivariate And Trivariate Normal Models Maria

More information

Haruhiko Ogasawara. This article gives the first half of an expository supplement to Ogasawara (2015).

Haruhiko Ogasawara. This article gives the first half of an expository supplement to Ogasawara (2015). Economic Review (Otaru University of Commerce, Vol.66, No. & 3, 9-58. December, 5. Expository supplement I to the paper Asymptotic expansions for the estimators of Lagrange multipliers and associated parameters

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling

Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling Stein s Method and the Zero Bias Transformation with Application to Simple Random Sampling Larry Goldstein and Gesine Reinert November 8, 001 Abstract Let W be a random variable with mean zero and variance

More information

Z-estimators (generalized method of moments)

Z-estimators (generalized method of moments) Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT

MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT Statistica Sinica 23 (2013), 1523-1540 doi:http://dx.doi.org/10.5705/ss.2012.203s MODERATE DEVIATIONS IN POISSON APPROXIMATION: A FIRST ATTEMPT Louis H. Y. Chen 1, Xiao Fang 1,2 and Qi-Man Shao 3 1 National

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Sampling distribution of GLM regression coefficients

Sampling distribution of GLM regression coefficients Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

Flat and multimodal likelihoods and model lack of fit in curved exponential families

Flat and multimodal likelihoods and model lack of fit in curved exponential families Mathematical Statistics Stockholm University Flat and multimodal likelihoods and model lack of fit in curved exponential families Rolf Sundberg Research Report 2009:1 ISSN 1650-0377 Postal address: Mathematical

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Differential Stein operators for multivariate continuous distributions and applications

Differential Stein operators for multivariate continuous distributions and applications Differential Stein operators for multivariate continuous distributions and applications Gesine Reinert A French/American Collaborative Colloquium on Concentration Inequalities, High Dimensional Statistics

More information

ECE534, Spring 2018: Solutions for Problem Set #3

ECE534, Spring 2018: Solutions for Problem Set #3 ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their

More information

Section 9: Generalized method of moments

Section 9: Generalized method of moments 1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i

More information

Probability on a Riemannian Manifold

Probability on a Riemannian Manifold Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

arxiv: v3 [stat.me] 10 Mar 2016

arxiv: v3 [stat.me] 10 Mar 2016 Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

Euler Equations: local existence

Euler Equations: local existence Euler Equations: local existence Mat 529, Lesson 2. 1 Active scalars formulation We start with a lemma. Lemma 1. Assume that w is a magnetization variable, i.e. t w + u w + ( u) w = 0. If u = Pw then u

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Theory of Statistical Tests

Theory of Statistical Tests Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information