STA 732: Inference. Notes 4. General Classical Methods & Efficiency of Tests B&D 2, 4, 5

Size: px
Start display at page:

Download "STA 732: Inference. Notes 4. General Classical Methods & Efficiency of Tests B&D 2, 4, 5"

Transcription

1 STA 732: Inference Notes 4. General Classical Methods & Efficiency of Tests B&D 2, 4, 5 1 Going beyond ML with ML type methods 1.1 When ML fails Example. Is LSAT correlated with GPA? Suppose we want to answer this question based on data (X i, Y i ), i = 1,, n, on LSAT and GPA scores of n students. As in Hw 1, we could model (X i, Y i ) BVN(µ 1, µ 2, σ1, 2 σ2, 2 ρ) and formalize the test on the correlation coefficient parameter ρ. However, there is no compelling reason to believe that the two scores are jointly normally distributed. A more general model and test formalization, with almost no assumptions on the shape of the bivariate pdf is given by (X i, Y i ) g, g G 2 = {all densities on R 2 with finite second moments} test H 0 : ρ(g) = 0 vs. H 1 : ρ(g) 0 where 1 ρ(g) = Corr(X, Y g) Unfortunately, the ML approach does not work here because ĝ MLE for any observed data (x i, y i ), i = 1,..., n we have does not exist! Indeed, sup l x,y (g) =. (1) g G 2 As a quick check of (1), define for every k 1, g k := 1 n n N 2(z i, I 2 ) G 2, where z i = (x i, y i ) T and notice that for k large, l x,y (g k ) n log k as k. 1.2 Methods based on M-estimators Consider the situation where data X 1,..., X n are modeled as X i g, g G and we are interested in the parameter θ = θ(g) Θ R d. Here θ provides only a partial indexing of the model space G. We could write G = a Θ {g G : θ(g) = a}. An ML type approach for inference on θ could be developed by starting with some criterion function m(x i, θ) with the property that large values of m(x i, θ) indicate a closer agreement 2 between θ and X i. This could be made more precise in the following way: if the 1 More precisely, ρ(g) = σ12(g) σ 1(g)σ 2(g) with σ 12(g) = (x µ 1 (g))(y µ 2 (g))g(x, y)dxdy, σ 2 1(g) = (x µ 1 (g)) 2 g(x, y)dx, σ 2 2(g) = (y µ 2 (g)) 2 g(x, y)dy, µ 1 (g) = xg(x, y)dxdy, µ 2 (g) = yg(x, y)dxdy 2 The text uses discrepancy measure to mean the negative of a criterion function. 1

2 true pdf is g = g 0 G and so the true θ = θ 0 := θ(g 0 ), then the map θ E{m(X i, θ) g 0 } is maximized at θ 0 and is potentially a decreasing function θ θ 0 at least in a small neighborhood of θ 0. Testing of H 0 : θ Θ 0 vs. H 1 : θ Θ 0 could be carried out by the test that rejects H 0 for large values of the statistic where M n = max θ Θ m n(θ) max θ Θ 0 m n (θ) m n (θ) = n m(x i, θ). Under reasonable regularity conditions we might expect m n (θ) to be approximately quadratic around a neighborhood of its maxima ˆθ n = argmax θ Θ m n (θ) with curvature { m n (ˆθ)}. Hence the above test will be approximately equivalent to reject H 0 whenever {θ : (θ ˆθ n ) T { m n (ˆθ n )}(θ ˆθ n ) c 2 } Θ 0 = for some c. The maxima ˆθ n, which can be taken as an estimator of θ, is commonly referred to as an M-estimator (M for maximization). In case of a parametric model where θ provides a complete indexing, i.e., we could rewrite the mode as X i g(x i θ), θ Θ, an attractive criterion function is m(x i, θ) = log g(x i θ), with E{m(X i, θ) θ 0 } = const K(g( θ 0 ), g( θ)), taking us back to the ML approach. However, other criterion functions could be used in this situation and will have to be used in other situations where θ provides only a partial indexing. Here are some examples. Example (Least squares, linear and non-linear). Consider data X i = (Z i, Y i ) R p R, i = 1,, n modeled as Y i = h(z i, β) + σϵ i, (Z i, ϵ i ) g(z i, ϵ i ) with model parameters (β, σ, g) R p R + G, where G is the space of all pdfs g(z, ϵ) on R p R such that E[ϵ Z, g] = 0, Var[ϵ Z, g] = 1. Here h is a known function, potentially non-linear and satisfies the identifiability condition: h(z, β 1 ) = h(z, β 2 ) z β 1 = β 2. The least squares criterion function is: m(x 1, β) = (Y 1 h(z 1, β)) 2. If (β 0, σ 0, g 0 ) characterizes the true distribution of the data, and E 0 denotes expectation under this distribution, then E 0 {m(x i, β)} = σ 2 0 E 0 {h(z 1, β 0 ) h(z 1, β)} 2 which is uniquely maximized at β = β 0. The M-estimator ˆβ n is often computed by using quasi-newton optimization algorithms. Example (Quantile regression). Consider data X i = (Z i, Y i ) R p R, i = 1,, n and suppose we want to model the τ-th conditional quantile of Y i given Z i by Z T i β. More formally, Y i = Z T i β + ϵ i, IND Z i g z (z i ), ϵ i Z i g ϵ (ε i Z i ) 2

3 0 where (β, g z, g ϵ ) are unknown but g ϵ is constrained to satisfy: g ϵ(ε z i )dε = τ, so that under the model P (Y i zi T β Z i = z i ) = τ for all z i. Koenker and Bassett introduced the criterion function m(x i, β) = (Y i Z T i β){τ I(Y i Z T i β 0)} and it can be proved that E 0 {m(x i, β)} is uniquely maximized at β = β 0. The M-estimator ˆβ n can be computed efficiently by linear programming. 1.3 Z-estimators If the criterion function is differentiable in θ, then the M-estimator can potentially be computed by solving the equation: ψ n (θ) := n ψ(x i, θ) = 0 (2) where ψ(x i, θ) = m(x θ i, θ). This suggests another way to get to a good estimator of θ, which can then be used to construct test procedures for hypotheses about θ. Let dim(θ) = d. We will call a d-dimensional map θ ψ(x i, θ) a score function if E 0 ψ(x i, θ) has a unique zero at θ = θ 0. A root (or zero) ˆθ n of ψ n (θ) will be called a Z-estimator. Bear in mind that an M-estimator need not be a Z-estimator (MLE for the model X i Uniform(0, θ), θ > 0) and a Z-estimator need not be an M-estimator (moment estimators, next). Example (Method of Moments). Consider the model X 1,, X n g(x i θ), θ Θ R d. Suppose X j s are univariate. Let µ j (θ) = E{X j 1 θ}, j = 1,, d. Suppose θ (µ 1 (θ), µ 2 (θ),, µ d (θ)) is one-to-one. Then one could estimate θ by solving the equations: µ j (θ) = 1 n X j i n, j = 1, 2,, d The solution ˆθ is called the method of moments (MoM) estimator. Similar methods could be devised for multivariate X j s. 1.4 Asymptotic Normality Under regularity conditions, asymptotic normality properties could be established for both M- and Z-estimators, leading to asymptotic size and p-value calculations for tests based on these estimators. The classical conditions are precisely the Cramér conditions we saw for asymptotic normality of the MLE. In fact we can reproduce Theorem 7 from Handout 3 almost verbatim. Here is a precise statement for the Z-estimator (and so also applies to M-estimators that can be seen as Z-estimators). 3

4 Theorem 1. Suppose X 1, X 2,... are modeled as X i g, g G and we are interested in θ = θ(g) Θ R d. Assume g 0 G is the true distribution and θ 0 = θ(g 0 ) is the true value of θ. Let E 0, Var 0 denote expectation and variance under g 0. Suppose θ ψ(x, θ) is twice continuously differentiable in a neighborhood U 0 of θ 0 for every x. Suppose E 0 ψ(x i, θ 0 ) = 0, Var 0 ψ(x i, θ 0 ) is finite and the matrix E 0 ψ(x i, θ 0 ) exists and is nonsingular. Assume that the second order partial derivatives are dominated within U 0 by a fixed function ψ(x) with E 0 ψ(x1 ) <. Then, 1. The probability that the equation ψ n (θ) = 0 has at least one root tends to 1 as n and there exists a sequence o roots θ n such that θ n P θ For any consistent estimator sequence ˆθ n satisfying ψ n (ˆθ n ) = 0, n n(ˆθn θ 0 ) = {E 0 ψ(x 1, θ 0 ) T } 1 1 n ψ(x i θ 0 ) + R n with R n P 0, and hence, n(ˆθn θ 0 ) L N d (0, Σ(g 0 )) where Σ(g 0 ) = {E 0 ψ(x 1, θ 0 ) T } 1 {Var 0 ψ(x 1, θ 0 )}{E 0 ψ(x 1, θ 0 )} 1 ). 3. With probability tending to 1, 1 n ψ n (ˆθ n ) is nonsingular and ˆΣ n := { 1 n ψ n (ˆθ n ) T } 1 { 1 n and hence, ˆθ n AN d (θ 0, 1 n ˆΣ n ). } n ψ(x i, ˆθ n )ψ(x i, ˆθ n ) T { 1 ψ n n (ˆθ n )} 1 P Σ(g 0 ) Proof. Repeat the proof of Theorem 7 of Handout 3 by replacing l n (θ) with ψ n (θ). Example (Error-in-variable regression). Let observations X i = (Z i, Y i ) R R, i = 1, 2,... denote paired univariate measurements modeled as Z i = U i + ϕ i, Y i = α + βu i + ϵ i, (U i, ϕ i, ϵ i ) g U N(0, σ 2 ) N(0, σ 2 ). This model is referred to as error-in-variable regression or measurement-error regression, since the actual predictor U i is unobserved, but we get to record a contaminated version Z i = U i + ϕ i. The distribution g U of the latent predictor is unknown, and is of no interest. Primary interest concerns regression coefficients 3 θ = (α, β, σ 2 ). The simple least squares estimate of θ provides poor inference, the estimator is not even consistent. But a consistent, asymptotically normal Z-estimator is obtained by using (HW 2): Y i α βz i ψ(x i, θ) = Z i (Y i α βz i ) + βσ 2. (Y i α βz i ) 2 σ 2 (1 + β 2 ) 3 Here we have assumed Var(ϕ i ) = Var(ϵ i ) = σ 2. In a more general setting we could let Var(ϕ i ) = ρσ 2. When ρ is known, we could get back to the equal variance setting by rescaling U i and Z i. Unknown ρ is difficult to deal with, and requires prior knowledge or additional variables. 4

5 2 Other approaches The M- and Z-estimator approaches provide a broad platform to construct statistical procedures with sound frequentist properties. They cover a large number of widely used statistical methods. However, there are an even larger number of procedures that do not stem from a particular platform of methods generation, but are constructed more or less based on heuristics that apply to a particular context. A testing procedure built upon heuristics, turns into a rigorous testing procedure the moment we are able to calculate its size and establish good power on the alternatives. These calculations can be challenging, but asymptotic theory and/or simulation techniques help out. Here are two specific examples. 2.1 Rank based test for location shift Consider two samples of observations W 1,..., W m and V 1,..., V k modeled as W i g(w i θ), V j g(v j ), W i s, V j s independent where θ R and g {all densities on R} are unknown. We are interested in testing H 0 : θ = 0, vs. H 1 : θ 0. A popular test statistic is the Mann-Whitney statistic 4 : U = 1 mk m k I(V j W i ) j=1 Clearly, µ(θ) := E(U θ) = P (V 1 W 1 θ), which equals 1/2 if θ = 0. If θ > 0, then U is expected to be larger (because then W i s get shifted to right by θ and should be relatively larger than V j s), and similarly, for θ < 0, U is expected to be smaller. Hence a test statistic against H 0 is given by U 1/2. It can be shown (from U-statistic theory, beyond the scope of this course) that with n = m + k and G(w) denoting the CDF of g(w), n(u µ(θ)) = n m m n [G(W i ) E{G(W i ) θ}] k k [G(V j θ) E{G(V j θ) θ}]+r n (θ) with R n (θ) P 0 under θ as n. If m/n λ (0, 1) as n, then we can conclude from the above (by standard CLT) that j=1 n(u µ(θ)) L N 1 (0, σ 2 (θ)) (3) where σ 2 (θ) = 1 λ Var{G(W 1) θ} λ Var{G(V 1 θ) θ}. 4 A related statistic is the Wilcoxon rank sum statistic: R = m rank(w i) = mku + m(m + 1)/2 where rank(w i ) gives the rank of W i in the pooled sample {W 1,..., W m, V 1,, V k } arranged from smallest to largest. 5

6 These calculations simplify considerably when θ = 0, which is what we need to calculate the size of the test based on U. Note that when θ = 0, W i, V j g and hence G(W i ) and G(V j ) are all Uniform(0, 1) variables with variance 1/12. Hence, σ 2 (0) = 1 12λ(1 λ). A simple consistent estimator of σ 2 (0) is n 2 /{12mk} and hence an approximately size-α Mann-Whitney test for H 0 : θ = 0 vs H 1 : θ 0 is given by reject H 0 if 12mk U 1/2 /n Φ 1 (1 α/2). Of course, if g was known to be a Gaussian, then the ML test for the above hypotheses would be given by the standard t-test (with equal variance assumed). We will later see that even in this case, a size-α Mann-Whitney test has power comparable to a size-α t-test. Much more interestingly, if g is non-gaussian and possesses tails that are thicker than those of a Gaussian pdf, then the Mann-Whitney test could be much more powerful than the t-test! 2.2 Testing complete spatial randomness (CSR) Point pattern data are routinely studied in ecology and related fields. Let X 1,..., X n denote observed spatial locations of n events (such as occurrences of some shrub species) in S R 2. Can we detect whether the point pattern of these locations is aggregate (i.e., points are clustered together), or regular (i.e., grid-like and the points repel each other) or completely spatially random? A heuristic approach to this inference problem goes like this. The K-statistic at distance r is defined as ˆK(r) = A w n 2 ij I( X i X j r) i j where A = area(s), w ij = 1/area(S B(X i, r)) with B(c, r) := {x R 2 : x c r} denoting the ball/disc in R 2 with center c and radius r. The hypothesis of CSR implies K(r) πr 2. So we will take our test statistic as ˆL(r) = ˆK(r) π The exact distribution of ˆL(r) is difficult to assess (depends on the shape of S in an intractable way). But it is easy to simulate CSR patterns of n points in S, and we could use the simulations to create Monte Carlo approximation to the distribution of ˆL(r). If L α/2,r and L 1 α/2,r denote the (MC appoximate) α/2 and (1 α/2)-th quantiles of L(r) under CSR, then we could reject H 0 : CSR at range r if ˆL(r) [L α/2,r, L 1 α/2,r ]. See for more details and for an application to shrub patterns. 3 Relative Efficiency of Tests Now that we have seen multiple ways of constructing test procedures, it is natural to ask if we could rank them based on their relative performances. While such comparisons are hard to make for a fixed sample size, asymptotic calculations often help. 6 r

7 To facilitate our discussion, let s get a clear formalization of the asymptotic context. We will consider an infinite sequence of statistical models X (n) p n,θ, θ Θ with a common parameter. For data X 1, X 2,..., modeled as X i g(x i θ), θ Θ, such a sequence of models is given by X (n) = (X 1,..., X n ), p n,θ (x (n) ) = g(x 1 θ) g(x n θ). (4) For testing H 0 : θ Θ 0 vs. H 1 : θ Θ c 0, we will consider sequences of test procedures given by test statistics and critical values, {(T n, c n ) : n = 1, 2,...} with T (n) based on data X (n), typically given by the same test generation scheme applied to all n, such as ML, or a Z-estimation associated with a certain score function ψ, etc. Such a test sequence will be called asymptotically size α if, lim sup P (T n c n θ) = α. n θ Θ 0 Now consider two sequences of test procedures sequences (T n,1, c n,1 ) n, (T n,2, c n,2 ), n = 1, 2,... and denote the corresponding power function sequences as: π n,1 (θ) = P (T n,1 c n,1 θ), π n,2 (θ) = P (T n,2 c n,2 θ). If both test sequences are asymptotically size α, then we would declare the first sequence asymptotically better than the second if lim π n,1(θ) lim π n,2 (θ) for every θ Θ c n n 0. However, such comparisons are mostly useless because almost for all reasonable size α test sequences, the limiting power equals 1 at any θ Θ c 0. For example, for the sequence of models based on X i N(µ, 1), an asymptotically size 5% test for H 0 : µ 0 vs H 1 : µ 0 is given by reject H 0 if n X (n) At any µ > 0, π n (µ) = 1 Φ(1.64 nµ) 1 as n. 3.1 The Pitman Relative Efficiency The situation can be salvaged by comparing lim n π n (θ n ) over a sequence of increasingly difficult alternatives θ n Θ 0 [the boundary of Θ 0 ]. To make headway along this line, we make a number of simplifying assumptions. First, we assume Θ 0 = {θ 0 }, a point null with θ 0. We also focus only on test statistic sequences (T n : n = 1, 2,...) that are regular at θ 0, i.e., there exist functions µ(θ) and σ(θ) such that for any h, n{tn µ(θ 0 + h n )} σ(θ 0 + h n ) L N(0, 1) under p n,θ0 + h n. (5) The above condition is to be understood as follows: if Z n denotes the normalized statistic on the left of (5) then lim n P (Z n A θ 0 + h n ) = P (Z A) for every A R where Z N(0, 1). Bear in mind that regularity is a strong condition and need not hold even if 7

8 we could show n(t n µ(θ))/σ(θ) L N(0, 1) at every θ Θ. We will later see tools to establish regularity. As a sequence of increasingly difficult alternatives, consider θ n = θ 0 + h/ n, n 1, for any fixed h 0, i.e., we consider the limiting power along a sequence of alternatives collapsing onto θ 0 at n rate along a ray h. If {(T n, c n ), n = 1} is asymptotically size α with T n regular, then we can write c n = µ(θ 0 ) + σ(θ 0 )(z α + r n )/ n where z α = Φ 1 (1 α) and r n 0. lim π n(θ n ) = lim P n n Notice that, ( n{tn µ(θ n )} σ(θ n ) n{µ(θ0 ) + σ(θ 0 )(z α + r n )/ n µ(θ n )} lim n σ(θ n ) and hence by Slutsky s theorem, ( ) lim π n(θ) = 1 Φ z α ht µ(θ 0 ). n σ(θ 0 ) n{µ(θ0 ) + σ(θ 0 )(z α + r n )/ n µ(θ n )} σ(θ n ) = z α ht µ(θ 0 ) σ(θ 0 ) The quantity µ(θ 0 )/σ(θ 0 ) is known as the slope of the test statistic sequence T n at θ 0. Clearly we should prefer test statistics with higher slopes! The above calculations may seem too dependent on the arbitrary choice of the collapsing alternatives. Pitman established that these calculations are indeed robust. To formalize this, I state below the definition of the Pitman relative efficiency and a theorem that connects this concept with the ratio of slopes of two test sequences. Definition 1. Let {T n,1 : n 1} and {T n,2 : n 1} be two test statistics sequences. For any given α, γ (0, 1), let n 1 (α, γ, θ) denote the minimum n needed so that a level α test based on T n,1 has power at least γ at θ. Let n 2 (α, γ, θ) denote the same for the other sequence. The Pitman relative efficiency of the first sequence against the second, at level α, power γ is defined as: n 2 (α, γ, θ ν ) lim (6) θ ν θ 0 n 1 (α, γ, θ ν ) provided the limit exists. In principle, the limiting sample size ratio may depend on α, γ and the particular collapsing sequence of alternatives θ ν. But often it does not, especially if θ ν = θ 0 + νh where h is a fixed unit vector and ν 0, i.e., the collapsing alternatives θ ν approach θ 0 along a ray and from only one direction. Below is a precise statement and proof for the case of dim(θ) = 1 with Θ = [0, ), θ 0 = 0. Theorem 2. Consider statistical models (p n,θ : θ 0} such that p n,θ p n,0 0 as θ 0 for every fixed n. Let T n,1 and T n,2 be two sequences of test statistics that satisfy the regularity condition (5) with µ i (0) > 0 and σ i (0) > 0, i = 1, 2. Then the Pitman relative efficiency of the first sequence to the second equals ( ) 2 µ1 (0)/σ 1 (0) µ 2 (0)/σ 2 (0) for every sequence θ ν 0 independently of α and γ. 8 ).

9 Proof. A somewhat tedious but straightforward argument shows that each of n i (α, γ, θ ν ), i = 1, 2, must diverge to. Also to ensure asymptotic size of α, the critical value for maybe taken as µ i (0) + σ i (0)(z α + r n,i ) where r n,i 0. And hence, ( π ni (α,γ,θ ν ),i(θ ν ) = 1 Φ z α + r n ) µ i (0) n i (α, β, θ ν )θ ν σ i (0) (1 + r n,i) + r n where all of r n, r n, r n 0. These power values converge to γ only when the argument on Φ() above converges to z γ and hence,. n 2 (α, γ, θ ν ) lim ν n 1 (α, γ, θ ν ) = lim n 2 (α, γ, θ ν )θν 2 ν n 1 (α, γ, θ ν )θν 2 = (z α z γ ) 2 /( µ 2(0) σ 2 (0) )2 (z α z γ ) 2 /( = µ 1(0) σ 1 (0) )2 3.2 Le Cam s Third Lemma & Local Asymptotic Normality ( ) 2 µ1 (0)/σ 1 (0) µ 2 (0)/σ 2 (0) The regularity property (5) requires establishing asymptotic normality of T n under a sequence of probability distributions q n = p n,θ0 +h/ n associated with a sequence of parameter values θ n = θ 0 + h/ n. We have seen that it is often possible to establish asymptotic normality under the sequence p n = p n,θ0 associated with the fixed parameter value θ 0. Le Cam developed a brilliant theory of contiguity and local asymptotic normality to show that asymptotic normality under p n could be enough to establish asymptotic normality under q n. We will not see his theory in full glory, but here is the much celebrated Le Cam s third lemma : Theorem 3 (Le Cam s third lemma). For every n = 1, 2,..., let p n, q n be two pdfs defined on a common space X n and let T n : X n R k be a map such that, ( ) Tn (X (n) (( ) ( )) ) L µ Σ τ log q n p n (X (n) N ) k+1 1, 2 σ2 τ σ 2 under X (n) p n. Then T n (X n ) L N k (µ + τ, Σ) under X (n) q n. The result might look a peculiar at first because of the requirement that log qn p n (X (n) ) L N( σ 2 /2, σ 2 ) for some σ 2. There is nothing unusual about this because it holds for all regular models. In fact it holds under a broad characterization known as Local Asymptotic Normality (LAN). Here is the definition: Definition 2 (Local Asymptotic Normality). A sequence of statistical models (p n,θ : θ Θ) is called locally asymptotically normal (LAN) at θ = θ 0 if there exist matrices r n, I θ0 and random vectors n,θ0 such that n,θ0 L N(0, I θ0 ) under p n,θ0 and for every converging sequence h n h with R n 0 in probability under p n,θ0. log p n,θ 0 +rn 1 h n (X (n) ) = h T n,θ0 1 p n,θ0 2 ht I θ0 h + R n, 9

10 Note that if the family is LAN at θ 0, then log p n,θ 0 +rn 1 p n,θ0 h n (X (n) ) L N( 1 2 σ2, σ 2 ) under p n,θ0 with σ 2 = h T I θ0 h. Therefore for a model satisfying LAN, we can invoke Le Cam s third lemma for any statistic T n = T n (X (n) ) as soon as we can establish a bivariate CLT between T n and h T n,θ0 and calculate the limiting covariance vector τ. Before we work out a specific example, here are some results on when LAN happens. Example ( models). Suppose p n,θ s are associated with an observations model X 1, X 2,... g(x i θ), θ Θ as in (4). If {g( θ) : θ Θ} satisfies the Cramér conditions at a θ 0 then (p n,θ : θ Θ) is LAN at θ 0 with I θ0 = I F (θ 0 ), the Fisher information at θ 0, r n = n I d, (I d is the d d identity matrix). n,θ0 = 1 n n θ log g(x i θ 0 ) = 1 n ln (θ 0 ). The same assertion holds under a conditions weaker than the Cramér conditions. The family {g( θ) : θ Θ} is said to be differentiable in quadratic mean (DQM) at θ 0 if there is a function s θ0 (x) such that lim h 0 1 h 2 [ g(x θ0 + h) g(x θ 0 ) 1 2 ht s θ0 (x) g(x θ 0 )] 2 dx = 0. When θ log g(x θ) is differentiable at θ 0 for almost every x, one typically has s θ0 (x) = θ0 log g(x θ 0 ), the score function. If {g( θ) : θ Θ} is DQM at θ 0 then the associated sequence of models (p n,θ : θ Θ) is LAN at θ 0 with I θ0 = E{s θ0 (X)s θ0 (X) T θ 0 }, r n = n I d and n,θ0 = n s θ 0 (X i )/ n. 3.3 Slope calculation for Mann-Whitney test Under the assumed model on X (n) = (W 1,..., W m, V 1,..., V k ), we have, p n,θ (w 1,..., w m, v 1,..., v k ) = m m g(w i θ) g(v j ) = p 1 m,θ(w (m) )p 2 k(v (k) ). j=1 Assume g is differentiable everywhere with derivative ġ. Then LAN holds at θ = 0 for the sequence of models p 1 m,θ on W (m) = (W 1,..., W m ), with m,0 = 1 m m ġ(w i)/g(w i ) and r m = m; and the same is inherited by the original model sequence p n,θ with r n = m and n,0 = 1 m ġ(w i ) m g(w i ). We have earlier seen that the centered and scaled Mann-Whitney statistic T n = n(u 1 2 ) admits the asymptotic expansion (3) with θ = 0. And hence if m/n λ (0, 1), then the 10

11 g(w) Relative Efficiency Logistic π 2 /9 = 110% Normal 3/π = 95% Uniform 100% t(3) 124% t(5) 190% 3 4 w2 )I( w < 1) (lower-bound) Table 1: Relative efficiency of the Mann-Whitney test against the two sample t-test for various choices of g. assumption of the Le Cam s third lemma holds for p n = p n,0 and q n = p n,h/ n = p n,hn /r n where h n = h m/n λ, with µ = 0, Σ = 1/{12λ(1 λ)} and { } τ = he ġ(w1 ) G(W g(w 1 ) 1) θ = 0 = h G(w)ġ(w)dw = h g(w) 2 dw by the integration by parts identity. And so n(u 1/2) L N(τ, 1/{12λ(1 λ)}) under q n. Consequently, n{u µ(h/ n)} = n(u 1/2) n{µ(h/ n) µ(0)} L N(τ h µ(0), 1 12λ(1 λ) ) but, µ(0) = θ µ(θ) θ=0 = θ [ {1 G(w θ)}g(w)dw] θ=0 = g(w) 2 dw by interchanging differentiation and integration. Hence, n(u µ(h/ n)) L N(0, 1 12λ(1 λ) ) under q n. So the regularity condition (5) holds with µ(θ) = P (V 1 < W 1 θ) and σ 2 (θ) = 1/{12λ(1 λ)}. So the slope is µ(0)/σ(0) = 12λ(1 λ) g(w) 2 dw. 3.4 Relative Efficiency of Mann-Whitney test against T-test It is rather straightforward to show that the slope of the two sample t-test 5 is λ(1 λ)/σ g, where σ 2 g = Var(W 1 g) = w 2 g(w)dw { wg(w)dw} 2. So the Pitman relative efficiency of the Mann-Whitney test against the two sample t-test is 12σ 2 g{ g(w) 2 dw} 2. Table 1 lists the efficiency number for various choices of g. It can be shown that this number is never smaller that 108/125 86% but it can be made arbitrarily large by considering a thick tailed g. 4 Parametric Models, Efficiency Bounds and Optimality of ML tests It turns out that under LAN conditions (and hence Cramér conditions which imply LAN), there is non-trivial bound on lim n π n (θ 0 +h/r n ) and there are sequence of test procedures that attain these bounds. Here is a precise statement. Theorem 4. Let Θ R d and suppose the sequence of parametric models (p n,θ : θ Θ), n = 1, 2... is LAN at θ = θ 0 with associated r n I d, I θ0 and n,θ0 such that r n. Suppose 5 Holds for either version: equal or unequal variance. See HW 2. 11

12 η : Θ R is differentiable at θ 0 with non-zero gradient η(θ 0 ) and η(θ 0 ) = 0. Then the power function sequence π n (θ) of any sequence of level α tests for testing H 0 : η(θ) 0 vs. H 1 : η(θ) > 0 satisfy, for every h such that η(θ 0 ) T h > 0, lim sup π n (θ 0 + h r n ) 1 Φ z α n and the bound is attained by tests reject H 0 if T n z α with η(θ 0 ) T h η(θ 0 ) T I 1 θ 0 η(θ 0 ) T n = η(θ 0) T I 1 θ 0 n,θ0 η(θ 0 ) T I 1 θ 0 η(θ 0 ) A sketch of a proof. It can be established that (perhaps on some subsequence) π n (θ 0 +h/r n ) converges to some π(h). The LAN property then leads to the fact that 6 π(h) is the power function for some size α procedure for testing H 0 : η(θ 0 ) T h 0 vs. H 1 : η(θ 0 ) T h > 0 in the model (N d (h, I 1 θ 0 ) : h R k ). But there is a size α UMP test for this H 0 in this model, whose power function is precisely the bound displayed in the statement of the theorem. That the test reject H 0 if T n z α attains the bound can shown by establishing T L n N d ( η(θ 0 ) T h/{ η(θ 0 ) T I 1 θ 0 η(θ 0 )} 1/2, 1) under θ 0 +h/r n by applications of Le Cam s third lemma and Slutsky s theorem. 6 This requires a powerful representation theorem which we omit. 12

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 17: Likelihood ratio and asymptotic tests Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

5601 Notes: The Sandwich Estimator

5601 Notes: The Sandwich Estimator 560 Notes: The Sandwich Estimator Charles J. Geyer December 6, 2003 Contents Maximum Likelihood Estimation 2. Likelihood for One Observation................... 2.2 Likelihood for Many IID Observations...............

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information

Z-estimators (generalized method of moments)

Z-estimators (generalized method of moments) Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Relative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem

Relative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem Relative efficiency Patrick Breheny October 9 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Relative efficiency Suppose test 1 requires n 1 observations to obtain a certain power β, and that test

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Asymmetric least squares estimation and testing

Asymmetric least squares estimation and testing Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012 Outline ALS estimators Large sample properties Asymptotic

More information

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X). 4. Interval estimation The goal for interval estimation is to specify the accurary of an estimate. A 1 α confidence set for a parameter θ is a set C(X) in the parameter space Θ, depending only on X, such

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Introduction to Statistical Inference Ping Yu Department of Economics University of Hong Kong Ping Yu (HKU) Statistics 1 / 30 1 Point Estimation 2 Hypothesis Testing Ping Yu (HKU) Statistics 2 / 30 The

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Theoretical Statistics. Lecture 23.

Theoretical Statistics. Lecture 23. Theoretical Statistics. Lecture 23. Peter Bartlett 1. Recall: QMD and local asymptotic normality. [vdv7] 2. Convergence of experiments, maximum likelihood. 3. Relative efficiency of tests. [vdv14] 1 Local

More information

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 STA 73: Inference Notes. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 1 Testing as a rule Fisher s quantification of extremeness of observed evidence clearly lacked rigorous mathematical interpretation.

More information

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants 18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Method of Conditional Moments Based on Incomplete Data

Method of Conditional Moments Based on Incomplete Data , ISSN 0974-570X (Online, ISSN 0974-5718 (Print, Vol. 20; Issue No. 3; Year 2013, Copyright 2013 by CESER Publications Method of Conditional Moments Based on Incomplete Data Yan Lu 1 and Naisheng Wang

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Lecture 26: Likelihood ratio tests

Lecture 26: Likelihood ratio tests Lecture 26: Likelihood ratio tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f θ0 (X) > c 0 for

More information

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. Rebecca Barter March 30, 2015 Mann-Whitney Test Mann-Whitney Test Recall that the Mann-Whitney

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

5.2 Fisher information and the Cramer-Rao bound

5.2 Fisher information and the Cramer-Rao bound Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Theoretical Statistics. Lecture 22.

Theoretical Statistics. Lecture 22. Theoretical Statistics. Lecture 22. Peter Bartlett 1. Recall: Asymptotic testing. 2. Quadratic mean differentiability. 3. Local asymptotic normality. [vdv7] 1 Recall: Asymptotic testing Consider the asymptotics

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Chapter 10. U-statistics Statistical Functionals and V-Statistics Chapter 10 U-statistics When one is willing to assume the existence of a simple random sample X 1,..., X n, U- statistics generalize common notions of unbiased estimation such as the sample mean and the

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

Local Asymptotic Normality

Local Asymptotic Normality Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions

More information

Generalized nonparametric tests for one-sample location problem based on sub-samples

Generalized nonparametric tests for one-sample location problem based on sub-samples ProbStat Forum, Volume 5, October 212, Pages 112 123 ISSN 974-3235 ProbStat Forum is an e-journal. For details please visit www.probstat.org.in Generalized nonparametric tests for one-sample location problem

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Stat 5101 Notes: Algorithms

Stat 5101 Notes: Algorithms Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Classical Inference for Gaussian Linear Models

Classical Inference for Gaussian Linear Models Classical Inference for Gaussian Linear Models Surya Tokdar Gaussian linear model Data (z i, Y i ), i = 1,, n, dim(z i ) = p Model Y i = z T i Parameters: β R p, σ 2 > 0 Inference needed on η = a T β IID

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information