Asymptotic Tests and Likelihood Ratio Tests

Size: px
Start display at page:

Download "Asymptotic Tests and Likelihood Ratio Tests"

Transcription

1 Asymptotic Tests and Likelihood Ratio Tests Dennis D. Cox Department of Statistics Rice University P. O. Box 1892 Houston, Texas November 21,

2 1 Chapter 6, Section 5: Asymptotic Tests Here, we consider asymptotic tests and, in particular, Wald tests, likelihood ratio tests, and score tests. 1.1 General Considerations; Wald Tests. Definitio.1 Let X n denote the observation vector at stage n and consider a test of H 0 : θ Θ 0 vs. H 1 : θ Θ 1. A sequence of tests φ n (X n ) is called an asymptotic (sequence of) level α test(s) of H 0 vs. H 1 if that is θ Θ 0, lim sup E θ [φ n (X n )] α, n θ Θ 0, ǫ > 0, N such that n N, E θ [φ n (X n )] < α + ǫ. (1) Example 1.1 Let X 1, X 2,... be i.i.d. with finite second moments, and let µ = E[X] and σ 2 = Var[X]. Suppose we want to test H 0 : µ = µ 0 vs. H 1 : µ µ 0, where µ 0 is given. Letting X n and S n be the mean and sample standard deviations based on a sample of size n, we know from previous results that Let X n µ S n / n D N(0, 1). If H 0 is true, then T n = X n µ 0 S n / n P [ T n < z α/2 or T n > z α/2 ] α, so rejecting H 0 for T n < z α/2 or T n > z α/2 gives an asymptotic level α test. 1

3 It is important to recognize that an asymptotic level α test may be nowhere near level α for any sample size. In fact, for the test in the previous example, it follows from a Theorem due to Bahadur (ref???) that the size of the test is 1 for all n. The main thing to remember is that the N in (1) may depend on θ Θ 0. The test statistic of the previous example is in fact a special instance of a more general method. Suppose we want to test H 0 : g(θ) = γ 0 where g(θ) is some estimand. Let ˆγ n be a consistent and asymptotic normal estimator with n(ˆγn g(θ)) D N(0, σ 2 (θ)), if θ is the true value of the parameter. Let S n be a consistent estimator of σ(θ). Then, by Slutsky s theorem T n = ˆγ n g(θ) S n / n D N(0, 1). Thus, rejecting H 0 when T n < z α/2 or T n > z α/2 gives an asymptotic level α test. In general, if we use the MLE for ˆγ n, i.e. g(ˆθ n ) where ˆθ n is the MLE of theta, and the consistent estimator of the variance g(θ) I 1 (θ) g(θ) of the asymptotic normal distribution is obtained by pluggin in ˆθ n, then the test is referred to as a Wald test. Example 1.2 Let X i B(n i, p i ), i = 1, 2 be independent, and we want to test H 0 : p 1 = p 2 vs. H 1 : p 1 p 2. Note that the null hypothesis can be restated as g(p 1, p 2 ) = p 1 p 2 = 0. Letting the MLE for g(p 1, p 2 ) is ˆp i = X i n i, (2) ˆγ = ˆp 1 ˆp 2. As this is a two sample problem, the asymptotics requires a little consideration. It is not covered by our previous theory, since there we assumed we had a single i.i.d. sample. In 2

4 particular, we will need to consider if it is merely sufficient for both sample sizes and to go to, or do they have to be tied together somehow. We have ni (ˆp i p i ) D N (0, p i (1 p i )), by the CLT. We will rewrite this as ˆp i D = p i + Z i p i (1 p i )/n i + o P (n 1/2 i ), i = 1, 2, Z i i.i.d. N(0, 1). This follows from Skorohod s theorem: for each i we can find Y D in = n i (ˆp i p i ) such that Y P in p i (1 p i )Z i, and we may take (Z i, Y i1, Y i2,...) independent, so that in fact (Y 1n, Y 2n ) D = ( [ˆp 1 p 1 ], [ˆp 2 p 2 ]). Thus, ˆγ = ˆp 1 ˆp 2 D = p 1 p 2 + Z 1 p 1 (1 p 1 )/ Z 2 p 2 (1 p 2 )/ + o P ((min{, } 1/2 ) D p1 (1 p 1 ) = p 1 p 2 + Z + p 2(1 p 2 ) + o P ((min{, } 1/2 ) where Z N(0, 1). Note that p1 (1 p 1 ) + p 2(1 p 2 ) min{, } 1/2. (3) Hence, we have ˆγ (p 1 p 2 ) p1 (1 p 1 ) + p 2(1 p 2 ) D N(0, 1), as min{, }. (4) Now (ˆp 1, ˆp 2 ) P (p 1, p 2 ), so by the Continuous Mapping Principle, ( ˆp 1 (1 ˆp 1 ), ˆp 2 (1 ˆp 2 ) ) P (p 1 (1 p 1 ), p 2 (1 p 2 ) ) = (v 1, v 2 ). Thus, ˆp 1 (1 ˆp 1 ) 1 + ˆp 2(1 ˆp 2 ) p 1 (1 p 1 ) + p 2(1 p 2 ) p 1 (1 p 1 ) ˆp 1 (1 ˆp 1 ) n 1 + p 2(1 p 2 ) ˆp 2 (1 ˆp 2 ) v 1 + v 2 o P (1) n = 1 + o P (1) v 1 + v 2 3

5 At this point, if we don t recall our original objective, we will be obliged to introduce max{, } to estimate the denominator. However, recall that we only need to establish the convergence result under H 0 : p 1 = p 2. So, letting the common values be denoted p = p 1 = p 2, v = v 1 = v 2 = p(1 p), then contuing with the computation from above, we have o P (1) + o P (1) v 1 + v 2 = o P (1) + o P (1) v ( ) o P (1) + o P (1) v 1 min{, } (since dropping a term makes the denominator smaller) o P (1) min{, } v 1 min{, } = o P(1) v = o P (1). Thus, as long as min{, }, (5) then ˆp 1 (1 ˆp 1 ) + ˆp 2(1 ˆp 2 ) p 1 (1 p 1 ) + p 2(1 p 2 ) P 1. (6) With this result, the Continuous Mapping Principle (applied to the function φ(x) = x 1/2, x > 0), Slutsky s Theorem, and (4), we have ˆγ (p 1 p 2 ) ˆp1 (1 ˆp 1 ) = + ˆp 2(1 ˆp 2 ) p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) ˆp 1 (1 ˆp 1 ) + ˆp 2(1 ˆp 2 ) ˆγ (p 1 p 2 ) p1 (1 p 1 ) + p 2(1 p 2 ) D 1 N(0, 1) = N(0, 1). (7) 4

6 Letting ˆp 1 ˆp 2 T = ˆp1 (8) (1 ˆp 1 ) + ˆp 2(1 ˆp 2 ) we obtain an asymptotic level α test by reject H 0 if T < z α/2 or T > z α/2. (9) In practice, as long as both sample sizes are reasonably large (say, min{, } 30) and the true p i s are not too close to 0 or 1, the test should be reasonably accurate. An assessment of the accuracy for given,, and p = p 1 = p 2 is presented in Figure 1. To describe the figure, we did two sets of 1,000 monte carlo trials. In the first trial, we used p 1 = p 2 = 0.2 and = = 20. The results of this simulation are shown in the top two plots in the figure. To produce these plots, we took the values of the Wald statistic for each run and converted them to approximate p-values using the asymptotic normal distribution. If the approximation is accurate, then the sorted p-values should look like the order statistics of a Unif(0, 1) distribution. We have plotted all of the sorted p-values vs. the expected values of the order statistics of a Unif(0, 1) distribution. If U (i) is the i th order statistic of N i.i.d. Unif(0, 1), then one can show E[U (i) ] = i/(n + 1). We have also put in the reference line y = x. If the p-values are accurate, they should fall on this line, except for random variation. In the upper left panel, we have plotted all of the sorted p-values. We see that the observed p-values tend to fall a little above the line, generally, except at the end. Note that a vertical line of p-values indicates a point where many values were tied. Since the original data were binomial, the p-value test statistic is discrete, and where there is a discrete lump of probability mass with a large probability, we expect to see a lot of tied values. The sort function will just list all the tied values in a big block, which gives a vertical line in the plot. The upper right figure is more useful it shows a blow-up for the observed (simulated) p-values which are 0.10, which is generally the range where we might consider a p-value interesting or significant. Because α = 0.05 is the usual level of significance, we have shown it here as a vertical line. Thus, we would reject H 0 for p-values Looking at the 5

7 P values for Wald Test Blow up For Wald Test Theoretical Quantiles Theoretical Quantiles sorted p values n1 = 20 n2 = 20 p1=p2 = 0.2 P values for Wald Test sorted p values n1 = 20 n2 = 20 p1=p2 = 0.2 Blow up For Wald Test Theoretical Quantiles Theoretical Quantiles sorted p values n1 = 50 n2 = 50 p1=p2 = sorted p values n1 = 50 n2 = 50 p1=p2 = 0.2 Figure 1: Plots of empirical quantiles of p-values computed for the Wald test of H 0 : p 1 = p 2 in the two sample Binomial setting. p-values 0.05, we see that a little over 0.08 of them are 0.05, and since we simulated under one of the parameter values in the null hypothesis (p 1 = p 2 = 0.2), we would falsely reject H 0 about 8% of the time. Thus, for a nominal level of 0.05, the size of the test is at least The bottom two plots in the figure show the same results for 1, 000 simulations with = = 50 and p 1 = p 2 = 0.2. Here, we see that the Unif(0, 1) Q-Q (quantile-quantile) plot is much better. In particular, we see that with a nominal α = 0.05, the actual type I error probability for this null value is only slightly above

8 Interestingly, if we compare the results from the two different sample sizes, it looks like for p-values < 0.01, the smaller sample size is actually more accurate than for the larger sample size. We can speculate that this is due to the discreteness of the observation. In general, if you have a practical problem, and you want to know how accurate the p-value is for your problem, a reasonable way of assessing this is to simulate from the Maximum Likelihood Estimate under H 0, using the same design (sample sizes, in this case), convert simulated test statistics to p-values, and look in the range of p-values you have gotten. If you see that the theoretical quantiles of the simulated p-values are somewhat above their nominal level, then it suggests that a correction is in order. Another thought that comes to mind is to replace our standard error estimator in the denominator of T by an estimator under H 0, i.e. to use T = ˆp 1 ˆp 2 ˆp(1 ˆp) [1/ + 1/ ], where ˆp = X 1 + X 2 +, (10) is the so-called pooled estimate of p since it comes by pooling the samples of Bernoulli trials. If H 0 is true, we should have a better estimate of the standard error by pooling, so it intuitively makes some sense to do this. We shall return to this test later. One issue with Wald tests is that they are not unique: there is usually more than one estimand g(θ) which gives the same null hypothesis. For example, to test H 0 : p 1 = p 2 in the two sample Binomial model, we could use g(p 1, p 2 ) = p 1 /p 2, and test the null hypothesis g(p 1, p 2 ) = Likelihood Ratio Tests. The Likelihood Ratio Test (LRT) for a testing problem is easy to define. We shall try to motivate it with some intuition, which suggests that the intuitive definition is probably 7

9 wrong, but in the cases where the standard theory applies, it is nearly certainly OK. Consider a general hypothesis testing problem H 0 : θ Θ 0 vs. H 1 : θ Θ 1, where {Θ 0, Θ 1 } is a partition of the parameter space Θ (i.e., Θ 0 Θ 1 = Θ, and Θ 0 Θ 1 = ). If both Θ 0 and Θ 1 were singleton sets, then we would apply Neyman-Pearson and have an optimal level α test. When one or both are composite, life is not so simple. However, suppose we could reduce each of the Θ i s to singleton sets by selecting our best guess under the assumption that θ Θ i and then apply Neyman-Pearson to this best guess. If our method of selecting the best guess were a good one, then it makes sense that our test might not be optimal, but it should be close. Unfortunately, our best guess will depend on the data, so we may have some problem with getting a single distribution to use under H 0 to select a critical value. There are then two problems with this idea: (1) what is our best guess; and (2) how do we get a critical value to get the desired level of significance. The first problem is relatively easy: use a good point estimate of the parameter under the assumption θ Θ i, i = 0, 1. The most general (frequentist) method for getting good point estimates is maximum likelihood. It s justification is primarily asymptotic, and the best we can hope for in solving problem (2) is that we can get an asymptotic level α test. In a remarkable tour-de-force of statistical asymptotics, Wilks (ref???) showed that this choice of point estimate ( best guess ) in fact leads to a single distribution, under any θ Θ 0, provided Θ 0 satisfies regularity conditions. Type in some more motivation and examples already given in lecture. In particular, the difference between the motivated test stat and the usual form??? We would reject for large values of Y. The usual form of the LRT statistic is Y = sup θ Θ 1 f θ (X) sup θ Θ0 f θ (X) λ = 2 log ( ) supθ Θ0 f θ (X) sup θ Θ f θ (X) 8 (11) (12)

10 Again, we reject for large values of λ. Note that λ 0 always. In general, we will let L n (θ) = log f θ (X n ) (13) ˆθ n = arg max n(θ) θ Θ (14) ˆθ 0n = arg maxl n (θ) θ Θ 0 (15) ˆθ 1n = arg maxl n (θ) θ Θ 1 (16) denote the log likelihood, the full MLE, the MLE under H 0, and the MLE under H 1, all for the n th entry in the sequence of experiments (e.g., for sample size n under i.i.d. sampling). If the sample size is not important, we will drop the n. We shall also use the same subscripting notation if the parameter (or component thereof) is denoted by some other symbol than θ. Needs more discussion???. (17) Example 1.3 Let X 1, X 2,..., X n be i.i.d. N(µ, σ 2 ) where both µ and σ 2 are unknown. We first consider testing H 0 : µ = µ 0 vs. H 1 : µ µ 0. (18) Letting θ = (µ, σ 2 ), The various MLEs are ˆµ = X = 1 n X i n i=1 ˆσ 2 = 1 n (X i n i=1 ˆµ 0 = µ 0 ˆσ 0 2 = 1 n (X i µ 0 ) 2 n i=1 ˆµ 1 = X a.s. ˆσ 1 2 = 1 n (X i n i=1 a.s. 9

11 Note that in this case, the two forms of the LRT statistic are essentially equivalent since the MLE s are essentially the same under H 0 and unrestricted. This is typically the case under the regularity conditions for which the asymptotic χ 2 distribution of λ holds, at least with probability approaching 1. Now, the maximized log likelihood is L (ˆµ, [ ˆσ 2) = (2π) n/2 (ˆσ 2 ) n/2 exp 1 ] n (X 2ˆσ 2 i ˆµ) 2 i=1 = (2π) n/2 (ˆσ 2 ) n/2 exp[ n/2]. (19) This is the typical situation for maximized normal likelihoods (something the student should commit to memory). Under the restriction of H 0 : L (ˆµ 0, ˆσ 2 0) = (2π) n/2 (ˆσ 2 0) n/2 exp[ n/2] Thus, the two forms of the LRT statistic are Y = ) 2 n/2 ) 2 (ˆσ 0 (ˆσ, λ = n log 0. ˆσ 2 ˆσ 2 Rejecting H 0 for large values of either of these test statistics is equivalent to rejecting for large values of where ˆσ 0 2 i(x i µ 0 ) 2 = ˆσ 2 i(x i X) 2 i(x i = X + X µ 0 ) 2 i(x i X) 2 i(x i = X) 2 + i( X µ 0 ) 2 i(x i X) 2 = 1 + T 2 n 1 T = 1 n 1 X µ 0 i(x i X) 2 /n = X µ 0 S/ n, S2 = 1 n 1 (X i X) 2, i 10

12 is Student s t-statistic. Now, we can reject for large values of T 2, or T, or simply reject for both large and small values, i.e., Reject H 0 if T < t (n 1),α/2 or T > t (n 1),α/2. Thus, we see that the LRT is equivalent to the classical two sided t-test, which is in fact a UMP unbiased test. Now suppose we change to a one-sided test, e.g. H 0 : µ µ 0 vs. H 1 : µ > µ 0. A difference in the two forms of the LRT statistic will emerge in this case. The full MLE remains the same, but ˆµ 0 = µ 0 if X µ0 X if X < µ0 ˆσ 0 2 = 1 n (X i µ 0 ) 2 n i=1 X if X > µ0 ˆµ 1 = µ 0 if X µ0 ˆσ 1 2 = 1 n (X i ˆµ 1 ) 2 n i=1 (Technically speaking, ˆµ 1 = µ 0 is not allowed since Θ 1 = {(µ, σ 2 ) : µ > µ 0 }, but this is where the likelihood is maximized under H 0, so plugging it in gives us sup θ Θ1 L(θ).) Notice that the full MLE ˆµ = ˆµ 0 is X µ 0. If this happens, we will have the λ LRT statistic equal to 0, and this happens with probability 1/2 if the true µ = µ 0 (greater probability for µ < µ 0 ). It is only a problem if we want to use a level of significance α > 1/2, which in practice is never an issue. Thus, ignoring this little difficulty for large significance levels (i.e., assume α 1/2), we will reject H 0 in either case if X is larger than µ 0 and the square of Student s t statistic T 2 is too large. Equivalently, we may specify our rejection region as reject H 0 if T > t (n 1),α, 11

13 where T has the same definition as the in the two sided testing problem. The above example illustrates some facts about the LRT principle for deriving a test statistic. Firstly, although we are primarily considering it as a statistic for an asymptotic test, in many situations its exact distribution can be derived, and of course then we should use that rather than an asymptotic approximation. Secondly, when there is a good or even best test, the LRT will typically give the same test. The asymptotic distribution result we will state holds for a special class of null hypotheses. Suppose Θ IR p is open. We will consider a test of H 0 : g(θ) = γ 0 vs. H 1 : g(θ) γ 0, (20) where g : Θ IR p IR k, k p. (21) Note that this null hypothesis amounts to saying that θ satisfies k (possibly nonlinear) constraints. For a null hypothesis as in (20), if g satisfies certain regularity conditions, then Θ 0 = {θ : g(θ) = γ 0 }. is a smooth k-dimensional manifold. If p = 2 or p = 3 and k = 1, then Θ 0 is a curve. If p = 3 and k = 2, the Θ 0 is a surface. To speak of a smooth manifold means that if one magnifies a small piece of the manifold, it looks like a linear manifold. More discussion is given in Subsectio.4 Example 1.4 We return to the two sample Binomial model of the previous example and derive the LRT. For convenience let b(x; n, p) = n x p x (1 p) n x, x = 0, 1,..., n, 12

14 P values for LRT Blow up For LRT Theoretical Quantiles Theoretical Quantiles sorted p values n1 = 20 n2 = 20 p1=p2 = 0.2 P values for LRT sorted p values n1 = 20 n2 = 20 p1=p2 = 0.2 Blow up For LRT Theoretical Quantiles Theoretical Quantiles sorted p values n1 = 50 n2 = 50 p1=p2 = sorted p values n1 = 50 n2 = 50 p1=p2 = 0.2 Figure 2: Plots of empirical quantiles of p-values computed for the LRT test of H 0 : p 1 = p 2 in the two sample Binomial setting. denote the B(n, p) p.m.f. Then λ = 2 log b(x 1;, ˆp)b(X 2 ;, ˆp) b(x 1 ;, ˆp 1 )b(x 2 ;, ˆp 2 ), where ˆp and the ˆp i are given in (10) and (2). While some simplification in the formula for λ is possible, it does not reduce to a particularly useful expression. 13

15 1.3 The Score Test We now introduce yet another test, whose justification is primarily asymptotic. The score test is harder to justify from an intuitive point of view than the LRT, nonetheless, it is asymptotically equivalent to the LRT, and often gives a test statistic with a simpler formula, often similar to a Wald test. The test statistic for the score test is based on the derivative of the (full) log likelihood evaluated at the MLE under H 0. If θ 0 denotes the true value of the parameter, then E[ L n (θ 0 )] = 0 (22) Cov[ L n (θ 0 )] = I n (θ 0 ) (23) Since ˆθ 0 is a consistent estimator of of θ 0 under H 0, we have under the regularity conditions for asymptotic normality of the MLE that I n (θ 0 ) 1/2 L n (θ 0 ) D N(0, I). If we evaluate this at the full MLE ˆθ n, then the same result will hold, but when evaluated at the MLE ˆθ 0n under H 0, the resulting asymptotic normal distribution will clearly be singular (since ˆθ 0n is constrained to lie on the manifold H 0 which is locally approximately a p k dimensional linear manifold). However, one can show that under the same regularity conditions that give the asymptotic χ 2 k distribution for λ n, the score statistic S n = L n (ˆθ 0n ) t I n (ˆθ 0n ) 1 L n (ˆθ 0n ), (24) is asymptotically equivalent to λ n, the LRT statistic. Hence, S D n χ 2 k, under H 0. (25) The main reason for preferring the score statistic over the LRT is that in many cases, it is easier to compute, and often yields simpler formulae. To compute the score statistic S, one need only find ˆθ 0n, whereas computation of lambda requires the full MLE ˆθ n as well as the null MLE. 14

16 Example 1.5 Let us return to the two sample binomial problem (X i B(n i, p i ) are independent, i = 1, 2), and consider the test of H 0 : p 1 = p 2. The log likelihood (except for unimportant constants that don t depend on (p 1, p 2 )) is given by L(p 1, p 2 ) = x 1 log p 1 + (n x 1 ) log(1 p 1 ) + x 2 log p 2 + (n x 2 ) log(1 p 2 ), and differentiating, we obtain L(p 1, p 2 ) = D 2 L(p 1, p 2 ) = I(p 1, p 2 ) = x 1 p 1 x 1 1 p 1 x 2 p 2 x 2 1 p 2 x 1 p 2 1 = + x 1 (1 p 1 ) 2 0 p 1 (1 p 1 0 ) x 1 p 1 p 1 (1 p 1 ) x 2 p 2 p 2 (1 p 2 ) x x 2 p 2 2 (1 p 2 ) 2 0 p 2 (1 p 2 ) Plugging in the general formula for the score statistic, we obtain S = ( )2 x1 ˆp ˆp(1 ˆp) + ˆp(1 ˆp) ( )2 x2 ˆp ˆp(1 ˆp). ˆp(1 ˆp) One can show with a little algebra that this reduces to S = (ˆp 1 ˆp 2 ) 2 ˆp(1 ˆp)[ ]. (26) Note that the constraint is k = 1 dimensional, so we reject H 0 if S > χ 2 1,α. In Figure 3 we show the results of the level study for the score statistic under identical conditions of the previous two simulation studies. One sees that the results are indistinguishable from the LRT, even in the smaller sample size case. As the score test in this setting is easier to motivate (use the Wald test with a pooled estimate for the standard error), it is more widely used, especially by nonstatisticians. Note that one can use the signed square root of S, namely T = ˆp 1 ˆp 2 ˆp(1 ˆp)[ ] 15

17 P values for Score Test Blow up For Score Test Theoretical Quantiles Theoretical Quantiles sorted p values n1 = 20 n2 = 20 p1=p2 = 0.2 P values for Score Test sorted p values n1 = 20 n2 = 20 p1=p2 = 0.2 Blow up For Score Test Theoretical Quantiles Theoretical Quantiles sorted p values n1 = 50 n2 = 50 p1=p2 = sorted p values n1 = 50 n2 = 50 p1=p2 = 0.2 Figure 3: Plots of empirical quantiles of p-values computed for the Score Test of H 0 : p 1 = p 2 in the two sample Binomial setting. for this problem as well. The asymptotic null distribution is N(0, 1). We reject the (two sided) null hypothesis H 0 : p 1 = p 2 if either T > z α/2 or T < z α/2. This test statistic can also be used to test one sided hypotheses, such as H 0 : p 1 p 2, with a corresponding one sided rejection region. In general, this test statistic gives better approximation of the level α constraint than the corresponding Wald test with the statistic in (8). 16

18 1.4 Proofs of Asymptotic Distribution Results. We begin by stating an important theorem from advanced calculus. Unfortunately, it takes a little further argument to show that the full rank condition on the derivative of the estimand is enough to give that the null hypothesis subspace is a locally linear manifold. Theorem 1.1 (Implicit Function Theorem) Consider a continuously differentiable map g : A IR k where A IR p is open, and k p. Let (a, b) A, a IR k, b IR p k, and f(a, b) = y. Assume further that det(d) 0, where D is the k k matrix D ij = g i x j (a, b), 1 i, j k. Then there exists a neighborhood W of a in IR k and a unique continuously differentiable function h : W IR p k such that g(t, h(t)) = y for all t W. Essentially, the theorem says that if the determinant of the partials of the first k variables of a k-dimensional-valued function is nonzero, then, given an k-dimensional value for the function, we can solve for the remaining p k variables, and the map from the first k variables to the remaining p k variables is differentiable (locally linear). Now, if the function g : A IR p where A IR p is open simply has rank(dg(x)) = k for all x A, then, at any point x 0, there are exactly k linearly independent columns in the k p matrix Dg(x 0 ), and we can rearrange the variables so that the first k columns satisfy this condition. Returning to our null hypothesis H 0 : g(θ) = γ 0, the set of points satisfying this constraint Θ 0 = {θ Θ : g(θ) = γ 0 }. Assume that there is an open set A with Θ 0 A such that Dg(θ) has rank k for all θ A. Then for any θ 0 Θ 0 we can find a neighborhood of θ 0 and k independent components of θ 0 such that the remaining p k components are continuously differentiable functions of the given k components. Hence, the graph of the function function will be almost linear in this neighborhood of the k independent components. 17

19 Corollary 1.2 Let Θ 0 = {θ Θ : g(θ) = γ 0 } where g : Θ IR p IR k. Assume that there is an open set A with Θ 0 A such that Dg(θ) has rank k for all θ A. Let θ 0 Θ 0 be given. We begin by outlining the main steps in the proof in a nonrigorous manner, leaving to later to fill in the details. 1.5 Appendix: Listing of R Code. We wrote 3 individual functions to compute the test statistics. Another function called these test statistic functions in a for loop to generate the matrix of p-values. Finally, a script file called the last function and generated the plots. The files are separated by 3 comment lines. Continuation lines are indented 3 spaces. Bin2wald <- function(x1,n1,x2,n2) { # computes Wald stat for testing equality of 2 binomial p s p1hat = x1/n1 p2hat = x2/n2 se = sqrt( p1hat*(1-p1hat)/n1 + p2hat*(1-p2hat)/n2 ) z = (p1hat - p2hat)/se return(z) } Bin2lrt <- function(x1,n1,x2,n2) { # computes LRT stat for testing equality of 2 binomial p s 18

20 p1hat = x1/n1 p2hat = x2/n2 phat = (x1+x2)/(n1+n2) num = dbinom(x1,n1,phat)*dbinom(x2,n2,phat) denom = dbinom(x1,n1,p1hat)*dbinom(x2,n2,p2hat) lambda = -2 * log( num/denom ) return(lambda) } Bin2score <- function(x1,n1,x2,n2) { # computes Wald stat for testing equality of 2 binomial p s p1hat = x1/n1 p2hat = x2/n2 phat = (x1+x2)/(n1+n2) se = sqrt( phat*(1-phat)*(1/n1 + 1/n2 ) ) z = (p1hat - p2hat)/se return(z) } Runsim = function(n1,p1,n2,p2,nmc) { # simulates Nmc trials of 2 sample binomial x1 ~ B(n1,p1), x2 ~ B(n2,p2) # computes wald, lrt, and score p-values for H0: p1 = p2 on each trial x1 = rbinom(nmc,n1,p1) x2 = rbinom(nmc,n2,p2) 19

21 pvals = matrix(na,nrow=nmc,ncol=3) dimnames(pvals) = list(null,c("wald","lrt","score")) for(i i:nmc) { zwald = Bin2wald(x1[i],n1,x2[i],n2) p1 = 2*pnorm(-abs(zwald)) lambda = Bin2lrt(x1[i],n1,x2[i],n2) p2 = 1-pchisq(lambda,1) zscore = Bin2score(x1[i],n1,x2[i],n2) p3 = 2*pnorm(-abs(zscore)) pvals[i,] = c(p1,p2,p3) } return(pvals) } # script file to ru simulations of the 3 tests for p1=p2 # under H0 and plot the results p1 =.2 p2 = p1 n1a = 20 n2a = 20 nmc = 1000 pvalsa=runsim(n1a,p1,n2a,p2,nmc) for(j i:3) pvalsa[,j] = sort(pvalsa[,j]) #### 2nd simulation, same p, different n s n1b = 50 n2b = 50 20

22 pvalsb=runsim(n1b,p1,n2b,p2,nmc) for(j i:3) pvalsb[,j] = sort(pvalsb[,j]) ############################################################################## # making plots xpoints = (1:nmc)/(nmc+1) m = round(nmc/10) subtitlea = paste("n1 =",as.character(n1a)," n2 =", as.character(n2a)," p1=p2 =",as.character(p1)) subtitleb = paste("n1 =",as.character(n1b)," n2 =", as.character(n2b)," p1=p2 =",as.character(p1)) postscript("fig01.ps") par(mfrow=c(2,2)) plot(xpoints,pvalsa[,1], xlab="theoretical Quantiles",ylab="sorted p-values", main="p-values for Wald Test", sub=subtitlea) abline(0,1) plot(xpoints[1:m],pvalsa[1:m,1], xlab="theoretical Quantiles",ylab="sorted p-values", main="blow-up For Wald Test", sub=subtitlea) abline(0,1) plot(xpoints,pvalsb[,1], xlab="theoretical Quantiles",ylab="sorted p-values", main="p-values for Wald Test", sub=subtitleb) abline(0,1) plot(xpoints[1:m],pvalsb[1:m,1], 21

23 xlab="theoretical Quantiles",ylab="sorted p-values", main="blow-up For Wald Test", sub=subtitleb) abline(0,1) graphics.off() postscript("fig02.ps") par(mfrow=c(2,2)) plot(xpoints,pvalsa[,2], xlab="theoretical Quantiles",ylab="sorted p-values", main="p-values for LRT", sub=subtitlea) abline(0,1) plot(xpoints[1:m],pvalsa[1:m,2], xlab="theoretical Quantiles",ylab="sorted p-values", main="blow-up For LRT", sub=subtitlea) abline(0,1) plot(xpoints,pvalsb[,2], xlab="theoretical Quantiles",ylab="sorted p-values", main="p-values for LRT", sub=subtitleb) abline(0,1) plot(xpoints[1:m],pvalsb[1:m,2], xlab="theoretical Quantiles",ylab="sorted p-values", main="blow-up For LRT", sub=subtitleb) abline(0,1) 22

24 graphics.off() postscript("fig03.ps") par(mfrow=c(2,2)) plot(xpoints,pvalsa[,3], xlab="theoretical Quantiles",ylab="sorted p-values", main="p-values for Score Test", sub=subtitlea) abline(0,1) plot(xpoints[1:m],pvalsa[1:m,3], xlab="theoretical Quantiles",ylab="sorted p-values", main="blow-up For Score Test", sub=subtitlea) abline(0,1) plot(xpoints,pvalsb[,3], xlab="theoretical Quantiles",ylab="sorted p-values", main="p-values for Score Test", sub=subtitleb) abline(0,1) plot(xpoints[1:m],pvalsb[1:m,3], xlab="theoretical Quantiles",ylab="sorted p-values", main="blow-up For Score Test", sub=subtitleb) abline(0,1) graphics.off() 23

25 2 Exercises for Section Show that for a one parameter exponential family, the LRT is equivalent to the UMP test of H 0 : θ θ 0 vs. H 1 : θ > θ 0, provided we either use Y statistic in (11), or use reasonable values of α (asymptotically, α < 1/2). 2.2 Verify the steps leadin up to (26). 2.3 Consider the two sample exponential model: X ij Expo(µ i ), 1 j n i, are mutually independent. We want to test H 0 : µ 1 = µ 2 vs.h 1 : µ 1 µ 2. (a) Derive a Wald test, and the LRT and score tests for this problem. You should give explicit formulae for each test statistic, and for the critical region. Where possible, use a two sided region based on the N(0, 1) distribution rather than a χ 2 distribution. (b) Perform level study similar to the one given in the text for the two sample binomial setting to compare how well the test statistics achieve the level α constraint for 0 < α < 1. (c) Verify directly that the LRT and Score tests are asymptotically equivalent. 24

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants 18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Topic 19 Extensions on the Likelihood Ratio

Topic 19 Extensions on the Likelihood Ratio Topic 19 Extensions on the Likelihood Ratio Two-Sided Tests 1 / 12 Outline Overview Normal Observations Power Analysis 2 / 12 Overview The likelihood ratio test is a popular choice for composite hypothesis

More information

Topic 15: Simple Hypotheses

Topic 15: Simple Hypotheses Topic 15: November 10, 2009 In the simplest set-up for a statistical hypothesis, we consider two values θ 0, θ 1 in the parameter space. We write the test as H 0 : θ = θ 0 versus H 1 : θ = θ 1. H 0 is

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Confidence Intervals and Hypothesis Tests

Confidence Intervals and Hypothesis Tests Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.

More information

Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing: The Generalized Likelihood Ratio Test Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

STAT 514 Solutions to Assignment #6

STAT 514 Solutions to Assignment #6 STAT 514 Solutions to Assignment #6 Question 1: Suppose that X 1,..., X n are a simple random sample from a Weibull distribution with density function f θ x) = θcx c 1 exp{ θx c }I{x > 0} for some fixed

More information

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1 4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

F79SM STATISTICAL METHODS

F79SM STATISTICAL METHODS F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

TUTORIAL 8 SOLUTIONS #

TUTORIAL 8 SOLUTIONS # TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

Review of Discrete Probability (contd.)

Review of Discrete Probability (contd.) Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

The Multinomial Model

The Multinomial Model The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

5.2 Fisher information and the Cramer-Rao bound

5.2 Fisher information and the Cramer-Rao bound Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

ST495: Survival Analysis: Hypothesis testing and confidence intervals

ST495: Survival Analysis: Hypothesis testing and confidence intervals ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took

More information

Math 181B Homework 1 Solution

Math 181B Homework 1 Solution Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test. Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

More Empirical Process Theory

More Empirical Process Theory More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory

More information

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33 Introduction 1 STA442/2101 Fall 2016 1 See last slide for copyright information. 1 / 33 Background Reading Optional Chapter 1 of Linear models with R Chapter 1 of Davison s Statistical models: Data, and

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 17: Likelihood ratio and asymptotic tests Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f

More information

Topic 17: Simple Hypotheses

Topic 17: Simple Hypotheses Topic 17: November, 2011 1 Overview and Terminology Statistical hypothesis testing is designed to address the question: Do the data provide sufficient evidence to conclude that we must depart from our

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2 Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated

More information

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test. Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two hypotheses:

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

5601 Notes: The Sandwich Estimator

5601 Notes: The Sandwich Estimator 560 Notes: The Sandwich Estimator Charles J. Geyer December 6, 2003 Contents Maximum Likelihood Estimation 2. Likelihood for One Observation................... 2.2 Likelihood for Many IID Observations...............

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

Statistical hypothesis testing The parametric and nonparametric cases. Madalina Olteanu, Université Paris 1

Statistical hypothesis testing The parametric and nonparametric cases. Madalina Olteanu, Université Paris 1 Statistical hypothesis testing The parametric and nonparametric cases Madalina Olteanu, Université Paris 1 2016-2017 Contents 1 Parametric hypothesis testing 3 1.1 An introduction on statistical hypothesis

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Probability and Statistics qualifying exam, May 2015

Probability and Statistics qualifying exam, May 2015 Probability and Statistics qualifying exam, May 2015 Name: Instructions: 1. The exam is divided into 3 sections: Linear Models, Mathematical Statistics and Probability. You must pass each section to pass

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 )

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 ) Answers to the 8th problem set The likelihood ratio with which we worked in this problem set is: Λ(x) = f(x θ = θ 1 ) L(θ 1 ) =. f(x θ = θ 0 ) L(θ 0 ) With a lower-case x, this defines a function. With

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing Eric Zivot October 12, 2011 Hypothesis Testing 1. Specify hypothesis to be tested H 0 : null hypothesis versus. H 1 : alternative

More information