Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Similar documents
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Multiple Linear Regression

Sociology 6Z03 Review II

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Confidence Intervals, Testing and ANOVA Summary

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Chap The McGraw-Hill Companies, Inc. All rights reserved.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 1 Statistical Inference

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Analysis of Variance

Review of Statistics 101

Stat 427/527: Advanced Data Analysis I

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Business Statistics. Lecture 10: Course Review

Chapter 24. Comparing Means

Mathematical Notation Math Introduction to Applied Statistics

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

STAT 536: Genetic Statistics

Chapter 9 Inferences from Two Samples

Bias Variance Trade-off

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Simple Linear Regression

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The Multinomial Model

Chapter 10: Inferences based on two samples

Difference between means - t-test /25

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Summary of Chapters 7-9

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

AP Statistics Cumulative AP Exam Study Guide

Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Goodness of Fit Goodness of fit - 2 classes

Formulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.

Ch 2: Simple Linear Regression

Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Chapter 7: Statistical Inference (Two Samples)

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Week 14 Comparing k(> 2) Populations

Analysis of variance (ANOVA) Comparing the means of more than two groups

Inference in Regression Analysis

Lecture 10: Generalized likelihood ratio test

Introductory Econometrics. Review of statistics (Part II: Inference)

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Module 9: Nonparametric Statistics Statistics (OA3102)

13.1 Categorical Data and the Multinomial Experiment

Lecture 7: Hypothesis Testing and ANOVA

Multiple comparisons - subsequent inferences for two-way ANOVA

Formal Statement of Simple Linear Regression Model

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Central Limit Theorem ( 5.3)

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Correlation Analysis

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Chapter 5 Confidence Intervals

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Inferences About Two Proportions

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

y response variable x 1, x 2,, x k -- a set of explanatory variables

Inference for Regression Inference about the Regression Model and Using the Regression Line

Analysis of variance

Data Analysis and Statistical Methods Statistics 651

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Exam 2 (KEY) July 20, 2009

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

Multiple Regression Analysis

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion

Introductory Econometrics

Statistical Methods in Natural Resources Management ESRM 304

Testing Independence

Final Exam. Name: Solution:

Chi-squared (χ 2 ) (1.10.5) and F-tests (9.5.2) for the variance of a normal distribution ( )

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

CENTRAL LIMIT THEOREM (CLT)

Statistics. Statistics

Data Analysis and Statistical Methods Statistics 651

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Probability and random variables. Sept 2018

Inference for Distributions Inference for the Mean of a Population

Transcription:

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement y P = probability distn which assigns probability f(y) to y. The trait value of an individual randomly selected from the popn is a random variable Y with distn P. Mean and variance of the trait in the population are m = y f(y), σ 2 = (y m) 2 f(y) These are also the mean and variance of the random variable Y.

Summary statistics A random sample of size n drawn from the popn generates a sequence of observations Y 1... Y n. If size of sample is much less than size of popn, they can be treated as independent random variables, each with probability distn P. These observations are random variables, and we can calculate the sampling distn of any summary statistic (sample mean, median, variance, range, etc).

A sampling distribution The sample mean Ȳ might be used to estimate the population mean m. Sampling distn of Ȳ: E(Ȳ) = m, and var ( Ȳ ) = σ 2 /n, where n is sample size. It can be shown that if the distn (P) of the trait in the popn is normal, the distn of Ȳ is normal, i.e. Ȳ N(m, σ 2 /n) When n is large, this will be approximately true, even when the distn in the popn is not normal (central limit theorem).

Another sampling distn A binary trait takes one of two possible values, e.g. eyes are blue, eyes are not blue. Let Y i = 1 if the i-th member of the sample has blue eyes, otherwise Y i = 0. Then Ȳ is the proportion of the sample with blue eyes, and the sampling distn of Ȳ is a scaled binomial: Pr(Ȳ = y ( ) n n ) = m y (1 m) n y, for y = 0... n. y where m is the population mean, i.e. the proportion of the population with blue eyes.

Inference from sample to population A sample provides information about the popn from which it is drawn. For example, the sample mean Ȳ tells us something about the popn mean m. 1) Different samples give rise to different estimates. The value of the estimate cannot be predicted in advance, and we regard it as a random variable with a probability distribution (the sampling distribution of the estimator). 2) The sample estimate will differ from the population parameter, but if the sample is large enough, the estimate will be close to the true value with high probability.

Inference from sample to population Inference usually takes the form of a probability statement based on the sampling distn of the appropriate summary statistic. If the distn of Y in the popn is normal, the sampling distn of Ȳ is N(m, σ 2 /n). When σ 2 is known, a simple form of inference takes the form of a statement that the event Ȳ m < k E occurs with high probability. E = σ 2 /n is the standard error of Ȳ, and k is a suitable quantile of the standard normal distn.

Mean square error, bias, variance Here θ represents some feature of the popn, T is a sample statistic. If T is used as an estimator of θ, the estimation error is T θ. The mean squared error is E(T θ) 2, which we would like to be as small as possible. Let m T = E(T). The MSE can be split into two components: E(T θ) 2 = E(T m T ) 2 + (m T θ) 2 MSE = variance + bias 2

The sample variance The sample variance S 2 = (n 1) 1 n (Y i Ȳ) 2 i=1 estimates the population variance σ 2. An unbiased estimator is obtained by using the divisor n 1 (rather than n). The standard deviation (square root of the variance) of the sampling distribution of an estimator is called the standard error of the estimator. For example, the standard error of Ȳ is σ 2 /n. Usually the value of σ 2 is unknown, in which case the estimated standard error is calculated by replacing σ 2 in the formula by an estimate (e.g. the sample variance): estimated se(ȳ) = S 2 /n.

The t distribution Sample mean Ȳ is distributed N(m,σ 2 /n). The standardised value n ( Ȳ m)/σ has an N(0,1) distribution. Replacing σ by S (square root of sample variance S 2 ) changes the distn: n ( Ȳ m)/s has a t distn with n 1 d.f. More generally, if Z is N(0,σ 2 ), and S 2 is an estimate of σ 2 with f d.f., then Z/S has a t distn with f degrees of freedom. The t distn with few d.f. has thicker tails than the normal distn.

Studentization The form of inference described on the previous slide (slide 6??) requires that we know the value of σ 2. If this value is unknown, the procedure must be modified: 1) Replace the unknown value of σ 2 by an estimate, for example S 2 (the sample variance). 2) Take quantile k from tables of the t distn (instead of the normal distn). The d.f. for t are those associated with the estimate of σ 2 (or the sum of squares on which it is based).

Hypothesis tests Test of null hypothesis H 0 (about parameter θ): 1) Choose summary statistic T (typically, an estimator of θ). 2) Reject H 0 if T in C, where C is a subset of the values of T. C is the rejection region, chosen so that P(T in C) when H 0 is true is a small number α, called the significance level, or size of the test. α is the probability of rejecting H 0 when it is true ( type I error). The smaller the value of α, the more stringent the test. Failing to reject a false hypothesis is the type II error. The probability of rejecting a false hypothesis is called the power of the test.

Example of a hypothesis test Here we have a random sample from N(m, σ 2 ), and use the sample mean Ȳ as a test statistic for hypotheses about m. The test rejects H 0 : m = m 0 when Ȳ m 0 E > k where E = σ 2 /n is the standard error of Ȳ and k is a suitable quantile of the standard normal distn (when σ 2 known) or the t distn (when σ 2 estimated). This test is sometimes called the z test (σ 2 known), or the one-sample t test (σ 2 estimated). It is a two-sided (two-tail) test: we reject H 0 if either Ȳ > m 0 + ke or Ȳ < m 0 ke

Level of significance By convention, certain values are used as guidelines: 0.05, 0.01, 0.001, representing increasing strength of evidence against H 0. The smaller the significance level, the stronger the evidence. The following descriptions of a significant result are suggested, although there is no general agreement on these: Significance level Conclusion when H 0 is rejected 0.05 Evidence against H 0 0.01 Strong evidence against H 0 0.001 Very strong evidence against H 0.

The p value Given the observed value of the test statistic, the p value is the smallest α at which the test is significant. Alternatively, it is the probability of obtaining a value more extreme than the observed value of the test statistic. The p value can be regarded as a measure of the strength of evidence against the hypothesis: the smaller the p value, the stronger the evidence, and the less we are inclined to believe that the hypothesis is true. The p value should not be interpreted as the probability that the hypothesis is true.

Confidence intervals A confidence interval tells us which values of the parameter are consistent with the data. In the case of inference about a normal mean, this is just a matter of rearranging one inequality (on left) into another (on right): m ke < Ȳ < m + ke Ȳ ke < m < Ȳ + ke The first statement says that the random variable Ȳ lies between given limits. The second statement says that the random interval (Ȳ k E, Ȳ + k E) includes the unknown value of m. The value of k is chosen so that the statement is true with a given probability (e.g. 0.95).

One-sample t-test Ȳ and S 2 are the sample mean and variance of a random sample of size n from N(m, σ 2 ). The variance σ 2 is unknown, and it is required to test H 0 : m = 0. The test statistic is T = Ȳ/E, where E = S 2 /n is the estimated standard error of Ȳ. The null distn of the test statistic is the t distn with n 1 d.f. H 0 is rejected if T > k, where k is a suitable quantile taken from tables of the t distn with n 1 d.f. There are many other versions of the t test. This is the simplest.

Matched pairs experiment The one-sample t test is used to compare two treatments when observations consist of matched pairs. Nine twin pairs are chosen for the experiment. For each pair, one twin (chosen randomly) is given standard diet (control), the other is given standard diet plus food additive. Pair 1 2 3 4 5 6 7 8 9 Difference 10 2 22 23 6 31 3 7 15 One-sample t test is applied to the differences (treated minus control). These are regarded as a single sample from a normal distn with mean m. The null hypothesis is H 0 : m = 0 (no treatment effect).

One-sample t test The nine differences for the matched-pairs experiment are 10 2 22 23 6 31 3 7 15 Sum is 99, mean is 11.0, uncorrected sum of squares 10 2 + 2 2 + + 15 2 = 2397. Corrected sum of squares is 2397 99 2 /9 = 1308. Estimate of σ 2 is S 2 = 1308/8 = 163.5. Estimated s.e. of Ȳ is E = (S 2 /9) = 4.262, t statistic is Ȳ/E = 2.58. The upper 0.025 point of the t distn with 8 d.f. is 2.306. The two-tail test is significant at the 0.05 level. There is some evidence that the food additive improves growth rate. A 95% confidence interval for the effect of the additive is 11.0 ± 2.306E (between 1.2 and 20.8 g/d).

A neat way to set out the calculations Write down the ANOVA table Source DF SSQ MSQ F Mean 1 1089 1089 6.661 Residual 8 1308 163.5 Total 9 2397 The value 1089 in the first row is the correction term from the previous slide. In each row, MSQ (mean square) is SSQ (sum of squares) divided by DF. In the last column F is the ratio of the two MSQ. t statistic is the square root of F, with DF of the residual row.

Algebra of the ANOVA table Source DF SSQ MSQ F Mean 1 C.F. = nȳ 2 nȳ 2 nȳ 2 /S 2 Residual n 1 Corrected SSQ S 2 Total n Uncorrected SSQ Square root of F is Ȳ S 2 /n.

An experiment with two unmatched samples A random sample of n 1 = 9 lambs receive standard diet plus food additive. An independent random sample of n 2 = 8 lambs receive the standard diet alone. Growth rates are measured on all 17 lambs. Treated 108 110 105 131 104 96 115 118 121 Controls 99 95 120 112 80 106 98 102 Assumptions: all measurements are independently normally distributed with variance σ 2. Population means are m 1 (control), m 2 (treated). Null hypothesis: m 1 = m 2.

The two-sample t test Test is based on Ȳ 1 Ȳ 2, which has variance σ 2 (1/n 1 + 1/n 2 ) The test statistic is T = (Ȳ 1 Ȳ 2 )/E, where E = S 2 (1/n 1 + 1/n 2 ) and S 2 is an estimate of σ 2. The null distn of T is the t dist with n 1 + n 2 2 d.f.

Calculating the estimate of σ 2 n sum uncorrected SSQ Treated 9 1008 113772 Controls 8 812 83414 Calculate the corrected sum of squares separately for each sample, then pool sums of squares and degrees of freedom. Treated Controls Pooled DF SSQ DF SSQ DF SSQ MSQ Mean 1 112896 1 82418 Residual 8 876 7 996 15 1872 124.8 Total 9 113772 8 83414 17 197186 Estimate of σ 2 is S 2 = 124.8 with 15 d.f., and the estimated s.e. of Ȳ 1 Ȳ 2 is 124.8(1/9 + 1/8) = 5.428.

Two-sample t test Ȳ 1 Ȳ 2 S 2 E T 112.0 101.5 124.8 5.428 1.93 T = 10.5/5.43 = 1.93 with 8 + 7 = 15 d.f. The upper 2.5% point of the t distn with 15 d.f. is 2.131. The two-sided test is not quite significant at the 0.05 level: the data are consistent with the null hypothesis that the additive has no effect. A 95% confidence interval for the benefit of the food additive is 10.5 ± 2.131 5.43 (between 1.1 and 22.1 g/d).

Chi-squared goodness-of-fit test Frequencies n 1... n k ( n i = n) are multinomially distributed with probabilities p 1... p k. The probabilities are specified by null hypothesis H 0. Chi-squared test statistic is X 2 = k i=1 2 (n i np i ), np i often written (O E) 2 /E, where O is the observed frequency n i and E is the expected frequency np i. An alternative formula is ( ) X 2 = O 2 /E n. X 2 is a measure of discrepancy between observed and expected frequencies. A large value indicates departure from H 0 (therefore a one-sided test, with large values significant).

The chi-squared distribution The distn of the sum of squares of ν independent N(0,1) r.v.s is called the chi-squared distn with ν d.f. (ν = 1, 2, 3,... ). For example, the corrected sum of squares for a sample of size n from a normal distn has a scaled chi-squared distn with ν = n 1 d.f. The distn also arises as the null distn of the X 2 test statistic. The mean of the distribution is ν and the variance 2ν. Upper tail probability (%) d.f. 10 5 2.5 1 0.5 0.1 1 2.706 3.841 5.024 6.635 7.88 10.83 2 4.605 5.991 7.378 9.210 10.60 13.82.

Example 1 A roulette wheel with three compartments is spun 99 times, with the following results. Is the wheel fair? Side 1 2 3 Total Frequency 42 27 30 99 The null hypothesis is p 1 = p 2 = p 3 = 1/3. Each expected frequency is equal to 33, and X 2 is [(42-33) 2 + (27-33) 2 + (30-33) 2 ]/33 = 3.818, with 2 d.f. Upper 5% point for X 2 2 is 5.991, result is not significant. There is no evidence of bias: data are consistent with the wheel being fair.

Example 2 Sometimes specification of probabilities by the null hypothesis is incomplete, leaving s parameters to be estimated. In this case the null distn of X 2 is chi-squared with k 1 s d.f. Are the blood group frequencies in the table below consistent with Hardy-Weinberg equilibrium? MM MN NN Total 233 385 129 747 H-W hypothesis specifies probabilities p 2, 2pq and q 2, where p and q are the M and N allele frequencies, which must be estimated. Estimates are p = 0.5696, q = 0.4304, and expected frequencies MM MN NN Total 242.36 366.26 138.38 747 X 2 = 1.96 with 3 1 1 = 1 d.f. (not significant)

Chi-squared association test Attributes A and B each take one of two possible values. Both are recorded for a sample of N individuals. Is there association between the attributes? (In the table below, a, b, c, and d are frequencies.) B 1 B 2 Total A 1 a b a+b A 2 c d c+d Total a+c b+d N The null hypothesis is independence of the row and column events, i.e. that Pr(A 1 B 1 ) = Pr(A 1 ) Pr(B 1 ), for example. Test statistic is X 2 = (O E) 2 /E. Null distn is the chi-squared distn with 4 1 2 = 1 d.f. Expected frequency for the top-left cell is (a + b)(a + c)/n, etc.

Example Relationship between nasal carrier rate for Streptococcus pyogenes and size of tonsils among 1398 children. Not enlarged Enlarged Total Carriers 19 (26.6) 53 (45.4) 72 Non-carriers 497 (489.4) 829 (836.6) 1326 Total 516 882 1398 Expected frequencies are shown in brackets. X 2 = (19 26.6)2 26.6 + 3 more terms = 3.61 (Or use alternative formula 19 2 /26.6 + 3 more terms 1398). The test is not significant at the 0.05 level.

Larger tables Calculate expected frequency for each cell as E = (row total) (column total)/(grand total), then sum (O E) 2 /E over all cells of the table. For a table with r rows and c columns, X 2 has (r 1)(c 1) degrees of freedom. Example: a more detailed breakdown of the tonsils data gives X 2 = 7.88 with 2 d.f. (P = 0.02). None Mild Severe Carriers 19 (26.6) 29 (30.3) 24 (15.1) Non-carriers 497 (489.4) 560 (558.7) 269 (277.9) Total 516 589 293

Using R pchisq( ) and qchisq( ) calculate probabilities and quantiles of the chi-squared distn. pt( ) and qt( ) do the same for the t distn. chisq.test( ) deals with the goodness-of-fit and association tests. Note: in the case where parameters are estimated, it reports the wrong d.f. t.test( ) can be used for one or two-sample version of the t test. For the two-sample version, set var.equal = TRUE. binom.test( ) can be used with binomial data. This test is exact, based on binomial probabilities, and usually gives a result similar to the chi-squared test.

Simulation Deriving sampling dists is a job for the mathematical statistician, but an approximate answer can usually be obtained by simulation. The R function replicate( ) is useful here. Example: The code below repeatedly draws samples of size 100 from a normal distn with unit variance, and compares the histogram of the results with the theoretical distn (normal with variance 1/100). curve(dnorm(x, sd = 0.1), -0.3, 0.3, col = red, ann = FALSE, las = 1) hist(replicate(1000, mean(rnorm(100))), freq = FALSE, add = TRUE)