UCLA STAT 251. Statistical Methods for the Life and Health Sciences. Hypothesis Testing. Instructor: Ivo Dinov,

Similar documents
Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

STAT Chapter 8: Hypothesis Tests

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

Statistics for IT Managers

Hypothesis testing. Data to decisions

POLI 443 Applied Political Research

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Inferential statistics

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Visual interpretation with normal approximation

Lecture 28 Chi-Square Analysis

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Review of Statistics 101

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

ECON Introductory Econometrics. Lecture 2: Review of Statistics

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Introductory Econometrics

Introductory Econometrics. Review of statistics (Part II: Inference)

ECO220Y Simple Regression: Testing the Slope

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Two-Sample Inferential Statistics

E509A: Principle of Biostatistics. GY Zou

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

F79SM STATISTICAL METHODS

Chapter 5 Confidence Intervals

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

9 One-Way Analysis of Variance

First we look at some terms to be used in this section.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Mathematical statistics

T test for two Independent Samples. Raja, BSc.N, DCHN, RN Nursing Instructor Acknowledgement: Ms. Saima Hirani June 07, 2016

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Part III: Unstructured Data

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Lecture on Null Hypothesis Testing & Temporal Correlation

Statistics for Managers Using Microsoft Excel

Medical statistics part I, autumn 2010: One sample test of hypothesis

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Topic 10: Hypothesis Testing

Chapter 5: HYPOTHESIS TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING

CHAPTER 9: HYPOTHESIS TESTING

Single Sample Means. SOCY601 Alan Neustadtl

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

hypothesis a claim about the value of some parameter (like p)

Applied Statistics for the Behavioral Sciences

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Chapter 8 of Devore , H 1 :

Business Statistics. Lecture 10: Course Review

Statistical Inference

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

10.2: The Chi Square Test for Goodness of Fit

Sampling Distributions: Central Limit Theorem

Hypothesis Testing hypothesis testing approach formulation of the test statistic

1; (f) H 0 : = 55 db, H 1 : < 55.

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Topic 10: Hypothesis Testing

POLI 443 Applied Political Research

Tests about a population mean

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Beyond p values and significance. "Accepting the null hypothesis" Power Utility of a result. Cohen Empirical Methods CS650

Hypothesis Testing The basic ingredients of a hypothesis test are

Power of a hypothesis test

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Difference between means - t-test /25

(1) Introduction to Bayesian statistics

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Ordinary Least Squares Regression Explained: Vartanian

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Hypothesis tests for two means

Chapter 7: Hypothesis Testing

Chapter 9 Inferences from Two Samples

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Two Sample Hypothesis Tests

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( )

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis

Introduction to Econometrics. Review of Probability & Statistics

Inferences About Two Proportions

5.2 Tests of Significance

Probability and Statistics Notes

Math 101: Elementary Statistics Tests of Hypothesis

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Samples and Populations Confidence Intervals Hypotheses One-sided vs. two-sided Statistical Significance Error Types. Statistiek I.

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Relating Graph to Matlab

Probability and Statistics

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Transcription:

UCLA STAT 251 Statistical Methods for the Life and Health Sciences Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Winter 22 http://www.stat.ucla.edu/~dinov/ Testing What do we test? Types of hypotheses Measuring the evidence against the null testing as decision making Why tests should be supplemented by intervals? Slide 1 Slide 2 Measuring the distance between the true-value and the estimate in terms of the Intuitive criterion: Estimate is credible if it s not far away from its hypothesized true-value! But how far is far-away? Compute the distance in standard-terms: Estimator TrueParameterValue T = Reason is that the distribution of T is known in some cases (Student s t, or N(,1)). The estimator (obs-value) is typical/atypical if it is close to the center/tail of the distribution. Slide 9 Comparing CI s and significance tests These are different methods for coping with the uncertainty about the true value of a parameter caused by the sampling variation in estimates. Confidence interval: A fixed level of confidence is chosen. We determine a range of possible values for the parameter that are consistent with the data (at the chosen confidence level). Significance test: Only one possible value for the parameter, called the hypothesized value, is tested. We determine the strength of the evidence (confidence) provided by the data against the proposition that the hypothesized value is the true value. Slide 1 Hypotheses What do t -values tell us? (Our estimate is typical/atypical, consistent or inconsistent with our hypothesis.) What is the essential difference between the information provided by a confidence interval (CI) and by a significance test (ST)? (Both are uncertainty quantifiers. CI s use a fixed level of confidence to determine possible range of values. ST s one possible value is fixed and level of confidence is determined.) Guiding principles We cannot rule in a hypothesized value for a parameter, we can only determine whether there is evidence to rule out a hypothesized value. The null hypothesis tested is typically a skeptical reaction to a research hypothesis Slide 11 Slide 12 1

Comments Why can't we (rule-in) prove that a hypothesized value of a parameter is exactly true? (Because when constructing estimates based on data, there s always sampling and may be non-sampling errors, which are normal, and will effect the resulting estimate. Even if we do 6, ESP tests, as we saw earlier, repeatedly we are likely to get estimates like.2 and.21, and.199999, etc. non of which may be exactly the theoretically correct,.2.) Why use the rule-out principle? (Since, we can t use the rule-in method, we try to find compelling evidence against the observed/dataconstructed estimate to reject it.) Why is the null hypothesis & significance testing typically used? (H o : skeptical reaction to a research hypothesis; ST is used to check if differences or effects seen in the data can be explained simply in terms of sampling variation!) Comments How can researchers try to demonstrate that effects or differences seen in their data are real? (Reject the hypothesis that there are no effects) How does the alternative hypothesis typically relate to a belief, hunch, or research hypothesis that initiates a study? (H 1 =H a : specifies the type of departure from the nullhypothesis, H (skeptical reaction), which we are expecting (research hypothesis itself). In the Cavendish s mean Earth density data, null hypothesis was H : µ =5.517. We suspected bias, but not bias in any specific direction, hence H a :µ!=5.517. Slide 13 Slide 14 Comments The t-test In the ESP Pratt & Woodruff data, (skeptical reaction) null hypothesis was H : µ =.2 (pureguessing). We suspected bias, toward success rate being higher than that, hence the (research hypothesis) H a :µ>.2. Other commonly encountered situations are: H : µ 1 µ 2 = H a : µ 1 µ 2 > H : µ rest µ activation = H a : µ rest µ activation!= Using ˆ θ to test H : θ = θ versus some alternative H 1. STEP 1 Calculate the test statistic, STEP 2 STEP 3 ˆ θ θ t = se( ˆ θ ) = estimate - hypothesized value standard error [This tells us how many standard errors the estimate is above the hypothesized value (t pos itive) or belo w the hypothes ized value (t negative).] Calculate the P -value using the following table. Interpret the P -value in the context of the data. Slide 15 Slide 16 Alternative H : ersion > 1 of Table 9.1.1) too much bigger than -scale Pictorial representation of the T-test H : θ= θ H 1 : θ > θ t = t-scale (# of std errors) se( ) Alternative H : < 1 Figure 9.3.1: Testing H : = too much smaller than Pictorial representation of the T-test H 1 : θ < θ Slide 17 t Slide 18 t 2

Alternative H : 1 (2-sided) Figure 9.3.1: Testing H : = too far from (either direction) Slide 19 STAT 251, UCLA, Ivo Dinov Pictorial representation of the T-test H 1 : θ!= θ t t = Shaded Area t t The t-test Alternative H : θ > θ hypothesis provided by P -value H 1 : θ > θ ˆ θ too much bigger than θ P = pr(t t ) (i.e., ˆ θ - θ too large) H 1 : θ < ˆ θ θ too much smaller than θ P = pr(t t ) (i.e., ˆ θ - θ too negative) H 1 : θ ˆ θ θ too far from θ P = 2 pr(t t ) (i.e., ˆ θ - θ too large) where T ~ Student(df ) Slide 2 Interpretation of the p-value Alternative Figure 9.3.1: Testing H : = (Pictorial version of Table 9.1.1) -scale t = t-scale (# of std errors) se( ) TABLE 9.3.2 Interpreting the Size of a P -Value Approximate size of P -Value Translation >.12 (12%) No evidence against H.1 (1%) Weak evidence against H H : > 1 H : < 1 too much bigger than too much smaller than t t.5 (5%) Some evidence against H.1 (1%) Strong evidence against H.1 (.1%) Very Strong evidence against H H : 1 (2-sided) too far from (either direction) t t = Shaded Area t t Slide 21 Slide 22 s from t-tests The is the probability that, if the hypothesis was true, sampling variation would produce an estimate that is further away from the hypothesized value than our data-estimate. The measures the strength of the evidence against H. The smaller the, the stronger the evidence against H. (The second and third points are true for significance tests generally, and not just for t-tests.) What does the t-statistic tell us? The T-statistics, t = ˆ θ θ tells us (in std. units) if se( ˆ θ ) the observed value/estimate is typical/consistent and can be explained by the variation in the sampling distribution. When do we use a 2-tailed rather than a 1-tailed test? We use two-sided/two-tailed test, unless there is a prior (knowledge available before data was collected) or a strong reason to believe that the result should go in one particular direction ( µ ). Slide 23 Slide 24 3

What were the 3 types of alternative hypothesis involving the parameter θ and the hypothesized value θ? Write them down! Let s go through and construct our own t-test Table. For each alternative, think through what would constitute evidence against the hypothesis and in favor of the alternative. t gative) Then write down the corresponding s in terms of t and represent these s on hand-drawn curves [ P=Pr(T>=t ), P=Pr(T<=t ), P=2Pr(T>= t ).] Slide 25 t t t What does the measure? (If H was true, sampling variation alone would produce an estimate farther then the hypothesized value.) What do very small s tell us? What do large s tell us? (strength of evidence against H.) Pair the phrases: the... the, the... the evidence for/against... the null hypothesis. Do large values of t correspond to large or small s? Why? What is the relationship between the Student (df) distribution and Normal(,1) distribution? (identical as ) Slide 26 n Is a second child gender influenced by the gender of the first child, in families with >1 kid? TABLE 9.3.4 First and Second Births by S ex Analysis of the birth-gender data data summary Second Child Male Female Total First Child Male 3,22 2,776 5,978 Female 2,62 2,792 5,412 Total 5,822 5,568 11,39 Research hypothesis needs to be formulated first before collecting/looking/interpreting the data that will be used to address it. Mothers whose 1 st child is a girl are more likely to have a girl, as a second child, compared to mothers with boys as 1 st children. Data: 2 yrs of birth records of 1 Hospital in Auckland, NZ. Slide 28 Second Child Group Number of births Number of girls 1 (Previous child was girl) 5412 2792 (approx. 51.6%) 2 (Previous child was boy) 5978 2776 (approx. 46.4%) Let p 1 =true proportion of girls in mothers with girl as first child, p 2 =true proportion of girls in mothers with boy as first child. Parameter of interest is p 1 -p 2. H : p 1 -p 2 = (skeptical reaction). H a : p 1 -p 2 > (research hypothesis) Slide 29 testing as decision making TABLE 9.4.1 Decision Making Decision made H is true H is false Accept H as true OK Type II error Reject H as false Type I error OK Slide 3 Actual situation Sample sizes: n 1 =5412, n 2 =5978, Sample proportions (estimates) = 2792/ 5412.5159, = 2776/ 5978.4644, 1 2 H : p 1 -p 2 = (skeptical reaction). H a : p 1 -p 2 > (research hypothesis) Analysis of the birth-gender data Samples are large enough to use Normal-approx. Since the two proportions come from totally diff. mothers they are independent use formula 8.5.5.a Estimate - HypothesizedValue t = = 5.49986 = = = ˆ (1 ˆ ) ˆ (1 ˆ ) ˆ ˆ p p p p p p 1 2 + n n P value = Pr( T t ) = 1.9 1 8 Slide 31 4

Analysis of the birth-gender data Samples are large enough to use Normal-approx. Since the two proportions come from totally diff. mothers they are independent use formula Second Child Male Female Total First Child Male 3,22 2,776 5,978 Female 2,62 2,792 5,412 Total 5,822 5,568 11,39 p 1 - p 2 Estimate - HypothesizedValue t = = 5.49986 = P value = Pr( T t 1.9 1 8 ) = Slide 32 Analysis of the birth-gender data We have strong evidence to reject the H, and hence conclude mothers with first child a girl a more likely to have a girl as a second child. How much more likely? A 95% CI: CI (p 1 -p 2 ) =[.33;.7]. And computed by: estimate ± z = ± 1.96 = (1 ) (1 ) ± 1.96 1 1 + 2 2 = n n.515 ± 1.96.93677 = [3% ;7%] Slide 33 Tests and confidence intervals If 12 researchers each independently investigated a it true/ hypothesis, how many researchers would you expect to obtain a result that was significant at the 5% level (just by chance)? (Type I, false-positive; 12*5%=6) What was the other type of error described? What was it called? When is the idea useful? (Type II, falsenegative) Power of statistical test = 1-β, where β =P(Type II error) = P(Accepting Ho as true, when its truly false) Slide 36 A two-sided test of H : θ = θ is significant at the 5% level if and only if θ lies outside a 95% confidence interval for θ. A two-sided test of H : θ = θ gives a result that is significant at the 5% level if the =2Pr(T >= t ) <.5. Where t =(estimate-hypoth dvalue)/(θ ) t =(θ θ )/(θ ). Let t be a threshold chosen so that Pr(T>= t ) =.25. Now t tells us now many s θ and θ are apart (without direction in their diff.) If t > t, then θ is more than t s away from θ and hence lies outside the 95% CI for θ. Slide 38 Significance Significance cont. Statistical significance relates to the strength of the evidence of existence of an effect. The practical significance of an effect depends on its size how large is the effect. A small provides evidence that the effect exists but says nothing at all about the size of the effect. To estimate the size of an effect (its practical significance), compute a confidence interval. Slide 39 A non-significant test does not imply that the null hypothesis is true (or that we accept H ). It simply means we do not have (this data does not provide) the evidence to reject the skeptical reaction, H. To prevent people from misinterpreting your report: Never quote a about the existence of an effect without also providing a confidence interval estimating the size of the effect. Slide 4 5

General ideas of test statistic and p-value A test statistic is a measure of discrepancy between what we see in data and what we would expect to see if H was true. The is the probability, calculated assuming that the null hypothesis is true, that sampling variation alone would produce data which is more discrepant than our data set. Slide 44 6