Statistical Inference Part 2. t test for single mean. t tests useful for PSYCH 710. Review of Statistical Inference (Part 2) Week 2

Size: px

Start display at page:

Download "Statistical Inference Part 2. t test for single mean. t tests useful for PSYCH 710. Review of Statistical Inference (Part 2) Week 2"

Wilfrid Dorsey
5 years ago
Views:

1 Statistical Inference Part PSYCH 70 eview Statistical Inference (Part ) Week Pr. Patrick ennett t-tests Effect Size Equivalence Tests Consequences Low Power s P t tests useful for t test for single mean t statistic (Ȳ µ) t = (s/ p n) - comparing mean sample to some expected value - comparing samples statistic t - sampling distribution is t distribution - unimodal, symmetric about mean - shape governed by parameter degrees--freedom (df) - when NULL hyposis is true, mean t will be zero code set.seed(55540) mu <- 0 # population mean n <- 0 # sample size stdev <- # population sd < t.val <- rep(0,) p.val <- rep(0,) for(kk in :){.sample <- rnorm(n,mu,stdev) t.results <- t.test(.sample) t.val[kk] <- t.results$statistic p.val[kk] <- t.results$p.value } null hyposis is true null hyposis is true t.test evaluates null hyposis that mu=0 p value t value

2 is true. If null hyposis is true, n z will be between ±.96 95% time and between ±.56 99% time. Therefore, using criteria ±.96 to reject null hyposis will yield a Type I error rate 0.05, whereas criteria ±.56 corresponds to a Type I error rate 0.0. Our observed z score falls outside both sets criteria, and so null hyposis is rejected. It used to be standard practice to indicate which level was used by writing that null hyposis was rejected (p <.05) or (p <.0). Nowadays, scientists are encouraged to publish exact p value for ir observed statistic. In our case, probability drawing a z that was outside range ±3.3 is , and so we would report statistical result by writing sample mean di ered significantly from 00 (z = 3.3, p = ) and so null hyposis µ = 00 was rejected. 0.3 ennett, PJ t tests ennett, PJ Hyposis Testing PSYCH 70 In previous example, we knew that must be estimated from data: di erence between two sample PSYCH 70 Hyposis Testing two independent samples = 0. ut in most cases we we do not know, and refore it sp Y ) i (Yi ˆ=s= (n ) The estimate, s can be used to calculate t, which is similar to a z score: t testsor single mean t statistic Comparing far we have considered how to decide wher a single sample mean was drawn from a population with (Y µ) di erence between two sample p = p =.36 two independent samples a mean µ. More commonly, we want to compare p t =to each (6) or to decide wher or (/ n) (/ two 0) sample (s/ n) populations scores Given So far we have considered how to decide wher single sample mean was drawn from a population not ya di er. If population with two samples are µ and µ, n two-sided null and a mean µ. More commonly, we want to compare two sample to each or to decide wher or - : µa & µb alternative hyposes are: not y di er. If population code two samples are µ and µ, n two-sided null and 4 null hyposis is false mu=0.5 null hyposis is false mu=0.5 - variances: σa & σb alternative hyposes are: set.seed(55540) H0 : µ = µ mu <- 0.5 # population Distributions sample : H0 : µmean = µ H : µ 6= µ n <- 0 # sample size Hennett, : µ 6= PJ µ - : µ a & µb PSYCH 70 Hyposis Testing stdev <- # population sd < a - variances: These hyposes can be re-written as:these hyposes can be re-written as: = Y a t.val <- rep(0,) n b n Y b = 0. 6 di erence between two sample will be normal: p.val <- rep(0,) H0 : µ µ = 0 for(kk in :){ H0 : µ µ = 0 - shape: normal (via Central Limit Theorem) : µ µ 6= 0.sample <-H rnorm(n,mu,stdev) 0.7. two independent samples H : µ µ 6= 0 t.results <- t.test(.sample) Distribution difference between sample Each sample mean is a random variable drawn from sampling distribution mean. The t.val[kk] <- t.results$statistic /n and /n. (We will sampling distributions will be µ<µso and we have variances will be mean: u d = µ a & µb and, far to decide wher a single sample mean was drawn from a population with p.val[kk] t.results$p.value Each sample mean isconsidered adistributions randomhow variable from sampling distribution mean. The- take advantage Central Limit Theorem and assume are normal.)drawn } a mean that µ. More commonly, we wantnow, to let s compare two sample to each or to decide wher or 0 /n and /n. (We - variance: create a new variable that is di erence between sample, d = Y will Y. How sampling distributions be isµd distributed? and µ, and variances will be will di er. If variables population two samples µ 0. a Normal variable, are µ 4 and 6 8,0n two-sided null and It can be shown that sum evaluates (or di erence) not twoy or more Normal is also t.test null hyposis that mu=0 take advantage Central 0.0 Limit Theorem and.0assume that distributions are normal.) Now, let s so d is distributed Normally. Furrmore, alternative mean a sum (or di erence) two variables hyposes are: p valueis sum createa mean new variable thatdistribution is di erence between sample, dt value = Y Y. How is d distributed? (or di erence) two. Therefore, di erence equals di erence between two samplingitdistributions : µ. If two can be shown that Y sum (or di erence) are two variables is also a Normal variable, d = µ Y Y H0 : or µmore = Normal µ equal, n Y d = 0. Finally, variance (or di erence) two independent random variables is so d aissum distributed Normally. Furrmore, mean sumµ (or di erence) two variables is sum H : µ a 6= sum two variances: d = Y + Y. (N.. The variance a di erence between independent (or di erence) two. Therefore, mean di erence distribution equals di erence variables also is sum, not di erence, two variances.) Therefore, if we know form between sampling distributions : Y d = µy µy. If two are populations and, n statistical ory allows us to predict form distribution between These hyposes cantwo be re-written as: samples drawn from those populations (see Figure equal, n Y d =5).0. Finally, variance a sum (or di erence) two independent random variables is or, and refore In general, we will not know eir will have to from : µ be estimated H0 µ = 0 Y sum two Y variances: d = Y + Y. (N.. The variance a di erence between independent data with formula (n )s + (n is )s H : µ µ 6= 0 Therefore, if we know form sum, variables also not di erence, two variances.) ˆY Y = + (7) n + n n n d = Y a Y b + populations and, n statistical ory allows us to predict form distribution between variable drawn from The samples drawn from those populations (see Figure 5).sampling distribution mean. /n. (We will Y sampling distributions will be µ and µ, and variances will be /n and or (Y general, (µwe µ ) ) not know eir In will, and refore will have to be estimated from -sample t-test (8) and assume that Simulation t= Y Y distributions are normal.) takey advantage Central Limit Theorem Now, let s Y data withˆ formula + (n sample, d = Y create a new variable that is di erence between Y. How is d distributed? (n )s )s underlying similarity: with df = n + n. close comparison Equations 6 and 8 will revealˆan = di erence)in two or more + variables is also a Normal variable, (7) It can be shown that parameter, sum Normal -sample t-test H0 is true Y (or both cases, t equals di erence between an observed statistic and a populationy divided n by + n n n so d is distributed Normally. Furrmore, mean a sum (or di erence) two variables is sum standard error statistic: sample pop where (or n and n are sizes Therefore, two groups. The t statistic is sample two. mean two-group di erence distribution equals di erence statisticdi erence) parameter t= µy. If two are estimated between standard error statistic two sampling distributions : Y d = µy code (Y ay ) (or (µdi erence) µ ) equal, n Y d = 0. Finally, variance sum two independent random variables t= (8) is This general formula applies to all t tests. set.seed(65540) + ˆ.Y (N.. Y The variance a di erence between sum variance, two variances: we estimate population so parameter In, a two-sample t testnormally, is done with t.test() command. Notice that mu <- independent 00 # population mean d = var.equal Y Y ourtells test will a tis distribution mu <- 00 # population mean is set to TUE. Using this setting statistic to assume thatfollow samples and have same variables also sum, not variance. di erence, two variances.) Therefore, if we know form standard an errorunderlying n <5 # samplein size = degrees n + n. close comparison Equations 6 and 8estimated will reveal similarity: withwith n + df n - freedom difference form distribution > sample. <- c(95, 9, 93, 96, 98, 99) populations and, n statistical ory allows us to predict stdev between <- 0 # population sd both t equals di erence between an observed statistic and a population parameter,stdev divided by# > sample. <- c(0, 99, 06, 00, 98, cases, ) <0 population sd samples drawn from those populations (see Figure 5). standard error statistic: > t.test(x=sample.,y=sample.,alternative="two.sided", var.equal=tue) < In general, we will not know eir or, and refore Y Y will have to be estimated from t.val <- rep(0,) bove statement t assumes data re: with formula statistic parameter p.val <- rep(0,) t = that are distributed (n )s + (n )s for(kk in :){ 0 standard error statistic + ˆY estimated (7) normally with equal variance Y =.sample. <- rnorm(n,mu,stdev) n + n n n p value.sample. <- rnorm(n,mu,stdev) This general formula applies to all t tests. t.results <- t.test(.sample.,.sample.,var.equal=t) where n and n are sample sizes two groups. The two-group t statistic is t.val[kk] <- t.results$statistic In, a two-sample t test is done with t.test() command. Notice that parameter var.equal p.val[kk] <- t.results$p.value is set to TUE. Using this setting tells to assume and (Y thaty samples ) (µ µ ) have same variance. } t= (8) ˆY Y > sample. <- c(95, 9, 93, 96, 98, 99) > sample. 00, 98, ) with df<= c(0, n + n 99,. 06, close comparison Equations 6 and 8 will reveal an underlying similarity: in > t.test(x=sample.,y=sample.,alternative="two.sided", both cases, t equals di erence between an observed statistic var.equal=tue) and a population parameter, divided by standard error statistic: where n and n are sample sizes two groups. The two-group t statistic is Each sample mean is a random t test for comparing independent t= statistic parameter 0 estimated standard error statistic This general formula applies to all t tests sample t-test H0 is true -4-0 t value 4

3 Simulation -sample t-test (equal variances) -sample t-test H0 is false -sample t-test H0 is false Simulation -sample t-test (unequal variances) -samp t unequal Var H0 is true code set.seed(65540) mu <- 00 # population mean mu <- 05 # population mean n <- 5 # sample size stdev <- 0 # population sd stdev <- 0 # population sd < t.val <- rep(0,) p.val <- rep(0,) for(kk in :){ sample. <- rnorm(n,mu,stdev).sample. <- rnorm(n,mu,stdev) p value t value t.results <- t.test(.sample.,.sample.,var.equal=t) t.val[kk] <- t.results$statistic p.val[kk] <- t.results$p.value } stdev_diff <- sqrt(( ( (5-)*00 + (5-)*00 ) / (5+5-) ) * ( (/5)+(/5) ) ) =.88 Expected Value t = (00-05)/.88 = -.76 (actual mean = -.8) code set.seed(65540) mu <- 00 # population mean mu <- 00 # population mean n <- 5 # sample size stdev <- 0 # population sd stdev <- 3 # population sd < t.val <- rep(0,) p.val <- rep(0,) for(kk in :){.sample. <- rnorm(n,mu,stdev).sample. <- rnorm(n,mu,stdev) t.results <- t.test(.sample.,.sample.,var.equal=t) t.val[kk] <- t.results$statistic p.val[kk] <- t.results$p.value } t value Effect Size Cohen s ds what makes a p-value significant or non-significant? - alpha, sample size (power), effect size classes effect size - d - standardized differences (distances) btwn - r - measures association (variance accounted for) Used for between-subjects designs: d s = X X q (n )SD +(n )SD /(n + n ) d s = t r n + n t/ p N

4 > m<-0 > m<- > n = 0 > stdev <- 4 > s <- rnorm(n,m,stdev) > s <- rnorm(n,m,stdev) > (t.test. <- t.test(s,s,var.equal=t)) Two Sample t-test data: s and s t = , df = 38, p-value = H: true difference in is not equal to 0 95 percent confidence interval: sample estimates: mean x mean y > t <- t.test.$statistic > ( ds <- t*sqrt( (/n)+(/n) ) ) > m<-0 > m<- > n = 00 > stdev <- 4 > s <- rnorm(n,m,stdev) > s <- rnorm(n,m,stdev) > (t.test. <- t.test(s,s,var.equal=t)) Two Sample t-test data: s and s t = , df = 98, p-value =.764e-05 H: true difference in is not equal to 0 95 percent confidence interval: sample estimates: mean x mean y > t <- t.test.$statistic > ( ds <- t*sqrt( (/n)+(/n) ) ) Hedges gs unbiased estimate population ds g s = d s 3 4(n + n ) 9 Convert ds to r Cohen s dz r = d s p d s +(N N)/(n n ) used for within-subjects design (paired scores) - Md = mean difference scores - Di = difference score i - N = number difference scores d z = M d p ( P (Di M d )) /(N ) d z = t p N

5 eta-squared measure association between independent & dependent variables omega-squared unbiased estimate association in population = SS e ect SS total proportion variation in Y accounted for by group membership! = df e ect (MS e ect MS error ) SS total + MS error p = SS e ect SS e ect + SS error for factorial experiments (more than independent variable)! p = df e ect (MS e ect MS error ) df error MS e ect +(N df error ) MS error p = F df e ect F df e ect + df error Why report effect size? Why not just report p-values as index effect? - p-values depend on number subjects/events/measures - increased power leads to lower p values Would like to have measure doesn t depend on experiment N - or on particular aspects experimental design Effect size measures try to do this - and also give information about magnitude effect Sample size affects precision ds estimation simulation parameters: < # 5K samples m <- 0 # mean pop m <- # mean pop stdev <- 4 # sd population simulation : n <- 0 # sample size simulation : n <- 70 # sample size n=n= t (-sample test) Cohens ds n=n= t (-sample test) Cohens ds effect size measures vary across samples variance sampling distribution is larger for smaller samples

6 Effect sizes in published reports (winner s curse) effect size inflation (winner s curse) n=n=0 n=n=70 n = n = 0 apply statistical threshold (e.g., p<.05) to effect size - what is median significant ds? median = depends on sample size t (-sample test) t (-sample test) Significant Cohen ds (p<.05) - for small samples, only big ds are significant - large samples more likely to get significant large & small ds published studies usually report significant results n = n = 70 median = small n studies lead to overestimation true effect size Cohens ds Cohens ds Significant Cohen ds (p<.05) effect size measures vary across samples variance sampling distribution is larger for smaller samples significant ds are more extreme with small samples What does a significant p-value mean? probability replicating p=.05 effect a ejecting null hyposis Null Observed effect p = sample t-test H0 is false significant p-value indicates that result is unusual - assuming null hyposis is true and assumptions are correct That is LL it - it is not equal to probability that H0 is true - it is not equal to probability replicating result.96 x sem observed effect significant p= x sem b The sampling distribution critical value t(8)= x sem expect 50% replications to yield smaller effect t value n=9, mu=.86*sem

7 What does a p-value mean? non-significant p-value indicates that result is NOT unusual assuming null hyposis is true. It does NOT mean that null hyposis is true - e.g., a non-significant -sample t-test does NOT mean that populations are equal - You do not accept null hyposis, you simply fail to reject it. To accept null hyposis, perform an equivalence test Equivalence Tests reverse H0 & H: - H0: re is an effect - H: re is no effect try to reject H0 in favour H define Smallest Effect Sizes Interest (SESOI) - test upper & lower bounds (SESOI) with -tailed t-tests - H0-u: observed effect unusually smaller than upper SESOI? - H0-l: observed effect unusually larger than lower SESOI? if both -tailed tests are significant, n we say observed effect is smaller than SESOI - e.g., two groups are equivalent Equivalence Tests reverse H0 & H: - H0: re is an effect; H: re is no effect - try to reject H0 in favour H define Smallest Effect Sizes Interest (SESOI; L & U) - evaluate hyposis with two -tailed t-tests: H0-u: is observed effect unusually smaller than U? H0-l: observed effect unusually larger than L? if both -tailed tests are significant, n observed effect is smaller than SESOI - e.g., two groups are equivalent Null Hyposis vs Equivalence Tests Four possible outcomes when evaluating difference between groups Null Hyposis Significance Test Equivalence Test Equivalent to Zero Not Equivalent to Zero Not Different from Zero + - Different from Zero - +

8 NHST vs Equivalence Tests (4 outcomes) Spurious findings? equivalence region difference between L 0 U Statistically Equivalent & Different Statistically Equivalent & Not Different Not Equivalent & Different Not Equivalent & Not Different Over time: - re has been a dramatic increase in ease with which we can analyze a single data set in multiple ways - also, effects/phenomena interest typically become increasingly smaller & more subtle as a research area matures - but, sample sizes tend not to change very much - nor have unwritten rules about deciding if a paper is publishable se trends make it likely that number papers reporting spurious statistically significant findings has increased Lakens, D. Equivalence tests: practical primer for t tests, correlations, and metaanalyses. Social Psychological & Personality Science, 8(4), Power (Type I & Type II Errors) Positive Predictive Value () Table : Possible outcomes hyposis testing. decision H0 is True H0 is False reject H0: Type I (p = ) Correct (p = =power) do not reject H0: Correct (p = ) Type II error (p = ) Given that we ve rejected HO in favour H - what is probability that our conclusion is correct? decision H0 is True H0 is False reject H0: p Type I (p = ) Correct (p = =power) do not reject H0: Correct (p = ) Type II error (p = ) p Type I Error: reject H0 when it is true (alpha) Type II Error: fail to reject H0 when it is false (beta) = p / (p + p) Power = Probability rejecting false H0 (-beta)

9 Ioannidis 005 depends on a priori probability that H0 is true Ioannidis 005 Table. esearch Findings and True elationships esearch Finding True elationship Yes No Total Yes c( β)/( + ) cα/( + ) c( + α β)/( + ) No cβ/( + ) c( α)/( + ) c( α + β)/( + ) Total c/( + ) c/( + ) c DOI: 0.37/journal.pmed.0004.t00 alpha is Type error rate (-eta) is power ( minus Type II error rate) c is number effects begin examined is ratio number true effects divided number no effects Positive Predictive Value () is probability that a statistically significant finding is true = (-eta) / ( - *eta + alpha) reduced by low power - small studies & small effects reduced by low - testing more false hyposes - increasing bias greater flexibility in design & analysis selective reporting positive results N.. higher beta = lower power beta = 0. alpha = 0.05 Statistically significant finding >50% chance being true/correct if: (-eta) > alpha Ioannidis 005 effect alpha on Ioannidis 005 effect bias on bias: proportion null results findings that are reported as true effects beta = 0. alpha = 0.05 beta = 0. alpha = 0.0 beta = 0. bias = 0 alpha = 0.0 beta = 0. bias = 0. alpha = 0.0

10 Ioannidis 005 effect bias on bias: proportion null results findings that are reported as true effects Factors that lower beta = 0. bias = 0 alpha = 0.05 beta = 0. bias = 0. alpha = 0.05 small studies & small effects (low power) testing more hyposes (lowering ) greater flexibility in design and analysis (increasing bias) more independent research teams studying phenomenon - due to selective reporting positive results

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8 Concepts from previous lectures HUMBEHV 3HB3 one-sample t-tests Week 8 Prof. Patrick Bennett sampling distributions - sampling error - standard error of the mean - degrees-of-freedom Null and alternative/research