Methodological workshop How to get it right: why you should think twice before planning your next study. Part 1

Size: px

Start display at page:

Download "Methodological workshop How to get it right: why you should think twice before planning your next study. Part 1"

Georgia Nichols
5 years ago
Views:

1 Methodological workshop How to get it right: why you should think twice before planning your next study Luigi Lombardi Dept. of Psychology and Cognitive Science, University of Trento Part 1

2 1 The power algebra

3 1 The power algebra The Neyman-Pearson paradigm (N-H)

4 1 The power algebra

5 1 The power algebra power The N-H table

6 1 The power algebra Probabilistic interpretation

7 1 The power algebra Graphical interpretation

8 1 The power algebra Decision rule in the N-H approach

9 1 The power algebra Power analysis is based on four different parameters: Power (population level) Hypothetical Sample size Type I error (population level) Effect size (population level)

10 1 The power algebra Effect size parameter defining HA; it represents the degree of deviation from H0 in the underlying population Effect size (population level)

11 1 The power algebra A priori power analysis

12 1 The power algebra A priori power analysis: an example using the pwr package One-sample t-test: H pwr.t.test(d=0.2,power=0.85,sig.level=0.05,n=null,typ e="one.sample",alternative="greater") R syntax One-sample t test power calculation n = d = 0.2 sig.level = 0.05 power = 0.85 alternative = greater R output

13 1 The power algebra Post hoc power analysis

1 The power algebra Post hoc power analysis: an example using the pwr package One-sample t-test: H0 0 60 0.05 0.2 pwr.t.test(d=0.2,n=60,sig.level=0.

14 1 The power algebra Post hoc power analysis: an example using the pwr package One-sample t-test: H pwr.t.test(d=0.2,n=60,sig.level=0.05,power=null,type= "one.sample",alternative="greater") R syntax One-sample t test power calculation n = 60 d = 0.2 sig.level = 0.05 power = alternative = greater R output

15 1 The power algebra Sensitivity analysis.

16 1 The power algebra Sensitivity analysis: an example using the pwr package One-sample t-test: H pwr.t.test(n=50,power=0.9,sig.level=0.05,d=null,type= "one.sample",alternative="greater") R syntax One-sample t test power calculation n = 50 d = sig.level = 0.05 power = 0.9 alternative = greater R output

17 1 The power algebra Criterion analysis.

18 1 The power algebra Criterion analysis: an example using the pwr package One-sample t-test: H pwr.t.test(n=100,d=0.3,power=0.9,sig.level=null,type= "one.sample",alternative="greater") R syntax One-sample t test power calculation n = 100 d = 0.3 sig.level = power = 0.9 alternative = greater R output

19 1 The power algebra: the power fallacy.

20 1 The power algebra: the power fallacy Observed power analysis The basic idea of observed power analysis is that there is evidence for the null hypothesis being true if p > and the computed power is high at the observed effect size d The effect size (at population level) is replaced with the observed effect size d (at the sample level)

21 1 The power algebra: the power fallacy Observed power analysis The effect size (at population level) is replaced with the observed effect size d (at the sample level) Note d is not a theoretical value (hypothetical value)

22 1 The power algebra: the power fallacy Observed power analysis The effect size (at population level) is replaced with the observed effect size d (at the sample level) Note d is not a theoretical value (hypothetical value) It is estimated from the sample according to the theoretical model for the null hypothesis

23 1 The power algebra: the power fallacy Observed power analysis The effect size (at population level) is replaced with the observed effect size d (at the sample level) Note d is not a theoretical value (hypothetical value) It is estimated from the sample according to the theoretical model for the null hypothesis It is biased!!!

1 The power algebra: the power fallacy Observed power analysis hypothetical derivations Basic power analysis claim: (p > ) AND (power is high) entails «evidence for H0 is high» Some derivations : NOT

24 1 The power algebra: the power fallacy Observed power analysis hypothetical derivations Basic power analysis claim: (p > ) AND (power is high) entails «evidence for H0 is high» Some derivations : NOT [(p > ) AND (power is high)] iff NOT(p > ) OR NOT(power is high) Some derivations : 1. NOT(p > ) AND (power is high) entails?? 2. (p > ) AND NOT(power is high) entails?? 3. NOT(p > ) AND NOT(power is high) entails??

25 1 The power algebra: the power fallacy Observed power analysis hypothetical derivations Some interpretations: (p > ) AND NOT(power is high) entails «evidence for H0 is weak» The underlying idea is: if we increase the sample size, then we raise the power, and probably we can reject H0! However some of these interpretations lead us to the a paradox!

26 1 The power algebra: the power fallacy There is a negative monotonic relationship between observed power and p-value!

27 1 The power algebra: the power fallacy That is to say, because of the one-to-one relationship between p-values and observed power, nonsignificant p-values always correspond to low observed powers!!! There is a negative monotonic relationship between observed power and p-value!

1 The power algebra: the power fallacy That is to say, because of the one-to-one relationship between p-values and observed power, nonsignificant p-values always correspond to low observed powers!

28 1 The power algebra: the power fallacy That is to say, because of the one-to-one relationship between p-values and observed power, nonsignificant p-values always correspond to low observed powers!!! Hence, we will never observe nonsignificant p-values corresponding to high observed powers. The main claim is a nonsense! There is a negative monotonic relationship between observed power and p-value!

29 1 The power algebra: the power fallacy relationship between observed power and p-value simulation study

1 The power algebra: the power fallacy One-sample t-test: H0 0 = 0 (simulation study) n <- 50 mu0 <- 0 sd <- 1 B <- 2000 simpv <- rep(0,b) simpw <- rep(0,b) for (b in 1:B) { } X <- rnorm(n,mu0,sd)

30 1 The power algebra: the power fallacy One-sample t-test: H0 0 = 0 (simulation study) n <- 50 mu0 <- 0 sd <- 1 B < simpv <- rep(0,b) simpw <- rep(0,b) for (b in 1:B) { } X <- rnorm(n,mu0,sd) dobs <- (mean(x))/sqrt(((n-1)*sd^2)/(n-1)) simpv[b] <- t.test(x)$p.value simpw[b] <- pwr.t.test(d=dobs,n=n,sig.level=0.05,power=null, type="one.sample",alternative="two.sided")$power plot(simpv,simpw,ylab="observed power", xlab="p-value") R syntax

31 2 Computing observed effect sizes

32 2 Computing observed effect sizes Observed effect sizes allow to compute the magnitute of an effect of interest. They can be understood as estimates of the differences between groups or the strength of associations between variables. Widely used examples of observed effect sizes are: Different typologies of d measures (Cohen, 1988; Hedges, 1981; Rosenthal, 1994; Dunlap et al., 1996) Association measures such as, for example, the correlation r Differences Between groups Association between quantitative variables

33 2 Computing observed effect sizes Observed effect size for comparing two independent groups

34 2 Computing observed effect sizes Observed effect size for comparing two independent groups with t values Note this is a Transformation index

35 2 Computing observed effect sizes Observed effect size for comparing two dependent groups with t values

36 2 Computing observed effect sizes Conversion formulae Note however that conversions may unnecessarily incur in some sort of bias

2 Computing observed effect sizes Observed effect size derived from regression models In general, it is always possible to obtain t values from a regression model for each continuous predictor

37 2 Computing observed effect sizes Observed effect size derived from regression models In general, it is always possible to obtain t values from a regression model for each continuous predictor variable and also for each group (level) of a categorical predictor variable (specifically for each of its recoded dummy variables of the categorical predictor): Categorical predictor Continuous predictor where n1 and n2 are the sample sizes for two groups and df denotes the degrees of freedom used for the associated t value in a linear model

2 Computing observed effect sizes Deriving approximate confidence intervals (CI) for effect sizes In general, computing approximate CI for effect sizes is not an easy task as the equations usually

38 2 Computing observed effect sizes Deriving approximate confidence intervals (CI) for effect sizes In general, computing approximate CI for effect sizes is not an easy task as the equations usually vary according to the selected effect size index and also the way it has been derived from the specific statistical analysis. A general equation is the following: 95% CI for ES The main problem regards the way we compute the asymptotic standard error (se). A better way may be to use a parametric bootstrap approach to derive empirical Cis for effect sizes.

2 Computing observed effect sizes Multiple regression model: 1 quant. predictor + 1 categ. predictor (simulation study) beta0 <- 0 beta1 <- 0.5 1 beta2 <- -2.

39 2 Computing observed effect sizes Multiple regression model: 1 quant. predictor + 1 categ. predictor (simulation study) beta0 <- 0 beta1 < beta2 < n <- 100 x1 <- rnorm(n,10,5) a <- c(rep("a1",n/2),rep("a2",n/2)) x2 <- c(rep(0,n/2),rep(1,n/2)) y <- beta0 + beta1*x1 + beta2*x2 + rnorm(n,0,4) plot(x1,y) plot(x2,y) boxplot(y ~ a) MR <- lm(y ~ x1 + a) summary(mr) # effect size categorical variable a - second level (a2) d <- (summary(mr)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(mr$df)) # effect size for the quantitative variable x1 r <- summary(mr)$coefficients[2,3]/sqrt(summary(mr)$coefficients[2,3]^2 + MR$df) d r R syntax ( )

# Parametric bootstrap for approximate 95% CIs for effect sizes ##### # number of simulations: B B <- 500 dsim <- rep(0,b) rsim <- rep(0,b) for (b in 1:B) { YS <- simulate(mr,1)[,1] MS <- lm(ys ~ x1

40 # Parametric bootstrap for approximate 95% CIs for effect sizes ##### # number of simulations: B B <- 500 dsim <- rep(0,b) rsim <- rep(0,b) for (b in 1:B) { YS <- simulate(mr,1)[,1] MS <- lm(ys ~ x1 + a) # absolute effect size dsim[b] <- abs(summary(ms)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(ms$df)) rsim[b] <- summary(ms)$coefficients[2,3]/sqrt(summary(ms)$coefficients[2,3]^2 + MS$df) } par(mfrow=c(1,2)) plot(density(dsim),main="distribution for simulated d ") hist(dsim,freq=f,add=t) plot(density(rsim),main="distribution for simulated r") hist(rsim,freq=f,add=t) quantile(dsim,probs=c(0.025,0.975)) quantile(rsim,probs=c(0.025,0.975)) R syntax (end) Multiple regression model: 1 quant. predictor + 1 categ. predictor (simulation study) 1

41 1 Multiple regression model: 1 quant. predictor + 1 categ. predictor (simulation study) 95% CI for d [0.508, 1.357] 95% CI for r [0.368, 0.654]

2 Multiple logistic regression model: 1 quant. predictor + 1 categ. predictor (simulation study) beta0 <- 0 beta1 <- 0.5 beta2 <- -2.

42 2 Multiple logistic regression model: 1 quant. predictor + 1 categ. predictor (simulation study) beta0 <- 0 beta1 <- 0.5 beta2 < n <- 100 x1 <- rnorm(n,10,5) a <- c(rep("a1",n/2),rep("a2",n/2)) x2 <- c(rep(0,n/2),rep(1,n/2)) mul <- beta0 + beta1*x1 + beta2*x2 # linear predictor pis <- exp(mul)/(1+exp(mul)) # inverse transformation mul y <- rbinom(n,40,pis) # generate binomial counts u.b. = 40 plot(x1[a=="a1"],y[a=="a1"],xlab="x1",ylab="y") points(x1[a=="a2"],y[a=="a2"],pch=3) MR <- glm(cbind(y,40-y) ~ x1 + a, family='binomial') summary(mr) df <- 97 # as if t-tests were used # effect size categorical variable a - second level (a2) d <- (summary(mr)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(df)) # effect size for the quantitative variable x1 r <- summary(mr)$coefficients[2,3]/sqrt(summary(mr)$coefficients[2,3]^2 + df) d r R syntax ( )

43 2 Multiple logistic regression model: 1 quant. predictor + 1 categ. predictor (simulation study) # Parametric bootstrap for approximate 95% CIs for effect sizes ##### B <- 500 dsim <- rep(0,b) rsim <- rep(0,b) for (b in 1:B) { YS <- simulate(mr,1)[,1] MS <- glm(ys ~ x1 + a, family='binomial') # absolute effect size dsim[b] <- abs(summary(ms)$coefficients[3,3]*(n))/(sqrt((n/2)^2)*sqrt(df)) rsim[b] <- summary(ms)$coefficients[2,3]/sqrt(summary(ms)$coefficients[2,3]^2 + df) } par(mfrow=c(1,2)) plot(density(dsim),main="distribution for simulated d ") hist(dsim,freq=f,add=t) plot(density(rsim),main="distribution for simulated r") hist(rsim,freq=f,add=t) quantile(dsim,probs=c(0.025,0.975)) quantile(rsim,probs=c(0.025,0.975)) R syntax (end)

44 2 Multiple logistic regression model: 1 quant. predictor + 1 categ. predictor (simulation study) 95% CI for d [2.318, 2.767] 95% CI for r [0.908, 0.917]

45 2 Multiple logistic regression model: 1 quant. predictor + 1 categ. predictor (simulation study) z Categorical predictor Continuous predictor For glm (generalized linear models) the t values must be replaced with z values. However, the degrees of freedom should be computed as if t- tests were used. Cautionary note When using glm models to derive ESs, it is uncertain the amount of bias that may be incurred using the above modified equations

46 3 Beyond power calculations

3 Beyond power calculations One of the main problems of standard power analysis is that it puts a narrow emphasis on statistical significance which is the primary focus of many study designs.

47 3 Beyond power calculations One of the main problems of standard power analysis is that it puts a narrow emphasis on statistical significance which is the primary focus of many study designs. However, in noisy, small-sample settings, statistically significant results can often be misleading. This is particularly true when observed power analysis is used to evaluate the statistical results.

48 3 Beyond power calculations A better approach would be Design Analysis (DA): a set of statistical calculations about what could happen under hypothetical replications of a study (that focuses on estimates and uncertainties rather than on statistical significance)

49 3 Beyond power calculations Somehow this work represents a kind of conceptual «bridge» linking the frequentist approach with a more Bayesian oriented perspective

50 3 Beyond power calculations DA main tokens The observed effect The true population effect The standard error (SE) of the observed effect The Type I error A hypothetical normally distributed random variable with parameters D and s (note this constitutes a conceptual leap)

51 3 Beyond power calculations DA main tokens The main goals are to compute: and dc being the cumulative standard normal distribution and the critical value for the effect size, respectively

52 3 Beyond power calculations DA main tokens The main goals are to compute:

53 3 Beyond power calculations DA main tokens The main goals are to compute:

54 3 Beyond power calculations Gelman & Carlin (2014), p. 644

55 3 Beyond power calculations retrodesign <- function(a, s, alpha=.05, df=inf, n.sims=10000){ z <- qt(1-alpha/2, df) p.hi <- 1 - pt(z-a/s, df) p.lo <- pt(-z-a/s, df) power <- p.hi + p.lo types <- p.lo/power estimate <- A + s*rt(n.sims,df) significant <- abs(estimate) > s*z exaggeration <- mean(abs(estimate)[significant])/a return(list(power=power,types=types,exaggeration=exaggeration)) } R function: Gelman & Carlin (2014), p. 644

56 3 Beyond power calculations A simple example: linear regression

3 Beyond power calculations Call: lm(formula = y ~ x) Simple regression with lm() Residuals: Min 1Q Median 3Q Max -15.1642-4.7063-0.9168 5.5848 15.6263 Coefficients: Estimate Std.

57 3 Beyond power calculations Call: lm(formula = y ~ x) Simple regression with lm() Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x e-07 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 38 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 38 DF, p-value: 7.955e-07 R syntax

3 Beyond power calculations > retrodesign(1, 0.3697, df=38) $power [1] 0.

58 3 Beyond power calculations > retrodesign(1, , df=38) $power [1] $types [1] e-05 $exaggeration [1] Design Analysis D = 1 True population effect R syntax

59 3 Beyond power calculations D = 1

3 Beyond power calculations > retrodesign(0.5, 0.3697, df=38) $power [1] 0.

60 3 Beyond power calculations > retrodesign(0.5, , df=38) $power [1] $types [1] Design Analysis D = 0.5 True population effect $exaggeration [1] R syntax

61 3 Beyond power calculations D = 0.5

62 3 Beyond power calculations 5000 simulated samples with 20 observations each from a normal distribution with parameters = 0.5; s = 0.9 % of significant results ( 0) : 39.7 % of sample means > D(= ) : 32.3

63 3 Beyond power calculations Type S error as a function of Power Gelman & Carlin (2014), p. 644

64 3 Beyond power calculations Exaggeration ratio as a function of Power Gelman & Carlin (2014), p. 644

3 Beyond power calculations Practical implications: Design Analysis strongly suggest larger sample sizes than those that are commonly used in psychology.

65 3 Beyond power calculations Practical implications: Design Analysis strongly suggest larger sample sizes than those that are commonly used in psychology. In particular, if sample size is too small, in relation to the true effect size, then what appears to be a win (statistical significance) may really be a loss (in the form of a claim that does not replicate). For a more formal presentation of the DA approach see Gelman A. & Tuerlinckx F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational Statistics, 15,

66 4 Fake data analysis

SGR can be used to quantify uncertainty in inferences based on possible fake data as well as to evaluate the implications of fake data for

67 4 Fake data analysis: the SGR approach SGR = Sample Generation by Replacement (Lombardi & Pastore, 2012; Pastore & Lombardi, 2014, Lombardi & Pastore, 2014; Lombardi et al., 2015) SGR is a data simulation procedure that allows to generate artificial samples of fake discrete/ordinal data. SGR can be used to quantify uncertainty in inferences based on possible fake data as well as to evaluate the implications of fake data for statistical results. For example, how sensitive are the results to possible fake data? Are the conclusions still valid under one or more scenarios of faking manipulations?

68 4 Fake data analysis: the SGR approach Some «examples»

69 4 Fake data analysis: the SGR approach Some «examples»

70 4 Fake data analysis: the SGR approach The SGR logic

71 4 Fake data analysis: the SGR approach The SGR logic This is usually not directly observable This is observable Information (data)

72 Original value d 4 Fake data analysis: the SGR approach The replacement distribution Replaced value f

73 4 Fake data analysis: the SGR approach Other examples of replacement distribution

74 4 Fake data analysis: the SGR approach Other examples of replacement distribution

75 4 Fake data analysis: the SGR approach The sgr package (The R Journal, 6(1), )

76 4 Fake data analysis: the SGR approach The sgr package (The R Journal, 6(1), ) sgr package is available on the CRAN repository

77 Spearman correlation 4 Fake data analysis: the SGR approach Effect of faking on two items that are originally not correlated (n=50) Proportion of subjects with fake responses (faking-good type)

78 Spearman correlation 4 Fake data analysis: the SGR approach Effect of faking on two items that are originally not correlated (n=100) Proportion of subjects with fake responses (faking-good type)

79 4 Fake data analysis: the SGR approach SGR allows to test and compare different fake data models Fake data hypotheses

80 4 Fake data analysis: the SGR approach SGR allows to test and compare different fake data models

81 4 Fake data analysis: the SGR approach SGR allows to test and compare different fake data models also more complex factorial models

82 4 Fake data analysis: the SGR approach SGR allows to test and compare different fake data models to evaluate the effect on g.o.f. statistics

83 Thank you for your attention! visit the WS website at

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation