# BIOS 312: Precision of Statistical Inference

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013

2 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

3 and Power/Sample Size and Standard Errors Bias and Goal of statistical inference is to estimate parameters accurately (unbiased) and with high precision Measures of precision Standard error (not standard deviation) Width of confidence intervals Power (equivalently, type II error rate)

4 and Power/Sample Size and Standard Errors Summary measures Scientific hypotheses are typically refined in statistical hypotheses by identifying some parameter, θ, measuring differences in the distribution of the response variable Often we are interested in if θ differs across of levels of categorical (e.g. treatment/control) or continuous (e.g. age) predictor variables θ could be any summary measure such as Difference/ratio of means Difference/ratio of medians Ratio of geometric means Difference/ratio of proportions Odds ratio, relative risk, risk difference Hazard ratio

5 and Power/Sample Size and Standard Errors Choosing summary measure How to select θ? In order of importance... 1 Scientific (clinical) importance. May be based on current state of knowledge 2 Is θ likely to vary across the predictor of interest? Impacts the ability to detect a difference, if it exists. 3 Statistical precision. Only relevant if all other factors are equal.

6 and Power/Sample Size and Standard Errors Statistical inference Statistics is concerned with making inference about population parameters, (θ), based on a sample of data Frequentist estimation includes both point estimates (ˆθ) and interval estimates (confidence intervals) Bayesian analysis estimates the posterior distribution of θ given the sampled data, p(θ data). The posterior distribution can then be summarized by quantities like the posterior mean and 95% credible interval. Likelihood analysis focuses on using the likelihood function to obtain maximum likelihood estimates. The likelihood function can be used directly to obtain upper and lower confidence-type intervals for estimates.

7 and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A [-129, 69] A [-49.6, -10.4] B [-85, 45] B [-8.5, 4.5] C [-9.9, -2.1] 0.002

8 and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A [-129, 69] A [-49.6, -10.4] B [-85, 45] B [-8.5, 4.5] C [-9.9, -2.1] Which drug is effective at reducing cholesterol? Why is study 4 more informative than study 3 (even though the p values are similar)?

9 and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A [-129, 69] A [-49.6, -10.4] B [-85, 45] B [-8.5, 4.5] C [-9.9, -2.1] Which drug is effective at reducing cholesterol? Why is study 4 more informative than study 3 (even though the p values are similar)? Moral: Hypothesis tests and p-values can often be insufficient to make proper decisions. The confidence interval provides more useful information.

10 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

11 and Power/Sample Size and Standard Errors Sampling distribution defined The sampling distribution is the probability distribution of a statistic ( ) e.g. the sampling distribution of the sample mean is N µ, σ2 n Most often we choose estimators that are asymptotically Normally distributed For large n, ˆθ N ( ) θ, V n ˆθ is our estimate of θ. Theˆindicates it is an estimate. Mean: θ Variance: V, which is related to the average amount of statistical information available from each observation Often V depends on θ Large n depends on the distribution of the underlying data. If n is large enough, approximate Normality of ˆθ will hold.

12 and Power/Sample Size and Standard Errors Confidence intervals when n is large Calculating 100(1 α)% confidence intervals (θ L, θ U ) with approximate Normality θ L = ˆθ V Z 1 α/2 n V θ U = ˆθ + Z 1 α/2 n (estimate) ± (crit val) (std err of estimate) Can similarly calculate approximate two-sided p-values Z = (estimate) (hyp value) (std err of estimate) p-value in Stata: 2 norm p-value in R: use the pnorm() function ( ( )) abs (estimate) (hyp value) (std err of estimate)

13 and Power/Sample Size and Standard Errors Comparing independent estimates If estimates are independent and Normally distributed ˆθ 1 N ( ) θ 1, se1 2 and ˆθ2 N ( ) θ 2, se2 2 Then, ˆθ 1 ˆθ 2 N ( ) θ 1 θ 2, se1 2 + se2 2 ˆθ 1 + ˆθ 2 N ( ) θ 1 + θ 2, se1 2 + se2 ( ) 2 ˆθ 1 N θ1 ˆθ 2 θ 2, se1 2 + θ2 1 se 2 θ2 2 2

14 and Power/Sample Size and Standard Errors Comparing correlated estimates If estimate are correlated and Normally distributed ˆθ 1 N ( ) θ 1, se1 2 and ˆθ 2 N ( ) θ 2, se2 2 ρ = corr(ˆθ 1, ˆθ 2) Then, ˆθ 1 ˆθ 2 N ( ) θ 1 θ 2, se1 2 + se2 2 2 ρ se 1 se 2 ˆθ 1 + ˆθ 2 N ( ) θ 1 + θ 2, se1 2 + se ρ se 1 se 2 Example: Comparing results from the same study Paper may not give the interesting results (from your point of view) Comparison can be difficult because correlation usually not reported

15 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

16 and Power/Sample Size and Standard Errors Classical hypothesis testing Classical hypothesis testing is stated in terms of the null hypothesis (H 0 ). The alternative hypothesis (H 1 ) is the complement of H 0 Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 0 One sided: H 0 : θ θ 0 vs H 1 : θ < θ 0 One sided: H 0 : θ θ 0 vs H 1 : θ > θ 0 Inference is based on either rejecting or failing to reject the null hypothesis Typically, the null hypothesis is stated in some form so as to indicate no association

17 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true

18 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened

19 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0

20 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true?

21 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise

22 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise Evidence summarized through a single statistic capturing a tendency of data, e.g., x

23 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise Evidence summarized through a single statistic capturing a tendency of data, e.g., x Look at probability of getting a statistic as or more extreme than the calculated one (results as or more impressive than ours) if H 0 is true (the P-value)

24 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event

25 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it

26 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it A failure to reject does not imply that we have gathered evidence in favor of H 0 many reasons for studies to not be impressive, including small sample size (n)

27 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it A failure to reject does not imply that we have gathered evidence in favor of H 0 many reasons for studies to not be impressive, including small sample size (n) Key Limitation Classical hypothesis ignores clinical significance. An approach that allows us to make informed decisions is preferential.

28 and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative

29 and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative Summarize the design alternative through θ 1 (θ 1 > 0) Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 1 or θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1

30 and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative Summarize the design alternative through θ 1 (θ 1 > 0) Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 1 or θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 Using the decision theoretic approach, can conclude Reject Null Hypothesis. Data is atypical of what we would expect if the null hypothesis is true Reject Alternative Hypothesis. Data is atypical of what we would expect if the alternative hypothesis is true

31 and Power/Sample Size and Standard Errors Decision theoretic approach cont. Key difference from classical approach The design alternative (θ 1 ) is ideally chosen to be the minimal important difference to detect based on scientific or clinical criteria. Clinical significance: In the cholesterol example, the important difference was assumed to be 10 mg/dl Economic impact: A new drug is not marketable unless it has a large effect Feasibility of study: Limited availability of subjects may limit investigators to searching for interventions with large impact

32 and Power/Sample Size and Standard Errors Decision theoretic approach cont. Key difference from classical approach The design alternative (θ 1 ) is ideally chosen to be the minimal important difference to detect based on scientific or clinical criteria. Clinical significance: In the cholesterol example, the important difference was assumed to be 10 mg/dl Economic impact: A new drug is not marketable unless it has a large effect Feasibility of study: Limited availability of subjects may limit investigators to searching for interventions with large impact Remember the cholesterol example. Studies 2, 4, and 5 follow the decision theoretic approach because they allow us to discriminate between scientifically meaningful hypotheses.

33 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

34 and Power/Sample Size and Standard Errors Measures of high precision What are the measures of (high) precision? Estimators are less variable across studies, which is often measured by decreased standard error. Narrower confidence intervals. Estimators are consistent with fewer hypotheses if the CIs are narrow. Able to reject false hypotheses. Z statistic is higher when the alternative hypothesis is true.

35 and Power/Sample Size and Standard Errors Measures of high precision What are the measures of (high) precision? Estimators are less variable across studies, which is often measured by decreased standard error. Narrower confidence intervals. Estimators are consistent with fewer hypotheses if the CIs are narrow. Able to reject false hypotheses. Z statistic is higher when the alternative hypothesis is true. Translation into sample size Based on the width of the confidence interval Choose a sample size such that a 95% CI will not contain both the null and design alternative If both θ 0 and θ 1 cannot be in the CI, we have discriminated between those hypotheses Based on statistical power When the alternative is true, have a high probability of rejecting the null In other words, minimize the type II error rate

36 and Power/Sample Size and Standard Errors Statistical power: quick review Power is the probability of rejecting the null hypothesis when the alternative is true Pr(reject H 0 θ = θ 1) Most often ˆθ N ( ) θ, V n so that the test statistic Z = ˆθ θ 0 wll V /n follow a Normal distribution Under H 0, Z N(0, ( 1) so we ) reject H 0 if Z > Z 1 α/2 θ Under H 1, Z N 1 θ 0, 1 V /n

37 and Power/Sample Size and Standard Errors Statistical power: quick review Power is the probability of rejecting the null hypothesis when the alternative is true Pr(reject H 0 θ = θ 1) Most often ˆθ N ( ) θ, V n so that the test statistic Z = ˆθ θ 0 wll V /n follow a Normal distribution Under H 0, Z N(0, ( 1) so we ) reject H 0 if Z > Z 1 α/2 θ Under H 1, Z N 1 θ 0, 1 V /n Power curves The power function (power curve) is a function of the true value of θ We can compute power for every value of θ As θ moves away from θ 0, power increases (for two-sided alternatives) For any choice of desired power, there is always some θ such that the study has that power Pwr(θ 0 ) = α, the type I error rate

38 and Power/Sample Size and Standard Errors Power curves for a two-sample, equal variance, t-test; n=100 Power σ = 1 σ = True difference in means (theta)

39 and Power/Sample Size and Standard Errors Code for generating example power curve mydiffs <- seq(-0.8, 0.8, 0.05) mypower <- vector("numeric", length(mydiffs)) mypower2 <- vector("numeric", length(mydiffs)) for (i in 1:length(mydiffs)) { mypower[i] <- power.t.test(n = 100, sd = 1, delta = mydiffs[i])\$power mypower2[i] <- power.t.test(n = 100, sd = 1.2, delta = mydiffs[i])\$power } plot(mydiffs, mypower, xlab = "True difference in means (theta)", ylab = "Power", type = "l", main = "") lines(mydiffs, mypower2, lty = 2) legend("top", c(expression(sigma == 1), expression(sigma == 1.2)), lty = 1:2, inset = 0.05)

40 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

41 and Power/Sample Size and Standard Errors and standard errors Standard errors are the key to precision Greater precision is achieved with smaller standard errors Standard errors are decreased by either decreasing V or increasing n Typically: se(ˆθ) = V n Width of CI: 2 (crit value) se(ˆθ) Test statistic: Z = ˆθ θ 0 se( ˆθ)

42 and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n

43 and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n Note that we are not assuming a specific distribution for Y i, just that the distribution has a mean and variance We are assuming that n is large so asymptotic results are applicable Then the distribution Y i could be binary data, Poisson, exponential, normal, etc. and the results will hold

44 and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n Note that we are not assuming a specific distribution for Y i, just that the distribution has a mean and variance We are assuming that n is large so asymptotic results are applicable Then the distribution Y i could be binary data, Poisson, exponential, normal, etc. and the results will hold There are ways to decrease V including... Restrict sample by age, gender, etc. Take repeated measures on each subject, summarize, and perform test on summary measures Better ideas (this course): Adjust for age and gender; use all data while modeling correlation

45 and Power/Sample Size and Standard Errors Example: Two sample mean Difference of independent means Observations no longer identically distributed, just independent. Group 1 has a different mean and variance than group 2 ind Y ij (µ j, σ 2 j ), j = 1, 2; i = 1,..., n j n = n 1 + n 2; r = n 1/n 2 θ = µ 1 µ 2, ˆθ = Y 1 Y 2 V = (r + 1)( σ2 1 r + σ 2) 2 se(ˆθ) = = σ1 2 V n n 1 + σ2 2 n 2

46 and Power/Sample Size and Standard Errors Comments on the optimal ratio of sample sizes (r) If we are constrained by the maximal sample size n = n 1 + n 2 Smallest V when r = n 1 n 2 = σ 1 σ 2 In other words, smaller V if we sample more subjects from the more variable group If we are unconstrained by the maximal sample size, there is a point of diminishing returns Example: Case-control study where finding cases is difficult/expensive but finding controls is easy/cheap Often quoted r = 5

47 and Power/Sample Size and Standard Errors Optimal sample size ratio for fixed sample size Optimal r for Fixed (n1 + n2): r = s1 / s2 Standard Error r = 1 r = 3 r = 2 s1 = 3*s2 s1 = 2*s2 s1 = s Sample Size Ratio r = n1/n2

48 and Power/Sample Size and Standard Errors Diminishing returns for increase sample size ratio, r Diminishing returns for r > 5 Standard Error s1 = 3*s2 s1 2*s2 s1 = s Sample Size Ratio r = n1/n2

49 and Power/Sample Size and Standard Errors Code for optimal sample size ratio for fixed sample size var.fn <- function(r, s1, s2) { (r + 1) * (s1^2/r + s2^2) } n <- 100 s2 <- 10 plot(function(r) sqrt(var.fn(r, s1 = s2, s2 = s2)/n), 0, 20, ylim = c(1, 6), xlim = c(0, 25), ylab = "Standard Error", xlab = "Sample Size Ratio r = n1/n2", main = "Optimal r for Fixed (n1 + n2): r = s1 / s2") plot(function(r) sqrt(var.fn(r, s1 = 2 * s2, s2 = s2)/n), 0, 20, add = TRUE, lty = 2) plot(function(r) sqrt(var.fn(r, s1 = 3 * s2, s2 = s2)/n), 0, 20, add = TRUE, lty = 3) text(20, 4.7, "s1 = s2", pos = 4) text(20, 5.1, "s1 = 2*s2", pos = 4) text(20, 5.5, "s1 = 3*s2", pos = 4) points(c(1, 2, 3), sqrt(var.fn(c(1, 2, 3), s1 = c(1, 2, 3) * s2, s2 = s2)/n), pch = 2) text(1, 1.8, "r = 1") text(2, 2.8, "r = 2") text(3, 3.8, "r = 3")

50 and Power/Sample Size and Standard Errors Code for diminishing returns for increase sample size ratio n1 <- 200 plot(function(r) sqrt(var.fn(r, s1 = s2, s2 = s2)/(n1 + r * n1)), 0, 20, ylim = c(0.5, 3), xlim = c(0, 25), ylab = "Standard Error", xlab = "Sample Size Ratio r = n1/n2", main = "Diminishing returns for r > 5") plot(function(r) sqrt(var.fn(r, s1 = 2 * s2, s2 = s2)/(n1 + r * n1)), 0, 20, add = TRUE, lty = 2) plot(function(r) sqrt(var.fn(r, s1 = 3 * s2, s2 = s2)/(n1 + r * n1)), 0, 20, add = TRUE, lty = 3) text(20, 0.7, "s1 = s2", pos = 4) text(20, 0.8, "s1 = 2*s2", pos = 4) text(20, 0.9, "s1 = 3*s2", pos = 4)

51 and Power/Sample Size and Standard Errors Example: Paired means Difference of paired means No longer iid. Group 1 has a different mean and variance than group 2, and observations are paired (correlated) Y ij (µ j, σj 2 ), j = 1, 2; i = 1,..., n corr(y i1, Y i2 ) = ρ; corr(y ij, Y mk ) = 0 if i m θ = µ 1 µ 2, ˆθ = Y 1 Y 2 V = σ1 2 + σ2 2 2ρσ 1σ 2 se(ˆθ) = V n gains are made when matched observations are positively correlated (ρ > 0) Usually the case, but possible exceptions Sleep on successive nights Intrauterine growth of litter-mates

52 and Power/Sample Size and Standard Errors Example: Clustered data Clustered data: Experiment where treatments/interventions are assigned based on the basis of Households, schools, clinics, cities, etc. Mean of clustered data Y ij (µ, σ 2 ), i = 1,..., n; j = 1,..., m Up to n clusters, each of which have m subjects corr(y ij, Y ik ) = ρ if j k corr(y ij, Y mk ) = 0 if i m θ = µ, ˆθ = 1 nm n i=1 ( ) V = σ 2 1+(m 1)ρ se(ˆθ) = V n m m Y ij = Y j=1

53 and Power/Sample Size and Standard Errors Example: Clustered data Clustered data: Experiment where treatments/interventions are assigned based on the basis of Households, schools, clinics, cities, etc. Mean of clustered data Y ij (µ, σ 2 ), i = 1,..., n; j = 1,..., m Up to n clusters, each of which have m subjects corr(y ij, Y ik ) = ρ if j k corr(y ij, Y mk ) = 0 if i m θ = µ, ˆθ = 1 nm n i=1 ( ) V = σ 2 1+(m 1)ρ V n m m Y ij = Y j=1 se(ˆθ) = What is V if... ρ = 0 (independent) m = 1 m is large (e.g m = 1000) and ρ is 0, 1, or 0.01

54 and Power/Sample Size and Standard Errors Clustered data cont. With clustered data, even small correlations can be very important to consider Equal precision achieved with Clusters (n) m ρ Total N

55 and Power/Sample Size and Standard Errors Clustered data cont. With clustered data, even small correlations can be very important to consider Equal precision achieved with Clusters (n) m ρ Total N Always consider practical issues. Is it easier/cheaper to collect 1 observation on 1000 different subjects, or 100 observations on 20 different subjects?

56 and Power/Sample Size and Standard Errors Example: Independent odds ratios Binary outcomes ind Y ij B(1, p j ), i = 1,..., n j ; j = 1, 2 n = n 1 + n 2; r = n 1/n 2 θ = log σ 2 j = ( p1 /(1 p 1 ) p 2 /(1 p 2 ) 1 = 1 p j (1 p j ) p j (q j ) V = (r + 1)( σ2 1 r + σ2) 2 se(ˆθ) = = 1 V n ) ; ˆθ = log n 1 p 1 q n 2 p 2 q 2 ( ) ˆp1 /(1 ˆp 1 ) ˆp 2 /(1 ˆp 2 ) Notes on maximum precision Max precision is achieved when the underlying odds are near 1 (proportions near 0.5) If we were considering differences in proportions, the max precision is achieved when the underlying proportions are near 0 or 1

57 and Power/Sample Size and Standard Errors Example: Hazard ratios Independent censored time to event outcomes (T ij, δ ij ), i = 1,..., n j ; j = 1, 2 n = n 1 + n 2; r = n 1/n 2 θ = log(hr); ˆθ = ˆβ from proportional hazards (PH) regression V = (r+1)(1/r+1) se(ˆθ) = Pr(δ ij =1) V n = (r+1)(1/r+1) d In the PH model, statistical information is roughly proportional to d, the number of observed events Papers always report the number of events Study design must consider how long it will take to observe events (e.g. deaths) starting from randomization

58 and Power/Sample Size and Standard Errors Example: Linear regression Independent continuous outcomes associated with covariates ind Y i X i (β 0 + β 1X i, σ 2 Y X ), i = 1,..., n θ = β 1, ˆθ = ˆβ 1 from LS regression V = σ2 Y X Var(X ) se(ˆθ) = ˆσ 2 Y X n ˆ Var(X ) tends to increases as the predictor (X ) is measured over a wider range also related to the within group variance σ 2 Y X What happens to the formulas when X is a binary variable? See two sample mean

59 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

60 Summary Overview and Power/Sample Size and Standard Errors Options for increasing precision Increase sample size Decrease V (Decrease confidence level) Criteria for precision Standard error Width of confidence intervals Statistical power Select a suitable design alternative Select desired power

61 Summary Overview and Power/Sample Size and Standard Errors Sample size calculation: The number of sampling units needed to obtain the desired precision Level of significance α when θ = θ 0 Power β when θ = θ 1 Variability V within one sampling unit n = (z 1 α/2 +z β ) 2 V (θ 1 θ 0 ) 2 When sample size is constrained (the usual case) either Compute power to detect a specified alternative ( ) (θ 1 β = φ 1 θ 0 ) z 1 α/2 V /n φ is the standard Normal cdf function In STATA, use normprob for the φ function Compute alternative that can be detected with high power θ 1 = θ 0 + (z 1 α/2 + z β ) V /n

62 and Power/Sample Size and Standard Errors General comments Sample size required behaves like the square of the width of the CI. To cut the width of the CI in half, need to quadruple the sample size. Positively correlated observations within the same group provide less precision than the same number of independent observations Positively correlated observations across groups provide more precision What power do you use? Most popular is 80% (too low) or 90% Key is to be able to discriminate between scientifically meaningful hypotheses

### Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

### Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj

### Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

### Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

### Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

### Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

### Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

### Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Hypothesis Testing ECE 3530 Spring 2010 Antonio Paiva What is hypothesis testing? A statistical hypothesis is an assertion or conjecture concerning one or more populations. To prove that a hypothesis is

### Statistical Simulation An Introduction

James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Simulation Through Bootstrapping Introduction 1 Introduction When We Don t Need Simulation

### Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider

### Section Comparing Two Proportions

Section 8.2 - Comparing Two Proportions Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Comparing Two Proportions Two-sample problems Want to compare the responses in two groups or treatments

### Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical

### Confidence Distribution

Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

### CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

### MTMS Mathematical Statistics

MTMS.01.099 Mathematical Statistics Lecture 12. Hypothesis testing. Power function. Approximation of Normal distribution and application to Binomial distribution Tõnu Kollo Fall 2016 Hypothesis Testing

### (4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

### DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

Section 0. Comparing Two Proportions Learning Objectives After this section, you should be able to DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence

### Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

### Inverse Sampling for McNemar s Test

International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test

### Correlation and regression

1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

### Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

### Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

### Power and Sample Size Bios 662

Power and Sample Size Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-10-31 14:06 BIOS 662 1 Power and Sample Size Outline Introduction One sample: continuous

### Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

### Inference for Binomial Parameters

Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

### Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Two examples of the use of fuzzy set theory in statistics Glen Meeden University of Minnesota http://www.stat.umn.edu/~glen/talks 1 Fuzzy set theory Fuzzy set theory was introduced by Zadeh in (1965) as

### Chapter 1 Statistical Inference

Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

### Model Estimation Example

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

### Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

### Introduction to bivariate analysis

Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

### Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

### ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

### Introduction to bivariate analysis

Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

### Lecture 3: Inference in SLR

Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

### Two Sample Problems. Two sample problems

Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent

### Confidence Intervals with σ unknown

STAT 141 Confidence Intervals and Hypothesis Testing 10/26/04 Today (Chapter 7): CI with σ unknown, t-distribution CI for proportions Two sample CI with σ known or unknown Hypothesis Testing, z-test Confidence

### Tutorial 2: Power and Sample Size for the Paired Sample t-test

Tutorial 2: Power and Sample Size for the Paired Sample t-test Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,

### Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

### BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

### 10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

### Generalized Linear Modeling - Logistic Regression

1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

### 18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

### Model comparison and selection

BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

### Chapter 3 ANALYSIS OF RESPONSE PROFILES

Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over

### Solution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6]

Name: SOLUTIONS Midterm (take home version) To help you budget your time, questions are marked with *s. One * indicates a straight forward question testing foundational knowledge. Two ** indicate a more

### Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

### The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

### Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

### HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

### PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

PubH 5450 Biostatistics I Prof. Carlin Lecture 13 Outline Outline Sample Size Counts, Rates and Proportions Part I Sample Size Type I Error and Power Type I error rate: probability of rejecting the null

### 10.1. Comparing Two Proportions. Section 10.1

/6/04 0. Comparing Two Proportions Sectio0. Comparing Two Proportions After this section, you should be able to DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET

### Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical

### Contents 1. Contents

Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

### ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 Winter 2012 Lecture 13 (Winter 2011) Estimation Lecture 13 1 / 33 Review of Main Concepts Sampling Distribution of Sample Mean

### Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

### Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

### STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

### Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 8 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 22, 2007 1 2 3 4 5 6 1 Define convergent series 2 Define the Law of Large Numbers

### Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

### MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

### Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

### 1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

### ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

### 2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2015-07-24 Case control example We analyze

### Parametric Techniques

Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

### University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

### LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

### Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive

### Bayesian Inference for Normal Mean

Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

### Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

### Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

### ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

### Describing Contingency tables

Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

### General Linear Model: Statistical Inference

Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

### STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name:

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14 Your Name: Please make sure to specify all of your notations in each problem GOOD LUCK! 1 Problem# 1. Consider the following model, y i =

### STA6938-Logistic Regression Model

Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

### Answers to Problem Set #4

Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

### Association studies and regression

Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

### Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

### 1; (f) H 0 : = 55 db, H 1 : < 55.

Reference: Chapter 8 of J. L. Devore s 8 th Edition By S. Maghsoodloo TESTING a STATISTICAL HYPOTHESIS A statistical hypothesis is an assumption about the frequency function(s) (i.e., pmf or pdf) of one

### Master s Written Examination - Solution

Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

### Passing-Bablok Regression for Method Comparison

Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

### Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

### ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

### SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000) AMITA K. MANATUNGA THE ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY SHANDE

### PASS Sample Size Software. Poisson Regression

Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis

### Linear Regression With Special Variables

Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

### Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

### Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

### Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 12 Review So far... We have discussed the role of phase III clinical trials in drug development

### 1 Comparing two binomials

BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

### Survival Regression Models

Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

### A brief introduction to mixed models

A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

### Package bpp. December 13, 2016

Type Package Package bpp December 13, 2016 Title Computations Around Bayesian Predictive Power Version 1.0.0 Date 2016-12-13 Author Kaspar Rufibach, Paul Jordan, Markus Abt Maintainer Kaspar Rufibach Depends

### Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

### Examples and Limits of the GLM

Examples and Limits of the GLM Chapter 1 1.1 Motivation 1 1.2 A Review of Basic Statistical Ideas 2 1.3 GLM Definition 4 1.4 GLM Examples 4 1.5 Student Goals 5 1.6 Homework Exercises 5 1.1 Motivation In

### The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen January 23-24, 2012 Page 1 Part I The Single Level Logit Model: A Review Motivating Example Imagine we are interested in voting

### Applied Regression Analysis

Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of