BIOS 312: Precision of Statistical Inference

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "BIOS 312: Precision of Statistical Inference"

Transcription

1 and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013

2 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

3 and Power/Sample Size and Standard Errors Bias and Goal of statistical inference is to estimate parameters accurately (unbiased) and with high precision Measures of precision Standard error (not standard deviation) Width of confidence intervals Power (equivalently, type II error rate)

4 and Power/Sample Size and Standard Errors Summary measures Scientific hypotheses are typically refined in statistical hypotheses by identifying some parameter, θ, measuring differences in the distribution of the response variable Often we are interested in if θ differs across of levels of categorical (e.g. treatment/control) or continuous (e.g. age) predictor variables θ could be any summary measure such as Difference/ratio of means Difference/ratio of medians Ratio of geometric means Difference/ratio of proportions Odds ratio, relative risk, risk difference Hazard ratio

5 and Power/Sample Size and Standard Errors Choosing summary measure How to select θ? In order of importance... 1 Scientific (clinical) importance. May be based on current state of knowledge 2 Is θ likely to vary across the predictor of interest? Impacts the ability to detect a difference, if it exists. 3 Statistical precision. Only relevant if all other factors are equal.

6 and Power/Sample Size and Standard Errors Statistical inference Statistics is concerned with making inference about population parameters, (θ), based on a sample of data Frequentist estimation includes both point estimates (ˆθ) and interval estimates (confidence intervals) Bayesian analysis estimates the posterior distribution of θ given the sampled data, p(θ data). The posterior distribution can then be summarized by quantities like the posterior mean and 95% credible interval. Likelihood analysis focuses on using the likelihood function to obtain maximum likelihood estimates. The likelihood function can be used directly to obtain upper and lower confidence-type intervals for estimates.

7 and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A [-129, 69] A [-49.6, -10.4] B [-85, 45] B [-8.5, 4.5] C [-9.9, -2.1] 0.002

8 and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A [-129, 69] A [-49.6, -10.4] B [-85, 45] B [-8.5, 4.5] C [-9.9, -2.1] Which drug is effective at reducing cholesterol? Why is study 4 more informative than study 3 (even though the p values are similar)?

9 and Power/Sample Size and Standard Errors example Consider the following results from 5 clinical trials of three drugs (A, B, C) designed to lower cholesterol compared to baseline. Assume a 10 unit drop in cholesterol (relative to baseline) is clinically meaningful. Trial Drug Pts Mean diff Std dev Std error 95% CI for diff p-value 1 A [-129, 69] A [-49.6, -10.4] B [-85, 45] B [-8.5, 4.5] C [-9.9, -2.1] Which drug is effective at reducing cholesterol? Why is study 4 more informative than study 3 (even though the p values are similar)? Moral: Hypothesis tests and p-values can often be insufficient to make proper decisions. The confidence interval provides more useful information.

10 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

11 and Power/Sample Size and Standard Errors Sampling distribution defined The sampling distribution is the probability distribution of a statistic ( ) e.g. the sampling distribution of the sample mean is N µ, σ2 n Most often we choose estimators that are asymptotically Normally distributed For large n, ˆθ N ( ) θ, V n ˆθ is our estimate of θ. Theˆindicates it is an estimate. Mean: θ Variance: V, which is related to the average amount of statistical information available from each observation Often V depends on θ Large n depends on the distribution of the underlying data. If n is large enough, approximate Normality of ˆθ will hold.

12 and Power/Sample Size and Standard Errors Confidence intervals when n is large Calculating 100(1 α)% confidence intervals (θ L, θ U ) with approximate Normality θ L = ˆθ V Z 1 α/2 n V θ U = ˆθ + Z 1 α/2 n (estimate) ± (crit val) (std err of estimate) Can similarly calculate approximate two-sided p-values Z = (estimate) (hyp value) (std err of estimate) p-value in Stata: 2 norm p-value in R: use the pnorm() function ( ( )) abs (estimate) (hyp value) (std err of estimate)

13 and Power/Sample Size and Standard Errors Comparing independent estimates If estimates are independent and Normally distributed ˆθ 1 N ( ) θ 1, se1 2 and ˆθ2 N ( ) θ 2, se2 2 Then, ˆθ 1 ˆθ 2 N ( ) θ 1 θ 2, se1 2 + se2 2 ˆθ 1 + ˆθ 2 N ( ) θ 1 + θ 2, se1 2 + se2 ( ) 2 ˆθ 1 N θ1 ˆθ 2 θ 2, se1 2 + θ2 1 se 2 θ2 2 2

14 and Power/Sample Size and Standard Errors Comparing correlated estimates If estimate are correlated and Normally distributed ˆθ 1 N ( ) θ 1, se1 2 and ˆθ 2 N ( ) θ 2, se2 2 ρ = corr(ˆθ 1, ˆθ 2) Then, ˆθ 1 ˆθ 2 N ( ) θ 1 θ 2, se1 2 + se2 2 2 ρ se 1 se 2 ˆθ 1 + ˆθ 2 N ( ) θ 1 + θ 2, se1 2 + se ρ se 1 se 2 Example: Comparing results from the same study Paper may not give the interesting results (from your point of view) Comparison can be difficult because correlation usually not reported

15 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

16 and Power/Sample Size and Standard Errors Classical hypothesis testing Classical hypothesis testing is stated in terms of the null hypothesis (H 0 ). The alternative hypothesis (H 1 ) is the complement of H 0 Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 0 One sided: H 0 : θ θ 0 vs H 1 : θ < θ 0 One sided: H 0 : θ θ 0 vs H 1 : θ > θ 0 Inference is based on either rejecting or failing to reject the null hypothesis Typically, the null hypothesis is stated in some form so as to indicate no association

17 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true

18 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened

19 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0

20 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true?

21 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise

22 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise Evidence summarized through a single statistic capturing a tendency of data, e.g., x

23 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process Assumes H 0 is true Conceives of data as one of many datasets that might have happened See if data are consistent with H 0 Are data extreme or unlikely if H 0 is really true? Proof by contradiction: if assuming H 0 is true leads to results that are bizarre or unlikely to have been observed, casts doubt on premise Evidence summarized through a single statistic capturing a tendency of data, e.g., x Look at probability of getting a statistic as or more extreme than the calculated one (results as or more impressive than ours) if H 0 is true (the P-value)

24 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event

25 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it

26 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it A failure to reject does not imply that we have gathered evidence in favor of H 0 many reasons for studies to not be impressive, including small sample size (n)

27 and Power/Sample Size and Standard Errors Classical hypothesis testing thought process cont. If the statistic has a low probability of being observed to be this extreme we say that if H 0 is true we have acquired data that are very improbable, i.e., have witnessed a low probability event Then evidence mounts against H 0 and we might reject it A failure to reject does not imply that we have gathered evidence in favor of H 0 many reasons for studies to not be impressive, including small sample size (n) Key Limitation Classical hypothesis ignores clinical significance. An approach that allows us to make informed decisions is preferential.

28 and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative

29 and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative Summarize the design alternative through θ 1 (θ 1 > 0) Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 1 or θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1

30 and Power/Sample Size and Standard Errors Decision theoretic approach Stated in terms of the null hypothesis and suitable chosen design alternative Summarize the design alternative through θ 1 (θ 1 > 0) Two sided: H 0 : θ = θ 0 vs H 1 : θ θ 1 or θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 One sided: H 0 : θ θ 0 vs H 1 : θ θ 1 Using the decision theoretic approach, can conclude Reject Null Hypothesis. Data is atypical of what we would expect if the null hypothesis is true Reject Alternative Hypothesis. Data is atypical of what we would expect if the alternative hypothesis is true

31 and Power/Sample Size and Standard Errors Decision theoretic approach cont. Key difference from classical approach The design alternative (θ 1 ) is ideally chosen to be the minimal important difference to detect based on scientific or clinical criteria. Clinical significance: In the cholesterol example, the important difference was assumed to be 10 mg/dl Economic impact: A new drug is not marketable unless it has a large effect Feasibility of study: Limited availability of subjects may limit investigators to searching for interventions with large impact

32 and Power/Sample Size and Standard Errors Decision theoretic approach cont. Key difference from classical approach The design alternative (θ 1 ) is ideally chosen to be the minimal important difference to detect based on scientific or clinical criteria. Clinical significance: In the cholesterol example, the important difference was assumed to be 10 mg/dl Economic impact: A new drug is not marketable unless it has a large effect Feasibility of study: Limited availability of subjects may limit investigators to searching for interventions with large impact Remember the cholesterol example. Studies 2, 4, and 5 follow the decision theoretic approach because they allow us to discriminate between scientifically meaningful hypotheses.

33 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

34 and Power/Sample Size and Standard Errors Measures of high precision What are the measures of (high) precision? Estimators are less variable across studies, which is often measured by decreased standard error. Narrower confidence intervals. Estimators are consistent with fewer hypotheses if the CIs are narrow. Able to reject false hypotheses. Z statistic is higher when the alternative hypothesis is true.

35 and Power/Sample Size and Standard Errors Measures of high precision What are the measures of (high) precision? Estimators are less variable across studies, which is often measured by decreased standard error. Narrower confidence intervals. Estimators are consistent with fewer hypotheses if the CIs are narrow. Able to reject false hypotheses. Z statistic is higher when the alternative hypothesis is true. Translation into sample size Based on the width of the confidence interval Choose a sample size such that a 95% CI will not contain both the null and design alternative If both θ 0 and θ 1 cannot be in the CI, we have discriminated between those hypotheses Based on statistical power When the alternative is true, have a high probability of rejecting the null In other words, minimize the type II error rate

36 and Power/Sample Size and Standard Errors Statistical power: quick review Power is the probability of rejecting the null hypothesis when the alternative is true Pr(reject H 0 θ = θ 1) Most often ˆθ N ( ) θ, V n so that the test statistic Z = ˆθ θ 0 wll V /n follow a Normal distribution Under H 0, Z N(0, ( 1) so we ) reject H 0 if Z > Z 1 α/2 θ Under H 1, Z N 1 θ 0, 1 V /n

37 and Power/Sample Size and Standard Errors Statistical power: quick review Power is the probability of rejecting the null hypothesis when the alternative is true Pr(reject H 0 θ = θ 1) Most often ˆθ N ( ) θ, V n so that the test statistic Z = ˆθ θ 0 wll V /n follow a Normal distribution Under H 0, Z N(0, ( 1) so we ) reject H 0 if Z > Z 1 α/2 θ Under H 1, Z N 1 θ 0, 1 V /n Power curves The power function (power curve) is a function of the true value of θ We can compute power for every value of θ As θ moves away from θ 0, power increases (for two-sided alternatives) For any choice of desired power, there is always some θ such that the study has that power Pwr(θ 0 ) = α, the type I error rate

38 and Power/Sample Size and Standard Errors Power curves for a two-sample, equal variance, t-test; n=100 Power σ = 1 σ = True difference in means (theta)

39 and Power/Sample Size and Standard Errors Code for generating example power curve mydiffs <- seq(-0.8, 0.8, 0.05) mypower <- vector("numeric", length(mydiffs)) mypower2 <- vector("numeric", length(mydiffs)) for (i in 1:length(mydiffs)) { mypower[i] <- power.t.test(n = 100, sd = 1, delta = mydiffs[i])$power mypower2[i] <- power.t.test(n = 100, sd = 1.2, delta = mydiffs[i])$power } plot(mydiffs, mypower, xlab = "True difference in means (theta)", ylab = "Power", type = "l", main = "") lines(mydiffs, mypower2, lty = 2) legend("top", c(expression(sigma == 1), expression(sigma == 1.2)), lty = 1:2, inset = 0.05)

40 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

41 and Power/Sample Size and Standard Errors and standard errors Standard errors are the key to precision Greater precision is achieved with smaller standard errors Standard errors are decreased by either decreasing V or increasing n Typically: se(ˆθ) = V n Width of CI: 2 (crit value) se(ˆθ) Test statistic: Z = ˆθ θ 0 se( ˆθ)

42 and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n

43 and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n Note that we are not assuming a specific distribution for Y i, just that the distribution has a mean and variance We are assuming that n is large so asymptotic results are applicable Then the distribution Y i could be binary data, Poisson, exponential, normal, etc. and the results will hold

44 and Power/Sample Size and Standard Errors Example: One sample mean Observations are independent and identically distributed (iid) iid Y i (µ, σ 2 ), i = 1,..., n n θ = µ, ˆθ = 1 Y n i = Y i=1 V = σ 2, se(ˆθ) = σ 2 n Note that we are not assuming a specific distribution for Y i, just that the distribution has a mean and variance We are assuming that n is large so asymptotic results are applicable Then the distribution Y i could be binary data, Poisson, exponential, normal, etc. and the results will hold There are ways to decrease V including... Restrict sample by age, gender, etc. Take repeated measures on each subject, summarize, and perform test on summary measures Better ideas (this course): Adjust for age and gender; use all data while modeling correlation

45 and Power/Sample Size and Standard Errors Example: Two sample mean Difference of independent means Observations no longer identically distributed, just independent. Group 1 has a different mean and variance than group 2 ind Y ij (µ j, σ 2 j ), j = 1, 2; i = 1,..., n j n = n 1 + n 2; r = n 1/n 2 θ = µ 1 µ 2, ˆθ = Y 1 Y 2 V = (r + 1)( σ2 1 r + σ 2) 2 se(ˆθ) = = σ1 2 V n n 1 + σ2 2 n 2

46 and Power/Sample Size and Standard Errors Comments on the optimal ratio of sample sizes (r) If we are constrained by the maximal sample size n = n 1 + n 2 Smallest V when r = n 1 n 2 = σ 1 σ 2 In other words, smaller V if we sample more subjects from the more variable group If we are unconstrained by the maximal sample size, there is a point of diminishing returns Example: Case-control study where finding cases is difficult/expensive but finding controls is easy/cheap Often quoted r = 5

47 and Power/Sample Size and Standard Errors Optimal sample size ratio for fixed sample size Optimal r for Fixed (n1 + n2): r = s1 / s2 Standard Error r = 1 r = 3 r = 2 s1 = 3*s2 s1 = 2*s2 s1 = s Sample Size Ratio r = n1/n2

48 and Power/Sample Size and Standard Errors Diminishing returns for increase sample size ratio, r Diminishing returns for r > 5 Standard Error s1 = 3*s2 s1 2*s2 s1 = s Sample Size Ratio r = n1/n2

49 and Power/Sample Size and Standard Errors Code for optimal sample size ratio for fixed sample size var.fn <- function(r, s1, s2) { (r + 1) * (s1^2/r + s2^2) } n <- 100 s2 <- 10 plot(function(r) sqrt(var.fn(r, s1 = s2, s2 = s2)/n), 0, 20, ylim = c(1, 6), xlim = c(0, 25), ylab = "Standard Error", xlab = "Sample Size Ratio r = n1/n2", main = "Optimal r for Fixed (n1 + n2): r = s1 / s2") plot(function(r) sqrt(var.fn(r, s1 = 2 * s2, s2 = s2)/n), 0, 20, add = TRUE, lty = 2) plot(function(r) sqrt(var.fn(r, s1 = 3 * s2, s2 = s2)/n), 0, 20, add = TRUE, lty = 3) text(20, 4.7, "s1 = s2", pos = 4) text(20, 5.1, "s1 = 2*s2", pos = 4) text(20, 5.5, "s1 = 3*s2", pos = 4) points(c(1, 2, 3), sqrt(var.fn(c(1, 2, 3), s1 = c(1, 2, 3) * s2, s2 = s2)/n), pch = 2) text(1, 1.8, "r = 1") text(2, 2.8, "r = 2") text(3, 3.8, "r = 3")

50 and Power/Sample Size and Standard Errors Code for diminishing returns for increase sample size ratio n1 <- 200 plot(function(r) sqrt(var.fn(r, s1 = s2, s2 = s2)/(n1 + r * n1)), 0, 20, ylim = c(0.5, 3), xlim = c(0, 25), ylab = "Standard Error", xlab = "Sample Size Ratio r = n1/n2", main = "Diminishing returns for r > 5") plot(function(r) sqrt(var.fn(r, s1 = 2 * s2, s2 = s2)/(n1 + r * n1)), 0, 20, add = TRUE, lty = 2) plot(function(r) sqrt(var.fn(r, s1 = 3 * s2, s2 = s2)/(n1 + r * n1)), 0, 20, add = TRUE, lty = 3) text(20, 0.7, "s1 = s2", pos = 4) text(20, 0.8, "s1 = 2*s2", pos = 4) text(20, 0.9, "s1 = 3*s2", pos = 4)

51 and Power/Sample Size and Standard Errors Example: Paired means Difference of paired means No longer iid. Group 1 has a different mean and variance than group 2, and observations are paired (correlated) Y ij (µ j, σj 2 ), j = 1, 2; i = 1,..., n corr(y i1, Y i2 ) = ρ; corr(y ij, Y mk ) = 0 if i m θ = µ 1 µ 2, ˆθ = Y 1 Y 2 V = σ1 2 + σ2 2 2ρσ 1σ 2 se(ˆθ) = V n gains are made when matched observations are positively correlated (ρ > 0) Usually the case, but possible exceptions Sleep on successive nights Intrauterine growth of litter-mates

52 and Power/Sample Size and Standard Errors Example: Clustered data Clustered data: Experiment where treatments/interventions are assigned based on the basis of Households, schools, clinics, cities, etc. Mean of clustered data Y ij (µ, σ 2 ), i = 1,..., n; j = 1,..., m Up to n clusters, each of which have m subjects corr(y ij, Y ik ) = ρ if j k corr(y ij, Y mk ) = 0 if i m θ = µ, ˆθ = 1 nm n i=1 ( ) V = σ 2 1+(m 1)ρ se(ˆθ) = V n m m Y ij = Y j=1

53 and Power/Sample Size and Standard Errors Example: Clustered data Clustered data: Experiment where treatments/interventions are assigned based on the basis of Households, schools, clinics, cities, etc. Mean of clustered data Y ij (µ, σ 2 ), i = 1,..., n; j = 1,..., m Up to n clusters, each of which have m subjects corr(y ij, Y ik ) = ρ if j k corr(y ij, Y mk ) = 0 if i m θ = µ, ˆθ = 1 nm n i=1 ( ) V = σ 2 1+(m 1)ρ V n m m Y ij = Y j=1 se(ˆθ) = What is V if... ρ = 0 (independent) m = 1 m is large (e.g m = 1000) and ρ is 0, 1, or 0.01

54 and Power/Sample Size and Standard Errors Clustered data cont. With clustered data, even small correlations can be very important to consider Equal precision achieved with Clusters (n) m ρ Total N

55 and Power/Sample Size and Standard Errors Clustered data cont. With clustered data, even small correlations can be very important to consider Equal precision achieved with Clusters (n) m ρ Total N Always consider practical issues. Is it easier/cheaper to collect 1 observation on 1000 different subjects, or 100 observations on 20 different subjects?

56 and Power/Sample Size and Standard Errors Example: Independent odds ratios Binary outcomes ind Y ij B(1, p j ), i = 1,..., n j ; j = 1, 2 n = n 1 + n 2; r = n 1/n 2 θ = log σ 2 j = ( p1 /(1 p 1 ) p 2 /(1 p 2 ) 1 = 1 p j (1 p j ) p j (q j ) V = (r + 1)( σ2 1 r + σ2) 2 se(ˆθ) = = 1 V n ) ; ˆθ = log n 1 p 1 q n 2 p 2 q 2 ( ) ˆp1 /(1 ˆp 1 ) ˆp 2 /(1 ˆp 2 ) Notes on maximum precision Max precision is achieved when the underlying odds are near 1 (proportions near 0.5) If we were considering differences in proportions, the max precision is achieved when the underlying proportions are near 0 or 1

57 and Power/Sample Size and Standard Errors Example: Hazard ratios Independent censored time to event outcomes (T ij, δ ij ), i = 1,..., n j ; j = 1, 2 n = n 1 + n 2; r = n 1/n 2 θ = log(hr); ˆθ = ˆβ from proportional hazards (PH) regression V = (r+1)(1/r+1) se(ˆθ) = Pr(δ ij =1) V n = (r+1)(1/r+1) d In the PH model, statistical information is roughly proportional to d, the number of observed events Papers always report the number of events Study design must consider how long it will take to observe events (e.g. deaths) starting from randomization

58 and Power/Sample Size and Standard Errors Example: Linear regression Independent continuous outcomes associated with covariates ind Y i X i (β 0 + β 1X i, σ 2 Y X ), i = 1,..., n θ = β 1, ˆθ = ˆβ 1 from LS regression V = σ2 Y X Var(X ) se(ˆθ) = ˆσ 2 Y X n ˆ Var(X ) tends to increases as the predictor (X ) is measured over a wider range also related to the within group variance σ 2 Y X What happens to the formulas when X is a binary variable? See two sample mean

59 Outline Overview and Power/Sample Size and Standard Errors 1 Overview and Power/Sample Size 5 and Standard Errors 6

60 Summary Overview and Power/Sample Size and Standard Errors Options for increasing precision Increase sample size Decrease V (Decrease confidence level) Criteria for precision Standard error Width of confidence intervals Statistical power Select a suitable design alternative Select desired power

61 Summary Overview and Power/Sample Size and Standard Errors Sample size calculation: The number of sampling units needed to obtain the desired precision Level of significance α when θ = θ 0 Power β when θ = θ 1 Variability V within one sampling unit n = (z 1 α/2 +z β ) 2 V (θ 1 θ 0 ) 2 When sample size is constrained (the usual case) either Compute power to detect a specified alternative ( ) (θ 1 β = φ 1 θ 0 ) z 1 α/2 V /n φ is the standard Normal cdf function In STATA, use normprob for the φ function Compute alternative that can be detected with high power θ 1 = θ 0 + (z 1 α/2 + z β ) V /n

62 and Power/Sample Size and Standard Errors General comments Sample size required behaves like the square of the width of the CI. To cut the width of the CI in half, need to quadruple the sample size. Positively correlated observations within the same group provide less precision than the same number of independent observations Positively correlated observations across groups provide more precision What power do you use? Most popular is 80% (too low) or 90% Key is to be able to discriminate between scientifically meaningful hypotheses

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Hypothesis Testing. ECE 3530 Spring Antonio Paiva Hypothesis Testing ECE 3530 Spring 2010 Antonio Paiva What is hypothesis testing? A statistical hypothesis is an assertion or conjecture concerning one or more populations. To prove that a hypothesis is

More information

Statistical Simulation An Introduction

Statistical Simulation An Introduction James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Simulation Through Bootstrapping Introduction 1 Introduction When We Don t Need Simulation

More information

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider

More information

Section Comparing Two Proportions

Section Comparing Two Proportions Section 8.2 - Comparing Two Proportions Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Comparing Two Proportions Two-sample problems Want to compare the responses in two groups or treatments

More information

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

MTMS Mathematical Statistics

MTMS Mathematical Statistics MTMS.01.099 Mathematical Statistics Lecture 12. Hypothesis testing. Power function. Approximation of Normal distribution and application to Binomial distribution Tõnu Kollo Fall 2016 Hypothesis Testing

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions. Section 0. Comparing Two Proportions Learning Objectives After this section, you should be able to DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence

More information

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

More information

Inverse Sampling for McNemar s Test

Inverse Sampling for McNemar s Test International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Power and Sample Size Bios 662

Power and Sample Size Bios 662 Power and Sample Size Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-10-31 14:06 BIOS 662 1 Power and Sample Size Outline Introduction One sample: continuous

More information

Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota. Two examples of the use of fuzzy set theory in statistics Glen Meeden University of Minnesota http://www.stat.umn.edu/~glen/talks 1 Fuzzy set theory Fuzzy set theory was introduced by Zadeh in (1965) as

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Two Sample Problems. Two sample problems

Two Sample Problems. Two sample problems Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent

More information

Confidence Intervals with σ unknown

Confidence Intervals with σ unknown STAT 141 Confidence Intervals and Hypothesis Testing 10/26/04 Today (Chapter 7): CI with σ unknown, t-distribution CI for proportions Two sample CI with σ known or unknown Hypothesis Testing, z-test Confidence

More information

Tutorial 2: Power and Sample Size for the Paired Sample t-test

Tutorial 2: Power and Sample Size for the Paired Sample t-test Tutorial 2: Power and Sample Size for the Paired Sample t-test Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Chapter 3 ANALYSIS OF RESPONSE PROFILES Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over

More information

Solution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6]

Solution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6] Name: SOLUTIONS Midterm (take home version) To help you budget your time, questions are marked with *s. One * indicates a straight forward question testing foundational knowledge. Two ** indicate a more

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13 PubH 5450 Biostatistics I Prof. Carlin Lecture 13 Outline Outline Sample Size Counts, Rates and Proportions Part I Sample Size Type I Error and Power Type I error rate: probability of rejecting the null

More information

10.1. Comparing Two Proportions. Section 10.1

10.1. Comparing Two Proportions. Section 10.1 /6/04 0. Comparing Two Proportions Sectio0. Comparing Two Proportions After this section, you should be able to DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET

More information

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12 Winter 2012 Lecture 13 (Winter 2011) Estimation Lecture 13 1 / 33 Review of Main Concepts Sampling Distribution of Sample Mean

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. Lecture 8 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 22, 2007 1 2 3 4 5 6 1 Define convergent series 2 Define the Law of Large Numbers

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling 2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics, University of Washington 2015-07-24 Case control example We analyze

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!!  (preferred!)!! Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name:

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name: STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14 Your Name: Please make sure to specify all of your notations in each problem GOOD LUCK! 1 Problem# 1. Consider the following model, y i =

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

1; (f) H 0 : = 55 db, H 1 : < 55.

1; (f) H 0 : = 55 db, H 1 : < 55. Reference: Chapter 8 of J. L. Devore s 8 th Edition By S. Maghsoodloo TESTING a STATISTICAL HYPOTHESIS A statistical hypothesis is an assumption about the frequency function(s) (i.e., pmf or pdf) of one

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000) SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000) AMITA K. MANATUNGA THE ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY SHANDE

More information

PASS Sample Size Software. Poisson Regression

PASS Sample Size Software. Poisson Regression Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 12 Review So far... We have discussed the role of phase III clinical trials in drug development

More information

1 Comparing two binomials

1 Comparing two binomials BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Package bpp. December 13, 2016

Package bpp. December 13, 2016 Type Package Package bpp December 13, 2016 Title Computations Around Bayesian Predictive Power Version 1.0.0 Date 2016-12-13 Author Kaspar Rufibach, Paul Jordan, Markus Abt Maintainer Kaspar Rufibach Depends

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Examples and Limits of the GLM

Examples and Limits of the GLM Examples and Limits of the GLM Chapter 1 1.1 Motivation 1 1.2 A Review of Basic Statistical Ideas 2 1.3 GLM Definition 4 1.4 GLM Examples 4 1.5 Student Goals 5 1.6 Homework Exercises 5 1.1 Motivation In

More information

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen January 23-24, 2012 Page 1 Part I The Single Level Logit Model: A Review Motivating Example Imagine we are interested in voting

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information