Accounting for Baseline Observations in Randomized Clinical Trials
|
|
- Debra Murphy
- 6 years ago
- Views:
Transcription
1 Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical trials investigating the treatment of chronic, progressive diseases, a common design is to make some physiologic measurement of the disease state at the start of the trial, randomize patients to either the experimental treatment or some control treatment (placebo), follow the patients over some period of time, and then measure once again the physiologic measurement of the disease state The question then arises: How should the baseline measurements made prior to randomization be used? In this manuscript, I consider several common strategies that might be used in this setting, including analyzing only the follow-up measurement, analyzing the change in measurements, and the analysis of covariance (ANCOVA) approach of adjusting for the baseline measurement in a linear regression model I illustrate several different ways to motivate the use of ANCOVA, as well as examining the settings in which not measuring baseline might be preferred Introduction In clinical trials investigating the treatment of chronic, progressive diseases, a common design is to make some physiologic measurement of the disease state at the start of the trial, randomize patients to either the experimental treatment or some control treatment (placebo), follow the patients over some period of time, and then measure once again the physiologic measurement of the disease state Take for instance a doubleblind, placebo controlled, randomized clinical trial (RCT) of a new drug in the setting of hypertension Upon recruiting a patient into the RCT, we would typically measure the patient s systolic blood pressure (SBP) for the purposes of determining eligibility for the trial We might further decide to stratify randomization of patients on SBP in order to ensure balance across treatment arms with respect to severity of hypertension Then after treating the patients with either the experimental treatment or placebo for the period of, say, a year, we again measure the SBP of all patients, and analyze the data to decide whether the experimental treatment is associated with a scientifically important and statistically significant lower SBP when compared to placebo The question is: How do we use the baseline (pre-randomization) measurement of SBP in the data analysis? Owing to the randomized nature of the study, there are several choices that will each estimate the causal effect of the treatment in an unbiased fashion: Ignore baseline and analyze the final measurements By virtue of randomization, the distribution of SBP at the start of the study is equal in each treatment arm Hence, any differences in the distribution of SBP at the end of the trial is directly attributable to the effect of the experimental treatment Analyze the change in measurements over the course of the study For each patient, we compute the difference between the final SBP and their baseline SBP, and then compare the treatment arms with respect to the change in measurements
2 Accounting for Baseline Measurements in RCT Emerson, Page 3 Adjust for the baseline measurement as a covariate in a linear regression model of final measurement for each patient by treatment arm 4 Adjust for the baseline measurement as a covariate in a linear regression model of the change in measures for each patient by treatment arm In my experience, the second of these methods is the one most commonly advocated by clinical investigators It seems only natural to them: we are interested in an intervention that would either change the pathophysiologic process for the better or (at least) lessen the change in the pathophysiologic process for the worse In this manuscript, I explore the settings in which one of the approaches might be preferable to the others In this exploration, I illustrate the seemingly paradoxical result that suggests that in RCT that second method (analyzing the change in measurements) is never an optimal choice, while there are situations that any of the other three is optimal It should be noted that the results presented herein are specific to RCT, because we make extensive use of the fact that the distribution of baseline measurements is known to be the same in both treatment groups Notation in the Homoscedastic Setting Let Y itj be the SBP measurement in the ith treatment group (i = for experimental treatment, i = 0 for placebo) at time t (t = 0 for the baseline (pre-randomization) measurement, t = for the end of treatment measurement) in the jth patient (j =,, n i ) Suppose Y itj (µ it, σ ), meaning that for a population receiving treatment i, the average measurement pre-randomization would be µ i0 with a standard deviation of σ, and the average measurement post-treatment would be µ i with a standard deviation of σ We presume that individuals are independent, and that repeat measurements made on the same subject in the ith group have correlation ρ Thus corr(y ij, Y i0j ) = ρ, corr(y itj, Y it j ) = 0 for j j, and corr(y itj, Y i t j ) = 0 for i i (This homoscedastic model presumes not only that the treatment does not affect the variability of the measurements, but also that the treatment does not affect the correlation of the baseline and follow-up measurements) We presume a randomized study, so we have µ 0 = µ 00 θ = µ µ 0 We are ultimately interested in estimating In order to derive general results, we consider the distribution of linear combinations of the form ij = Y ij a i Y i0j Using simple properties of expectation and covariances, we find E ij = E Y ij a i Y i0j = µ i a i µ i0 V ar ( ij ) = V ar (Y ij a i Y i0j ) = σ + a i σ a i ρσ = ( a i ρ + a ) i σ so letting we can derive moments i = n i n i ij i= E i = µi a i µ i0 V ar ( ) ai ρ + a i = i σ n i
3 Accounting for Baseline Measurements in RCT Emerson, Page 3 Now, for a specified value of a, we define ˆθ (a) = 0, and find distribution moments E ˆθ(a) = (µ a µ 0 ) (µ 0 a 0 µ 00 ) = θ (a µ 0 a 0 µ 00 ) = θ (a a 0 )µ 00 and ˆθ is unbiased for θ for arbitrary µ 00 if and only if a = a 0 = a Now we can find distribution variance ˆθ(a) V ar = ( aρ + a ) ( σ + ) n n 0 Special cases of interest are Ignore baseline and analyze the final measurements This corresponds to a = 0, in which case V ar(ˆθ (0) ) = σ ( n + n 0 ) Analyze the change in measurements over the course of the study This corresponds to a =, in which case ( V ar(ˆθ () ) = σ ( ρ) + ) n n 0 By differentiating the above expression with respect to a, we can find the form of ˆθ with minimal variability as having a = ρ, in which case V ar(ˆθ (ρ) ) = σ ( ρ ) ( + ) n n 0 Note that For ρ < 05, V ar(ˆθ (0) ) < V ar(ˆθ () ), and throwing away the baseline value is more efficient than comparing the change in measurements across treatment arms For ρ = 05, V ar(ˆθ (0) ) = V ar(ˆθ () ), and throwing away the baseline value is just as efficient as comparing the change in measurements across treatment arms (This is the only point of equality between these two estimators) 3 For ρ > 05, V ar(ˆθ (0) ) > V ar(ˆθ () ), and throwing away the baseline value is less efficient than comparing the change in measurements across treatment arms 4 For all ρ, V ar(ˆθ (ρ) ) V ar(ˆθ (0) ) and V ar(ˆθ (ρ) ) V ar(ˆθ () ) 5 For ρ = 0, V ar(ˆθ (0) ) = V ar(ˆθ (ρ) ) (This is the only point of equality between these two estimators) 6 For ρ =, V ar(ˆθ () ) = V ar(ˆθ (ρ) ) (This is the only point of equality between these two estimators) Note that the third observation above suggests that when ρ is known, the use of ˆθ (ρ) is the best linear estimator, as it is always at least as good as either of the other two approaches based on a = 0 or a =
4 Accounting for Baseline Measurements in RCT Emerson, Page 4 3 An Alternative Derivation When Measurements are Normally Distributed Consider the same homoscedastic setting as in section with σ and ρ known constants, but now suppose that it is known that vector (Y i0j, Y ij ) are bivariate normal We can then write the density for our data as a function of θ = (µ 00, µ 0, µ ) as f( Y ; θ) n i = πσ ( ρ i=0 j= ) exp (Y i0j µ 00 ) ρ(y i0j µ 00 )(Y ij µ i ) + (Y ij µ i ) σ ( ρ ) = g(θ)h( Y 3 ) exp c k (θ)t k ( Y ), k= where the familiar form of an exponential family density is found by expanding the product and grouping terms, with n0+n g( θ) = πσ exp (n 0 + n )µ 00 ρ(n 0 µ 0 + n µ )µ 00 + n µ + n 0 µ 0 ( ρ ) σ ( ρ ) ni ( ) h( Y i=0 j= Y i0j ρy i0j Y ij + Yij ) = exp σ ( ρ ) µ 00 c ( θ) = σ ( ρ ) µ 0 c ( θ) = σ ( ρ ) µ c 3 ( θ) = σ ( ρ ) T ( Y n i ) = (Y i0j ρy ij ) i=0 j= T ( Y n 0 ) = (Y 0j ρy 00j ) j= T 3 ( Y n ) = (Y j ρy 0j ) j= Note that in that exponential family density, θ is a three dimensional vector, and the range of θ contains an open rectangle in 3 dimensions Hence, we know that T is a complete sufficient statistic Furthermore, because ˆθ (ρ) = T 3 ( Y )/n T ( Y )/n 0 is unbiased for θ and a function of the complete sufficient statistic, we know that ˆθ (ρ) is the uniform minimum variance unbiased estimator of θ 4 Settings Where Not Measuring Baseline is Optimal Even when ρ is known, there is a setting in which a better approach would be to use only the final measurement
5 Accounting for Baseline Measurements in RCT Emerson, Page 5 Suppose the measurement of Y itj is extremely expensive Then the limiting factor in our experimental design might be the number of measurements we have to make In order to estimate ˆθ (ρ), we must make (n 0 + n ) measurements one for each subject at baseline and one for each subject at the end of treatment As noted above, the variance of ˆθ (ρ) is given by V ar(ˆθ (ρ) ) = σ ( ρ ) ( + ) n n 0 But suppose that we doubled the number of subjects to n 0 = n 0 and n = n and we did not measure baseline values at all In this case, we would use ˆθ (0) which has variance ( V ar(ˆθ (0) ) = σ n + ) ( n = σ + ) 0 n n 0 We can then consider the setting in which V ar(ˆθ (0) ) < V ar(ˆθ (ρ) ) as σ V ar(ˆθ (0) ) < V ar(ˆθ (ρ) ) ( + ) < σ ( ρ ) ( + ) n n 0 n n 0 < ( ρ ) ρ < = 0707 Hence, if there is no other need to measure baseline values, and if it is relatively easier and cheaper to accrue patients than to make measurements on them, then unless the correlation within subjects is extremely high (ρ > 0707), the most efficient design is to measure and use only the follow-up values at the end of treatment Of course, it is not unusual for patient eligibility to be based on the pre-randomization value of the measurement, in which case, the advantage of ignoring the baseline measurement is gone 5 Settings Where ρ is Unknown: ANCOVA The above derivations assumed that both ρ and σ were known in a homoscedastic setting It is most often the case, however, that neither ρ nor σ are known, and we must use estimates One approach to such estimation is to use a linear model in which we adjust for the baseline value as a covariate To further explore this approach, it is useful to modify our notation Define outcome vector Z, treatment vector X, and baseline vector W for k =,, n = n 0 + n by Y 0k k n 0 0 k n 0 Y 00k k n 0 Z k = X k = W k = Y j k = n 0 + j k > n 0 Y 0j k = n 0 + j We then fit ordinary least squares (OLS) model EZ k X k, W k = β 0 + X k β + W k β
6 Accounting for Baseline Measurements in RCT Emerson, Page 6 obtaining OLS estimates ˆβ = rxw ˆβ = rxw VXZ V XW V W Z V XX V XX V W W VW Z V XW V XZ V W W V XX V W W, where for sample means Z, X, and W S XW r W X = SXX S W W n S XX = (X k X ), k= V XX = S XX /n, n S XW = (X k X )(W k W ), k= V XW = S XW /n, with similar definitions for V W W, V XZ, and V W Z Now by randomization, we know that W k and X k are independent Thus under the assumption that the randomization ratio approaches some constant (ie, n /n λ (0, ) as n ), the consistency of sample moments for population moments and the properties of convergence in probability then provide that V XW p 0, r XW p 0, V XX p λ( λ), V W W p σ, V ZZ p σ + λ( λ)(µ µ 0 ), V W Z ρσ, V XZ p λ( λ)(µ µ 0 ) = λ( λ)θ, ˆβ p θ, ˆβ p ρ Thus, this analysis of covariance (ANCOVA) model is essentially using a consistent estimate for ρ as the coefficient for the baseline value Further statements can be made if we know that our statistical model is an accurate representation of the data generation mechanism Because the treatment assignment is binary, the only issue is whether some linear relationship exists between the baseline and follow-up measurement within each dose group But if this does hold, then in the homoscedastic setting (irrespective of the distribution of the errors including irrespective of whether the errors are identically distributed) we know by the Gauss-Markov Theorem that ˆβ is the best linear unbiased estimator (BLUE) for θ Sometimes an investigator is extremely insistent on analyzing the change in measurements They then
7 Accounting for Baseline Measurements in RCT Emerson, Page 7 fit a model in which they define Y 0k Y 00k k n 0 Z k = Y j Y 0k k = n 0 + j X k = 0 k n 0 Y 00k k n 0 W k = k > n 0 Y 0j k = n 0 + j and fit ordinary least squares (OLS) model EZ k X k, W k = α 0 + X k α + W k α This approach provides the exact same inference about the treatment effect, because α 0 = β 0, α = β, and α = β and OLS estimates ˆα 0 = ˆβ 0, ˆα = ˆβ, and ˆα = ˆβ 6 Notation in the General Heteroscedastic Setting Let Y itj be the SBP measurement in the ith treatment group (i = for experimental treatment, i = 0 for placebo) at time t (t = 0 for the baseline (pre-randomization) measurement, t = for the end of treatment measurement) in the jth patient (j =,, n i ) Suppose Y itj (µ it, σit ), meaning that for a population receiving treatment i, the average measurement pre-randomization would be µ i0 with a standard deviation of σ i0, and the average measurement post-treatment would be µ i with a standard deviation of σ i We presume that individuals are independent, and that repeat measurements made on the same subject in the ith group have correlation ρ i Thus corr(y ij, Y i0j ) = ρ i, corr(y itj, Y it j ) = 0 for j j, and corr(y itj, Y i t j ) = 0 for i i (In this heteroscedastic setting, we consider that the treatment can affect both the variability of the follow-up measurements, as well as the correlation between the baseline and follow-up measurements) We presume a randomized study, so we have µ 0 = µ 00 and σ 0 = σ 00 We are ultimately interested in estimating θ = µ µ 0 In order to derive general results, we consider the distribution of linear combinations of the form ij = Y ij a i Y i0j Using simple properties of expectation and covariances, we find E ij = E Y ij a i Y i0j = µ i a i µ i0 V ar ( ij ) = V ar (Y ij a i Y i0j ) = σ i + a i σ i0 a i ρ i σ i σ i0 so letting we can derive moments i = n i n i ij i= E i = µi a i µ i0 V ar i = σ i + a i σ i0 a iρ i σ i σ i0 n i Now, we define ˆθ = 0, and find distribution moments E ˆθ = (µ a µ 0 ) (µ 0 a 0 µ 00 ) = θ (a µ 0 a 0 µ 00 ) = θ (a a 0 )µ 00
8 Accounting for Baseline Measurements in RCT Emerson, Page 8 and ˆθ is unbiased for θ for arbitrary µ 00 if and only if a = a 0 = a Now we can find distribution variance V ar ˆθ = σ + a σ0 aρ σ σ 0 + σ 0 + a σ00 aρ 0 σ 0 σ 00 n n ( 0 = σ + σ 0 + a σ00 + ) ( ρ σ aσ 00 + ρ ) 0σ 0 n n 0 n n 0 n n 0 By differentiating the above expression with respect to a, we can find the form of ˆθ with minimal variability as having ( ) ( ) n0 σ n σ 0 σ σ 0 a = ρ + ρ 0 = ( λ) ρ + λρ 0, n 0 + n σ 0 n 0 + n σ 00 σ 00 σ 00 where λ = n /(n 0 + n ) reflects the randomization ratio Note that this is a weighted average of the slope parameter from a regression of Y j on Y 0j and the slope parameter from a regression of Y 0j on Y 00j In the realistic setting in which we do not know ρ i or σit, we would again consider the ANCOVA model EZ k X k, W k = β 0 + X k β + W k β with OLS estimates ˆβ = rxw ˆβ = rxw VXZ V XW V W Z V XX V XX V W W VW Z V XW V XZ V W W V XX V W W In the heteroscedastic setting in which the randomization ratio is kept constant (so again assuming n /n λ (0, ) as n ), we have V XW p 0, r XW p 0, V XX p λ( λ), V W W p σ 00, V ZZ p ( λ)σ 0 + λσ + λ( λ)(µ µ 0 ), V W Z ( λ)ρ 0 σ 00 σ 0 + λρ σ 00 σ, V XZ p λ( λ)(µ µ 0 ) = λ( λ)θ, ˆβ p θ, ˆβ p λρ σ σ 0 + ( λ) ρ 0 σ 0 σ 00 Note that in this case, the OLS coefficient for the baseline measurement is not consistent for the optimal value of a, unless the randomization ratio is : (ie, λ = 05) or ρ σ = ρ 0 σ 0 (something that is rather hard to ascertain) If we know that our statistical model is an accurate representation of the data generation mechanism, we know by the Gauss-Markov Theorem that the OLS estimator ˆβ is not the best linear unbiased estimator
9 Accounting for Baseline Measurements in RCT Emerson, Page 9 (BLUE) for θ unless V ar(y ij Y i0j ) is a constant independent of i and j Otherwise, the BLUE would be the weighted least squares estimate with each observation weighted by the inverse variance of the follow-up measurement conditional on its treatment group and its baseline measurement As noted above, the optimal linear combination of the baseline and follow-up observations based on a is of a form that would suggest performing linear regressions of the follow-up observations on the baseline observations within each group, and then using the estimated coefficients for the baseline values to adjust the follow-up measurements That is, we could fit linear regression models E Y ij Y i0j = γ 0i + Y i0j γ i, and then define variables (( ) ( ) ) Zk n0 n = Z k ˆγ 0 + ˆγ W k, n 0 + n n 0 + n and then regress Zk on X k It should be noted, however, that the Zk are now correlated within treatment groups, and thus the inference obtained from classical OLS on that simple linear regression model would not be correct
Accounting for Baseline Observations in Randomized Clinical Trials
Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationBios 6649: Clinical Trials - Statistical Design and Monitoring
Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver
More informationBIOS 6649: Handout Exercise Solution
BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationBios 6648: Design & conduct of clinical research
Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary 2.5(c) Skewed baseline (a) Time-to-event (revisited) (b) Binary (revisited)
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationAnalysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008 Session Three Outline Role of correlation Impact proper standard errors Used to weight
More informationRegression #4: Properties of OLS Estimator (Part 2)
Regression #4: Properties of OLS Estimator (Part 2) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #4 1 / 24 Introduction In this lecture, we continue investigating properties associated
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationSample Size Determination
Sample Size Determination 018 The number of subjects in a clinical study should always be large enough to provide a reliable answer to the question(s addressed. The sample size is usually determined by
More informationGeneral Linear Model: Statistical Inference
Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least
More informationHarmonic Regression in the Biological Setting. Michael Gaffney, Ph.D., Pfizer Inc
Harmonic Regression in the Biological Setting Michael Gaffney, Ph.D., Pfizer Inc Two primary aims of harmonic regression 1. To describe the timing (phase) or degree of the diurnal variation (amplitude)
More informationwith the usual assumptions about the error term. The two values of X 1 X 2 0 1
Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationBIOS 312: Precision of Statistical Inference
and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample
More informationComprehensive Examination Quantitative Methods Spring, 2018
Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationExample: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently
Chapter 3 Sufficient statistics and variance reduction Let X 1,X 2,...,X n be a random sample from a certain distribution with p.m/d.f fx θ. A function T X 1,X 2,...,X n = T X of these observations is
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationUniversity of Oxford. Statistical Methods Autocorrelation. Identification and Estimation
University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationApproximate analysis of covariance in trials in rare diseases, in particular rare cancers
Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework
More informationPubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationLinear models. Linear models are computationally convenient and remain widely used in. applied econometric research
Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y
More informationSample Size and Power Considerations for Longitudinal Studies
Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous
More informationEstimating Optimal Dynamic Treatment Regimes from Clustered Data
Estimating Optimal Dynamic Treatment Regimes from Clustered Data Bibhas Chakraborty Department of Biostatistics, Columbia University bc2425@columbia.edu Society for Clinical Trials Annual Meetings Boston,
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationOn the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial
Biostatistics Advance Access published January 30, 2012 Biostatistics (2012), xx, xx, pp. 1 18 doi:10.1093/biostatistics/kxr050 On the covariate-adjusted estimation for an overall treatment difference
More informationMixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette
Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette What is a mixed model "really" estimating? Paradox lost - paradox regained - paradox lost again. "Simple example": 4 patients
More informationUSING TRAJECTORIES FROM A BIVARIATE GROWTH CURVE OF COVARIATES IN A COX MODEL ANALYSIS
USING TRAJECTORIES FROM A BIVARIATE GROWTH CURVE OF COVARIATES IN A COX MODEL ANALYSIS by Qianyu Dang B.S., Fudan University, 1995 M.S., Kansas State University, 2000 Submitted to the Graduate Faculty
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationDistribution-free ROC Analysis Using Binary Regression Techniques
Distribution-free Analysis Using Binary Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Introductory Talk No, not that!
More informationQuantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017
Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand
More informationDIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS
DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding
More informationEconomics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models
University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement
More informationDe-mystifying random effects models
De-mystifying random effects models Peter J Diggle Lecture 4, Leahurst, October 2012 Linear regression input variable x factor, covariate, explanatory variable,... output variable y response, end-point,
More informationDATA-ADAPTIVE VARIABLE SELECTION FOR
DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department
More information18 Bivariate normal distribution I
8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer
More informationQuestions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares
Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares L Magee Fall, 2008 1 Consider a regression model y = Xβ +ɛ, where it is assumed that E(ɛ X) = 0 and E(ɛɛ X) =
More informationCase Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial
Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial William R. Gillespie Pharsight Corporation Cary, North Carolina, USA PAGE 2003 Verona,
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationEconometrics II - EXAM Answer each question in separate sheets in three hours
Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following
More informationCorrelation analysis. Contents
Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationBIOS 2083: Linear Models
BIOS 2083: Linear Models Abdus S Wahed September 2, 2009 Chapter 0 2 Chapter 1 Introduction to linear models 1.1 Linear Models: Definition and Examples Example 1.1.1. Estimating the mean of a N(μ, σ 2
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationPractical Considerations Surrounding Normality
Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the
More informationECON 3150/4150, Spring term Lecture 6
ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture
More informationComparing the effects of two treatments on two ordinal outcome variables
Working Papers in Statistics No 2015:16 Department of Statistics School of Economics and Management Lund University Comparing the effects of two treatments on two ordinal outcome variables VIBEKE HORSTMANN,
More informationMANY BILLS OF CONCERN TO PUBLIC
- 6 8 9-6 8 9 6 9 XXX 4 > -? - 8 9 x 4 z ) - -! x - x - - X - - - - - x 00 - - - - - x z - - - x x - x - - - - - ) x - - - - - - 0 > - 000-90 - - 4 0 x 00 - -? z 8 & x - - 8? > 9 - - - - 64 49 9 x - -
More informationMFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators
MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,
More informationMantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC
Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationDiagnostics can identify two possible areas of failure of assumptions when fitting linear models.
1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationEconometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017
Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationCTDL-Positive Stable Frailty Model
CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland
More informationDynamic Prediction of Disease Progression Using Longitudinal Biomarker Data
Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum
More informationAccounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies
Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies William D. Johnson, Ph.D. Pennington Biomedical Research Center 1 of 9 Consider a study that enrolls children
More informationGroup Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology
Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,
More informationOn Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics
UW Biostatistics Working Paper Series 9-8-2010 On Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics James Dai FHCRC, jdai@fhcrc.org Charles Kooperberg fred hutchinson cancer
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationGroup Sequential Designs: Theory, Computation and Optimisation
Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference
More information2.6.3 Generalized likelihood ratio tests
26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationA Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes
A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes Achmad Efendi 1, Geert Molenberghs 2,1, Edmund Njagi 1, Paul Dendale 3 1 I-BioStat, Katholieke Universiteit
More informationNext is material on matrix rank. Please see the handout
B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0
More informationCausal-inference and Meta-analytic Approaches to Surrogate Endpoint Validation
Causal-inference and Meta-analytic Approaches to Surrogate Endpoint Validation Tomasz Burzykowski I-BioStat, Hasselt University, Belgium, and International Drug Development Institute (IDDI), Belgium tomasz.burzykowski@uhasselt.be
More informationRichard D Riley was supported by funding from a multivariate meta-analysis grant from
Bayesian bivariate meta-analysis of correlated effects: impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences Author affiliations Danielle L Burke
More informationEconometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11
Econometrics A Keio University, Faculty of Economics Simple linear model (2) Simon Clinet (Keio University) Econometrics A October 16, 2018 1 / 11 Estimation of the noise variance σ 2 In practice σ 2 too
More informationThe BLP Method of Demand Curve Estimation in Industrial Organization
The BLP Method of Demand Curve Estimation in Industrial Organization 9 March 2006 Eric Rasmusen 1 IDEAS USED 1. Instrumental variables. We use instruments to correct for the endogeneity of prices, the
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationIntroduction to Econometrics Midterm Examination Fall 2005 Answer Key
Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is
More informationAnalysis of repeated measurements (KLMED8008)
Analysis of repeated measurements (KLMED8008) Eirik Skogvoll, MD PhD Professor and Consultant Institute of Circulation and Medical Imaging Dept. of Anaesthesiology and Emergency Medicine 1 Day 2 Practical
More information