Accounting for Baseline Observations in Randomized Clinical Trials

Size: px
Start display at page:

Download "Accounting for Baseline Observations in Randomized Clinical Trials"

Transcription

1 Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical trials investigating the treatment of chronic, progressive diseases, a common design is to make some physiologic measurement of the disease state at the start of the trial, randomize patients to either the experimental treatment or some control treatment (placebo, follow the patients over some period of time, and then measure once again the physiologic measurement of the disease state The question then arises: How should the baseline measurements made prior to randomization be used? In this manuscript, I consider several common strategies that might be used in this setting, including analyzing only the follow-up measurement, analyzing the change in measurements, and the analysis of covariance (ANCOVA approach of adjusting for the baseline measurement in a linear regression model I illustrate several different ways to motivate the use of ANCOVA, as well as examining the settings in which not measuring baseline might be preferred I conclude with a discussion of how these results might change when using a randomized crossover design Introduction In clinical trials investigating the treatment of chronic, progressive diseases, a common design is to make some physiologic measurement of the disease state at the start of the trial, randomize patients to either the experimental treatment or some control treatment (placebo, follow the patients over some period of time, and then measure once again the physiologic measurement of the disease state Take for instance a doubleblind, placebo controlled, randomized clinical trial (RCT of a new drug in the setting of hypertension Upon recruiting a patient into the RCT, we would typically measure the patient s systolic blood pressure (SBP for the purposes of determining eligibility for the trial We might further decide to stratify randomization of patients on SBP in order to ensure balance across treatment arms with respect to severity of hypertension Then after treating the patients with either the experimental treatment or placebo for the period of, say, a year, we again measure the SBP of all patients, and analyze the data to decide whether the experimental treatment is associated with a scientifically important and statistically significant lower SBP when compared to placebo The question is: How do we use the baseline (pre-randomization measurement of SBP in the data analysis? Owing to the randomized nature of the study, there are several choices that will each estimate the causal effect of the treatment in an unbiased fashion: Ignore baseline and analyze the final measurements By virtue of randomization, the distribution of SBP at the start of the study is equal in each treatment arm Hence, any differences in the distribution of SBP at the end of the trial is directly attributable to the effect of the experimental treatment Analyze the change in measurements over the course of the study For each patient, we compute the difference between the final SBP and their baseline SBP, and then compare the treatment arms with respect to the change in measurements

2 Accounting for Baseline Measurements in RCT Emerson, Page 3 Adjust for the baseline measurement as a covariate in a linear regression model of final measurement for each patient by treatment arm 4 Adjust for the baseline measurement as a covariate in a linear regression model of the change in measures for each patient by treatment arm In my experience, the second of these methods is the one most commonly advocated by clinical investigators It seems only natural to them: we are interested in an intervention that would either change the pathophysiologic process for the better or (at least lessen the change in the pathophysiologic process for the worse In this manuscript, I explore the settings in which one of the approaches might be preferable to the others In this exploration, I illustrate the seemingly paradoxical result that suggests that in RCT that second method (analyzing the change in measurements is never an optimal choice, while there are situations that any of the other three is optimal It should be noted that the results presented herein are specific to RCT, because we make extensive use of the fact that the distribution of baseline measurements is known to be the same in both treatment groups Notation in the Homoscedastic, Independent Sample Setting Let Y ktj be the SBP measurement in the kth treatment group (k = for experimental treatment, k = 0 for placebo at time t (t = 0 for the baseline (pre-randomization measurement, t = for the end of treatment measurement in the jth patient (j =,, n i We assume all subjects are independent and that the subjects in the experimental treatment group are different that those in the placebo group Suppose Y ktj (µ kt, σ, meaning that for a population receiving treatment k, the average measurement pre-randomization would be µ k0 with a standard deviation of σ, and the average measurement post-treatment would be µ k with a standard deviation of σ We presume that individuals are independent, and that repeat measurements made on the same subject in the kth group have correlation ρ Thus corr(y kj, Y k0j = ρ, corr(y ktj, Y kt j = 0 for j j, and corr(y ktj, Y k t j = 0 for k k (This homoscedastic model presumes not only that the treatment does not affect the variability of the measurements, but also that the treatment does not affect the correlation of the baseline and follow-up measurements We presume a randomized study, so we have µ 0 = µ 00 θ = µ µ 0 We are ultimately interested in estimating In order to derive general results, we consider the distribution of linear combinations of the form kj = Y kj a k Y k0j Using simple properties of expectation and covariances, we find E kj = E Y kj a i Y k0j = µ k a k µ k0 V ar ( kj = V ar (Y kj a k Y k0j = σ + a kσ a k ρσ = ( a k ρ + a k σ so letting we can derive moments k = n k n k kj j= E k = µk a k µ k0 V ar ( ak ρ + a k = k σ n k

3 Accounting for Baseline Measurements in RCT Emerson, Page 3 Now, for a specified value of a, we define ˆθ (a = 0, and find distribution moments E ˆθ(a = (µ a µ 0 (µ 0 a 0 µ 00 = θ (a µ 0 a 0 µ 00 = θ (a a 0 µ 00 and ˆθ is unbiased for θ for arbitrary µ 00 if and only if a = a 0 = a Now we can find distribution variance ˆθ(a V ar = ( aρ + a ( σ + n n 0 Special cases of interest are Ignore baseline and analyze the final measurements This corresponds to a = 0, in which case V ar(ˆθ (0 = σ ( n + n 0 Analyze the change in measurements over the course of the study This corresponds to a =, in which case ( V ar(ˆθ ( = σ ( ρ + n n 0 By differentiating the above expression with respect to a, we can find the form of ˆθ with minimal variability as having a = ρ, in which case V ar(ˆθ (ρ = σ ( ρ ( + n n 0 Note that For ρ < 05, V ar(ˆθ (0 < V ar(ˆθ (, and throwing away the baseline value is more efficient than comparing the change in measurements across treatment arms For ρ = 05, V ar(ˆθ (0 = V ar(ˆθ (, and throwing away the baseline value is just as efficient as comparing the change in measurements across treatment arms (This is the only point of equality between these two estimators 3 For ρ > 05, V ar(ˆθ (0 > V ar(ˆθ (, and throwing away the baseline value is less efficient than comparing the change in measurements across treatment arms 4 For all ρ, V ar(ˆθ (ρ V ar(ˆθ (0 and V ar(ˆθ (ρ V ar(ˆθ ( 5 For ρ = 0, V ar(ˆθ (0 = V ar(ˆθ (ρ (This is the only point of equality between these two estimators 6 For ρ =, V ar(ˆθ ( = V ar(ˆθ (ρ (This is the only point of equality between these two estimators Note that the third observation above suggests that when ρ is known, the use of ˆθ (ρ is the best linear estimator, as it is always at least as good as either of the other two approaches based on a = 0 or a =

4 Accounting for Baseline Measurements in RCT Emerson, Page 4 3 An Alternative Derivation When Measurements are Normally Distributed Consider the same homoscedastic setting as in section with σ and ρ known constants, but now suppose that it is known that vector (Y k0j, Y kj are bivariate normal We can then write the density for our data as a function of θ = (µ 00, µ 0, µ as (Y k0j µ 00 ρ(y k0j µ 00 (Y kj µ k + (Y kj µ k n k f( Y ; θ = πσ ( ρ k=0 j= exp = g(θh( Y 3 exp c l (θt l ( Y, l= σ ( ρ where the familiar form of an exponential family density is found by expanding the product and grouping terms, with n0+n g( θ = πσ exp (n 0 + n µ 00 ρ(n 0 µ 0 + n µ µ 00 + n µ + n 0 µ 0 ( ρ σ ( ρ ( nk h( Y k=0 j= Yk0j = exp ρy k0jy kj + Ykj σ ( ρ µ 00 c ( θ = σ ( ρ µ 0 c ( θ = σ ( ρ µ c 3 ( θ = σ ( ρ T ( Y n k = (Y k0j ρy kj k=0 j= n 0 T ( Y = (Y 0j ρy 00j j= T 3 ( Y n = (Y j ρy 0j j= Note that in that exponential family density, θ is a three dimensional vector, and the range of θ contains an open rectangle in 3 dimensions Hence, we know that T is a complete sufficient statistic Furthermore, because ˆθ (ρ = T 3 ( Y /n T ( Y /n 0 is unbiased for θ and a function of the complete sufficient statistic, we know that ˆθ (ρ is the uniform minimum variance unbiased estimator of θ 4 Settings Where Not Measuring Baseline is Optimal Even when ρ is known, there is a setting in which a better approach would be to use only the final measurement

5 Accounting for Baseline Measurements in RCT Emerson, Page 5 Suppose the measurement of Y ktj is extremely expensive Then the limiting factor in our experimental design might be the number of measurements we have to make In order to estimate ˆθ (ρ, we must make (n 0 + n measurements one for each subject at baseline and one for each subject at the end of treatment As noted above, the variance of ˆθ (ρ is given by V ar(ˆθ (ρ = σ ( ρ ( + n n 0 But suppose that we doubled the number of subjects to n 0 = n 0 and n = n and we did not measure baseline values at all In this case, we would use ˆθ (0 which has variance ( V ar(ˆθ (0 = σ n + ( n = σ + 0 n n 0 We can then consider the setting in which V ar(ˆθ (0 < V ar(ˆθ (ρ as σ V ar(ˆθ (0 < V ar(ˆθ (ρ ( + < σ ( ρ ( + n n 0 n n 0 < ( ρ ρ < = 0707 Hence, if there is no other need to measure baseline values, and if it is relatively easier and cheaper to accrue patients than to make measurements on them, then the most efficient design is to measure and use only the follow-up values at the end of treatment, unless the correlation within subjects is extremely high (ρ > 0707 Of course, it is not unusual for patient eligibility to be based on the pre-randomization value of the measurement, in which case, the advantage of ignoring the baseline measurement is gone 5 Settings Where ρ is Unknown: ANCOVA The above derivations assumed that both ρ and σ were known in a homoscedastic setting It is most often the case, however, that neither ρ nor σ are known, and we must use estimates One approach to such estimation is to use a linear model in which we adjust for the baseline value as a covariate To further explore this approach, it is useful to modify our notation Define outcome vector Z, treatment vector X, and baseline vector W for i =,, n = n 0 + n by Y 0i i n 0 0 i n 0 Y 00i i n 0 Z i = X i = W i = Y j i = n 0 + j i > n 0 Y 0j i = n 0 + j We then fit ordinary least squares (OLS model EZ i X i, W i = β 0 + X i β + W i β

6 Accounting for Baseline Measurements in RCT Emerson, Page 6 obtaining OLS estimates ˆβ = rxw ˆβ = rxw VXZ V XW V W Z V XX V XX V W W VW Z V XW V XZ V W W V XX V W W, where for sample means Z, X, and W S XW r W X = SXX S W W n S XX = (X i X, i= V XX = S XX /n, n S XW = (X i X (W i W, i= V XW = S XW /n, with similar definitions for V W W, V XZ, and V W Z Now by randomization, we know that W k and X k are independent Thus under the assumption that the randomization ratio approaches some constant (ie, n /n λ (0, as n, the consistency of sample moments for population moments and the properties of convergence in probability then provide that V XW p 0 r XW p 0 V XX p λ( λ V W W p σ V ZZ p σ + λ( λ(µ µ 0 V W Z p ρσ V XZ p λ( λ(µ µ 0 = λ( λθ ˆβ p θ ˆβ p ρ Thus, this analysis of covariance (ANCOVA model is essentially using a consistent estimate for ρ as the coefficient for the baseline value Further statements can be made if we know that our statistical model is an accurate representation of the data generation mechanism Because the treatment assignment is binary, the only issue is whether some linear relationship exists between the baseline and follow-up measurement within each dose group But if this does hold, then in the homoscedastic setting (irrespective of the distribution of the errors including irrespective of whether the errors are identically distributed we know by the Gauss-Markov Theorem that ˆβ is the best linear unbiased estimator (BLUE for θ Sometimes an investigator is extremely insistent on analyzing the change in measurements They then

7 Accounting for Baseline Measurements in RCT Emerson, Page 7 fit a model in which they define Y 0i Y 00i i n 0 Z i = Y j Y 0j i = n 0 + j X i = 0 i n 0 Y 00i i n 0 W i = i > n 0 Y 0j i = n 0 + j and fit ordinary least squares (OLS model EZ k X i, W i = α 0 + X i α + W i α This approach provides the exact same inference about the treatment effect, because α 0 = β 0, α = β, and α = β and OLS estimates ˆα 0 = ˆβ 0, ˆα = ˆβ, and ˆα = ˆβ 6 ANCOVA in the General Heteroscedastic Setting 6 Notation Let Y ktj be the SBP measurement in the kth treatment group (k = for experimental treatment, k = 0 for placebo at time t (t = 0 for the baseline (pre-randomization measurement, t = for the end of treatment measurement in the jth patient (j =,, n i Suppose Y ktj (µ kt, σkt, meaning that for a population receiving treatment k, the average measurement pre-randomization would be µ k0 with a standard deviation of σ k0, and the average measurement post-treatment would be µ k with a standard deviation of σ k We presume that individuals are independent, and that repeat measurements made on the same subject in the kth group have correlation ρ k Thus corr(y kj, Y k0j = ρ k, corr(y ktj, Y kt j = 0 for j j, and corr(y ktj, Y k t j = 0 for k k (In this heteroscedastic setting, we consider that the treatment can affect both the variability of the follow-up measurements, as well as the correlation between the baseline and follow-up measurements 6 Optimal Linear Combination of Measurements We presume a randomized study, so we have µ 0 = µ 00 and σ 0 = We are ultimately interested in estimating θ = µ µ 0 In order to derive general results, we consider the distribution of linear combinations of the form kj = Y kj a k Y k0j Using simple properties of expectation and covariances, we find E kj = E Y kj a k Y k0j = µ k a k µ i0 V ar ( kj = V ar (Y kj a k Y k0j = σ k + a kσ k0 a k ρ k σ k σ k0 so letting we can derive moments k = n k n k kj j= E k = µk a k µ k0 V ar k = σ k + a k σ k0 a kρ k σ k σ k0 n k

8 Accounting for Baseline Measurements in RCT Emerson, Page 8 Now, we define ˆθ = 0, and find distribution moments E ˆθ = (µ a µ 0 (µ 0 a 0 µ 00 = θ (a µ 0 a 0 µ 00 = θ (a a 0 µ 00 and ˆθ is unbiased for θ for arbitrary µ 00 if and only if a = a 0 = a Now we can find distribution variance V ar ˆθ = σ + a σ0 aρ σ σ 0 + σ 0 + a σ00 aρ 0 σ 0 n n ( 0 = σ + σ 0 + a σ00 + ( ρ σ a + ρ 0σ 0 n n 0 n n 0 n n 0 By differentiating the above expression with respect to a, we can find the form of ˆθ with minimal variability as having ( ( n0 σ n σ 0 σ σ 0 a = ρ + ρ 0 = ( λ ρ + λρ 0, n 0 + n σ 0 n 0 + n where λ = n /(n 0 + n reflects the randomization ratio Note that this is a weighted average of the slope parameter from a regression of Y j on Y 0j and the slope parameter from a regression of Y 0j on Y 00j 63 An Alternative Derivation When Measurements are Normally Distributed Consider the same heteroscedastic setting as in section 6 with σ kt and ρ k known constants, but now suppose that it is known that vector (Y k0j, Y kj are bivariate normal We can then write the density for our data as a function of µ = (µ 00, µ 0, µ as f( Y n k ; µ = πσ k=0 j= 00 σ k ( ρ k exp (Y k0j µ 00 σ00 ( ρ k + ρ k(y k0j µ 00 (Y kj µ k σ k ( ρ k (Y kj µ k σk ( ρ k = g(µh( Y 3 exp c l ( µt l ( Y, l=

9 Accounting for Baseline Measurements in RCT Emerson, Page 9 where the familiar form of an exponential family density is found by expanding the product and grouping terms, with ( nk n g( µ = k µ 00 exp πσ k=0 00 σ k ( ρ k σ00 ( ρ k + n kρ k µ k µ 00 σ k ( ρ k n k µ } k σk ( ρ k h( Y n k = exp Yk0j σ00 ( ρ k ρ ky k0j Y kj σ k ( ρ k + Ykj σk ( ρ k c ( θ = µ 00 σ 00 µ 0 k=0 j= c ( θ = σ0 ( ρ 0 µ c 3 ( θ = σ ( ρ T ( Y = n k ρ k k=0 n 0 j= j= ( Y k0j ρ k Y kj σ k = ( ρ 0 Y 00 n 0ρ 0 ( ρ 0 Y 0 + n σ 0 ( ρ Y 0 n ρ ( ρ Y σ T ( Y n 0 ( σ 0 σ 0 = Y 0j ρ 0 Y 00j = n 0 Y 0 n 0 ρ 0 Y 00 σ j= 00 T 3 ( Y n ( σ σ = Y j ρ Y 0j = n Y n ρ Y 0 Note that in that exponential family density, µ is a three dimensional vector, and the range of µ contains an open rectangle in 3 dimensions Hence, we know that T is a complete sufficient statistic, and any unbiased estimator of a function of µ that is only a function of the complete sufficient statistic is the uniform minimum variance unbiased estimator Thus for unbiased estimator µ = (ˆµ 00 ˆµ 0 ˆµ T with ˆµ 00 = T ( Y n 0 + n + ρ 0 ρ T ( Y 0 σ + ρ 0 ρ T 3 ( Y σ = n 0 n 0 + n Y 00 + n n 0 + n Y 0 ˆµ 0 = n 0 T ( Y + ρ 0 σ 0 ˆµ 00 = Y 0 ρ 0 n n 0 + n σ 0 Y 00 + ρ 0 n n 0 + n σ 0 Y 0 ˆµ = n T 3 ( Y + ρ σ σ 0 ˆµ 00 = Y + ρ n 0 n 0 + n σ Y 00 ρ n 0 n 0 + n σ Y 0 it is easily seen that this estimator is UMVUE for µ It similarly follows that ˆθ = ˆµ ˆµ 0 is UMVUE for

10 Accounting for Baseline Measurements in RCT Emerson, Page 0 θ = µ µ 0, with ( ( n σ 0 ˆθ = Y ρ 0 + n 0 σ ρ n 0 + n n 0 + n Y 0 ( ( n σ 0 Y 0 ρ 0 + n 0 σ ρ Y 00 n 0 + n n 0 + n We can thus find that the UMVUE is exactly equal to the optimal choice of a when reducing data on each subject to a single measurement 64 Use of the OLS ANCOVA model In the realistic setting in which we do not know ρ k or σkt, we could again consider the ANCOVA model EZ i X i, W i = β 0 + X i β + W i β with OLS estimates ˆβ = rxw ˆβ = rxw VXZ V XW V W Z V XX V XX V W W VW Z V XW V XZ V W W V XX V W W In the heteroscedastic setting in which the randomization ratio is kept constant (so again assuming n /n λ (0, as n, we have V XW p 0, r XW p 0, V XX p λ( λ, V W W p σ 00, V ZZ p ( λσ 0 + λσ + λ( λ(µ µ 0, V W Z ( λρ 0 σ 0 + λρ σ, V XZ p λ( λ(µ µ 0 = λ( λθ, ˆβ p θ, ˆβ p λρ σ σ 0 + ( λ ρ 0 σ 0 Note that in this case, the OLS coefficient for the baseline measurement is consistent for the optimal value of a, when the randomization ratio is : (ie, λ = 05 a very common setting or ρ σ = ρ 0 σ 0 (something that is rather hard to ascertain Otherwise, the OLS coefficient is not consistent for the optimal value of a If we know that our statistical model is an accurate representation of the data generation mechanism, we know by the Gauss-Markov Theorem that the OLS estimator ˆβ is not the best linear unbiased estimator (BLUE for θ unless V ar(y kj Y k0j is a constant independent of k and j Otherwise, the BLUE would be the weighted least squares estimate with each observation weighted by the inverse variance of the follow-up measurement conditional on its treatment group and its baseline measurement As noted above, the optimal linear combination of the baseline and follow-up observations based on a is of a form that would suggest performing linear regressions of the follow-up observations on the baseline

11 Accounting for Baseline Measurements in RCT Emerson, Page observations within each group, and then using the estimated coefficients for the baseline values to adjust the follow-up measurements That is, we could fit linear regression models E Y kj Y k0j = γ k0 + Y k0j γ k, and then define variables Z i (( ( n0 n = Z i ˆγ + ˆγ 0 W i, n 0 + n n 0 + n and then regress Z i on X i It should be noted, however, that the Z i are now (very slightly correlated within treatment groups, and thus the inference obtained from classical OLS on that simple linear regression model would not be entirely correct

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Bios 6648: Design & conduct of clinical research

Bios 6648: Design & conduct of clinical research Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary 2.5(c) Skewed baseline (a) Time-to-event (revisited) (b) Binary (revisited)

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Sample Size Determination

Sample Size Determination Sample Size Determination 018 The number of subjects in a clinical study should always be large enough to provide a reliable answer to the question(s addressed. The sample size is usually determined by

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

More information

Regression #4: Properties of OLS Estimator (Part 2)

Regression #4: Properties of OLS Estimator (Part 2) Regression #4: Properties of OLS Estimator (Part 2) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #4 1 / 24 Introduction In this lecture, we continue investigating properties associated

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008 Session Three Outline Role of correlation Impact proper standard errors Used to weight

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Harmonic Regression in the Biological Setting. Michael Gaffney, Ph.D., Pfizer Inc

Harmonic Regression in the Biological Setting. Michael Gaffney, Ph.D., Pfizer Inc Harmonic Regression in the Biological Setting Michael Gaffney, Ph.D., Pfizer Inc Two primary aims of harmonic regression 1. To describe the timing (phase) or degree of the diurnal variation (amplitude)

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Comprehensive Examination Quantitative Methods Spring, 2018

Comprehensive Examination Quantitative Methods Spring, 2018 Comprehensive Examination Quantitative Methods Spring, 2018 Instruction: This exam consists of three parts. You are required to answer all the questions in all the parts. 1 Grading policy: 1. Each part

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial

On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial Biostatistics Advance Access published January 30, 2012 Biostatistics (2012), xx, xx, pp. 1 18 doi:10.1093/biostatistics/kxr050 On the covariate-adjusted estimation for an overall treatment difference

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

USING TRAJECTORIES FROM A BIVARIATE GROWTH CURVE OF COVARIATES IN A COX MODEL ANALYSIS

USING TRAJECTORIES FROM A BIVARIATE GROWTH CURVE OF COVARIATES IN A COX MODEL ANALYSIS USING TRAJECTORIES FROM A BIVARIATE GROWTH CURVE OF COVARIATES IN A COX MODEL ANALYSIS by Qianyu Dang B.S., Fudan University, 1995 M.S., Kansas State University, 2000 Submitted to the Graduate Faculty

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Robustifying Trial-Derived Treatment Rules to a Target Population

Robustifying Trial-Derived Treatment Rules to a Target Population 1/ 39 Robustifying Trial-Derived Treatment Rules to a Target Population Yingqi Zhao Public Health Sciences Division Fred Hutchinson Cancer Research Center Workshop on Perspectives and Analysis for Personalized

More information

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics II - EXAM Answer each question in separate sheets in three hours Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following

More information

Correlation analysis. Contents

Correlation analysis. Contents Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently Chapter 3 Sufficient statistics and variance reduction Let X 1,X 2,...,X n be a random sample from a certain distribution with p.m/d.f fx θ. A function T X 1,X 2,...,X n = T X of these observations is

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

Practical Considerations Surrounding Normality

Practical Considerations Surrounding Normality Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001 Statistics Ph.D. Qualifying Exam: Part II November 3, 2001 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette

Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette What is a mixed model "really" estimating? Paradox lost - paradox regained - paradox lost again. "Simple example": 4 patients

More information

MANY BILLS OF CONCERN TO PUBLIC

MANY BILLS OF CONCERN TO PUBLIC - 6 8 9-6 8 9 6 9 XXX 4 > -? - 8 9 x 4 z ) - -! x - x - - X - - - - - x 00 - - - - - x z - - - x x - x - - - - - ) x - - - - - - 0 > - 000-90 - - 4 0 x 00 - -? z 8 & x - - 8? > 9 - - - - 64 49 9 x - -

More information

DATA-ADAPTIVE VARIABLE SELECTION FOR

DATA-ADAPTIVE VARIABLE SELECTION FOR DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department

More information

BIOS 2083: Linear Models

BIOS 2083: Linear Models BIOS 2083: Linear Models Abdus S Wahed September 2, 2009 Chapter 0 2 Chapter 1 Introduction to linear models 1.1 Linear Models: Definition and Examples Example 1.1.1. Estimating the mean of a N(μ, σ 2

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial William R. Gillespie Pharsight Corporation Cary, North Carolina, USA PAGE 2003 Verona,

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

Distribution-free ROC Analysis Using Binary Regression Techniques

Distribution-free ROC Analysis Using Binary Regression Techniques Distribution-free Analysis Using Binary Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Introductory Talk No, not that!

More information

Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies

Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies William D. Johnson, Ph.D. Pennington Biomedical Research Center 1 of 9 Consider a study that enrolls children

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

VIII. ANCOVA. A. Introduction

VIII. ANCOVA. A. Introduction VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables Causal inference in biomedical sciences: causal models involving genotypes Causal models for observational data Instrumental variables estimation and Mendelian randomization Krista Fischer Estonian Genome

More information

The BLP Method of Demand Curve Estimation in Industrial Organization

The BLP Method of Demand Curve Estimation in Industrial Organization The BLP Method of Demand Curve Estimation in Industrial Organization 9 March 2006 Eric Rasmusen 1 IDEAS USED 1. Instrumental variables. We use instruments to correct for the endogeneity of prices, the

More information

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Estimating Optimal Dynamic Treatment Regimes from Clustered Data Estimating Optimal Dynamic Treatment Regimes from Clustered Data Bibhas Chakraborty Department of Biostatistics, Columbia University bc2425@columbia.edu Society for Clinical Trials Annual Meetings Boston,

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares L Magee Fall, 2008 1 Consider a regression model y = Xβ +ɛ, where it is assumed that E(ɛ X) = 0 and E(ɛɛ X) =

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

Comparing the effects of two treatments on two ordinal outcome variables

Comparing the effects of two treatments on two ordinal outcome variables Working Papers in Statistics No 2015:16 Department of Statistics School of Economics and Management Lund University Comparing the effects of two treatments on two ordinal outcome variables VIBEKE HORSTMANN,

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example Accepted Manuscript Comparing different ways of calculating sample size for two independent means: A worked example Lei Clifton, Jacqueline Birks, David A. Clifton PII: S2451-8654(18)30128-5 DOI: https://doi.org/10.1016/j.conctc.2018.100309

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information