BIOS 6649: Handout Exercise Solution

Size: px
Start display at page:

Download "BIOS 6649: Handout Exercise Solution"

Transcription

1 BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss data in the file WeightLoss.csv. The data set has 11 columns comprised of a subject id, treatment group, and weights at times (0, 6, 12, 18, 24, 30, 36, 42, 48) months. Read the data into R. Note: The file WeightLoss.R contains relevant R-code. In that file I have standardized the contrasts that are discussed in the problem so that the results are on the same scale. You can submit either the standardized or non-standardized contrasts for your answers. 1. Used the observed weight loss data to answer the following: (a) Plot the data: i. Plot the data for each subject in the control group on a single graph (one line per subject). ii. Plot the data for each subject in the intervention group on a single graph (one line per subject). Answer: Group 1 Group 0 Weight Weight Months post randomization Months post randomization 1

2 (b) Calculate the contrast corresponding to the average change from baseline and its variance for each treatment group (separately). Specifically (using R): Get the average in each group. Get the covariance matrix in each group. Create the contrast vector: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8). Use the above elements to answer the question (see source file). (c) What is the between-group difference in the above contrast; specifically: i. What is the value of the between-group difference in the average change contrast? ii. What is the standard error of the between-group difference in the average change contrast? iii. What is the 95% confidence interval for the between-group difference in the average change contrast? (d) Repeat part (c) using the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. (e) Repeat part (c) using the 48-month linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. Answers to problem 1 (with standardized contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change (3.633) (5.117) (-5.746, ) mo change (3.975) (3.930) (-2.571, ) Linear trend (0.1651) (0.1521) ( , ) * Between group difference divided by the standard error (i.e., the t-statistic). Answers to problem 1 (without standardization of contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change (4.087) (5.757) (-6.464, ) mo change (7.950) (7.860) (-5.143, ) Linear trend ( ) ( ) ( , )

3 Suppose that you are considering a new weight intervention study. It will enroll 500 subjects per group. Problems 2-4 illustrate how you might evaluate the effect size for various timetrajectory contrasts over different values for the true mean weight loss trajectory. Use the observed covariance matrix from the data in problem 1 to answer these questions. 2. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 8, 8, 8, 8, 8, 8, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 2: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change ( 3.633) ( 5.117) ( , ) mo change ( 3.975) ( 3.930) ( , ) Linear trend (0.1651) (0.1521) ( , ) * Between group difference divided by the standard error (i.e., the t-statistic). 3

4 3. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 7, 6, 5, 4, 3, 2, 1) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 3: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change ( 3.633) ( 5.117) ( , ) mo change ( 3.975) ( 3.930) ( , ) Linear trend (.1651) (.1521) (0.0470, ) * Between group difference divided by the standard error (i.e., the t-statistic). 4

5 4. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 1, 2, 3, 4, 5, 6, 7, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 4: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change ( 3.633) ( 5.117) ( , ) mo change ( 3.975) ( 3.930) ( , ) Linear trend ( ) ( ) ( , ) * Between group difference divided by the standard error (i.e., the t-statistic). 5

6 5. Use your answers to the above problems in answering the following: (a) Which of the above hypothetical mean weight loss trajectories (µ 1 ) is most likely to be affiliated with beneficial health outcomes? Which trajectory is not very likely to result in good outcomes? Give a reason for your answer. Answer: In general we think that losing weight and keeping it off has the most benefit (trajectory in problem 2). Weight rebound is not good, although the trajectory in problem 3 shows long-term weight regain rather than short term loss and immediate weight regain. The trajectory in problem 4 is probably of intermediate health benefit; certainly slow steady weight loss is better than slow steady weight gain. (b) Based on the calculations in problems 1-4 recommend a contrast w for use in the new trial. Give a reason for your choice. Answer: As we have discussed in class, our choice of summary measure (contrast) will induce an order on the longitudinal outcome space. The orders will differ according to the summary measure that is selected. This is true regardless of whether you use summary measures (as above) or a mixed-model repeated-measures ANOVA summary (as demonstrated in class). The contrast should be selected to be sensitive (large) to true mean trajectories that are felt to convey health benefits, and not sensitive (small) for trajectories that are not likely to have long-term health benefits. As argued above, it is probably best to lose weight and keep it off (problem 2), it is probably bad to lose weight and regain it (problem 3), and of intermediate benefit to lose weight slowly over time (problem 4). The following conclusions follow based on the last column of the results tables for problems 2-4: The slope contrast is very sensitive to trends. In problem 3 it would find that the intervention group was significantly worse than control (positive value in the last column). It would find that intervention was highly beneficial if weight loss was linearly decreasing. It is not as sensitive as other contrasts to long-term maintenance of weight loss (problem 2). The average change contrast is sensitive to all of these weight loss trajectories (particularly the long-term weight loss - problem 2). It might not be desirable that it is sensitive to the weight rebound example (problem 3). The 48-month change contrast seems to be most sensitive to the changes that are likely to have greatest health benefits and least sensitive to trajectories that are less likely to be beneficial. I am inclined toward the 48-month change contrast. I would work with study investigators to make sure I understood the potential health benefits affiliated with each of the trajectories. I would discuss the above evaluation with the investigator team to make sure that we were all in agreement as to the best contrast. 6

7 6. Suppose that you have a randomized clinical trial with 2 groups and n participants per group. Suppose that you have a baseline measurement and a follow-up measurement on every participant. Denote the data by Y 0ik and Y 1ik for the baseline and follow-up measurements in the ith participant (i = 1,..., n) of the kth treatment group (k = 0, 1). Suppose that ( ) [( ) ] Y0ik µ0k N, Σ (1) Y 1ik µ 1k where Σ = [ ] σ 2 ρσ 2 ρσ 2 σ 2 (a) Find the variance of the between-group mean difference at the last measurement times; i.e.: var(y 11 Y 10 ) (b) Find the variance of the between-group mean difference in the change from baseline to follow-up; i.e., var[(y 11 Y 01 ) (Y 10 Y 00 )]. (c) Find the conditions under which the variance in 6a is smaller than the variance in 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) var[y 11 Y 10 ] = 2σ 2 /n (b) var[(y 11 Y 01 ) (Y 10 Y 00 )]: var[(y 11 Y 01 ) (Y 10 Y 00 )] = var[(y 11 Y 01 )] + var[(y 10 Y 00 )] = 2σ 2 /n 2ρσ 2 /n + 2σ 2 /n 2ρσ 2 /n = 4σ 2 (1 ρ)/n (c) The second measure is more efficient when its variance is smaller than that of the first measure: var[(y 11 Y 01 ) (Y 10 Y 00 )] < var[y 11 Y 10 ] 4σ 2 (1 ρ)/n < 2σ 2 /n (1 ρ) < 0.5 ρ > 0.5 (d) Thus, the correlation must be fairly strong before it is more efficient to measure the difference between treatment groups by the difference in the average change as opposed to the difference in the average outcome at follow-up. 7

8 7. Use the distribution from equation (1) when answering the following: (a) What is the interpretation of β 2 in the following regression model? E(Y 1ik ) = β 0 + β 1 Y 0ik + β 2 1 [k=1] where 1 [k=1] is the indicator function for treatment group 1. (b) What is the variance of ˆβ 2 (estimated using linear regression)? [Hint: See last page for a summary of the variance of regression coefficients.] (c) Find the conditions under which the variance of ˆβ 2 is smaller than either the variance of 6a or 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) β 2 is the expected difference between treatment groups in two populations that have the same mean baseline level of the outcome variable. (b) To apply these identities in this problem you need to recognize that X in the identity on the last page represents the indicator for treatment effect (1 [ k = 1]) and Z represents the baseline measure (Y 0ik ). You then need to recognize that ρ is the correlation between baseline and follow-up measures and that it is the same in both treatment groups; thus, ρ Y Z X = ρ. Furthermore, within each treatment group the variance of the outcome measure is the same; thus, σ 2 Y X = σ2. Finally, you need to recognize that the sample size in the identities is the total number of participants in the trial; thus N = 2n where n is the number per group. Now from the identities: var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z = σ2 Y X (1 ρ2 Y Z X ) Nσ 2 X Z = σ2 (1 ρ 2 ) 2nσ 2 X Z Now, note that σx Z 2 represents the variance of the treatment group indicator given the baseline level of the outcome. Since treatment groups were randomly assigned, the treatment group indicator takes the value 1 with probably 0.5 and the value 0 with probability 0.5 regardless of baseline value; thus E(X Z) = 0.5 and: σ 2 X Z = E(X E(X))2 = (1 0.5) (0 0.5) = 0.25 It follows that var( ˆβ 2 ) = 2σ 2 (1 ρ 2 )/n 8

9 (c) The variance of the regression analysis is smaller that analysis of the difference between change whenever: var( ˆβ 2 ) < var[(y 11 Y 01 ) (Y 10 Y 00 )] 2σ 2 (1 ρ 2 ) < 4σ 2 (1 ρ) (1 ρ)(1 + ρ) < 2(1 ρ) 1 + ρ < 2 ρ < 1 which is always true, therefore the regression analysis is always more efficient. The variance of the regression analysis is smaller that analysis of follow-up measures whenever: var( ˆβ) < var[y 11 Y 10 ] 2σ 2 (1 ρ 2 ) < 2σ 2 (1 ρ 2 ) < 1 which is always true (since 1 ρ 1), therefore the regression analysis is always more efficient. (d) The above proof shows that it is always more efficient to condition on the baseline value in a regression analysis when analyzing pre-post (before-after) outcomes. There are two caveats that you should realize before applying this uniformly: The above result assumes that n is large because I have ignored degrees of freedom. If you include degrees of freedom, then the regression method is more efficient as long as there are more than about 15 subjects per group. I recommending using the regression method in a randomized trial where the distribution of baseline values the same in both treatment groups. In an observational study where the distribution of the baseline value may differ between two exposure categories, it is possible to estimate a different quantity than you are estimating with an analysis of change (i.e., one of the analyses is biased relative to the other). 9

10 8. In problem 7, what is the probability model, the functional, and the contrast that define the statistical model for scientific inference? Answer: It is not necessary to assume that the data are normally distributed; estimated regression coefficients will be normally distributed by a form of the central limit theorem. Similarly, it is not necessary to assume that there is a linear relationship between baseline and follow-up measurements; the coefficient β 1 is the first-order approximation to the nonlinear relationship between baseline and follow-up measures. Probability Model: Non-parametric (assuming only that the regression coefficients are normally distributed). Functional: Mean level of outcome, conditional on baseline level. Contrast: Difference in mean outcome levels (conditional on baseline). 10

11 Variance of a conditional coefficient in multiple linear regression Consider the linear regression model: E(Y ) = β 0 + β 1 Z + β 2 X Given data (Y i, X i, Z i ) on i = 1,..., N subjects, it is possible to show: (a) Variance of ˆβ 2 : var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z where σy 2 XZ denotes the variance of Y given X and Z, σ2 X Z given Z, and N denotes the total sample size. (b) Conditional variance of Y given X and Z: denotes the variance of X σ 2 Y XZ = σ2 Y X (1 ρ2 Y Z X ) where ρ Y Z X denotes the correlation of Y and Z given X. 11

Bios 6648: Design & conduct of clinical research

Bios 6648: Design & conduct of clinical research Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary 2.5(c) Skewed baseline (a) Time-to-event (revisited) (b) Binary (revisited)

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical

More information

Accounting for Baseline Observations in Randomized Clinical Trials

Accounting for Baseline Observations in Randomized Clinical Trials Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

The variable θ is called the parameter of the model, and the set Ω is called the parameter space.

The variable θ is called the parameter of the model, and the set Ω is called the parameter space. Lecture 8 What is a statistical model? A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The variable θ is called

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

Two Factor Full Factorial Design with Replications

Two Factor Full Factorial Design with Replications Two Factor Full Factorial Design with Replications Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.

More information

7 Estimation. 7.1 Population and Sample (P.91-92)

7 Estimation. 7.1 Population and Sample (P.91-92) 7 Estimation MATH1015 Biostatistics Week 7 7.1 Population and Sample (P.91-92) Suppose that we wish to study a particular health problem in Australia, for example, the average serum cholesterol level for

More information

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT 17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

BIOS 2083: Linear Models

BIOS 2083: Linear Models BIOS 2083: Linear Models Abdus S Wahed September 2, 2009 Chapter 0 2 Chapter 1 Introduction to linear models 1.1 Linear Models: Definition and Examples Example 1.1.1. Estimating the mean of a N(μ, σ 2

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Practical Considerations Surrounding Normality

Practical Considerations Surrounding Normality Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the

More information

Econ 583 Final Exam Fall 2008

Econ 583 Final Exam Fall 2008 Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers: Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

Ph.D. Preliminary Examination Statistics June 2, 2014

Ph.D. Preliminary Examination Statistics June 2, 2014 Ph.D. Preliminary Examination Statistics June, 04 NOTES:. The exam is worth 00 points.. Partial credit may be given for partial answers if possible.. There are 5 pages in this exam paper. I have neither

More information

ECO375 Tutorial 8 Instrumental Variables

ECO375 Tutorial 8 Instrumental Variables ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22 Review: Endogeneity Instrumental

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1 1. Regressions and Regression Models Simply put, economists use regression models to study the relationship between two variables. If Y and X are two variables, representing some population, we are interested

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health Personalized Treatment Selection Based on Randomized Clinical Trials Tianxi Cai Department of Biostatistics Harvard School of Public Health Outline Motivation A systematic approach to separating subpopulations

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Problem Set #5. Due: 1pm on Friday, Nov 16th

Problem Set #5. Due: 1pm on Friday, Nov 16th 1 Chris Piech CS 109 Problem Set #5 Due: 1pm on Friday, Nov 16th Problem Set #5 Nov 7 th, 2018 With problems by Mehran Sahami and Chris Piech For each problem, briefly explain/justify how you obtained

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Growth Mixture Model

Growth Mixture Model Growth Mixture Model Latent Variable Modeling and Measurement Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 28, 2016 Slides contributed

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example Accepted Manuscript Comparing different ways of calculating sample size for two independent means: A worked example Lei Clifton, Jacqueline Birks, David A. Clifton PII: S2451-8654(18)30128-5 DOI: https://doi.org/10.1016/j.conctc.2018.100309

More information

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad 1 Supplemental Materials Graphing Values for Individual Dyad Members over Time In the main text, we recommend graphing physiological values for individual dyad members over time to aid in the decision

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements: Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements:

More information

Rule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA.

Rule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA. May 003: Think beyond simple ANOVA when a factor is time or dose think ANCOVA. Case B: Factorial ANOVA (New Rule, 6.3). A few corrections have been inserted in blue. [At times I encounter information that

More information

Modeling the Covariance

Modeling the Covariance Modeling the Covariance Jamie Monogan University of Georgia February 3, 2016 Jamie Monogan (UGA) Modeling the Covariance February 3, 2016 1 / 16 Objectives By the end of this meeting, participants should

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Growth Curve Modeling Approach to Moderated Mediation for Longitudinal Data

Growth Curve Modeling Approach to Moderated Mediation for Longitudinal Data Growth Curve Modeling Approach to Moderated Mediation for Longitudinal Data JeeWon Cheong Department of Health Education & Behavior University of Florida This research was supported in part by NIH grants

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Bios 6648: Design & conduct of clinical research

Bios 6648: Design & conduct of clinical research Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary (a) Time-to-event (revisited) (b) Binary (revisited) (c) Skewed (d)

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

A re-appraisal of fixed effect(s) meta-analysis

A re-appraisal of fixed effect(s) meta-analysis A re-appraisal of fixed effect(s) meta-analysis Ken Rice, Julian Higgins & Thomas Lumley Universities of Washington, Bristol & Auckland tl;dr Fixed-effectS meta-analysis answers a sensible question regardless

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects Keppel, G. & Wickens, T. D. Design and Analysis Chapter 1: Detailed Analyses of Main Effects and Simple Effects If the interaction is significant, then less attention is paid to the two main effects, and

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

VIII. ANCOVA. A. Introduction

VIII. ANCOVA. A. Introduction VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Multiple Regression Analysis: The Problem of Inference

Multiple Regression Analysis: The Problem of Inference Multiple Regression Analysis: The Problem of Inference Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Multiple Regression Analysis: Inference POLS 7014 1 / 10

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs Chad S. Briggs, Kathie Lorentz & Eric Davis Education & Outreach University Housing Southern Illinois

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Topic 16 Interval Estimation

Topic 16 Interval Estimation Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Regression Analysis. Ordinary Least Squares. The Linear Model

Regression Analysis. Ordinary Least Squares. The Linear Model Regression Analysis Linear regression is one of the most widely used tools in statistics. Suppose we were jobless college students interested in finding out how big (or small) our salaries would be 20

More information

Linear Regression Measurement & Evaluation of HCC Systems

Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Today s goal: Evaluate the effect of multiple variables on an outcome variable (regression) Outline: - Basic theory - Simple

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies

Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies William D. Johnson, Ph.D. Pennington Biomedical Research Center 1 of 9 Consider a study that enrolls children

More information