BIOS 6649: Handout Exercise Solution
|
|
- Rodney Fletcher
- 5 years ago
- Views:
Transcription
1 BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss data in the file WeightLoss.csv. The data set has 11 columns comprised of a subject id, treatment group, and weights at times (0, 6, 12, 18, 24, 30, 36, 42, 48) months. Read the data into R. Note: The file WeightLoss.R contains relevant R-code. In that file I have standardized the contrasts that are discussed in the problem so that the results are on the same scale. You can submit either the standardized or non-standardized contrasts for your answers. 1. Used the observed weight loss data to answer the following: (a) Plot the data: i. Plot the data for each subject in the control group on a single graph (one line per subject). ii. Plot the data for each subject in the intervention group on a single graph (one line per subject). Answer: Group 1 Group 0 Weight Weight Months post randomization Months post randomization 1
2 (b) Calculate the contrast corresponding to the average change from baseline and its variance for each treatment group (separately). Specifically (using R): Get the average in each group. Get the covariance matrix in each group. Create the contrast vector: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8). Use the above elements to answer the question (see source file). (c) What is the between-group difference in the above contrast; specifically: i. What is the value of the between-group difference in the average change contrast? ii. What is the standard error of the between-group difference in the average change contrast? iii. What is the 95% confidence interval for the between-group difference in the average change contrast? (d) Repeat part (c) using the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. (e) Repeat part (c) using the 48-month linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4). Submit the value for the contrast, its standard error, and the ratio between the contrast value and the standard error. Answers to problem 1 (with standardized contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change (3.633) (5.117) (-5.746, ) mo change (3.975) (3.930) (-2.571, ) Linear trend (0.1651) (0.1521) ( , ) * Between group difference divided by the standard error (i.e., the t-statistic). Answers to problem 1 (without standardization of contrast): Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change (4.087) (5.757) (-6.464, ) mo change (7.950) (7.860) (-5.143, ) Linear trend ( ) ( ) ( , )
3 Suppose that you are considering a new weight intervention study. It will enroll 500 subjects per group. Problems 2-4 illustrate how you might evaluate the effect size for various timetrajectory contrasts over different values for the true mean weight loss trajectory. Use the observed covariance matrix from the data in problem 1 to answer these questions. 2. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 8, 8, 8, 8, 8, 8, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 2: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change ( 3.633) ( 5.117) ( , ) mo change ( 3.975) ( 3.930) ( , ) Linear trend (0.1651) (0.1521) ( , ) * Between group difference divided by the standard error (i.e., the t-statistic). 3
4 3. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 8, 7, 6, 5, 4, 3, 2, 1) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 3: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change ( 3.633) ( 5.117) ( , ) mo change ( 3.975) ( 3.930) ( , ) Linear trend (.1651) (.1521) (0.0470, ) * Between group difference divided by the standard error (i.e., the t-statistic). 4
5 4. Suppose that the mean weight change at the measurement times is µ 0 = (0, 0, 0, 0, 0, 0, 0, 0, 0) (control group) and µ 1 = (0, 1, 2, 3, 4, 5, 6, 7, 8) in the treatment group. (a) What is θ if the time trajectory is summarized by the average change contrast: w = ( 1, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (b) What is θ if the time trajectory is summarized by the 48-month change contrast: w = ( 1, 0, 0, 0, 0, 0, 0, 0, 1)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? (c) What is θ if the time trajectory is summarized by the linear-trend (slope) contrast: w = ( 4, 3, 2, 1, 0, 1, 2, 3, 4)? What is the anticipated standard error of ˆθ? What is the ratio between the anticipated effect and its standard error? Answers to problem 4: Control Intervention Avg (sd) Avg (sd) Difference SE CI Diff/SE* Average change ( 3.633) ( 5.117) ( , ) mo change ( 3.975) ( 3.930) ( , ) Linear trend ( ) ( ) ( , ) * Between group difference divided by the standard error (i.e., the t-statistic). 5
6 5. Use your answers to the above problems in answering the following: (a) Which of the above hypothetical mean weight loss trajectories (µ 1 ) is most likely to be affiliated with beneficial health outcomes? Which trajectory is not very likely to result in good outcomes? Give a reason for your answer. Answer: In general we think that losing weight and keeping it off has the most benefit (trajectory in problem 2). Weight rebound is not good, although the trajectory in problem 3 shows long-term weight regain rather than short term loss and immediate weight regain. The trajectory in problem 4 is probably of intermediate health benefit; certainly slow steady weight loss is better than slow steady weight gain. (b) Based on the calculations in problems 1-4 recommend a contrast w for use in the new trial. Give a reason for your choice. Answer: As we have discussed in class, our choice of summary measure (contrast) will induce an order on the longitudinal outcome space. The orders will differ according to the summary measure that is selected. This is true regardless of whether you use summary measures (as above) or a mixed-model repeated-measures ANOVA summary (as demonstrated in class). The contrast should be selected to be sensitive (large) to true mean trajectories that are felt to convey health benefits, and not sensitive (small) for trajectories that are not likely to have long-term health benefits. As argued above, it is probably best to lose weight and keep it off (problem 2), it is probably bad to lose weight and regain it (problem 3), and of intermediate benefit to lose weight slowly over time (problem 4). The following conclusions follow based on the last column of the results tables for problems 2-4: The slope contrast is very sensitive to trends. In problem 3 it would find that the intervention group was significantly worse than control (positive value in the last column). It would find that intervention was highly beneficial if weight loss was linearly decreasing. It is not as sensitive as other contrasts to long-term maintenance of weight loss (problem 2). The average change contrast is sensitive to all of these weight loss trajectories (particularly the long-term weight loss - problem 2). It might not be desirable that it is sensitive to the weight rebound example (problem 3). The 48-month change contrast seems to be most sensitive to the changes that are likely to have greatest health benefits and least sensitive to trajectories that are less likely to be beneficial. I am inclined toward the 48-month change contrast. I would work with study investigators to make sure I understood the potential health benefits affiliated with each of the trajectories. I would discuss the above evaluation with the investigator team to make sure that we were all in agreement as to the best contrast. 6
7 6. Suppose that you have a randomized clinical trial with 2 groups and n participants per group. Suppose that you have a baseline measurement and a follow-up measurement on every participant. Denote the data by Y 0ik and Y 1ik for the baseline and follow-up measurements in the ith participant (i = 1,..., n) of the kth treatment group (k = 0, 1). Suppose that ( ) [( ) ] Y0ik µ0k N, Σ (1) Y 1ik µ 1k where Σ = [ ] σ 2 ρσ 2 ρσ 2 σ 2 (a) Find the variance of the between-group mean difference at the last measurement times; i.e.: var(y 11 Y 10 ) (b) Find the variance of the between-group mean difference in the change from baseline to follow-up; i.e., var[(y 11 Y 01 ) (Y 10 Y 00 )]. (c) Find the conditions under which the variance in 6a is smaller than the variance in 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) var[y 11 Y 10 ] = 2σ 2 /n (b) var[(y 11 Y 01 ) (Y 10 Y 00 )]: var[(y 11 Y 01 ) (Y 10 Y 00 )] = var[(y 11 Y 01 )] + var[(y 10 Y 00 )] = 2σ 2 /n 2ρσ 2 /n + 2σ 2 /n 2ρσ 2 /n = 4σ 2 (1 ρ)/n (c) The second measure is more efficient when its variance is smaller than that of the first measure: var[(y 11 Y 01 ) (Y 10 Y 00 )] < var[y 11 Y 10 ] 4σ 2 (1 ρ)/n < 2σ 2 /n (1 ρ) < 0.5 ρ > 0.5 (d) Thus, the correlation must be fairly strong before it is more efficient to measure the difference between treatment groups by the difference in the average change as opposed to the difference in the average outcome at follow-up. 7
8 7. Use the distribution from equation (1) when answering the following: (a) What is the interpretation of β 2 in the following regression model? E(Y 1ik ) = β 0 + β 1 Y 0ik + β 2 1 [k=1] where 1 [k=1] is the indicator function for treatment group 1. (b) What is the variance of ˆβ 2 (estimated using linear regression)? [Hint: See last page for a summary of the variance of regression coefficients.] (c) Find the conditions under which the variance of ˆβ 2 is smaller than either the variance of 6a or 6b. (d) What are the implications of this result for study design (or data analysis)? Answer: (a) β 2 is the expected difference between treatment groups in two populations that have the same mean baseline level of the outcome variable. (b) To apply these identities in this problem you need to recognize that X in the identity on the last page represents the indicator for treatment effect (1 [ k = 1]) and Z represents the baseline measure (Y 0ik ). You then need to recognize that ρ is the correlation between baseline and follow-up measures and that it is the same in both treatment groups; thus, ρ Y Z X = ρ. Furthermore, within each treatment group the variance of the outcome measure is the same; thus, σ 2 Y X = σ2. Finally, you need to recognize that the sample size in the identities is the total number of participants in the trial; thus N = 2n where n is the number per group. Now from the identities: var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z = σ2 Y X (1 ρ2 Y Z X ) Nσ 2 X Z = σ2 (1 ρ 2 ) 2nσ 2 X Z Now, note that σx Z 2 represents the variance of the treatment group indicator given the baseline level of the outcome. Since treatment groups were randomly assigned, the treatment group indicator takes the value 1 with probably 0.5 and the value 0 with probability 0.5 regardless of baseline value; thus E(X Z) = 0.5 and: σ 2 X Z = E(X E(X))2 = (1 0.5) (0 0.5) = 0.25 It follows that var( ˆβ 2 ) = 2σ 2 (1 ρ 2 )/n 8
9 (c) The variance of the regression analysis is smaller that analysis of the difference between change whenever: var( ˆβ 2 ) < var[(y 11 Y 01 ) (Y 10 Y 00 )] 2σ 2 (1 ρ 2 ) < 4σ 2 (1 ρ) (1 ρ)(1 + ρ) < 2(1 ρ) 1 + ρ < 2 ρ < 1 which is always true, therefore the regression analysis is always more efficient. The variance of the regression analysis is smaller that analysis of follow-up measures whenever: var( ˆβ) < var[y 11 Y 10 ] 2σ 2 (1 ρ 2 ) < 2σ 2 (1 ρ 2 ) < 1 which is always true (since 1 ρ 1), therefore the regression analysis is always more efficient. (d) The above proof shows that it is always more efficient to condition on the baseline value in a regression analysis when analyzing pre-post (before-after) outcomes. There are two caveats that you should realize before applying this uniformly: The above result assumes that n is large because I have ignored degrees of freedom. If you include degrees of freedom, then the regression method is more efficient as long as there are more than about 15 subjects per group. I recommending using the regression method in a randomized trial where the distribution of baseline values the same in both treatment groups. In an observational study where the distribution of the baseline value may differ between two exposure categories, it is possible to estimate a different quantity than you are estimating with an analysis of change (i.e., one of the analyses is biased relative to the other). 9
10 8. In problem 7, what is the probability model, the functional, and the contrast that define the statistical model for scientific inference? Answer: It is not necessary to assume that the data are normally distributed; estimated regression coefficients will be normally distributed by a form of the central limit theorem. Similarly, it is not necessary to assume that there is a linear relationship between baseline and follow-up measurements; the coefficient β 1 is the first-order approximation to the nonlinear relationship between baseline and follow-up measures. Probability Model: Non-parametric (assuming only that the regression coefficients are normally distributed). Functional: Mean level of outcome, conditional on baseline level. Contrast: Difference in mean outcome levels (conditional on baseline). 10
11 Variance of a conditional coefficient in multiple linear regression Consider the linear regression model: E(Y ) = β 0 + β 1 Z + β 2 X Given data (Y i, X i, Z i ) on i = 1,..., N subjects, it is possible to show: (a) Variance of ˆβ 2 : var( ˆβ 2 ) = σ2 Y XZ Nσ 2 X Z where σy 2 XZ denotes the variance of Y given X and Z, σ2 X Z given Z, and N denotes the total sample size. (b) Conditional variance of Y given X and Z: denotes the variance of X σ 2 Y XZ = σ2 Y X (1 ρ2 Y Z X ) where ρ Y Z X denotes the correlation of Y and Z given X. 11
Bios 6648: Design & conduct of clinical research
Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary 2.5(c) Skewed baseline (a) Time-to-event (revisited) (b) Binary (revisited)
More informationBios 6649: Clinical Trials - Statistical Design and Monitoring
Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver
More informationAccounting for Baseline Observations in Randomized Clinical Trials
Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical
More informationAccounting for Baseline Observations in Randomized Clinical Trials
Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects
More informationAdvanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1
Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationThe variable θ is called the parameter of the model, and the set Ω is called the parameter space.
Lecture 8 What is a statistical model? A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The variable θ is called
More informationGeneral Linear Model: Statistical Inference
Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More informationTwo Factor Full Factorial Design with Replications
Two Factor Full Factorial Design with Replications Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/
More informationCausal Inference with Big Data Sets
Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity
More informationThree-Level Modeling for Factorial Experiments With Experimentally Induced Clustering
Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.
More information7 Estimation. 7.1 Population and Sample (P.91-92)
7 Estimation MATH1015 Biostatistics Week 7 7.1 Population and Sample (P.91-92) Suppose that we wish to study a particular health problem in Australia, for example, the average serum cholesterol level for
More informationRecitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT
17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationBIOS 2083: Linear Models
BIOS 2083: Linear Models Abdus S Wahed September 2, 2009 Chapter 0 2 Chapter 1 Introduction to linear models 1.1 Linear Models: Definition and Examples Example 1.1.1. Estimating the mean of a N(μ, σ 2
More informationComparing Group Means When Nonresponse Rates Differ
UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,
More informationSample Size and Power Considerations for Longitudinal Studies
Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous
More informationApproximate analysis of covariance in trials in rare diseases, in particular rare cancers
Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationPractical Considerations Surrounding Normality
Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the
More informationEcon 583 Final Exam Fall 2008
Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationUse of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:
Use of Matching Methods for Causal Inference in Experimental and Observational Studies Kosuke Imai Department of Politics Princeton University April 27, 2007 Kosuke Imai (Princeton University) Matching
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationPh.D. Preliminary Examination Statistics June 2, 2014
Ph.D. Preliminary Examination Statistics June, 04 NOTES:. The exam is worth 00 points.. Partial credit may be given for partial answers if possible.. There are 5 pages in this exam paper. I have neither
More informationECO375 Tutorial 8 Instrumental Variables
ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22 Review: Endogeneity Instrumental
More informationA Measure of Robustness to Misspecification
A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate
More informationRegression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.
Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationMeasuring the fit of the model - SSR
Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do
More informationBIOS 312: Precision of Statistical Inference
and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More information1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1
1. Regressions and Regression Models Simply put, economists use regression models to study the relationship between two variables. If Y and X are two variables, representing some population, we are interested
More informationAGEC 661 Note Fourteen
AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationPersonalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health
Personalized Treatment Selection Based on Randomized Clinical Trials Tianxi Cai Department of Biostatistics Harvard School of Public Health Outline Motivation A systematic approach to separating subpopulations
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationProblem Set #5. Due: 1pm on Friday, Nov 16th
1 Chris Piech CS 109 Problem Set #5 Due: 1pm on Friday, Nov 16th Problem Set #5 Nov 7 th, 2018 With problems by Mehran Sahami and Chris Piech For each problem, briefly explain/justify how you obtained
More informationFinal Exam. Economics 835: Econometrics. Fall 2010
Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationExample. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationGrowth Mixture Model
Growth Mixture Model Latent Variable Modeling and Measurement Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 28, 2016 Slides contributed
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationAccepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example
Accepted Manuscript Comparing different ways of calculating sample size for two independent means: A worked example Lei Clifton, Jacqueline Birks, David A. Clifton PII: S2451-8654(18)30128-5 DOI: https://doi.org/10.1016/j.conctc.2018.100309
More informationCovariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance
Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationSupplemental Materials. In the main text, we recommend graphing physiological values for individual dyad
1 Supplemental Materials Graphing Values for Individual Dyad Members over Time In the main text, we recommend graphing physiological values for individual dyad members over time to aid in the decision
More information1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as
ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationTutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:
Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements:
More informationRule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA.
May 003: Think beyond simple ANOVA when a factor is time or dose think ANCOVA. Case B: Factorial ANOVA (New Rule, 6.3). A few corrections have been inserted in blue. [At times I encounter information that
More informationModeling the Covariance
Modeling the Covariance Jamie Monogan University of Georgia February 3, 2016 Jamie Monogan (UGA) Modeling the Covariance February 3, 2016 1 / 16 Objectives By the end of this meeting, participants should
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationRegression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).
Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationGrowth Curve Modeling Approach to Moderated Mediation for Longitudinal Data
Growth Curve Modeling Approach to Moderated Mediation for Longitudinal Data JeeWon Cheong Department of Health Education & Behavior University of Florida This research was supported in part by NIH grants
More informationChapter 14 Simple Linear Regression (A)
Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables
More informationBios 6648: Design & conduct of clinical research
Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary (a) Time-to-event (revisited) (b) Binary (revisited) (c) Skewed (d)
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationA re-appraisal of fixed effect(s) meta-analysis
A re-appraisal of fixed effect(s) meta-analysis Ken Rice, Julian Higgins & Thomas Lumley Universities of Washington, Bristol & Auckland tl;dr Fixed-effectS meta-analysis answers a sensible question regardless
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationCovariance and Correlation
Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationKeppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects
Keppel, G. & Wickens, T. D. Design and Analysis Chapter 1: Detailed Analyses of Main Effects and Simple Effects If the interaction is significant, then less attention is paid to the two main effects, and
More informationSpatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions
Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64
More informationVIII. ANCOVA. A. Introduction
VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.
More informationEconometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018
Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationCausal Mechanisms Short Course Part II:
Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationMultiple Regression Analysis: The Problem of Inference
Multiple Regression Analysis: The Problem of Inference Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Multiple Regression Analysis: Inference POLS 7014 1 / 10
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationThe Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs
The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs Chad S. Briggs, Kathie Lorentz & Eric Davis Education & Outreach University Housing Southern Illinois
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationGroup Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology
Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationTopic 16 Interval Estimation
Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationRegression Analysis. Ordinary Least Squares. The Linear Model
Regression Analysis Linear regression is one of the most widely used tools in statistics. Suppose we were jobless college students interested in finding out how big (or small) our salaries would be 20
More informationLinear Regression Measurement & Evaluation of HCC Systems
Linear Regression Measurement & Evaluation of HCC Systems Linear Regression Today s goal: Evaluate the effect of multiple variables on an outcome variable (regression) Outline: - Basic theory - Simple
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationAccounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies
Accounting for Regression to the Mean and Natural Growth in Uncontrolled Weight Loss Studies William D. Johnson, Ph.D. Pennington Biomedical Research Center 1 of 9 Consider a study that enrolls children
More information