Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Similar documents
UNIVERSITY OF TORONTO Faculty of Arts and Science

Practical Considerations Surrounding Normality

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

ECNS 561 Multiple Regression Analysis

Swarthmore Honors Exam 2012: Statistics

Bayesian Linear Regression

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Introduction to Within-Person Analysis and RM ANOVA

The regression model with one stochastic regressor.

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Stat 5101 Lecture Notes

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Rerandomization to Balance Covariates

BIOS 6649: Handout Exercise Solution

Next is material on matrix rank. Please see the handout

PSI Journal Club March 10 th, 2016

Homoskedasticity. Var (u X) = σ 2. (23)

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

Lecture 24: Weighted and Generalized Least Squares

ECON The Simple Regression Model

A Re-Introduction to General Linear Models (GLM)

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Applied Econometrics (QEM)

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Multiple Linear Regression

A Re-Introduction to General Linear Models

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

The General Linear Model. How we re approaching the GLM. What you ll get out of this 8/11/16

Review of Statistics

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

1 The Classic Bivariate Least Squares Model

Fundamental Probability and Statistics

Multiple Linear Regression CIVL 7012/8012

Multivariate Regression Analysis

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Simple Linear Regression: The Model

Confidence Intervals, Testing and ANOVA Summary

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Introductory Econometrics

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)

The Simple Regression Model. Simple Regression Model 1

Poisson regression: Further topics

36-463/663: Multilevel & Hierarchical Models

Review of Econometrics

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Interpreting the coefficients

MATH ASSIGNMENT 2: SOLUTIONS

Group Sequential Designs: Theory, Computation and Optimisation

Principles of Bayesian Inference

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

An Introduction to Mplus and Path Analysis

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Lectures 5 & 6: Hypothesis Testing

Exam Applied Statistical Regression. Good Luck!

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

MS&E 226: Small Data

Experimental Design and Data Analysis for Biologists

Remarks on Improper Ignorance Priors

Data Analysis and Uncertainty Part 1: Random Variables

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Modeling the Mean: Response Profiles v. Parametric Curves

Regression Models - Introduction

Part 8: GLMs and Hierarchical LMs and GLMs

Lecture 9: Linear Regression

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Or How to select variables Using Bayesian LASSO

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

An Introduction to Path Analysis

Lecture 9 SLR in Matrix Form

Model Mis-specification

Variable Selection and Model Building

Principles of Bayesian Inference

Varieties of Count Data

Statistical Data Analysis Stat 3: p-values, parameter estimation

Categorical Predictor Variables

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Brief Suggested Solutions

R-squared for Bayesian regression models

The Linear Regression Model

ASA Section on Survey Research Methods

Model Estimation Example

WU Weiterbildung. Linear Mixed Models

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Least Squares Estimation-Finite-Sample Properties

Transcription:

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1

Acknowledgements This work is partly supported by the European Union s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. IDEAL An apology This is a work in progress (c) Stephen Senn 2 2

Outline Preliminaries Covariate adjustment for the linear model Less relevant for cancer but helps to raise some issues Effects on efficiency Possible approaches External regression Intermediate regression Augmented regression Bayesian approaches? Covariate adjustment for non-linear models (e.g. proportional hazards, logistic regression) Greater relevance for cancer What changes? Conclusions (c) Stephen Senn 3

Preliminary: three perspectives on estimation Perspective Issue 1. Experimental 2. Multivariate 3. Regression Population All possible randomisations of patients studied Target population? Mechanism Randomisation Sampling (which we shall pretend is random) Who cares? We re modelling Who cares? We re modelling Stochasticity Randomisation induced Multivariate Normal of predictors and outcomes Purpose Causal Did treatment have an effect Causal/ predictive Did/will treatment have an effect There s an error term thing there because we know the models don t really fit Causal/predictive Did/will treatment have an effect (c) Stephen Senn 4

The framework I shall choose I shall follow time-honoured tradition of mixing around these three approaches while pretending to know what I am talking about Basically, I regard the framework of 1 as most secure but the framework of 3 as being easiest to handle but I am going to have to treat regressors as stochastic and that means I am going to have to shimmy into 2 from time to time. (c) Stephen Senn 5

A fundamental formula and a paradox Var Y = E Var Y X + Var E Y X E Var Y X = Var Y Var E Y X Var Y Thus the conditional variance is less than the marginal variance. Analysis of Covariance is means of conditioning on covariates in the linear model It would thus seem logical that the variance of the treatment estimate having fitted using ANCOVA should be less than that when you don t condition Modelling is good! However, things are not that simple. (c) Stephen Senn 6

Basic model Predictors Disturbance term Y = Wθ + ε Treatment effect Outcome Parameters W = Z X θ = τ β Treatment assignment indicator Covariates Covariate coefficients (c) Stephen Senn 7

The consequences of adding covariates 1 & 2 First order efficiency It can be shown that the variance of the estimate of the treatment effect under ANCOVA is var 1 n n ˆ X ˆ ˆ nn 1 D n n n n X X D n 2 2 X X X 1 2 1 1 2 1 2 Where D is the vector of the mean differences in X between treatment groups X X Now suppose we have two covariate matrices so that the second is just the first with some columns corresponding to some additional covariates added, then the following is the case 2 2, E ˆ E ˆ X + X X X (c) Stephen Senn 8

The consequences of adding covariates 3 Second order efficiency One degree of freedom is lost for every covariate fitted Perhaps best understood in terms of its effect on the variance of the t distribution var t N 2 k v 2 N 4 k Where N is the number of patients and k is the number of covariates Joint effect of 1 st and 2 nd order efficiency is controversial (Gilmour & Trinca 2012) (c) Stephen Senn 9

Fisher to Nelder It must be the peculiarities of your teaching at Cambridge which led you to think that some other method is more authentic.. Probably, however, you were not taught to regard the fiducial distribution of µ as a frequency distribution at all 3 December 1956, Bennett 1990, p283 (c) Stephen Senn 10

To sum up Adding predictive covariates to a model makes the residual error smaller But it makes the design matrix somewhat less well-conditioned Second order efficiency is also affected Fewer degrees of freedom for estimating the error variance Less favourable t-distribution for confidence intervals Eventually as we add covariates we lose Problem in small trials FP7 HEALTH 2013-602552 11

The Gauss-Markov theorem This shows that Ordinary Least Squares is optimal if either a) you have fixed regressors or b) you have stochastic regressors but require conditionally unbiased estimates However, if you have stochastics regressors and are happy with marginal unbiasedness you may be able to do better Common example is recovering inter-block information The block effects are treated as random We no longer condition on them So here are some suggestions FP7 HEALTH 2013-602552 12

Auxiliary adjustment Our regression model is as follows Y = Xβ + Zτ + ε..(1) Outcome Covariates Coefficients Indicator Treatment Effect Disturbance We replace it with Y Xβ = Zτ + ε (2) Where β is a guesstimate for the vector β Since this is a randomised design if we apply ordinary least squares to (2) we have an unconditionally unbiased estimate of τ (it is not conditionally unbiased). It is true that E Var ε Var ε * but Thus, nevertheless, it is possible, due to the reduction in dimensions, that FP7 HEALTH 2013-602552 13 ˆ * var ˆ E var t E t

Variations on this 1. Pure adjustment. We use a predicted value using local covariates but slope estimates based on a previous data set. Example using table of predicted forced expiratory volume in one second (FEV 1 ) in asthma based on population surveys and using age, height and sex of the subject 2. Partial adjustment. We use two or more covariates to form a predicted value but the use this as a single covariate in ANCOVA Example using predicted FEV 1 based on age, height and sex as a covariate rather than using age, height and sex 3. Augmented regression The expected value of 1 nn n 1 1 1 2 D X X D Increases as we increase the number of columns but reduces if we increase the number of rows. We create an augmented data matrix consisting of other treatments but the same covariates (c) Stephen Senn 14

Toy example Cross-over trial in asthma comparing 7 treatments in 5 periods & 158 patients. (Senn et al, 1997) I have constructed a parallel group trial by dropping all periods except the first. I have chosen two of the treatments (placebo plus another*) to form a new trial but retained the other 5 to allow me to estimate an external outcome on baseline slope I have randomly chosen six (three for each treatment) patients and compared ANCOVA to external adjustment I have repeated this 100 times *Formoterol ISF 6µg (c) Stephen Senn 15

Simulation 1000 samples of size 6 Summary statistics for Estimate external adjustment Mean = 0.151 Median = 0.150 Variance = 0.0104 Summary statistics for Estimate ANCOVA Mean = 0.155 Median = 0.150 Variance = 0.0125 (c) Stephen Senn 16

Simulation 1000 samples of size 6 Summary statistics for Variance external adjustment Mean = 0.0121 Median = 0.00716 Minimum = 0.000181 Maximum = 0.0746 NB Variances based on 4 DF Summary statistics for Variance ANCOVA Mean = 0.0137 Median = 0.00749 Minimum = 0.0000353 Maximum = 0.290 NB Variances based on 3 DF (c) Stephen Senn 17

Simulation 1000 samples of size 8 Summary statistics for Estimate external adjustment Mean = 0.154 Median = 0.154 Variance = 0.00696 Summary statistics for Estimate ANCOVA Mean = 0.155 Median = 0.155 Variance = 0.00673 (c) Stephen Senn 18

Simulation 1000 samples of size 8 Summary statistics for Variance external adjustment Mean = 0.00878 Median = 0.00596 Minimum = 0.000283 Maximum = 0.0375 NB Variances based on 6 DF Summary statistics for Variance ANCOVA Mean = 0.00848 Median = 0.00567 Minimum = 0.000231 Maximum = 0.161 NB Variances based on 5 DF (c) Stephen Senn 19

A Bayesian Perspective? A model should be as big as an elephant. Jimmy Savage A Bayesian view of frequentist models is the following Any term in a frequentist model has an uninformative prior as to its effect Any term not in a frequentist model has an informative prior that its effect is zero The compromise Bayesian approach would be to have partially informative priors Perhaps a way can be found to use informative priors (ridge regression analogy) that avoids the paradox of information Conventional Bayesian approach (certainly not) Lindley & Smith, 1972 (not quite) Dawid & Fang, 1992 (perhaps) Work in progress (c) Stephen Senn 20

The world of non-linear models The Normal distribution is a two-parameter distribution and this has implications for the linear model Unexplained variation can be swept up in the variance The same is not true of certain other distributions/models* Poisson Binomial Proportional hazards Variances for certain models always increase if covariates are fitted Gail et al, Biometrika 1984, Robinson & Jewel Biometrics 1991, Ford et al Stats in Med 1995 (c) Stephen Senn 21

However Fact that variances increase does not mean power reduces Estimates are biased towards null if predictive covariates omitted Furthermore, in predictions space, marginal and conditional models can be more compatible (Lee and Nelder, 2004) In other words, the values of adjusting and the potential problems of doing so may be not so dissimilar after all This is just speculation on my part Not so much work in progress as work that has yet to progress (c) Stephen Senn 22

Conclusions Where diseases are rare, patients are few and large simple trials are not possible Further information cannot be obtained by studying more patients but may be obtained by measuring more things To the extent that they are predictive covariates can be useful But naïve use faces the paradox of conditioning More information appears to be worse than less We have to find ways of getting round this to progress Nevertheless, statistical modelling is not magic and there are limits (c) Stephen Senn 23