G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation
|
|
- Morgan Todd
- 5 years ago
- Views:
Transcription
1 G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation
2 G-Estimation of Structural Nested Models ( 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited 14.3 Structural nested mean models 14.4 Rank preservation 14.5 G-estimation 14.6 Structural nested models with two or more parameters BIOS G-Estimation
3 14.2 Exchangeability revisited Recall conditional exchangeability defined to be For binary Y this is equivalent to Y a A L for a = 0,1 Pr[A = 1 Y a,l] = Pr[A = 1L] Consider the following parametric logistic regression model logit{pr[a = 1 Y a=0,l]} = α 0 + α 1 Y a=0 + α 2 L Fitting such a model to a real data set not possible b/c Y a=0 not observed for all individuals Thought experiment: Suppose Y a=0 observed for all individuals so that we can fit this model. If conditional exchangeability holds and the model is correctly specified, what would you expect ˆα 1 to equal? BIOS G-Estimation
4 Consider the model 14.3 Structural nested mean models E[Y a Y a=0 A = a,l] = β 1 a + β 2 al such that β 1 + β 2 l equals the average causal effect (RD) within stratum L = l Below we discuss using g-estimation to draw inference about β 1 and β 2 Note this model is semi-parametric in the sense that we are not specifying a model for E[Y a=0 L], i.e., there is no intercept β 0 or term β 3 L in the model This is in contrast to the parametric g-formula from 13. Thus we expect g-estimation to be more robust to model mis-specification than the parametric g-formula. BIOS G-Estimation
5 14.4 Rank Preservation Suppose, contrary to fact, for the NHEFS data we knew Y a=1 and Y a=0 for all participants, i.e., each individual s potential weight gain if they quit smoking and if they did not quit smoking Imagine we sorted individuals according to Y a=1 from largest value to smallest value Imagine we sorted individuals according to Y a=0 from largest value to smallest value Suppose in either case individuals end up in the same order: rank preservation BIOS G-Estimation
6 14.4 Rank Preservation When the effect of treatment A on the outcome Y is exactly the same, on the additive scale, for all individuals in the study population, we say that additive rank preservation holds For example, if smoking cessation increases each individual s body weight by exactly 3 kg, then the ranking of individuals according to Y a=0 would be equal to the ranking according to Y a=1 A particular case of additive rank preservation occurs when the sharp null hypothesis is true ( 1), i.e., treatment has no effect on the outcomes of any individual For the purposes of structural nested mean models, we will care about additive rank preservation within levels of L. This conditional additive rank preservation holds if the effect of treatment A on the outcome Y is exactly the same for all individuals with the same values of L BIOS G-Estimation
7 14.4 Rank Preservation An example of an (additive conditional) rank-preserving structural model is Yi a Yi a=0 = ψ 1 a + ψ 2 al i for all subjects i where ψ 1 +ψ 2 l is the constant causal effect for all individuals with covariate values L = l For every individual i with L i = l Yi a=1 = Yi a=0 + ψ 1 + ψ 2 l Potential outcome under no treatment Yi a=0 is shifted by ψ 1 + ψ 2 l to obtain potential outcome under treatment Yi a=1 BIOS G-Estimation
8 14.4 Rank Preservation Figs 14.1 and 14.2 show examples of additive rank preservation within two strata L = l and L = l Figure 14.1 that in the latter list all individuals will be 3 kg additive rank preservation occurs when the sharp 36 Chapter 1), i.e., if treatment has no effect on the o the study population. For the purposes of struct will care about additive rank preservation shifts from within = additive rank preservation holds if stratum. the effect Figur of tre is exactly the same for all individuals stratum with the = sa 0 An example of an (additive conditional) from than rank in st is to the left of th =0 = 1 individuals + 2 for in a cessation than where is the constant causal effect for a values =. That is, for every individual for al wi is equal to =0 For most tr A subject s count treatment =0 pected to be c is shifted by to obtain th with the same outcome under treatment. tion is scientific Figure Figure shows an example of additive r cessation affect stratum =. The bell-shaped curves represent terfactual outcomes =0 ues of. Some (left curve) and effects of smok =1 in the upper part of the figure represent the valu. The individ outcomes for subject, and the two dots in the after quitting s ues of the two counterfactual outcomes for subject gain little, and the situation d varies across in not preserved s when =0b Because of t use methods fo For most treatments and outcomes, the individual causal effect is not expected to be constant across individuals with the same covariate values, and thus (additive conditional) rank preservation is scientifically implausible Eg, we do not expect that smoking cessation affects equally the body weight of all individuals with the same values of L BIOS G-Estimation
9 Figure Rank Preservation Reality is probably closer to Fig 14.3 Figure 14.3 Here not only are the shifts from Y a=0 to Y a=1 different between individuals, but also the ranks are not preserved A structural nested mean model is well definedintheabsenceofrank preservation. For example, one could propose a structural nested mean model for the setting depicted in Figure 14.3 to estimate the average causal effect within strata of. Such average causal effect will generally differ from the individuallevelcausaleffects. B/c of implausibility of rank preservation, causal methods that rely on it not recommended. Used in 14.5 to introduce g-est b/c g-est is easier to understand for rank-preserving models, and b/c g-est procedure is actually the same for rank-preserving and non-rankpreserving models. BIOS G-Estimation with the same covariate values, a tion is scientifically implausible. cessation affects equally the bod ues of. Some people are gene effects of smoking cessation tha. The individual causal effect after quitting smoking some ind gain little, and others may even the situation depicted in Figure varies across individuals with th not preserved since the outcome when =0but not when = Because of the implausibility use methods for causal inference we consider in this book require structural mean models from Ch not for individual causal effects, tion. The estimated average cau was 3 5 kg (95% CI: 2 5, 4 5). rank preservation of individual nested mean model in the previ preservation. The additive rank-preserving assumption than non-rank-prese stant treatment effect for all indi reason why we would want to u in practice. And yet we use it because g-estimation is easier to because the g-estimation proced and non-rank-preserving models
10 14.5 G-Estimation Suppose the goal is estimating the parameters of the structural nested mean model E[Y a Y a=0 A = a,l] = β 1 a For simplicity only considering model with one parameter, effectively assuming average causal effect constant across strata of L Assume additive rank-preserving model Yi a Yi a=0 such that ψ 1 = β 1. Equivalently = ψ 1 a or by causal consistency Y a=0 i Y a=0 i = Y a i ψ 1 a = Y ψ 1 A BIOS G-Estimation
11 14.5 G-Estimation If model correct and we knew ψ 1, then could calculate Yi a=0 individuals for all Don t know ψ 1. Moreover, drawing inference of ψ 1 is our goal. Thought experiment: Your friend (an oracle) knows the value of ψ 1. She tells you it equals one of the following three values: ψ = 20, ψ = 0 or ψ = 10. She then challenges you to determine the true value based on the oberved data. You accept the challenge. For each individual compute H(ψ ) = Y ψ A for each of the three possible values of ψ The three newly created random variables H( 20), H(0) and H(10) are candidate potential outcomes. Only one of the three is the correct potential outcome Y a=0. How do you choose which one? BIOS G-Estimation
12 14.5 G-Estimation Remember from 14.2 that the assumption of conditional exchangeability can be expressed as a logistic model for treatment given the counterfactual outcome and the covariates L. When conditional exchangeability holds, the coefficient for the counterfactual outcome should be zero. This suggests we fit three separate logistic regression models logitpr[a = 1 H(ψ ),L] = α 0 + α 1 H(ψ ) + α 2 L The candidate H(ψ ) with ˆα 1 0 is the counterfactual Y a=0 and the corresponding ψ is the estimate of the true ψ 1 Eg, suppose for H(ψ = 10) that ˆα 1 0. Then ˆψ 1 = 10. This is g-estimation. BIOS G-Estimation
13 14.5 G-Estimation Important note: G-est does not test whether conditional exchangeability holds; it assumes it holds in order to draw inference about the causal effect of interest In reality we do not have an oracle friend supplying a short list of possible values of ψ 1 Therefore need to search over all possible values of ψ 1 until we find one where the corresponding ˆα 1 = 0 Operationally this is done by a search over a fine grid (eg, -20 to 20 by 0.01) NHEFS example: consider 31 possible candidates H(2.0), H(2.1), H(2.2),..., H(4.9), H(5.0). Fit 31 separate logistic regression models of the probability of smoking cessation A = 1 just as in 12 (with same L), but include H(ψ ) as an additional covariate BIOS G-Estimation
14 14.5 G-Estimation Coefficient estimate ˆα 1 for H(ψ ) was closest to zero for H(3.4) and H(3.5) Finer search reveals ˆα 1 essentially zero for ψ = Thus g-est of average causal effect of smoking cessation on weight gain is 3.4 kg Wald test of H 0 : α 1 = 0 at ψ = yields p-value p 1 To find a 95% confidence interval for ψ 1, find subset of ψ where p > 0.05 (this is the standard approach of constructing a CI by inverting a hypothesis test) For NHEFS data 31 logistic models, this yields 95% CI [2.5, 4.5] (essentially the same as IP weighting and parametric G-formula) BIOS G-Estimation
15 G-Estimation: chapter14.r ################################################################## # G-estimation: Checking multiple possible values of psi*/ ################################################################## require(geepack) data <- nhefs.g.est grid <- seq(from = 2,to = 5, by = 0.01) # set by = for finer estimate j = 0 store.hpsi.coefs <- double(length(grid)) for (i in grid){ psi = i; j = j+1 data$hpsi <- data$wt82_71 - psi * data$qsmk gee.obj <- geeglm(qsmk ~ as.factor(sex) + as.factor(race) + age + I(age^2) + as.factor(education) + smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) + as.factor(exercise) + as.factor(active) + wt71 + I(wt71^2)+Hpsi, data = data, weight = w.cens, id=id, corstr="independence", family = binomial(logit)) store.hpsi.coefs[j] <- coef(gee.obj)["hpsi"] cat("iteration", j, "completed\n") } store.results <- as.data.frame(cbind(grid, abs(store.hpsi.coefs))) names(store.results) <- c("grid", "Hpsi.est") store.results[store.results$hpsi.est == min(store.results$hpsi.est),] BIOS G-Estimation
16 G-Estimation: chapter14.r α ^ ψ BIOS G-Estimation
17 G-Estimation: chapter14.r P value ψ BIOS G-Estimation
18 14.5 G-Estimation: Comments Other tests of H 0 : α 1 = 0 aside from Wald test, such as the score test or likelihood ratio test, could be used instead If we assume Y a {A,C} L no need to adjust for censoring O/w, if we make the weaker assumption Y C {A,L}, need to construct inverse probability of censoring weights W C = 1/Pr[C = 0 A = a,l] as in 12 With IP censoring weights and standard software, can (conservatively) use robust variance estimate to construct Wald tests of H 0 : α 1 = 0; expect 95% CIs to be wider than if non-conservative variance estimate or bootstrap used instead BIOS G-Estimation
19 14.5 G-Estimation: Comments Back to non-rank-preserving models G-estimation estimator ˆψ 1 consistent for parameter β 1 of structural nested mean model, assuming mean model is correctly specified (i.e., if average treatment effect is equal in all levels of L) This is true regardless of whether the individual treatment effect is constant I.e., it is not necessary that H(β 1 ) = Y a=0 for all subjects. Rather, it is sufficient for H(β 1 ) and Y a=0 to have the same conditional mean given L BIOS G-Estimation
20 14.6 SNM with 2 or more parameters One parameter structural nested model E[Y a Y a=0 L] = β 1 a assumes same average treatment effect If this model is mis-specified, i.e., there is effect modification by some components V of L, inferences will be wrong We expect effect modification to be the case in general [Note, in contrast, that effect modification does not invalidate MSM methods described in 12] Relax this assumption by considering instead two-parameter SNM E[Y a Y a=0 L] = β 1 a + β 2 av BIOS G-Estimation
21 14.6 SNM with 2 or more parameters For g-estimation, the corresponding rank preserving model is and now let Yi a Yi a=0 = ψ 1 a + ψ 2 av H(ψ) = Y ψ 1 A ψ 2 AV To estimate ψ 1 and ψ 2, fit logistic model logitpr[a = 1 H(ψ ),L] = α 0 + α 1 H(ψ ) + α 2 H(ψ )V + α 3 L Find combination of ψ 1 and ψ 2 where H(ψ ) A L I.e., search for combination of (ψ 1,ψ 2 ) that yields ˆα 1 = ˆα 2 = 0 In general, solution does not have a closed form and therefore numerical search algorithms (eg Nelder-Mead Simplex) must be used BIOS G-Estimation
22 14.6 NHEFS Data Revisited Consider two-parameter SNM E[Y a Y a=0 L] = β 1 a + β 2 av where Y is change in weight between follow-up and baseline, and V is baseline smoking intensity Numerical 2-d grid search; fit logistic model logitpr[a = 1 H(ψ ),L] = α 0 + α 1 H(ψ ) + α 2 H(ψ )V + α 3 L for ψ 1 {2,2.05,...,5} and ψ 2 { 1, 0.95,...,1} Find values of ψ 1 and ψ 2 where α 1 α 2 0 Yields ˆβ 1 = ˆψ and ˆβ 2 = ˆψ BIOS G-Estimation
23 Contour plot of α 1 + α 2 ψ ψ 1 BIOS G-Estimation
24 Tech Pt 14.2 In certain settings, g-estimator has a closed form E.g., consider one parameter SNM E[Y a Y a=0 L] = β 1 a Suppose g-est based on score test of H 0 : α 1 = 0 logitpr[a = 1 H(ψ ),L] = α 0 + α 1 H(ψ ) + α 2 L Then equivalent to finding parameter value ψ that solves EE H i (ψ )(A i Ê[A i L i ]) = 0 i Using the fact H i (ψ ) = Y i ψ A i, closed form solution ˆψ 1 = iy i (A i Ê[A i L i ]) i A i (A i Ê[A i L i ]) What if there is censoring, or if we fit a two parameter SNM? See Tech Pt 14.2 BIOS G-Estimation
25 chapter14.r ################################################################## # G-estimation: Closed form estimator linear mean models ################################################################## logit.est <- glm(as.factor(qsmk) ~ as.factor(sex) + as.factor(race) + age + I(age^2) + as.factor(education) + smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) + as.factor(exercise) + as.factor(active) + wt71 + I(wt71^2), data = nhefs0, weight = w.cens, family = binomial("logit")) nhefs0$qsmk.pred <- predict(logit.est, nhefs0, type = "response") # solve sum(w_c * H(psi) * (qsmk - E[qsmk L])) = 0 # for a single psi and H(psi) = wt82_71 - psi * qsmk # this can be solved as # psi = sum( w_c * wt82_71 * (qsmk - pqsmk)) / sum(w_c * qsmk * (qsmk - pqsmk)) with(nhefs0, sum( w.cens * wt82_71 * (qsmk - qsmk.pred)) / sum(w.cens * qsmk * (qsmk - qsmk.pred))) # [1] BIOS G-Estimation
26 Recap 12 Fitting MSM via IPW requires correct model of Pr[A = a L] 13 Parametric G-formula requires correct model of E[Y A,L] Doubly robust (DR) estimators require (i) correct model of Pr[A = a L] or (ii) correct model of E[Y A,L] but not necessarily both 14 G-estimation requires correct model of Pr[A = a L] and correct (semiparametric) structural mean model E[Y a Y a=0 V ] = β 1 a + β 2 av See Tech Pt 14.2 regarding DR G-estimators Less popular b/c computationally demanding, lack of off-the-shelf software, but has advantages over other approaches (Vansteelandt and Joffe Stat Sci 2014) BIOS G-Estimation
G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation
G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited
More informationIP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM
IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS 776 1 12 IPW and MSM IP weighting and marginal structural models ( 12) Outline 12.1 The causal question 12.2 Estimating IP weights via modeling
More informationOUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores
OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationQuestions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.
Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More informationWeb Appendix for Effect Estimation using Structural Nested Models and G-estimation
Web Appendix for Effect Estimation using Structural Nested Models and G-estimation Introductory concepts and notation Anonymized authors. First, we provide some additional details on the general data framework
More informationWhat s New in Econometrics. Lecture 1
What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and
More informationWeighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai
Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationTargeted Maximum Likelihood Estimation in Safety Analysis
Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35
More informationApplication of Time-to-Event Methods in the Assessment of Safety in Clinical Trials
Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationSurvival Regression Models
Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationChapter 11. Correlation and Regression
Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationGov 2002: 3. Randomization Inference
Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last week: This week: What can we identify using randomization? Estimators were justified via
More informationCasual Mediation Analysis
Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationCausal Inference Basics
Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More informationOverview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation
Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already
More informationRank preserving Structural Nested Distribution Model (RPSNDM) for Continuous
Rank preserving Structural Nested Distribution Model (RPSNDM) for Continuous Y : X M Y a=0 = Y a a m = Y a cum (a) : Y a = Y a=0 + cum (a) an unknown parameter. = 0, Y a = Y a=0 = Y for all subjects Rank
More informationLinear Model Under General Variance
Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T
More informationOne-sample categorical data: approximate inference
One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationBootstrapping Sensitivity Analysis
Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationPersonalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health
Personalized Treatment Selection Based on Randomized Clinical Trials Tianxi Cai Department of Biostatistics Harvard School of Public Health Outline Motivation A systematic approach to separating subpopulations
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationCox s proportional hazards model and Cox s partial likelihood
Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 269 Diagnosing and Responding to Violations in the Positivity Assumption Maya L. Petersen
More informationWhat s New in Econometrics? Lecture 14 Quantile Methods
What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informatione author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls
e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular
More informationPSC 504: Dynamic Causal Inference
PSC 504: Dynamic Causal Inference Matthew Blackwell 4/8/203 e problem Let s go back to a problem that we faced earlier, which is how to estimate causal effects with treatments that vary over time. We could
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationAn Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016
An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationL6: Regression II. JJ Chen. July 2, 2015
L6: Regression II JJ Chen July 2, 2015 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error,
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More information7 Sensitivity Analysis
7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationBehavioral Data Mining. Lecture 19 Regression and Causal Effects
Behavioral Data Mining Lecture 19 Regression and Causal Effects Outline Counterfactuals and Potential Outcomes Regression Models Causal Effects from Matching and Regression Weighted regression Counterfactuals
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More informationβ j = coefficient of x j in the model; β = ( β1, β2,
Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationMethods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures
Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Ruth Keogh, Stijn Vansteelandt, Rhian Daniel Department of Medical Statistics London
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationPropensity Score Methods for Estimating Causal Effects from Complex Survey Data
Propensity Score Methods for Estimating Causal Effects from Complex Survey Data Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School
More informationFlexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.
FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke
More informationDouble Robustness. Bang and Robins (2005) Kang and Schafer (2007)
Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More information1/15. Over or under dispersion Problem
1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationGov 2000: 6. Hypothesis Testing
Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6.
More informationQuantitative Empirical Methods Exam
Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationGov 2002: 13. Dynamic Causal Inference
Gov 2002: 13. Dynamic Causal Inference Matthew Blackwell December 19, 2015 1 / 33 1. Time-varying treatments 2. Marginal structural models 2 / 33 1/ Time-varying treatments 3 / 33 Time-varying treatments
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More information