A tool to demystify regression modelling behaviour
|
|
- Ambrose Melton
- 5 years ago
- Views:
Transcription
1 A tool to demystify regression modelling behaviour Thomas Alexander Gerds 1 / 38
2 Appetizer Every child knows how regression analysis works. The essentials of regression modelling strategy, such as which variables to include in which way, however, are typically based on family tradition rather than mathematical theory. Every child knows, too, that it is pretty dicult to change tradition. 2 / 38
3 I have the best data in the world. Here are the guidelines! What is the question? Who asks the question? Parameter? Identiability? 3 / 38
4 A classical misunderstanding 4 / 38
5 A statistics teacher states When the sample size is large, then the test statistic is often normally distributed. When the data are normally distributed, then the t-test statistic is optimal. 5 / 38
6 Two applied researchers remember together If the sample size is large, then I should apply the t-test, if the sample size is small then the non-parametric Wilcoxon rank sum test is preferred Yes, for a t-test you need at least 30 samples 6 / 38
7 Family traditions 7 / 38
8 Target audience 1 Our coverage is not intended for highly skilled practitioners; rather, we target: teachers students and working epidemiologists who would like to do better with data analysis, but who lack resources such as R programming skills or a bona de modelling expert committed to their project. 1 Greenland, Daniel, Pearce Int J Epidemiol Apr; 45(2): / 38
9 Still Greenland et al Throughout, we assume that we are applying a conventional risk or rate regression model (e.g. logistic, Cox or Poisson regression) to estimate the eects of an exposure variable X on the distribution of a disease variable Y while controlling for other variables The other variables include forced variables, such as age and sex, which we may always want to control, and may also include unforced variables about which we are unsure whether to control. 9 / 38
10 Statistical modelling: the generalized linear model Here is a (favourite) multiple regression model 2 : g(y ) = α + βx + γz 2 g is a non-linear link function such as logit 10 / 38
11 Statistical modelling: the generalized linear model Here is a (favourite) multiple regression model 2 : g(y ) = α + βx + γz Some spoilsport thinks that the model gets better when Z is removed: g(y ) = α + β X 2 g is a non-linear link function such as logit 10 / 38
12 Tradition: X Step 1. use the data to test the crude null hypothesis: H 0 : β = 0 Step 2. use the same data to test the adjusted null hypothesis: H 0 : β = 0 Step 3. depends on the results of Steps 1 and 2: 11 / 38
13 Tradition: X Step 1. use the data to test the crude null hypothesis: H 0 : β = 0 Step 2. use the same data to test the adjusted null hypothesis: H 0 : β = 0 Step 3. depends on the results of Steps 1 and 2: Case: p < 0.05 and p < 0.05: X is an independent predictor Case p < 0.05 and p > 0.05: X reects other variables 11 / 38
14 How would an applied statistician with strong fundamental mathematical foundation react to this? 12 / 38
15 How would an applied statistician with strong fundamental mathematical foundation react to this? What is the question? 12 / 38
16 Tradition: Z Step 1. use the data to test the null hypothesis H 0 : γ = 0 Step 2. depends on the results of Step 1: 13 / 38
17 Tradition: Z Step 1. use the data to test the null hypothesis H 0 : γ = 0 Step 2. depends on the results of Step 1: Case: p < 0.05: Use same data to t this model: g(y ) = α + βx + γz Case p > 0.05 : Use same data to t that model: g(y ) = α + β X 13 / 38
18 How would an applied statistician with strong fundamental mathematical foundation react to this? 14 / 38
19 How would an applied statistician with strong fundamental mathematical foundation react to this? What is the parameter of interest: β or β? 14 / 38
20 Frank E Harrell Jr 3 about stepwise variable selection 1. Stepwise selection yields R 2 values that are biased high 2. The ordinary F and χ 2 test statistics do not have It yields P-values that are too small... and the proper correction for them is a very dicult problem. 5. It provides regression coecients that are biased high in absolute value and need shrinkage It allows us to not think about the problem 3 Regression Modelling Strategies (Springer, pages 56-57) 15 / 38
21 Hauck et al 4 About the two rival models g(y ) = α + β X (1) g(y ) = α + βx + γz (2) What makes model (2) the correct one? Why not model (1)? The issue is not one of bias! Rather, models (1) and (2) are estimating dierent measures of treatment eect. Thus, there is no reason for them to yield the same estimates. 4 Controlled Clinical Trials 19: (1998) 16 / 38
22 Tradition: X Z Step 1. Use expert knowledge, a DAG, and/or literature (but, not the data pyha!) to nd the known predictors Z and if they are real confounders: Z X? Step 2. depends on the results of Step 1: 17 / 38
23 Tradition: X Z Step 1. Use expert knowledge, a DAG, and/or literature (but, not the data pyha!) to nd the known predictors Z and if they are real confounders: Z X? Step 2. depends on the results of Step 1: Case: Z X Fit this model: g(y ) = α + βx + γz Case: Z X Fit that model: g(y ) = α + β X 17 / 38
24 How would an applied statistician with strong fundamental mathematical foundation react to this? 18 / 38
25 How would an applied statistician with strong fundamental mathematical foundation react to this? Well, this depends on g. 18 / 38
26 Principal criteria Classical criterion A covariate is a confounder if it is associated with the exposure and, causally, with the outcome. Operational criterion A covariate is a confounder if the estimate of exposure eect is changed by inclusion of the covariate. 19 / 38
27 Principal criteria Classical criterion A covariate is a confounder if it is associated with the exposure and, causally, with the outcome. Operational criterion A covariate is a confounder if the estimate of exposure eect is changed by inclusion of the covariate. 19 / 38
28 Mavericks 5 A maverick is a covariate that satises the operational but not the classical criterion. Y X Z 5 Hauck, Neuhaus, Kalbeisch, Anderson J Clin Epidmiol Vol. 44, No. I, pp , / 38
29 Hauck, Neuhaus, Kalbeisch, Anderson Rule 1 The eect of omitting a maverick is to bias the odds ratio 6 towards no eect. Rule 2 The magnitude of the bias caused by omitting a maverick increases with the variance of the omitted maverick and with the magnitude of the eect of the maverick on the outcome.... Rule 5 Tests of the hypothesis of a no exposure-outcome association remain valid when a maverick is omitted. 6 same for hazard ratio in Cox regression 21 / 38
30 Demonstration of the attenuation eect 22 / 38
31 library(lava) m <- lvm() distribution(m, Y+X) <- binomial.lvm("logit") regression(m,y X+Z) <- c(log(3),log(3)) d <- sim(m,n=50000) model1 <- glm(y X,data=d,family=binomial) model2 <- glm(y X+Z,data=d,family=binomial) Variable OddsRatio CI.95 OddsRatio CI.95 X 2.40 [2.31;2.49] 3.03 [2.91;3.16] Z 3.04 [2.96;3.12] Model 1 Model 2 23 / 38
32 library(lava) m <- lvm() distribution(m, Y+X) <- binomial.lvm("logit") regression(m,y X+Z) <- c(log(2),log(.5)) d <- sim(m,n=50000) model1 <- glm(y X,data=d,family=binomial) model2 <- glm(y X+Z,data=d,family=binomial) Variable OddsRatio CI.95 OddsRatio CI.95 X 1.89 [1.82;1.96] 2.04 [1.96;2.12] Z 0.50 [0.49;0.51] Model 1 Model 2 24 / 38
33 A new recipe 25 / 38
34 (Breaking with awed) regression modelling tradtions Van der Laan & Rose It should not be overlooked that the process of looking at the data, examining coecient p-values, and trying multiple statistical models is not only incredibly prevalent but is taught to students learning statistics. 26 / 38
35 The superlearner recipe Step 1 Ask the research question. Step 2 Dene the parameter of interest without specifying the rest of the model. Step 3 Estimate the parameter; use crossvalidation to distinguish alternative estimators of the same parameter. 27 / 38
36 Example Research question: Is there a dierence in the 30-day survival chances of cardiac arrest patients who received bystander CPR compared to cardiac arrest patients who did not receive bystander CPR? 28 / 38
37 Example Research question: Is there a dierence in the 30-day survival chances of cardiac arrest patients who received bystander CPR compared to cardiac arrest patients who did not receive bystander CPR? Target parameter: θ = P(Y 30 = 1 X = 1) P(Y 30 = 1 X = 0) Note: not specied if or how Z should be included 28 / 38
38 Two alternative estimators 7 Estimator 1: Y 30 X ˆθ 1 = 1 expit(ˆα + ˆβ ) 1 n 1 n 0 i:x i =1 expit(ˆα ) i:x i =0 Estimator 2: Y 30 X + Z ˆθ 2 = 1 expit(ˆα + n ˆβ + ˆγZ i ) 1 1 n 0 i:x i =1 expit(ˆα + ˆγZ i ) i:x i =0 7 Z i are demographics, comorbidities and other risk factors 29 / 38
39 Two alternative estimators 7 Estimator 1: Y 30 X ˆθ 1 = 1 expit(ˆα + ˆβ ) 1 n 1 n 0 i:x i =1 expit(ˆα ) i:x i =0 Estimator 2: Y 30 X + Z ˆθ 2 = 1 expit(ˆα + n ˆβ + ˆγZ i ) 1 1 n 0 i:x i =1 expit(ˆα + ˆγZ i ) i:x i =0 BUT: possibly comparing dierent distributions of Z 7 Z i are demographics, comorbidities and other risk factors 29 / 38
40 Example (continued) Research question: Is there a dierence in the 30-day survival chances of a cardiac arrest patients who received bystander CPR compared to cardiac arrest patients who did not receive bystander CPR when both patients have the same demographics, comorbidities and other risk factors? Target parameter: θ z = P(Y 30 = 1 X = 1, Z = z) P(Y 30 = 1 X = 0, Z = z) Note: result depends on z 30 / 38
41 Example (continued) Research question: What is the average causal eect of bystander CPR on the 30-day survival chances in cardiac arrest patients? Target parameter: θ = E Z {P(Y 30 = 1 do(x = 1), Z)} E Z {P(Y 30 = 1 do(x = 0), Z)} Note: result is an average across Z and does not depend on z 31 / 38
42 Three alternative (G-formula) estimators 8 1 n Estimator 1: 1 n n i=1 ( ) expit(ˆα + ˆβ ) expit(ˆα ) Estimator 2: 1 n ( ) expit(ˆα + ˆβ + ˆγZ i ) expit(ˆα + ˆγZ i ) n i=1 Estimator 3: n i=1 ( expit(ˆα + ˆβ + ˆγ 1 Z i + ˆγ 2 (Z i ) 2 ) expit(ˆα + ˆγ 1 Z i + ˆγ 2 (Z i ) 2 ) ) 8 Note: The estimates rely on dierent prediction models 32 / 38
43 Lava's g-formula (model1) regression(m,y X+Z) <- c(log(3),log(3)) regression(m,x Z) <- log(1.5) set.seed(18) d <- sim(m,n=50000) model1 <- glm(y X,data=d,family=binomial()) estimate(model1,function(p,data){ a <- p["(intercept)"] b <- p["x"] R.X1 <- expit(a + b) R.X0 <- expit(a) list(riskdiff=r.x1-r.x0)}, average=true) Estimate Std.Err 2.5% 97.5% P-value riskdiff / 38
44 Lava's g-formula (model2) regression(m,y X+Z) <- c(log(3),log(3)) regression(m,x Z) <- log(1.5) set.seed(18) d <- sim(m,n=50000) model2 <- glm(y X+Z,data=d,family=binomial()) estimate(model2,function(p,data){ a <- p["(intercept)"] b <- p["x"] c <- p["z"] R.X1 <- expit(a + b + c * data[,"z"]) R.X0 <- expit(a + c * data[,"z"]) list(riskdiff=r.x1-r.x0)}, average=true) Estimate Std.Err 2.5% 97.5% P-value riskdiff / 38
45 Lava's g-formula (model3) regression(m,y X+Z) <- c(log(3),log(3)) regression(m,x Z) <- log(1.5) set.seed(18) d <- sim(m,n=50000) d$q <- d$z^2 model3 <- glm(y X+Z+Q,data=d,family=binomial()) estimate(model3,function(p,data){ a <- p["(intercept)"] b <- p["x"] c <- p["z"] d <- p["q"] R.X1 <- expit(a + b + c * data[,"z"] + d * data[,"q"]) R.X0 <- expit(a + c * data[,"z"]+ d * data[,"q"]) list(riskdiff=r.x1-r.x0)}, average=true) Estimate Std.Err 2.5% 97.5% P-value riskdiff / 38
46 Conclusions The current tradition There is hardly any mathematical theory to back up some of the very prevalent strategies (traditions). We should STOP teaching backward elimination. Attenuation eect The scale on which we measure eects matters. Randomization does not prevent bias in logistic regression and Cox regression on the odds ratio and hazard ratio scale. But, the direction of the bias is known and the variance is reduced when all mavericks are omitted. Lava simulate data alike the real data study performance (bias and variance) of a modelling strategy under controlled conditions 36 / 38
47 Discussion of the new recipe The subject matter perspective matters when we dene what to estimate (average eect versus prediction) Put in several models/methods including machine learning and expert's opinion and let the data decide The superlearner recipe has a built-in validation mechanism Many possible confounders The more covariates Z 1,..., Z p the more possibilities to model Many exposure variables (AKA: What are the predictors?) You should know about the Table 2 fallacy 9 and possibly treat each X 1,..., X p separately using the new recipe or make a prediction model. 9 Westreich & Greenland. Am J Epidemiol. 2013;177(4): / 38
48 Antiparsimony principle 10 Models should be: rich enough to reect the complexity of the relations under study. Countervailing principle: you cannot estimate anything if you try to estimate everything. 10 Greenland (2000). citing L. T. Savage 38 / 38
Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:
#+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More informationMarginal, crude and conditional odds ratios
Marginal, crude and conditional odds ratios Denitions and estimation Travis Loux Gradute student, UC Davis Department of Statistics March 31, 2010 Parameter Denitions When measuring the eect of a binary
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLab 8. Matched Case Control Studies
Lab 8 Matched Case Control Studies Control of Confounding Technique for the control of confounding: At the design stage: Matching During the analysis of the results: Post-stratification analysis Advantage
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationISQS 5349 Spring 2013 Final Exam
ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices
More informationLecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018
, Non-, Precision, and Power Statistics 211 - Statistical Methods II Presented February 27, 2018 Dan Gillen Department of Statistics University of California, Irvine Discussion.1 Various definitions of
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationCausal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD
Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable
More informationTargeted Maximum Likelihood Estimation in Safety Analysis
Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More informationPrediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels
STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationEstimating and contextualizing the attenuation of odds ratios due to non-collapsibility
Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationIgnoring the matching variables in cohort studies - when is it valid, and why?
Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationBIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY
BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationABSTRACT INTRODUCTION. SESUG Paper
SESUG Paper 140-2017 Backward Variable Selection for Logistic Regression Based on Percentage Change in Odds Ratio Evan Kwiatkowski, University of North Carolina at Chapel Hill; Hannah Crooke, PAREXEL International
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationA new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston
A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials
More informationStatistics Ph.D. Qualifying Exam: Part II November 9, 2002
Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your
More informationConfounding, mediation and colliding
Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationApproximate analysis of covariance in trials in rare diseases, in particular rare cancers
Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework
More informationSelection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions
in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04 Fractional polynomials,
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu 1 / 35 Tip + Paper Tip Meet with seminar speakers. When you go on
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationLinear Regression Analysis
REVIEW ARTICLE Linear Regression Analysis Part 14 of a Series on Evaluation of Scientific Publications by Astrid Schneider, Gerhard Hommel, and Maria Blettner SUMMARY Background: Regression analysis is
More informationPubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;
More informationCore Courses for Students Who Enrolled Prior to Fall 2018
Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course
More informationCausality II: How does causal inference fit into public health and what it is the role of statistics?
Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationDiagnostics can identify two possible areas of failure of assumptions when fitting linear models.
1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important
More informationProbabilistic Index Models
Probabilistic Index Models Jan De Neve Department of Data Analysis Ghent University M3 Storrs, Conneticut, USA May 23, 2017 Jan.DeNeve@UGent.be 1 / 37 Introduction 2 / 37 Introduction to Probabilistic
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint
More informationExamples and Limits of the GLM
Examples and Limits of the GLM Chapter 1 1.1 Motivation 1 1.2 A Review of Basic Statistical Ideas 2 1.3 GLM Definition 4 1.4 GLM Examples 4 1.5 Student Goals 5 1.6 Homework Exercises 5 1.1 Motivation In
More informationRewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35
Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate
More informationYou know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?
You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David
More informationGROUPED SURVIVAL DATA. Florida State University and Medical College of Wisconsin
FITTING COX'S PROPORTIONAL HAZARDS MODEL USING GROUPED SURVIVAL DATA Ian W. McKeague and Mei-Jie Zhang Florida State University and Medical College of Wisconsin Cox's proportional hazard model is often
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationDynamics in Social Networks and Causality
Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationValida&on of Predic&ve Classifiers
Valida&on of Predic&ve Classifiers 1! Predic&ve Biomarker Classifiers In most posi&ve clinical trials, only a small propor&on of the eligible popula&on benefits from the new rx Many chronic diseases are
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationComparison of Two Samples
2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationBusiness Statistics 41000: Homework # 5
Business Statistics 41000: Homework # 5 Drew Creal Due date: Beginning of class in week # 10 Remarks: These questions cover Lectures #7, 8, and 9. Question # 1. Condence intervals and plug-in predictive
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationAsymptotic equivalence of paired Hotelling test and conditional logistic regression
Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationAssignment 2: K-Nearest Neighbors and Logistic Regression
Assignment 2: K-Nearest Neighbors and Logistic Regression SDS293 - Machine Learning Due: 4 Oct 2017 by 11:59pm Conceptual Exercises 4.4 parts a-d (p. 168-169 ISLR) When the number of features p is large,
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationChapter 11. Correlation and Regression
Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of
More informationAdvanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1
Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationMachine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on
Machine Learning Module 3-4: Regression and Survival Analysis Day 2, 9.00 16.00 Asst. Prof. Dr. Santitham Prom-on Department of Computer Engineering, Faculty of Engineering King Mongkut s University of
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationSubgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman
Subgroup analysis using regression modeling multiple regression Aeilko H Zwinderman who has unusual large response? Is such occurrence associated with subgroups of patients? such question is hypothesis-generating:
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationTABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1
TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8
More informationLecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017
Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction
More informationCAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea)
CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) OUTLINE Inference: Statistical vs. Causal distinctions and mental barriers Formal semantics
More informationAnswers to Problem Set #4
Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationSigmaplot di Systat Software
Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationAdvanced Statistics I : Gaussian Linear Model (and beyond)
Advanced Statistics I : Gaussian Linear Model (and beyond) Aurélien Garivier CNRS / Telecom ParisTech Centrale Outline One and Two-Sample Statistics Linear Gaussian Model Model Reduction and model Selection
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More information