Biost 536 / Epi 536 Categorical Data Analysis in Epidemiology

Size: px
Start display at page:

Download "Biost 536 / Epi 536 Categorical Data Analysis in Epidemiology"

Transcription

1 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 iost 536 / Epi 536 ategorical ata nalysis in Epidemiology Scott S. Emerson, M.., Ph.. Professor of iostatistics University of Washington Lecture 9: Inference with omplex Modeling of POI November 4, ategorical ata nalysis, UT

2 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Lecture Outline Modeling scientific question omplex Modeling of POI Effect Modification 2 ategorical ata nalysis, UT

3 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Modeling Predictor of Interest 3 ategorical ata nalysis, UT

4 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Uses of Regression Modeling questions about associations efining contrasts for associations between response and POI efining contrasts for detecting effect modification Enabling comparisons across POI groups that are otherwise similar with respect to other variables djusting for confounding djusting to gain precision 4 ategorical ata nalysis, UT

5 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 orrowing Information Use other groups to make estimates in groups with sparse data Intuitively: 67 and 69 year olds would provide some relevant information about 68 year olds ssuming straight line relationship in modeled covariates tells us how to adjust data from other (even more distant) age groups If we do not know about the exact functional relationship, we might want to borrow information only close to each group 5 ategorical ata nalysis, UT

6 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 efining ontrasts efine a comparison across groups to use when answering scientific question If straight line relationship in parameter, slope for POI compares parameter between groups differing by 1 unit in X when all other covariates in model are equal If nonlinear relationship in parameter, slope is average comparison of parameter between groups differing by 1 unit in X holding covariates constant Statistical jargon: a contrast across the groups If multiple regression predictors model the POI, interpetation of the contrast is more difficult 6 ategorical ata nalysis, UT

7 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Statistical Questions: lassification 1. lustering of observations Perhaps into groups that might be different diseases 2. lustering of variables Perhaps into groups representing biochemical pathways 3. Quantification of distributions Perhaps reporting mean life expectancy after diagnosis 4. omparing distributions Perhaps investigating associations between variables 5. Prediction of individual observations Perhaps diagnosing disease or estimating kidney function 7 ategorical ata nalysis, UT

8 Lecture 9: Inference with omplex Modeling of POI November 4, Investigating ssociations Our scientific questions can be at many different levels of detail 1. Is there an association? 2. What is the general (first order) trend in Y with higher X? 3. Is their a nonlinear trend in the association? 4. Is the general trend a particular shape? Increasing exponentially? Increasing to a threshold? onstant then decreasing? U-shaped? S-shaped? 5. What is the association at particular levels of X? E.g., What is the difference in odds of mortality between subjects with LL of 160 and 161 mg/dl? ny questions can be about associations independent of other mechanisms (i.e., adjusted for potential confounding) 8 ategorical ata nalysis, UT

9 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Stratification vs Regression Generally, any stratified analysis could be performed as a regression model Stratification adjusts for covariates and all interactions among those covariates E.g, sex, race, and the sex-race interaction ny covariates modeled in each stratum s analysis would have to be modeled as interactions E.g., sex stratified analyses of response adjusted for age could be modeled in an unstratified analysis with sex-age interaction Our habit in regression is to just adjust for the covariates (the main effect ), and consider interactions less often 9 ategorical ata nalysis, UT

10 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 omplex Modeling Predictor of Interest Univariate Transformations 10 ategorical ata nalysis, UT

11 Lecture 9: Inference with omplex Modeling of POI November 4, Investigating ssociations Our scientific questions can be at many different levels of detail 1. Is there an association? 2. What is the general (first order) trend in Y with higher X? 3. Is their a nonlinear trend in the association? 4. Is the general trend a particular shape? Increasing exponentially? Increasing to a threshold? onstant then decreasing? U-shaped? S-shaped? 5. What is the association at particular levels of X? E.g., What is the difference in odds of mortality between subjects with LL of 160 and 161 mg/dl? ny questions can be about associations independent of other mechanisms (i.e., adjusted for potential confounding) 11 ategorical ata nalysis, UT

12 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Transformations of Predictors We transform predictors to provide more flexible description of complex associations between the response and some scientific measure Threshold effects Exponentially increasing effects U-shaped functions S-shaped functions etc. Sometimes we transform to gain precision about specific questions ummy variables epartures from linearity 12 ategorical ata nalysis, UT

13 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 ommon hoices Nominal variables ichotomized ummy variables Ordered categorical (possibly categorized continuous) ichotomized ummy variables Grouped linear (possibly with nonlinear scores) ontinuous Linear Univariate transformations (e.g., log, square, ) Multivariate transformations Polynomial Linear splines (or cubic splines, ) 13 ategorical ata nalysis, UT

14 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Examples: Patterns of Predicted Values It is useful to consider how the different choices model the response We consider the predicted values that might be obtained from regression models of 4 year V mortality on age or systolic P It is most interpretable to consider what the models say about the probability of an event ut RR and OR have nonlinear link functions This makes the form of some predicted relationships differ from what a naïve user might consider E.g., linear models on the log odds are not linear on the probability 14 ategorical ata nalysis, UT

15 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Linear ge ontinuous E.g.: logistic deadin4 age Predicted Values from R (black), RR (green), OR(blue) regression age Fitted values Pr(deadin4) Predicted number of events 15 ategorical ata nalysis, UT

16 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Linear SP ontinuous E.g.: logistic deadin4 sbp Predicted Values from R (black), RR (green), OR(blue) regression sbp Fitted values Pr(deadin4) Predicted number of events 16 ategorical ata nalysis, UT

17 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Squared ge ontinuous E.g.: logistic deadin4 agesqr Predicted Values from R (black), RR (green), OR(blue) regression age Fitted values Pr(deadin4) Predicted number of events 17 ategorical ata nalysis, UT

18 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Squared SP ontinuous E.g.: logistic deadin4 sbpsqr Predicted Values from R (black), RR (green), OR(blue) regression sbp Fitted values Pr(deadin4) Predicted number of events 18 ategorical ata nalysis, UT

19 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Log ge ontinuous E.g.: logistic deadin4 logage Predicted Values from R (black), RR (green), OR(blue) regression age Fitted values Pr(deadin4) Predicted number of events 19 ategorical ata nalysis, UT

20 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Log SP ontinuous E.g.: logistic deadin4 logsbp Predicted Values from R (black), RR (green), OR(blue) regression sbp Fitted values Pr(deadin4) Predicted number of events 20 ategorical ata nalysis, UT

21 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 omments: Univariate Transformations straight line can approximate many curvilinear functions over a small interval of their domains converse of sorts is also true in statistics: If the truth is a straight line, a statistical procedure will estimate the parameters in a way that makes it cover a small interval of the domain 21 ategorical ata nalysis, UT

22 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Table) From Homework #3, we found suggestion of a nonlinear effect 22 ategorical ata nalysis, UT

23 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Plots) From Homework #3, we found suggestion of a nonlinear effect I can use linear splines to mimic a smooth to the data (Linear splines can handle the link functions in RR and OR) Fitted Values: Step Function (heptiles) aai R (linear) RR (Poisson) OR (logistic) 23 ategorical ata nalysis, UT

24 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Linear) Linear continuous models might have seemed unusual fits Need to consider preponderance of the data Need to consider impact of link function on possible curves Need to consider influential points relative to link function Fitted Values: Linear Splines, Linear aai Rspl (linear) RRspl (Poisson) ORlin (logistic) ORspl (logistic) Rlin (linear) RRlin (Poisson) 24 ategorical ata nalysis, UT

25 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Log) Linear continuous models might have seemed unusual fits Need to consider preponderance of the data Need to consider impact of link function on possible curves Need to consider influential points relative to link function Fitted Values: Linear Splines, Linear aai Rspl (linear) RRspl (Poisson) ORlog (logistic) ORspl (logistic) Rlog (linear) RRlog (Poisson) 25 ategorical ata nalysis, UT

26 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 omments: Univariate Transformations Many univariate transformations are in some sense defined relative to 0 or 1 If you center the covariates you might get a different picture (nd because a quadratic function includes the linear term, that in some sense centers the curve) 26 ategorical ata nalysis, UT

27 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Log (SP-75) ontinuous E.g.: logistic deadin4 logsbp75 Predicted Values from R (black), RR (green), OR(blue) regression sbp Fitted values Pr(deadin4) Predicted number of events 27 ategorical ata nalysis, UT

28 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Quadratic ge ontinuous E.g.: logistic deadin4 age agesqr Predicted Values from R (black), RR (green), OR(blue) regression age Fitted values Pr(deadin4) Predicted number of events 28 ategorical ata nalysis, UT

29 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Quadratic SP ontinuous E.g.: logistic deadin4 sbp sbpsqr Predicted Values from R (black), RR (green), OR(blue) regression sbp Fitted values Pr(deadin4) Predicted number of events 29 ategorical ata nalysis, UT

30 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Grouped Linear ge E.g.: logistic deadin4 age5yr Predicted Values from R (black), RR (green), OR(blue) regression age Fitted values Pr(deadin4) Predicted number of events 30 ategorical ata nalysis, UT

31 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Interpretation: Identity Link With categorical data, we use an identity link with R regression Linear regression on a binary response variable When modeling a scientific factor (POI) with a single predictor, interpretation of parameters is straightforward when predictor is inary: ifference in means between two groups Untransformed continuous: (verage) difference in means between two groups that differ in the predictor by one unit Log transformed continuous (verage) difference in means between two groups that differ by a e-fold difference in their predictor values Interpretation of parameters with other transformations is difficult 31 ategorical ata nalysis, UT

32 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Identity Link: Log Transformed Predictors Suppose we use the natural log transformation e= fold difference in ldl associated with.1477 lower probability of death (ould look at log(2) times the regression parameter for doubling). regress deadin5 logldl, robust Linear regression Number of obs = 725 F( 1, 723) = 8.14 Prob > F = R-squared = Root MSE = Robust deadin5 oef. StdErr t P> t [95% onf. Intvl] logldl _cons ategorical ata nalysis, UT

33 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Interpretation: log Link With categorical data, we use a logarithmic link (natural log) with RR (Poisson) and OR (logistic) regression When modeling a scientific factor (POI) with a single predictor, interpretation of exponentiated parameters is straightforward when predictor is inary: RR or OR between two groups Untransformed continuous: (verage) RR or OR two groups that differ in the predictor by one unit Log transformed continuous (verage) RR or OR between two groups that differ by a e-fold difference in their predictor values Interpretation of parameters with other transformations is difficult 33 ategorical ata nalysis, UT

34 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Log Link: Log Transformed Predictors Suppose we use the natural log transformation e= fold difference in ldl associated with OR only.3705 times as high (ould look at regression parameter raised to log(2) times for doubling). logistic deadin5 logldl Logistic regression Number of obs = 725 LR chi2(1) = 9.26 Prob > chi2 = Log likelihood = Pseudo R2 = deadin5 Odds Ratio StdErr. z P> z [95% onf. Intvl] logldl ategorical ata nalysis, UT

35 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Interpretable Estimates Sometimes 1 unit is outside the variability of our data E.g., age measured in centuries E.g., log height using natural log (a fold difference) Other times 1 unit is so small as to seem inconsequential E.g., adult ages measured in days The easiest way to deal with this is to transform your predictor to units that are more interesting Use log base 2 or log base 1.1 E.g., loght = log(height) / log(1.1) (see Supplementary Materials) Rescale to better units E.g. replace aai = aai / 0.1 E.g. replace age = age / 10 (decades) In any case, an OR or RR of should usually be regarded as 1 significant digit 35 ategorical ata nalysis, UT

36 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 ummy Variables Indicator variables for all but one group This is the only appropriate way to model nominal (unordered) variables E.g., for marital status create indicator variables for married (married = 1, everything else = 0) widowed (widowed = 1, everything else = 0) divorced (divorced = 1, everything else = 0) (single would then be the intercept) Often used for other settings as well escriptive statistics as in Homework #3 ut dummy variables ignore order of categories Equivalent to nalysis of Variance (NOV) 36 ategorical ata nalysis, UT

37 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Ex: Interpretation of Slopes ased on coding used Intercept corresponds to reference group Slope for other terms compare each group to the reference Tests for association must test all predictors together Need to be very careful in overinterpreting statistical significance among groups 37 ategorical ata nalysis, UT

38 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Stata: ummy Variables Stata has a facility to automatically create dummy variables Old way: Prefix regression commands with xi: Now just prefix variables to be modeled as dummy variables with i.varname Example: 4 year mortality rate by ankle: arm ratio of SP (I) xtiles aai5= aai, nq(5) logistic deadin4 aai5 Stata will drop the lowest category by default You can choose the category dropped by specifying, for instance, the third interval for logistic deadin4 ib3.aai5 This does not change the predicted values, only the interpretation of individual regression parameters 38 ategorical ata nalysis, UT

39 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Lowest Interval as aseline. logit deadin4 i.aaiq5 Logistic regression Number of obs = 4879 LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 oef. Std. Err. z P> z [95% onf. Interval] aaiq _cons ategorical ata nalysis, UT

40 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Second Interval as aseline. logit deadin4 ib2.aaiq5 Logistic regression Number of obs = 4879 LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 oef. Std. Err. z P> z [95% onf. Interval] aaiq _cons ategorical ata nalysis, UT

41 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Third Interval as aseline. logit deadin4 ib3.aaiq5 Logistic regression Number of obs = 4879 LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 oef. Std. Err. z P> z [95% onf. Interval] aaiq _cons ategorical ata nalysis, UT

42 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Highest Interval as aseline. logit deadin4 ib5.aaiq5 Logistic regression Number of obs = 4879 LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 oef. Std. Err. z P> z [95% onf. Interval] aaiq _cons ategorical ata nalysis, UT

43 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 aveats The parameterization does not affect the fitted values The parameterization does affect interpretation of regression coefficients E.g., coefficient for second level compares to baseline group Group 2 vs group 1 Group 2 vs group 3 Group 2 vs group 5 Interpretation of the p values are very problematic Not adjusted for multiple comparisons nd nonsignificant p values to not prove equivalence Were we to believe the validity of the p values We cannot discriminate group 2 from groups 3 or 5 (but 4 is different) We cannot discriminate group 3 from groups 2, 4, or 5 43 ategorical ata nalysis, UT

44 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Multivariate Modeling of POI: Inference If multiple modeled covariates involve our POI, we must test for an association by simultaneously testing all such parameters If there are no other covariates in the model, the overall F test or chi square test provide this inference If there are other covariates, we use Stata s test post-estimation command 44 ategorical ata nalysis, UT

45 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Question When is it more powerful to model the POI multivariately rather than univariately? 45 ategorical ata nalysis, UT

46 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 ompare Linear vs ummy Variables Suppose the truth is a straight line relationship: We can consider an example in logistic regression Linear ontinuous Power ummy Variables Power The major loss of power is from the dummy variables ignoring the order of the groups Had we used grouped linear, the power was.172,.576, ategorical ata nalysis, UT

47 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Linear ontinuous Models orrows information across groups ccurate, efficient if model is correct If model incorrect, mixes random and systematic error an gain power from ordering of groups in order to detect a trend ut, no matter how low the standard error is, if there is no trend in the mean, there is no statistical significance 47 ategorical ata nalysis, UT

48 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Hypothetical Settings Linear: Highest Power; NOV: High Power Linear: Moderate Power; NOV: Low Power Response Response ose ose Linear: No Power; NOV: High Power Linear: No Power; NOV: Low Power Response Response ose ose ategorical ata nalysis, UT

49 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Perils of ata-riven Model Selection Question: an we use the data to help us decide which form is used to model the association between the POI and response? 49 ategorical ata nalysis, UT

50 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Perils of ata-riven Model Selection Question: an we use the data to help us decide which form is used to model the association between the POI and response? nswer: onsider again lecture 8 50 ategorical ata nalysis, UT

51 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 pplication to Evaluation of Risk We consider a population of candidate hypotheses We use scientific studies to diagnose truly beneficial hypotheses Use both frequentist and ayesian optimality criteria Innovator (sponsor): High probability of adopting a true hypothesis (frequentist power) Regulatory: Low probability of adopting false hypotheses (freq type 1 error) High probability that adopted hypotheses are true (posterior probability) Public Health (frequentist sample space, ayes criteria) Maximize the number of true hypotheses adopted Minimize the number of false hypotheses adopted 51 ategorical ata nalysis, UT

52 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Frequentist vs ayesian Frequentist and ayesian inference truly complementary Frequentist: esign so the same data not likely from null / alt ayesian: Explore updated beliefs based on a range of priors ayes rule tells us that we can parameterize the positive predictive value by the type I error and prevalence Maximize new information by maximizing ayes factor With simple hypotheses: power prevalence PPV power prevalence type I err 1 prevalence PPV 1 PPV power prevalence type I err 1 prevalence posterior odds ayes Factor prior odds 52 ategorical ata nalysis, UT

53 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 onsider lternative Models We can test each with a type I error of.05 Linear continuous Quadratic ummy variables What if we take the lowest of p values from multiple models Linear, quadratic 0.05 for each goes to Linear, dummy 0.05 for each goes to ll three 0.05 for each goes to Example simulations: With a true linear effect, power increases by less than or or ategorical ata nalysis, UT

54 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Linear Splines raw straight lines between pre-specified knots Model intercept and m+1 variables when using m knots Suppose knots are k 1,, k m, for variable X efine variables Spline0 SplineM Spline0 equals X for X < k 1 k 1 for k 1 < X Then, for J = 1.. m, SplineJ equals (define k 0 =0, k m+1 = ) 0 for X < k J X k J for k J < X < k J+1 k J+1 k J for k J+1 < X 54 ategorical ata nalysis, UT

55 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Linear splines in 7 intervals 4 Year Mortality vs I Logistic regression Number of obs = 4879 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR Std. Err. z P> z [95% onf. Intvl] saai saai saai saai saai saai saai ategorical ata nalysis, UT

56 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Plots) From Homework #3, we found suggestion of a nonlinear effect I can use linear splines to mimic a smooth to the data (Linear splines can handle the link functions in RR and OR) Fitted Values: Step Function (heptiles) aai R (linear) RR (Poisson) OR (logistic) 56 ategorical ata nalysis, UT

57 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Linear Splines: Parameter Interpretation With identity link Intercept β 0 : θ Y X when X = 0 Slope parameters β j : Estimated difference in θ Y X between two groups both between the same knots but differing by 1 unit in X (May want to make sure that interval contains 1 unit With log link Exponentiated intercept exp(β 0 ): θ Y X when X = 0 Exponentiated slope parameters exp(β j ) : Estimated ratio of θ Y X between two groups both between the same knots but differing by 1 unit in X Tests for association must test all predictors together 57 ategorical ata nalysis, UT

58 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Plots) From Homework #3, we found suggestion of a nonlinear effect I can use linear splines to mimic a smooth to the data (Linear splines can handle the link functions in RR and OR) Fitted Values: Step Function (heptiles) aai R (linear) RR (Poisson) OR (logistic) 58 ategorical ata nalysis, UT

59 Lecture 9: Inference with omplex Modeling of POI November 4, Investigating ssociations Our scientific questions can be at many different levels of detail 1. Is there an association? 2. What is the general (first order) trend in Y with higher X? 3. Is their a nonlinear trend in the association? 4. Is the general trend a particular shape? Increasing exponentially? Increasing to a threshold? onstant then decreasing? U-shaped? S-shaped? 5. What is the association at particular levels of X? E.g., What is the difference in odds of mortality between subjects with LL of 160 and 161 mg/dl? ny questions can be about associations independent of other mechanisms (i.e., adjusted for potential confounding) 59 ategorical ata nalysis, UT

60 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Testing Nonlinearity fter answering questions about an association, we may move on to more detailed questions In an independent sample, or Protected by gate-keeping procedures in order to avoid inflating the type 1 error Only perform the more detailed test if the primary question was statistically significant Generally we can test for nonlinearity by Fitting a model that includes a linear term and other term(s) Then test to see if the other terms are (jointly) not significant 60 ategorical ata nalysis, UT

61 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - R Using a quadratic model We test the squared term. regress deadin4 aai aaisqr, robust Linear regression Number of obs = 4879 F( 2, 4876) = Prob > F = R-squared = Root MSE = Robust deadin4 oef. StdErr t P> t [95% onf Interval] aai aaisqr _cons ategorical ata nalysis, UT

62 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using a quadratic model We test the squared term. logistic deadin4 aai aaisqr Logistic regression Number of obs = 4879 LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR StdErr z P> z [95% onf. Intrvl] aai aaisqr ategorical ata nalysis, UT

63 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - RR Using a quadratic model We test the squared term. poisson deadin4 aai aaisqr, robust irr Poisson regression Number of obs = 4879 Wald chi2(2) = Prob > chi2 = Log pseudolikelihood = Pseudo R2 = Robust deadin4 IRR StdErr z P> z [95% onf Intrvl] aai aaisqr ategorical ata nalysis, UT

64 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Fitted Values: Linear Splines vs Linear Fitted Values: Linear Splines, Linear aai Rspl (linear) RRspl (Poisson) ORlin (logistic) ORspl (logistic) Rlin (linear) RRlin (Poisson) 64 ategorical ata nalysis, UT

65 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Fitted Values: Linear Splines vs Quadratic Fitted Values: Linear Splines, Quadratic aai Rspl (linear) RRspl (Poisson) ORquad (logistic) ORspl (logistic) Rquad (linear) RRquad (Poisson) 65 ategorical ata nalysis, UT

66 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Fitted Values: Linear vs Quadratic Fitted Values: Linear, Quadratic aai Rlin (linear) RRlin (Poisson) ORquad (logistic) ORlin (logistic) Rquad (linear) RRquad (Poisson) 66 ategorical ata nalysis, UT

67 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using a dummy variable model We add dummy variable terms to a linear model. logistic deadin4 i.aaictg aai Logistic regression Number of obs = 4879 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR StdErr z P> z [95% onf Intrvl] aaictg aai ategorical ata nalysis, UT

68 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using a dummy variable model We add dummy variable terms to a linear model Then we jointly test all dummy variables (Wald test in this case, though LR also possible with classical logistic regression). testparm i.aaictg ( 1) [deadin4]55.aaictg = 0 ( 2) [deadin4]75.aaictg = 0 ( 3) [deadin4]95.aaictg = 0 ( 4) [deadin4]115.aaictg = 0 ( 5) [deadin4]135.aaictg = 0 ( 6) [deadin4]155.aaictg = 0 chi2( 6) = Prob > chi2 = ategorical ata nalysis, UT

69 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using linear splines model We do not have to add a variable linear is special case. logistic deadin4 saai* Logistic regression Number of obs = 4879 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR StdErr z P> z [95% onf Intrvl] saai saai saai saai saai saai saai ategorical ata nalysis, UT

70 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using linear splines model We do not have to add a variable linear is special case We test for equality of slopes. test saai025 = saai055 = saai075 = saai095 = saai115 = saai135 = saai155 ( 1) [deadin4]saai025 - [deadin4]saai055 = 0 ( 2) [deadin4]saai025 - [deadin4]saai075 = 0 ( 3) [deadin4]saai025 - [deadin4]saai095 = 0 ( 4) [deadin4]saai025 - [deadin4]saai115 = 0 ( 5) [deadin4]saai025 - [deadin4]saai135 = 0 ( 6) [deadin4]saai025 - [deadin4]saai155 = 0 chi2( 6) = Prob > chi2 = ategorical ata nalysis, UT

71 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using linear splines model What if we do add a linear term?. logistic deadin4 aai saai* note: saai155 omitted because of collinearity Logistic regression Number of obs = 4879 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR StdErr z P> z [95% onf Intrvl] aai saai saai saai saai saai saai e+08 saai155 (omitted) 71 ategorical ata nalysis, UT

72 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR Using linear splines model If we add the linear term, then we test all the linear splines term are 0. testparm saai* ( 1) [deadin4]saai025 = 0 ( 2) [deadin4]saai055 = 0 ( 3) [deadin4]saai075 = 0 ( 4) [deadin4]saai095 = 0 ( 5) [deadin4]saai115 = 0 ( 6) [deadin4]saai135 = 0 chi2( 6) = Prob > chi2 = ategorical ata nalysis, UT

73 Lecture 9: Inference with omplex Modeling of POI November 4, Investigating ssociations Our scientific questions can be at many different levels of detail 1. Is there an association? 2. What is the general (first order) trend in Y with higher X? 3. Is their a nonlinear trend in the association? 4. Is the general trend a particular shape? Increasing exponentially? Increasing to a threshold? onstant then decreasing? U-shaped? S-shaped? 5. What is the association at particular levels of X? E.g., What is the difference in odds of mortality between subjects with LL of 160 and 161 mg/dl? ny questions can be about associations independent of other mechanisms (i.e., adjusted for potential confounding) 73 ategorical ata nalysis, UT

74 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 More etailed Questions Testing particular questions about a shape is more difficult We have to distinguish between testing in a model that presumes a particular shape and testing in a model that includes a particular shape E.g.: Quadratic model presumes U-shaped, but linear splines could include a U-shaped function as well as many others Have to isolate the properties of your question E.g.: U-shaped functions will have different slopes at the extremes of the predictor values We will rarely have good power nd we need to avoid using too flexible a design, because then we will have very little information We also have to worry about influential points 74 ategorical ata nalysis, UT

75 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR U-shaped functions using linear splines model We would need to test that simultaneously saai025 slope is negative (respectively, positive ) saai155 slope is positive (respectively, negative ). logistic deadin4 saai* Logistic regression Number of obs = 4879 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR StdErr z P> z [95% onf Intrvl] saai saai saai saai saai saai saai ategorical ata nalysis, UT

76 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Plots) From Homework #3, we found suggestion of a nonlinear effect I can use linear splines to mimic a smooth to the data (Linear splines can handle the link functions in RR and OR) Fitted Values: Step Function (heptiles) aai R (linear) RR (Poisson) OR (logistic) 76 ategorical ata nalysis, UT

77 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: From Homework #3 - OR U-shaped functions using linear splines model We would need to test that simultaneously s2aai025 slope is negative (respectively, positive ) s2aai110 slope is positive (respectively, negative ). logistic deadin4 s2aai* Logistic regression Number of obs = 4879 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = deadin4 OR StdErr z P> z [95% onf Intrvl] s2aai s2aai s2aai ategorical ata nalysis, UT

78 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: Mortality vs I (Plots) From Homework #3, we found suggestion of a nonlinear effect Linear splines with 2 knots Estimates suggest U-shaped trend, but not statistically significant Fitted Values: Linear Splines at 0.9 and 1.10 Pr(deadin4) aai 78 ategorical ata nalysis, UT

79 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Effect Modification 79 ategorical ata nalysis, UT

80 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Effect Modifier The association between Response and POI differs in strata defined by effect modifier Statistical term: Interaction epends on the measurement of effect Summary measure Mean, geometric mean, median, proportion, odds, hazard, etc. omparison across groups ifference, ratio 80 ategorical ata nalysis, UT

81 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 nalysis of Effect Modification When the scientific question involves effect modification, analyses must be within each stratum separately If we want to estimate degree of effect modification or test for its existence: regression model will typically include Predictor of interest Effect modifier covariate modeling the interaction (usually product) 81 ategorical ata nalysis, UT

82 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Model for Effect Modification Typical model for effect modification Include main effects (can be bad not to) X (or predictors that involve only X) W (or predictors that involve only W) Include interactions Predictor(s) derived from both X and W g X, W X W XW i i 0 0 X X X i i W W i W i XW XW X i i W i 82 ategorical ata nalysis, UT

83 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Interpretation of Parameters X i, Wi 0 X X i W Wi XW X i Wi Usual approach a bit more difficult We can try using the idea of comparison of across groups differing by 1 unit in corresponding predictor but agreeing in other modeled predictors However, terms involving two scientific variables makes this approach difficult 83 ategorical ata nalysis, UT

84 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Intercept X i, Wi 0 X X i W Wi XW X i Wi Interpretation of intercept straightforward 0 corresponds to X= 0, W= 0 May not be scientifically meaningful 84 ategorical ata nalysis, UT

85 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Slopes for Main Effects X i, Wi 0 X X i W Wi XW X i Wi Interpretation of main effects X corresponds to 1 unit difference in X holding W and (X W) constant So 1 unit difference in X when W= 0 May not be scientifically meaningful W corresponds to 1 unit difference in W holding X and (X W) constant So 1 unit difference in W when X= 0 May not be scientifically meaningful 85 ategorical ata nalysis, UT

86 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Slope for Interaction X i, Wi 0 X X i W Wi XW X i Wi Interpretation of interaction difficult XW corresponds to 1 unit difference in (X W) holding X and W constant Impossible, so we need another way to interpret this slope parameter 86 ategorical ata nalysis, UT

87 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 onsider Scientific Predictors g X i, w 0 X X w w w 0 W i W X XW XW X i w X i In stratum with Intercept : Slope : XW W w 0 X W XW w w by 1 unit in X correspond s to X compares groups differing is difference in X slope per 1 unit difference in W i 0 87 ategorical ata nalysis, UT

88 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 onsider Scientific Predictors g x, W i 0 X x W x x 0 X W W i XW XW x W W i i In stratum with X Intercept : 0 Slope : XW W x X XW x x by 1 unit in W correspond s to W compares groups differing is difference in W slope per 1 unit difference in X i 0 88 ategorical ata nalysis, UT

89 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Symmetry of Effect Modification Note that if X modifies the association between Y and W, then W modifies the association between Y and X side: onfounding need not be symmetric W can confound the association between Y and X, but X not confound the association between Y and W W and X associated in the sample Y and X not associated after adjusting for W Y and W associated after adjusting for X 89 ategorical ata nalysis, UT

90 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Inference for Effect Modification X i, Wi 0 X X i W Wi XW X i Wi No effect modification if XW = 0 Hence, inference about existence of effect modification tests that XW = 0 We can perform such inference using standard regression output for the corresponding slope parameter 90 ategorical ata nalysis, UT

91 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Inference for Main Effect Slope X i, Wi 0 X X i W Wi XW X i Wi Interpretation of X = 0 Same intercept in all strata defined by W Generally a very uninteresting question We rarely make inference on main effect slopes by themselves 91 ategorical ata nalysis, UT

92 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 g Inference bout Effect of X X i, Wi 0 X X i W Wi XW X i Wi Response parameter not associated with X if X = 0 N XW = 0 We will need to construct special tests that both parameters are simultaneously 0 The t tests given in regression output consider only one slope parameter at a time 92 ategorical ata nalysis, UT

93 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 Example: SEP Normal Ranges We want to find normal ranges for somatosensory evoked potential (SEP) Time that it takes a signal to reach your brain from your ankle Method of analysis Not recommended: Prediction intervals assuming same distribution (Gaussian) within each group Recommended: 2.5 th and 97.5 th quantile within groups s a first step, we want to consider important predictors of nerve conduction times If any variables such as sex, age, height, race, etc. are important predictors of nerve conduction times, then it would make most sense to obtain normal ranges within such groups 93 ategorical ata nalysis, UT

94 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Important Predictors Scientifically, we might expect that height, age, and sex are related to the nerve conduction time Nerve length should matter, and height is a surrogate for nerve length ge might affect nerve conduction times: People slow down with age Sex: Men are SO fragile 94 ategorical ata nalysis, UT

95 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Height ge Interaction? Prior to looking at the data, we can also consider the possibility that interactions between these variables might be important Height - age interaction? o we expect the difference in conduction times between 6 foot tall and 5 foot tall 20 year olds to be the same as the difference in conduction times between 6 foot tall and 5 foot tall 80 year olds? 95 ategorical ata nalysis, UT

96 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Height ge Interaction Rationale We might suspect such an interaction due to the fact that height may not be as good a surrogate for nerve length in older people With age, some people tend to shrink due to osteoporosis and compression of intervertebral discs It is not clear that nerve length would be altered in such a process Thus, in young people, differences in height probably are a better measure of nerve length than in old people Tall old people probably have been tall always Short old people will include some who were much taller when they were young 96 ategorical ata nalysis, UT

97 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Height ge - Sex Interaction? We can also consider the possibility of three way interactions between height, age, and sex Osteoporosis affects women far more than men Hence, we might expect the height - age interaction to be greatest in women and not so important in men two way interaction between height and age that is different between men and women defines a three way interaction between height, age, and sex 97 ategorical ata nalysis, UT

98 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 ategorical ata nalysis, UT Stratified Scatterplots Scatterplots of p60 by height for each sex, lowess in age strata verage Time to p60 Peak: Females Height (in.) Time to p60 Peak ,a = yo,b = yo,c = yo,d = yo a a a a a a a a a b b b b b b b b b b b b c c c c c c c c c d d d d d d d d d d verage Time to p60 Peak: Males Height (in.) Time to p60 Peak a a a a a a a a a a a a a b b b b b b b b b b b b b c c c c c c c c c d d d d d d d d d d d

99 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: efinition of Model efining a regression model with interactions We must create variables to model the three way interaction term Furthermore, it is a VERY GOO idea to include all main effects and lower order interactions in the model as well main effects : the individual variables which contribute to the interaction lower order terms : all interactions that involve some combination of the variables which contribute to the 3-way interaction 99 ategorical ata nalysis, UT

100 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Modeling Interactions Most often, we lack sufficient information to be able to guess what the true form of an interaction might be The most popular approach is thus to consider multiplicative interactions reate a new variable by merely multiplying the two (or more) interacting predictors 100 ategorical ata nalysis, UT

101 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: reating New Interaction Terms Thus for this problem we could create variables H = Height * ge HM = Height * Male M = ge * Male HM = Height * ge * Male 101 ategorical ata nalysis, UT

102 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Interpretation of Parameters In the presence of higher order terms (powers, interactions) interpretation of parameters is not easy We can no longer use the change associated with a 1 unit difference in predictor holding other variables constant It is generally impossible to hold other variables constant when changing a covariate involved in an interaction (If not impossible, it is certainly uninteresting) 102 ategorical ata nalysis, UT

103 Lecture 9: Inference with omplex Modeling of POI November 4, 2014 SEP: Interpretation Using Lines in Strata Interpretation of the model in terms of the SEP height relationship within age-sex strata E p60 Ht, H ge, Male H HM Ht ge Male 0 HM M H M HM HM M p60 - Height relationship for ge a : Sex Intercept F M 0 a H Ha a 0 M M Slope H HM H HM a 103 ategorical ata nalysis, UT

Statistical Questions: Classification Modeling Complex Dose-Response

Statistical Questions: Classification Modeling Complex Dose-Response Biost 536 / Epi 536 ategorical ata Analysis in Epidemiology Lecture Outline Modeling nonlinear associations (complex dose response ) Flexible methods Scott S. Emerson, M.., Ph.. Professor of Biostatistics

More information

Ex: Cubic Relationship. Transformations of Predictors. Ex: Threshold Effect of Dose? Ex: U-shaped Trend?

Ex: Cubic Relationship. Transformations of Predictors. Ex: Threshold Effect of Dose? Ex: U-shaped Trend? Biost 518 Applied Biostatistics II Scott S. Emerson, M.., Ph.. Professor of Biostatistics University of Washington Lecture Outline Modeling complex dose response Flexible methods Lecture 9: Multiple Regression:

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

BIOS 312: MODERN REGRESSION ANALYSIS

BIOS 312: MODERN REGRESSION ANALYSIS BIOS 312: MODERN REGRESSION ANALYSIS James C (Chris) Slaughter Department of Biostatistics Vanderbilt University School of Medicine james.c.slaughter@vanderbilt.edu biostat.mc.vanderbilt.edu/coursebios312

More information

Lecture Outline Biost 518 / Biost 515 Applied Biostatistics II / Biostatistics II. Linear Predictors Modeling Complex Dose-Response

Lecture Outline Biost 518 / Biost 515 Applied Biostatistics II / Biostatistics II. Linear Predictors Modeling Complex Dose-Response Lecture Outline Biost 518 / Biost 515 Applied Biostatistics II / Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Modeling complex dose response Multiple

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Consider Table 1 (Note connection to start-stop process).

Consider Table 1 (Note connection to start-stop process). Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Homework Solutions Applied Logistic Regression

Homework Solutions Applied Logistic Regression Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

4.1 Example: Exercise and Glucose

4.1 Example: Exercise and Glucose 4 Linear Regression Post-menopausal women who exercise less tend to have lower bone mineral density (BMD), putting them at increased risk for fractures. But they also tend to be older, frailer, and heavier,

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Sociology 362 Data Exercise 6 Logistic Regression 2

Sociology 362 Data Exercise 6 Logistic Regression 2 Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs

More information

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester Modelling Rates Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 05/12/2017 Modelling Rates Can model prevalence (proportion) with logistic regression Cannot model incidence in

More information

Control Function and Related Methods: Nonlinear Models

Control Function and Related Methods: Nonlinear Models Control Function and Related Methods: Nonlinear Models Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. General Approach 2. Nonlinear

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Ch 7: Dummy (binary, indicator) variables

Ch 7: Dummy (binary, indicator) variables Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male

More information

Addition to PGLR Chap 6

Addition to PGLR Chap 6 Arizona State University From the SelectedWorks of Joseph M Hilbe August 27, 216 Addition to PGLR Chap 6 Joseph M Hilbe, Arizona State University Available at: https://works.bepress.com/joseph_hilbe/69/

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

2. We care about proportion for categorical variable, but average for numerical one.

2. We care about proportion for categorical variable, but average for numerical one. Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Sociology Exam 2 Answer Key March 30, 2012

Sociology Exam 2 Answer Key March 30, 2012 Sociology 63993 Exam 2 Answer Key March 30, 2012 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher has constructed scales

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Working with Stata Inference on the mean

Working with Stata Inference on the mean Working with Stata Inference on the mean Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Dataset: hyponatremia.dta Motivating example Outcome: Serum sodium concentration,

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Logit estimates Number of obs = 5054 Wald chi2(1) = 2.70 Prob > chi2 = Log pseudolikelihood = Pseudo R2 =

Logit estimates Number of obs = 5054 Wald chi2(1) = 2.70 Prob > chi2 = Log pseudolikelihood = Pseudo R2 = August 2005 Stata Application Tutorial 4: Discrete Models Data Note: Code makes use of career.dta, and icpsr_discrete1.dta. All three data sets are available on the Event History website. Code is based

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Case-control studies

Case-control studies Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark b@bxc.dk http://bendixcarstensen.com Department of Biostatistics, University of Copenhagen, 8 November

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

4 Multicategory Logistic Regression

4 Multicategory Logistic Regression 4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories

More information

especially with continuous

especially with continuous Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei UK Stata Users meeting, London, 13-14 September 2012 Interactions general concepts General idea of

More information

options description set confidence level; default is level(95) maximum number of iterations post estimation results

options description set confidence level; default is level(95) maximum number of iterations post estimation results Title nlcom Nonlinear combinations of estimators Syntax Nonlinear combination of estimators one expression nlcom [ name: ] exp [, options ] Nonlinear combinations of estimators more than one expression

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Lecture 12: Interactions and Splines

Lecture 12: Interactions and Splines Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample

More information

Regression #8: Loose Ends

Regression #8: Loose Ends Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch

More information

Latent class analysis and finite mixture models with Stata

Latent class analysis and finite mixture models with Stata Latent class analysis and finite mixture models with Stata Isabel Canette Principal Mathematician and Statistician StataCorp LLC 2017 Stata Users Group Meeting Madrid, October 19th, 2017 Introduction Latent

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies. Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Part IV Statistics in Epidemiology

Part IV Statistics in Epidemiology Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi. Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Instantaneous geometric rates via Generalized Linear Models

Instantaneous geometric rates via Generalized Linear Models The Stata Journal (yyyy) vv, Number ii, pp. 1 13 Instantaneous geometric rates via Generalized Linear Models Andrea Discacciati Karolinska Institutet Stockholm, Sweden andrea.discacciati@ki.se Matteo Bottai

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Problem 1. The files

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

S o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7,

S o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7, S o c i o l o g y 63993 E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7, 2 0 0 9 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information