Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Similar documents
Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Chapter 8: Multiple and logistic regression

Chapter 20: Logistic regression for binary response variables

Lecture 19: Inference for SLR & Transformations

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination

Chapter 12.8 Logistic Regression

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

STA102 Class Notes Chapter Logistic Regression

Introduction Fitting logistic regression models Results. Logistic Regression. Patrick Breheny. March 29

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Logistic Regressions. Stat 430

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Logistic Regression - problem 6.14

9 Generalized Linear Models

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Exam Applied Statistical Regression. Good Luck!

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Generalised linear models. Response variable can take a number of different formats

Linear Regression Models P8111

R Hints for Chapter 10

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

8 Nominal and Ordinal Logistic Regression

Logistic regression: Miscellaneous topics

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Introducing Generalized Linear Models: Logistic Regression

Introduction to the Analysis of Tabular Data

Week 7 Multiple factors. Ch , Some miscellaneous parts

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Interactions in Logistic Regression

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

12 Modelling Binomial Response Data

Lecture 12: Effect modification, and confounding in logistic regression

Generalized linear models

Single-level Models for Binary Responses

Duration of Unemployment - Analysis of Deviance Table for Nested Models

BMI 541/699 Lecture 22

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

P( y = 1) = 1+ e b 0+b 1 x 1 1 ( ) Activity #13: Generalized Linear Models: Logistic, Poisson Regression

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Unit 6 - Simple linear regression

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

Neural networks (not in book)

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Classification. Chapter Introduction. 6.2 The Bayes classifier

Introduction to General and Generalized Linear Models

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Regression Methods for Survey Data

Introduction to logistic regression

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Consider fitting a model using ordinary least squares (OLS) regression:

Let s see if we can predict whether a student returns or does not return to St. Ambrose for their second year.

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Log-linear Models for Contingency Tables

Unit 6 - Introduction to linear regression

Multinomial Logistic Regression Models

STAT 7030: Categorical Data Analysis

Linear Regression With Special Variables

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Analysing categorical data using logit models

( ) ( ) 2. ( ) = e b 0 +b 1 x. logistic function: P( y = 1) = eb 0 +b 1 x. 1+ e b 0 +b 1 x. Linear model: cancer = b 0. + b 1 ( cigarettes) b 0 +b 1 x

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Matched Pair Data. Stat 557 Heike Hofmann

Midterm 2 - Solutions

Logistic Regression 21/05

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Lecture 16 - Correlation and Regression

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

UNIVERSITY OF TORONTO Faculty of Arts and Science

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

More Statistics tutorial at Logistic Regression and the new:

Review of Multiple Regression

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Calculating Odds Ratios from Probabillities

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Classification & Regression. Multicollinearity Intro to Nominal Data

Generalized Linear Models for Non-Normal Data

Business Statistics. Lecture 9: Simple Regression

Binary Logistic Regression

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Semiparametric Generalized Linear Models

Chapter 22: Log-linear regression for Poisson counts

Various Issues in Fitting Contingency Tables

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

On the Inference of the Logistic Regression Model

Intro to Linear Regression

Checking the Poisson assumption in the Poisson generalized linear model

Transcription:

Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical predictor Multiple regression Relationship between numerical response and multiple numerical and/or categorical predictors What we haven t seen is what to do when the predictors are weird (nonlinear, complicated dependence structure, etc. or when the response is weird (categorical, count data, etc. Sta 111 (Colin Rundel Lec 23 June 17, 2014 2 / 36 Truck prices The scatterplot below shows the relationship between year and price of a random sample of 43 pickup trucks. Describe the relationship between these two variables. Remove unusual observations Let s remove trucks older than 20 years, and only focus on trucks made in 1992 or later. Now what can you say about the relationship? price 20000 15000 10000 5000 1980 1985 1990 year From: http:// faculty.chicagobooth.edu/ robert.gramacy/ teaching.html Sta 111 (Colin Rundel Lec 23 June 17, 2014 3 / 36 price 20000 15000 10000 5000 year Sta 111 (Colin Rundel Lec 23 June 17, 2014 4 / 36

Truck prices - linear model? Truck prices - transform of the response variable price residuals 20000 15000 10000 5000 10000 5000 0 5000 10000 year price = b 0 + b 1 year The linear model doesn t appear to be a good fit since the residuals have non-constant variance. In particular residuals for newer cars (to the right have a larger variance than the residuals for older cars (to the left. residuals (price 10.0 9.5 9.0 8.5 8.0 7.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 year (price = b 0 + b 1 year We have applied a transformation to the response variable. The relationship now seems linear, and the residuals have (more constant variance. Sta 111 (Colin Rundel Lec 23 June 17, 2014 5 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 6 / 36 Interpreting models with transformation Estimate Std. Error t value Pr(> t (Intercept -265.07 25.04-10.59 0.00 pu$year 0.14 0.01 10.94 0.00 (price = 265.07 + 0.14 year For each additional year the car is newer (for each year decrease in car s age we would expect the price of the car to increase on average by 0.14 dollars. which is not very useful... Interpreting models with transformation (cont. The slope coefficient for the transformed model is 0.14, meaning the price difference between cars that are one year apart is predicted to be 0.14 dollars. (price 1 = 265.07 + 0.14 y (price 2 = 265.07 + 0.14 (y + 1 (price 2 (price 1 = 0.14 ( price 2 = 0.14 price 1 price 2 = e 0.14 = 1.15 price 1 Sta 111 (Colin Rundel Lec 23 June 17, 2014 7 / 36 For each additional year the car is newer (for each year decrease in car s age we would expect the price of the car to increase on average by a factor of 1.15. Sta 111 (Colin Rundel Lec 23 June 17, 2014 8 / 36

Recap: dealing with non-constant variance Non-constant variance is one of the most common model violations, however it is usually fixable by transforming the response (y variable The most common variance stabilizing transform is the transformation: (y, especially useful when the response variable is (extremely right skewed. When using a transformation on the response variable the interpretation of the slope changes: For each unit increase in x, y is expected on average to decrease/increase by a factor of e b 1. Another useful transformation is the square root: y, especially useful when the response variable is counts. These transformations may also be useful when the relationship is non-linear, but in those cases a polynomial regression may also be needed. Odds Odds are another way of quantifying the probability of an event, commonly used in gambling (and istic regression. Odds For some event E, odds(e = P(E P(E c = P(E 1 P(E Similarly, if we are told the odds of E are x to y then which implies odds(e = x y = x/(x + y y/(x + y P(E = x/(x + y, P(E c = y/(x + y Sta 111 (Colin Rundel Lec 23 June 17, 2014 9 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 10 / 36 Example - Donner Party In 1846 the Donner and Reed families left Springfield, Illinois, for California by covered wagon. In July, the Donner Party, as it became known, reached Fort Bridger, Wyoming. There its leaders decided to attempt a new and untested rote to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed by a difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastern Sierra Nevada mountains when the region was hit by heavy snows in late October. By the time the last survivor was rescued on April 21, 1847, 40 of the 87 members had died from famine and exposure to extreme cold. Example - Donner Party - Data Age Sex Status 1 23.00 Male Died 2 40.00 Female Survived 3 40.00 Male Survived 4 30.00 Male Died 5 28.00 Male Died.... 43 23.00 Male Survived 44 24.00 Male Died 45 25.00 Female Survived From Ramsey, F.L. and Schafer, D.W. (2002. The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed Sta 111 (Colin Rundel Lec 23 June 17, 2014 11 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 12 / 36

Example - Donner Party - EDA Example - Donner Party -??? Status vs. Gender: Status vs. Age: Male Female Died 20 5 Survived 10 10 It seems clear that both age and gender have an effect on someone s survival, how do we come up with a model that will let us explore this relationship? Even if we set Died to 0 and Survived to 1, this isn t something we can transform our way out of - we need something more. Age 20 30 40 50 60 One way to think about the problem - we can treat Survived and Died as successes and failures arising from a binomial distribution where the probability of a success is given by a transformation of a linear model of the predictors. Died Survived Sta 111 (Colin Rundel Lec 23 June 17, 2014 13 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 14 / 36 Generalized linear models It turns out that this is a very general way of addressing this type of problem in regression, and the resulting models are called generalized linear models (. Logistic regression is just one example of this type of model. All generalized linear models have the following three characteristics: 1 A probability distribution describing the outcome variable 2 A linear model η = β 0 + β 1 X 1 + + β n X n 3 A link function that relates the linear model to the parameter of the outcome distribution g(p = η or p = g 1 (η Logistic regression is a GLM used to model a binary categorical variable using numerical and categorical predictors. We assume a binomial distribution produced the outcome variable and we therefore want to model p the probability of success for a given set of predictors. To finish specifying the Logistic model we just need to establish a reasonable link function that connects η to p. There are a variety of options but the most commonly used is the it function. Logit function ( p it(p =, for 0 p 1 Sta 111 (Colin Rundel Lec 23 June 17, 2014 15 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 16 / 36

Properties of the Logit The it function takes a value between 0 and 1 and maps it to a value between and. Inverse it (istic function g 1 (x = exp(x 1 + exp(x = 1 1 + exp( x The inverse it function takes a value between and and maps it to a value between 0 and 1. This formulation also has some use when it comes to interpreting the model as it can be interpreted as the odds of a success, more on this later. The istic regression model The three GLM criteria give us: From which we arrive at, y i Binom(p i η i = β 0 + β 1 x 1,i + + β n x n,i it(p = η p i = exp(β 0 + β 1 x 1,i + + β n x n,i 1 + exp(β 0 + β 1 x 1,i + + β n x n,i Sta 111 (Colin Rundel Lec 23 June 17, 2014 17 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 18 / 36 Example - Donner Party - Model In R we fit a GLM in the same was as a linear model except using glm instead of lm and we must also specify the type of GLM to fit using the family argument. summary(glm(status ~ Age, data=donner, family=binomial Call: glm(formula = Status ~ Age, family = binomial, data = donner Coefficients: Estimate Std. Error z value Pr(> z (Intercept 1.81852 0.99937 1.820 0.0688. Age -0.06647 0.03222-2.063 0.0391 * Null deviance: 61.827 on 44 degrees of freedom Residual deviance: 56.291 on 43 degrees of freedom AIC: 60.291 Number of Fisher Scoring iterations: 4 Sta 111 (Colin Rundel Lec 23 June 17, 2014 19 / 36 Example - Donner Party - Prediction Estimate Std. Error z value Pr(> z (Intercept 1.8185 0.9994 1.82 0.0688 Age -0.0665 0.0322-2.06 0.0391 ( p = 1.8185 0.0665 Age Odds / Probability of survival for a newborn (Age=0: ( p = 1.8185 0.0665 0 p = exp(1.8185 = 6.16 p = 6.16/7.16 = 0.86 Sta 111 (Colin Rundel Lec 23 June 17, 2014 20 / 36

Example - Donner Party - Prediction (cont. Example - Donner Party - Prediction (cont. ( p = 1.8185 0.0665 Age ( p = 1.8185 0.0665 Age Odds / Probability of survival for a 25 year old: ( p = 1.8185 0.0665 25 p = exp(0.156 = 1.17 p = 1.17/2.17 = 0.539 Odds / Probability of survival for a 50 year old: ( p = 1.8185 0.0665 0 p = exp( 1.5065 = 0.222 p = 0.222/1.222 = 0.181 Status 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 Sta 111 (Colin Rundel Lec 23 June 17, 2014 21 / 36 Age Sta 111 (Colin Rundel Lec 23 June 17, 2014 22 / 36 Example - Donner Party - Interpretation Example - Donner Party - Interpretation - Slope Estimate Std. Error z value Pr(> z (Intercept 1.8185 0.9994 1.82 0.0688 Age -0.0665 0.0322-2.06 0.0391 Simple interpretation is only possible in terms of odds and odds ratios for intercept and slope terms. Intercept: The odds of survival for a party member with an age of 0. From this we can calculate the odds or probability, but additional calculations are necessary. Slope: For a unit increase in age (being 1 year older how much will the odds ratio change, not particularly intuitive. More often than not we care only about sign and relative magnitude. ( 1 ( = 1.8185 0.0665(x + 1 1 = 1.8185 0.0665x 0.0665 ( p2 = 1.8185 0.0665x 2 ( p2 ( 1 p 1 1 2 / p2 = 0.0665 = 0.0665 2 / p2 = exp( 0.0665 = 0.94 2 Sta 111 (Colin Rundel Lec 23 June 17, 2014 23 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 24 / 36

Example - Donner Party - Age and Gender summary(glm(status ~ Age + Sex, data=donner, family=binomial Call: glm(formula = Status ~ Age + Sex, family = binomial, data = donner Coefficients: Estimate Std. Error z value Pr(> z (Intercept 1.63312 1.11018 1.471 0.1413 Age -0.07820 0.03728-2.097 0.0359 * SexFemale 1.59729 0.75547 2.114 0.0345 * --- (Dispersion parameter for binomial family taken to be 1 Null deviance: 61.827 on 44 degrees of freedom Residual deviance: 51.256 on 42 degrees of freedom AIC: 57.256 Number of Fisher Scoring iterations: 4 Gender slope: When the other predictors are held constant this is the odds ratio between the given level (Female and the reference level (Male. Sta 111 (Colin Rundel Lec 23 June 17, 2014 25 / 36 Example - Donner Party - Gender Models Just like MLR we can plug in gender to arrive at two status vs age models for men and women respectively. General model: Male model: Female model: ( 1 = 1.63312 + 0.07820 Age + 1.59729 Sex ( = 1.63312 + 0.07820 Age + 1.59729 0 1 = 1.63312 + 0.07820 Age ( 1 = 1.63312 + 0.07820 Age + 1.59729 1 = 3.23041 + 0.07820 Age Sta 111 (Colin Rundel Lec 23 June 17, 2014 26 / 36 Example - Donner Party - Gender Models (cont. Hypothesis test for the whole model Status 0.0 0.2 0.4 0.6 0.8 1.0 Males Females 0 20 40 60 80 Age Male Female Sta 111 (Colin Rundel Lec 23 June 17, 2014 27 / 36 summary(glm(status ~ Age + Sex, data=donner, family=binomial Call: glm(formula = Status ~ Age + Sex, family = binomial, data = donner Coefficients: Estimate Std. Error z value Pr(> z (Intercept 1.63312 1.11018 1.471 0.1413 Age -0.07820 0.03728-2.097 0.0359 * SexFemale 1.59729 0.75547 2.114 0.0345 * --- (Dispersion parameter for binomial family taken to be 1 Null deviance: 61.827 on 44 degrees of freedom Residual deviance: 51.256 on 42 degrees of freedom AIC: 57.256 Number of Fisher Scoring iterations: 4 Note that the model output does not include any F-statistic, as a general rule there are not single model hypothesis tests for GLM models. Sta 111 (Colin Rundel Lec 23 June 17, 2014 28 / 36

Hypothesis tests for a coefficient Testing for the slope of Age Estimate Std. Error z value Pr(> z (Intercept 1.6331 1.1102 1.47 0.1413 Age -0.0782 0.0373-2.10 0.0359 SexFemale 1.5973 0.7555 2.11 0.0345 We are however still able to perform inference on individual coefficients, the basic setup is exactly the same as what we ve seen before except we use a Z test. Note the only tricky bit, which is way beyond the scope of this course, is how the standard error is calculated. Estimate Std. Error z value Pr(> z (Intercept 1.6331 1.1102 1.47 0.1413 Age -0.0782 0.0373-2.10 0.0359 SexFemale 1.5973 0.7555 2.11 0.0345 Z = H 0 : β age = 0 H A : β age 0 β ˆ age β age = -0.0782 0 = -2.10 SE age 0.0373 p-value = P( Z > 2.10 = P(Z > 2.10 + P(Z < -2.10 = 2 0.0178 = 0.0359 Sta 111 (Colin Rundel Lec 23 June 17, 2014 29 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 30 / 36 Confidence interval for age slope coefficient Estimate Std. Error z value Pr(> z (Intercept 1.6331 1.1102 1.47 0.1413 Age -0.0782 0.0373-2.10 0.0359 SexFemale 1.5973 0.7555 2.11 0.0345 Remember, the interpretation for a slope is the change in odds ratio per unit change in the predictor. Log odds ratio: Additional Example Example - Birdkeeping and Lung Cancer A 1972-1981 health survey in The Hague, Netherlands, discovered an association between keeping pet birds and increased risk of lung cancer. To investigate birdkeeping as a risk factor, researchers conducted a case-control study of patients in 1985 at four hospitals in The Hague (population 450,000. They identified 49 cases of lung cancer among the patients who were registered with a general practice, who were age 65 or younger and who had resided in the city since 1965. They also selected 98 controls from a population of residents having the same general age structure. CI = PE ± CV SE = 0.0782 ± 1.96 0.0373 = ( 0.1513, 0.0051 Odds ratio: From Ramsey, F.L. and Schafer, D.W. (2002. The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed exp(ci = (exp( 0.1513, exp( 0.0051 = (0.85960.9949 Sta 111 (Colin Rundel Lec 23 June 17, 2014 31 / 36 Sta 111 (Colin Rundel Lec 23 June 17, 2014 32 / 36

Additional Example Example - Birdkeeping and Lung Cancer - Data Additional Example Example - Birdkeeping and Lung Cancer - EDA LC FM SS BK AG YR CD 1 LungCancer Male Low Bird 37.00 19.00 12.00 2 LungCancer Male Low Bird 41.00 22.00 15.00 3 LungCancer Male High NoBird 43.00 19.00 15.00........ 147 NoCancer Female Low NoBird 65.00 7.00 2.00 LC FM SS BK AG YR CD Whether subject has lung cancer Sex of subject Socioeconomic status Indicator for birdkeeping Age of subject (years Years of smoking prior to diagnosis or examination Average rate of smoking (cigarettes per day Note - NoCancer is the reference response (0 or failure, LungCancer is the non-reference response (1 or success - this matters for interpretation. Sta 111 (Colin Rundel Lec 23 June 17, 2014 33 / 36 Bird No Bird Lung Cancer No Lung Cancer Sta 111 (Colin Rundel Lec 23 June 17, 2014 34 / 36 Additional Example Example - Birdkeeping and Lung Cancer - Model summary(glm(lc ~ FM + SS + BK + AG + YR + CD, data=bird, family=binomial Call: glm(formula = LC ~ FM + SS + BK + AG + YR + CD, family = binomial, data = bird Coefficients: Estimate Std. Error z value Pr(> z (Intercept -1.93736 1.80425-1.074 0.282924 FMFemale 0.56127 0.53116 1.057 0.290653 SSHigh 0.10545 0.46885 0.225 0.822050 BKBird 1.36259 0.41128 3.313 0.000923 *** AG -0.03976 0.03548-1.120 0.262503 YR 0.07287 0.02649 2.751 0.005940 ** CD 0.02602 0.02552 1.019 0.308055 (Dispersion parameter for binomial family taken to be 1 Null deviance: 187.14 on 146 degrees of freedom Residual deviance: 154.20 on 140 degrees of freedom AIC: 168.2 Number of Fisher Scoring iterations: 5 Sta 111 (Colin Rundel Lec 23 June 17, 2014 35 / 36 Additional Example Example - Birdkeeping and Lung Cancer - Interpretation Estimate Std. Error z value Pr(> z (Intercept -1.9374 1.8043-1.07 0.2829 FMFemale 0.5613 0.5312 1.06 0.2907 SSHigh 0.1054 0.4688 0.22 0.8221 BKBird 1.3626 0.4113 3.31 0.0009 AG -0.0398 0.0355-1.12 0.2625 YR 0.0729 0.0265 2.75 0.0059 CD 0.0260 0.0255 1.02 0.3081 Keeping all other predictors constant then, The odds ratio of getting lung cancer for bird keepers vs non-bird keepers is exp(1.3626 = 3.91. The odds ratio of getting lung cancer for an additional year of smoking is exp(0.0729 = 1.08. Sta 111 (Colin Rundel Lec 23 June 17, 2014 36 / 36