Multiple Regression: Chapter 13. July 24, 2015

Similar documents
Basic Business Statistics, 10/e

Model Building Chap 5 p251

Multiple Regression Examples

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Confidence Interval for the mean response

Chapter 14 Student Lecture Notes 14-1

Correlation & Simple Regression

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

STAT 212 Business Statistics II 1

Chapter 7 Student Lecture Notes 7-1

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

General Linear Model (Chapter 4)

INFERENCE FOR REGRESSION

Regression Analysis IV... More MLR and Model Building

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Correlation and regression

22s:152 Applied Linear Regression

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Chapter 3 Multiple Regression Complete Example

Unit 11: Multiple Linear Regression

Chapter 14 Multiple Regression Analysis

Multiple Regression Methods

STAT 7030: Categorical Data Analysis

Lecture 6: Linear Regression

Sociology 593 Exam 2 Answer Key March 28, 2002

STAT Chapter 10: Analysis of Variance

Lecture 10 Multiple Linear Regression

Inferences for Regression

Final Exam - Solutions

Concordia University (5+5)Q 1.

STA102 Class Notes Chapter Logistic Regression

Linear Regression With Special Variables

Categorical Predictor Variables

Lecture 18: Simple Linear Regression

9. Linear Regression and Correlation

Section 5: Dummy Variables and Interactions

6. Multiple Linear Regression

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

SMAM 314 Exam 42 Name

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Chapter 4. Regression Models. Learning Objectives

23. Inference for regression

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Ordinary Least Squares Regression Explained: Vartanian

Answer Key: Problem Set 6

Simple Linear Regression: One Qualitative IV

28. SIMPLE LINEAR REGRESSION III

Lecture 5: ANOVA and Correlation

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

MBA Statistics COURSE #4

Section 4: Multiple Linear Regression

Binary Logistic Regression

22S39: Class Notes / November 14, 2000 back to start 1

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Generalized logit models for nominal multinomial responses. Local odds ratios

y response variable x 1, x 2,, x k -- a set of explanatory variables

This document contains 3 sets of practice problems.

Regression Models for Quantitative and Qualitative Predictors: An Overview

Chapter 4: Regression Models

CS 5014: Research Methods in Computer Science

Lecture 6 Multiple Linear Regression, cont.

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

CIVL 7012/8012. Simple Linear Regression. Lecture 3

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Advanced Regression Summer Statistics Institute. Day 2: MLR and Dummy Variables

Exam Applied Statistical Regression. Good Luck!

Correlation & Regression Chapter 5

The simple linear regression model discussed in Chapter 13 was written as

Section 3: Simple Linear Regression

Lecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012

Confidence Intervals, Testing and ANOVA Summary

Lecture 6: Linear Regression (continued)

27. SIMPLE LINEAR REGRESSION II

Ch 13 & 14 - Regression Analysis

In Class Review Exercises Vartanian: SW 540

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

STA 101 Final Review

Models with qualitative explanatory variables p216

Classification & Regression. Multicollinearity Intro to Nominal Data

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

ECON3150/4150 Spring 2015

REVIEW 8/2/2017 陈芳华东师大英语系

Applied Regression Analysis. Section 2: Multiple Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Sociology 593 Exam 2 March 28, 2002

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Problem Set 10: Panel Data

10. Alternative case influence statistics

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

ANOVA: Analysis of Variation

Ch 11- One Way Analysis of Variance

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

ACOVA and Interactions

Answer Key: Problem Set 5

Inference for Regression Inference about the Regression Model and Using the Regression Line

Transcription:

Multiple Regression: Chapter 13 July 24, 2015

Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors) Note: the predictors can be quantitative, categorical, quadratic term, interaction term

Concentrate on: Reading computer output Interpreting coefficients for each Predictor Determining which order to test in picking the simplest model that does a good job for predicting Y

The Basic of MR Model: Y = α + β 1 X 1 + β 2 X 2 +... + β p X p + ɛ ( predictors: X 1, X 2,..., X p, # predictors: p) Assumptions: ɛ iid N(0, σ) Parameters: coefficients: β 1, β 2,..., β p constant: α

Reading the computer output: 1. Fitted Equation: ŷ = a + b 1 X 1 + b 2 X 2 +... + b p X p 2. ANOVA Test: H 0 : β 1 = β 2 =... = β p = 0 (nothing good in model) H a : at least one β i 0 (something good) Test Statistic: F = MSR MSE ANOVA table for regression Source df SS M S F P-value Regression p SSReg MSR MSR (from F table with MSE Error n p 1 SSE MSE df num, df denom ) Total n 1 SST

3. t test for Individual Predictors: H 0 : β i = 0 vs H a : β i 0 b Test Statistic: t = i 0 standard error of b i p-value computed from t-table with df= n p 1 (error) Interpretation: if p-value small, reject H 0 - conclude that predictor X i is a GOOD predictor of Y (X i provides significance information about Y ) AFTER all other predictors in the model are accounted for

Important Issues in Multiple Regression Don t just add predictors to the model - think! For p = n we have oversaturated model with R 2 = 100% (not useful to predict for larger populations, but only for this particular dataset) adjusted R 2 only increases if the new predictor added to the model is good, whereas R 2 goes up or stays the same even if the new predictors are bad Remember to look at p-values for each predictor.

Multicollinearity: when several predictors are correlated with each other, then the ANOVA p-value may be small even if all the individual t-test p-values are large. Correlated predictors give overlapping or redundant information. (Don t throw out all the predictors but take them out of model slowly) Sample size should be at least 5 to 20 times bigger than the number of predictors

Example: The following is the dataset on Blood Alcohol Content (BAC) and the Number of Beers consumed (NOB) with two more variables Weight and Sex. We fit different regression models and compare the output. BAC NOB Weight Sex M 1 F 1 0.100 5 132 f 0 1 0.030 2 128 f 0 1 0.190 9 110 f 0 1 0.120 8 192 m 1 0 0.040 3 172 m 1 0 0.095 7 250 f 0 1 0.070 3 125 f 0 1 0.060 5 175 m 1 0 0.020 3 175 f 0 1 0.050 5 275 m 1 0 0.070 4 130 f 0 1 0.100 6 168 m 1 0 0.085 5 128 f 0 1 0.090 7 246 m 1 0 0.010 1 164 m 1 0 0.050 4 175 m 1 0

Regression Analysis: BAC versus NOB The regression equation is BAC = - 0.0127 + 0.0180 NOB Predictor Coef SE Coef T P Constant -0.01270 0.01264-1.00 0.332 NOB 0.017964 0.002402 7.48 0.000 S = 0.0204410 R-Sq = 80.0% R-Sq(adj) = 78.6% Analysis of Variance Source DF SS MS F P Regression 1 0.023375 0.023375 55.94 0.000 Residual Error 14 0.005850 0.000418 Total 15 0.029225

Regression Analysis: BAC versus NOB, M_1 The regression equation is BAC = - 0.0035 + 0.0181 NOB - 0.0198 M_1 Predictor Coef SE Coef T P Constant -0.00348 0.01200-0.29 0.777 NOB 0.018100 0.002135 8.48 0.000 M_1-0.019763 0.009086-2.18 0.049 S = 0.0181633 R-Sq = 85.3% R-Sq(adj) = 83.1% Analysis of Variance Source DF SS MS F P Regression 2 0.024936 0.012468 37.79 0.000 Residual Error 13 0.004289 0.000330 Total 15 0.029225

Regression with Dummy Variables: Dummy Variable: Categorical variable coded as 0 or 1 0 if female Example: Let X 2 =Gender = 1 if male (baseline group has zero for dummy variable) Model (no interaction): Y = α + β 1 X 1 + β 2 X 2 + ɛ Note: This model gives two lines - one for females and one for males with same slope but different intercepts. F (X 2 = 0) Y = α + β 1 X 1 + ɛ M (X 2 = 1) Y = (α + β 2 ) + β 1 X 1 + ɛ

Interpret Coefficients: α β 1 β 2 y-intercept for baseline group (F) slope for both groups change in intercept for males compared to females

Regression Analysis: BAC versus NOB, Weight The regression equation is BAC = 0.0399 + 0.0200 NOB - 0.000363 Weight Predictor Coef SE Coef T P Constant 0.03986 0.01043 3.82 0.002 NOB 0.019976 0.001263 15.82 0.000 Weight -0.00036282 0.00005668-6.40 0.000 S = 0.0104104 R-Sq = 95.2% R-Sq(adj) = 94.4% Analysis of Variance Source DF SS MS F P Regression 2 0.027816 0.013908 128.33 0.000 Residual Error 13 0.001409 0.000108 Total 15 0.029225

Regression Analysis: BAC versus NOB, Weight, M_1 The regression equation is BAC = 0.0387 + 0.0199 NOB - 0.000344 Weight - 0.00324 M_1 Predictor Coef SE Coef T P Constant 0.03871 0.01097 3.53 0.004 NOB 0.019896 0.001309 15.20 0.000 Weight -0.00034440 0.00006842-5.03 0.000 M_1-0.003240 0.006286-0.52 0.616 S = 0.0107174 R-Sq = 95.3% R-Sq(adj) = 94.1% Analysis of Variance Source DF SS MS F P Regression 3 0.0278466 0.0092822 80.81 0.000 Residual Error 12 0.0013784 0.0001149 Total 15 0.0292250

Question: What if gender coded the other way? Regression Analysis: BAC versus NOB, Weight, F_1 The regression equation is BAC = 0.0355 + 0.0199 NOB - 0.000344 Weight + 0.00324 F_1 Predictor Coef SE Coef T P Constant 0.03547 0.01371 2.59 0.024 NOB 0.019896 0.001309 15.20 0.000 Weight -0.00034440 0.00006842-5.03 0.000 F_1 0.003240 0.006286 0.52 0.616 S = 0.0107174 R-Sq = 95.3% R-Sq(adj) = 94.1% Analysis of Variance Source DF SS MS F P Regression 3 0.0278466 0.0092822 80.81 0.000 Residual Error 12 0.0013784 0.0001149 Total 15 0.0292250

Interaction model: (with dummy) Y = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + ɛ Note: This model gives two lines - one for females and one for males with different slopes and different intercepts. F (X 2 = 0) Y = α +β 1 X 1 +ɛ M (X 2 = 1) Y = (α + β 2 ) +(β 1 + β 3 )X 1 +ɛ Interpret Coefficients: α β 1 β 2 β 3 y-intercept for baseline group (F) slope for baseline group (F) change in intercept for males compared to females change in slope for males compared to females

Regression Analysis: BAC versus NOB, Weight, M_1, Weight*M_1 The regression equation is BAC = 0.0460 + 0.0198 NOB - 0.000390 Weight - 0.0215 M_1 + 0.000104 Weight*M_1 Predictor Coef SE Coef T P Constant 0.04604 0.01467 3.14 0.009 NOB 0.019762 0.001343 14.71 0.000 Weight -0.00038990 0.00009130-4.27 0.001 M_1-0.02148 0.02453-0.88 0.400 Weight*M_1 0.0001045 0.0001357 0.77 0.457 S = 0.0109039 R-Sq = 95.5% R-Sq(adj) = 93.9% Analysis of Variance Source DF SS MS F P Regression 4 0.0279172 0.0069793 58.70 0.000 Residual Error 11 0.0013078 0.0001189 Total 15 0.0292250

What if we had 3 groups? Suppose we want to predict BAC from NOB and Race: white, black, hispanic Need 2 dummy variables for 3 categories. Let X 2 = 1 if black 0 otherwise, X 3 = 1 if hispanic 0 otherwise Note: Race = White, is the baseline zero for both dummy variables.

No Interaction model with 2 Dummies: Y = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + ɛ which gives the following 3 equations: X 2 = 0, X 3 = 0 (W): X 2 = 1, X 3 = 0 (B): X 2 = 0, X 3 = 1 (H): Y = α + β 1 X 1 + ɛ Y = (α + β 2 ) + β 1 X 1 + ɛ Y = (α + β 3 ) + β 1 X 1 + ɛ Interpret Coefficients: α β 1 β 2 β 3 intercept for baseline group (W) slope for all 3 groups change in intercept for blacks compared to whites change in intercept for hispanic compared to whites

Interaction Model: add interactions between the quantitative variable (X 1 ) and the dummy variables (X 2, X 3 ) Y = α+β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 X 1 X 2 +β 5 X 1 X 3 +ɛ which gives the following 3 equations: X 2 = 0, X 3 = 0 (W): X 2 = 1, X 3 = 0 (B): X 2 = 0, X 3 = 1 (H): Y = α + β 1 X 1 + ɛ Y = (α + β 2 ) + (β 1 + β 4 )X 1 + ɛ Y = (α + β 3 ) + (β 1 + β 5 )X 1 + ɛ Interpret Coefficients: α intercept for baseline group (W) β 1 slope for W β 2 change in intercept for B compared to W β 3 change in intercept for H compared to W β 4 change in slope for B compared to W change in slope for H compared to W β 5

In regression, if we have only one categorical predictor, REGRESSION ONE-WAY ANOVA Revisit the ONE-WAY ANOVA Example: Compare average weight loss for three diets. Data: Weight loss under 3 diets low FAT low CAL low CARB 22 24 28 18 21 27 21 26 30 25 27 32

ANOVA results (output): One-way ANOVA: lowfat, lowcal, lowcarb Source DF SS MS F P Factor 2 122.17 61.08 9.05 0.007 Error 9 60.75 6.75 Total 11 182.92 S = 2.598 R-Sq = 66.79% R-Sq(adj) = 59.41% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev -------+---------+---------+---------+-- lowfat 4 21.500 2.887 (-------*--------) lowcal 4 24.500 2.646 (-------*-------) lowcarb 4 29.250 2.217 (--------*-------) -------+---------+---------+---------+-- 21.0 24.5 28.0 31.5 Pooled StDev = 2.598

Now, let s set up the problem as regression with dummy variables. Y = weight loss (response) Let X 1 = 1 if lowcal 0 otherwise, X 2 = 1 if lowcarb 0 otherwise Model: Y = α + β 1 X 1 + β 2 X 2 + ɛ Interpret Coefficients: α β 1 β 2 intercept for baseline group (lowfat) change in intercept for lowcal compared to lowfat change in intercept for lowcarb compared to lowfat

REGRESSION results (output): Regression Analysis: Y versus x1, x2 The regression equation is Y = 21.5 + 3.00 x1 + 7.75 x2 Predictor Coef SE Coef T P Constant 21.500 1.299 16.55 0.000 x1 3.000 1.837 1.63 0.137 x2 7.750 1.837 4.22 0.002 S = 2.59808 R-Sq = 66.8% R-Sq(adj) = 59.4% Analysis of Variance Source DF SS MS F P Regression 2 122.167 61.083 9.05 0.007 Residual Error 9 60.750 6.750 Total 11 182.917

More about RESIDUALS: Plot of RESIDUALS vs FITTED value will exaggerate any pattern present in data other than linear trend. How to judge non constant variance in response from residual vs fitted plot? (example in class) Recall: residuals = y ŷ (i.e. linear trend is removed from the model) any pattern (or trend) still present in residual vs fitted value plot suggests that the linear regression was not enough. Need to add quadratic (or other polynomial) terms in the equation (examples in class)

QUADRATIC REGRESSION Model: Y = α + β 1 X + β 2 X 2 + ɛ, note that p = 2 predictors (X, X 2 ) Assumptions: ɛ iid N(0, σ) Fitted Equation (output): ŷ = a + b 1 X + b 2 X 2 Interpret Coefficient: Only interpret the coefficient for the quadratic term. Is β 2 significantly different from zero? if yes - keep quadratic term - look for sign of b 2 (determines whether curvature opens up or down) if no - throw X 2 out - do SLR

Example: Suppose we are interested in predicting the GPA of students in college (CGPA) using 16 different predictor variables. Data were collected from a random sample of 59 college students. What is the response variable in this problem? What are the values of n and p? What are Ho and Ha that you can test using the ANOVA table? What is your decision, based on the following ANOVA table? What is your conclusion?

Regression Analysis: CGPA versus Height, Gender,... The regression equation is CGPA = 0.53 + 0.0194 Height + 0.047 Gender - 0.00163 Haircut - 0.042 Job + 0.0004 Studytime - 0.375 Smokecig + 0.0488 Dated + 0.546 HSGPA + 0.00315 HomeDist + 0.00069 BrowseInternet - 0.00128 WatchTV - 0.0117 Exercise + 0.0140 ReadNewsP + 0.039 Vegan - 0.0139 PoliticalDegree - 0.0801 PoliticalAff Predictor Coef SE Coef T P Constant 0.532 1.496 0.36 0.724 Height 0.01942 0.01637 1.19 0.242 Gender 0.0468 0.1429 0.33 0.745 Haircut -0.001633 0.001697-0.96 0.341 Job -0.0418 0.1024-0.41 0.685 Studytime 0.00043 0.01921 0.02 0.982 Smokecig -0.3746 0.2249-1.67 0.103 Dated 0.04881 0.07111 0.69 0.496 HSGPA 0.5457 0.1776 3.07 0.004 HomeDist 0.003147 0.003400 0.93 0.360 BrowseInternet 0.000689 0.001163 0.59 0.557 WatchTV -0.0012840 0.0009710-1.32 0.193 Exercise -0.011657 0.005934-1.96 0.056 ReadNewsP 0.01395 0.02272 0.61 0.543 Vegan 0.0392 0.1578 0.25 0.805 PoliticalDegree -0.01390 0.03185-0.44 0.665 PoliticalAff -0.08006 0.07741-1.03 0.307

S = 0.322198 R-Sq = 43.2% R-Sq(adj) = 21.5% Analysis of Variance Source DF SS MS F P Regression 16 3.3135 0.2071 1.99 0.037 Residual Error 42 4.3601 0.1038 Total 58 7.6736

Best Subsets Regression: CGPA versus Height, Gender,... Response is CGPA B o r l o i P w t o s i l S e R c i t S H I E e a t H u m o n W x a l i H G a d o m t a e d D c e e i y k D H e e t r N V e a i n r t e a S D r c c e e g l g d c J i c t G i n h i w g r A Mallows h e u o m i e P s e T s s a e f Vars R-Sq R-Sq(adj) Cp S t r t b e g d A t t V e P n e f 1 25.5 24.2 0.1 0.31667 X 1 13.0 11.5 9.3 0.34217 X 2 31.6 29.2-2.4 0.30613 X X 2 29.4 26.9-0.8 0.31109 X X 3 33.8 30.2-2.1 0.30389 X X X 3 33.7 30.0-2.0 0.30423 X X X 4 35.7 31.0-1.5 0.30223 X X X X 4 35.3 30.5-1.2 0.30320 X X X X 5 37.3 31.4-0.6 0.30132 X X X X X 5 37.0 31.1-0.4 0.30198 X X X X X

6 38.3 31.2 0.6 0.30163 X X X X X X 6 38.3 31.2 0.6 0.30164 X X X X X X 7 39.6 31.3 1.7 0.30150 X X X X X X X 7 39.3 30.9 1.9 0.30231 X X X X X X X 8 40.4 30.8 3.1 0.30249 X X X X X X X X 8 40.4 30.8 3.1 0.30256 X X X X X X X X 9 41.5 30.8 4.2 0.30266 X X X X X X X X X 9 41.0 30.2 4.6 0.30395 X X X X X X X X X 10 41.9 29.8 6.0 0.30478 X X X X X X X X X X 10 41.8 29.7 6.0 0.30492 X X X X X X X X X X 11 42.2 28.7 7.7 0.30712 X X X X X X X X X X X 11 42.2 28.7 7.7 0.30715 X X X X X X X X X X X 12 42.6 27.6 9.4 0.30945 X X X X X X X X X X X X 12 42.6 27.6 9.5 0.30954 X X X X X X X X X X X X 13 42.9 26.4 11.2 0.31205 X X X X X X X X X X X X X 13 42.8 26.3 11.3 0.31229 X X X X X X X X X X X X X 14 43.1 25.0 13.1 0.31502 X X X X X X X X X X X X X X 14 43.0 24.9 13.1 0.31526 X X X X X X X X X X X X X X 15 43.2 23.4 15.0 0.31843 X X X X X X X X X X X X X X X 15 43.1 23.2 15.1 0.31866 X X X X X X X X X X X X X X X 16 43.2 21.5 17.0 0.32220 X X X X X X X X X X X X X X X X

Regression Analysis: CGPA versus HSGPA, Exercise The regression equation is CGPA = 1.55 + 0.560 HSGPA - 0.0111 Exercise Predictor Coef SE Coef T P Constant 1.5489 0.5551 2.79 0.007 HSGPA 0.5599 0.1436 3.90 0.000 Exercise -0.011138 0.004985-2.23 0.029 S = 0.306126 R-Sq = 31.6% R-Sq(adj) = 29.2% Analysis of Variance Source DF SS MS F P Regression 2 2.4256 1.2128 12.94 0.000 Residual Error 56 5.2479 0.0937 Total 58 7.6736

LOGISTIC REGRESSION Y = Categorical Response (Yes/No) or Binary Response (1 or 0) Example: Predict the probability that a person pay bills on time based on past credit history, income, employment, age, etc.. Example: Predict the probability that a person gets lung cancer based on smoking, family history, asthma, age, gender, race, eating habit, exercise habit, etc..

Logistic Regression Model: (with 1 predictor variable) p = exp(α + βx) 1 + exp(α + βx) Example: Whether a person has travel credit card. X = annual income (in thousand euros), y = (partial dataset..) income y 12 0 13 0 14 1 14 0 14 0 14 1 1 if yes 0 if no

Link Function: Logit Response Information Variable Value Count y 1 31 (Event) 0 69 Total 100 Logistic Regression Table Predictor Coef SE Coef Z P Constant -3.51795 0.710336-4.95 0.000 income 0.105409 0.0261574 4.03 0.000

Interpretations: Annual income is a good predictor of probability of having a travel credit card the probability of having a travel credit card increases (because of the positive sign of the coefficient) with higher annual income.

Prediction Equation: ˆp = exp( 3.52 + 0.105X) 1 + exp( 3.52 + 0.105X) i.e. a = 3.52, b = 0.105 predict the probability that person with annual income 12K (euros) has a travel credit card (answer: ˆp = 0.09) predict the probability that person with annual income 65K (euros) has a travel credit card (answer: ˆp = 0.97) the probability of having a travel credit card is 50% when X = a b = 0.105 3.52 = 33.524 (why?)

Multiple Logistic Regression: Example: Predict Marijuana use (Y/N) based on Alcohol use (Y/N) and cigarette smoking (Y/N) for HS seniors. Data: 2276 HS seniors in non-urban area outside Dayton, Ohio. Marijuana Cigarette Alcohol Frequency 1 1 1 911 1 0 1 44 1 1 0 3 1 0 0 2 0 1 1 538 0 0 1 456 0 1 0 43 0 0 0 279

Binary Logistic Regression: Marijuana versus Alcohol, Cigarette Link Function: Logit Response Information Variable Value Count Marijuana 1 960 (Event) 0 1316 Total 2276 Frequency: Frequency Logistic Regression Table Predictor Coef SE Coef Z P Constant -5.30904 0.475190-11.17 0.000 Alcohol 2.98601 0.464671 6.43 0.000 Cigarette 2.84789 0.163839 17.38 0.000

Predict probability of using Marijuana if Alcohol use = Yes and Cigarette smoking = Yes ˆp = exp( 5.30904 + 2.98601 + 2.84789) 1 + exp( 5.30904 + 2.98601 + 2.84789) = 0.628 Alcohol use = No and Cigarette smoking = Yes ˆp = exp( 5.30904 + 2.84789) 1 + exp( 5.30904 + 2.84789) = 0.079