1. BINARY LOGISTIC REGRESSION

Size: px
Start display at page:

Download "1. BINARY LOGISTIC REGRESSION"

Transcription

1 1. BINARY LOGISTIC REGRESSION The Model We are modelling two-valued variable Y. Model s scheme Variable Y is the dependent variable, X, Z, W are independent variables (regressors). Typically Y values are coded by 0 or 1. Model is constructed for logarithm of the ratio P(Y = 1)/P(Y=0) (the so-called logit function): ln P(YY = 1) P(YY = 0) = CC + bb 1 XX + bb 2 ZZ + bb 3 WW. The same model can be written also as Here P(YY = 1) = ezz 1 + e zz = 1, P(YY = 0) = 1 P(YY = 1). 1 + e zz zz = CC + bb 1 XX + bb 2 ZZ + bb 3 WW. Values of coefficients C, b1, b2, b3 are unknown. Application of logit model means that we will calculate estimates CC, bb 1, bb 2, bb 3 from data. If bb 1 > 0, then larger values of X increase probability P(Y= 1). If bb 1 < 0, then larger values of X decrease probability P(Y= 0). Odds ratio By Odds we call ratio P(Y = 1) / P(Y = 0). Odds ratio for X1 shows the change in odds, when X1 increases by unity and all other variables are constant. 1

2 e bb 1 is called odds ratio P(YY = 1) P(YY = 0) nnnnnn oooooooo = e bb P(YY = 1) 1 P(YY = 0) oooooo oooooooo Forecasting Forecast is made for P(Y = 1). Firstly, we pass values of X, Z, W into Then substitute z into P (YY = 1) = zz = CC + bb 1 XX + bb 2 ZZ + bb 3 WW. ezz 1 + e zz, P (YY = 0) = 1 P (YY = 1) = e zz. Here e = 2, For practical forecast it suffices to take e 2,7183. Classification Classification means that we must decide if Y = 1 or Y = 0. Typically it is assumed that if Jeigu P (YY = 1) > 0.5, then we choose Y = 1. Otherwise, we choose Y = 0. Note that probability P (YY = 1) = 0.5 can be treated in various ways. Most logit models assume that then Y = 1 or Y = 0 is chosen arbitrary (one can flip a coin). For classification it suffices to know the sign of logit function z = ln P(YY = 1) P(YY = 0) If z > 0, we choose Y = 1. If z < 0, we choose Y = 0. If z = 0, we flip a coin. 2

3 Data 1) Dependent variable Y is dichotomous. There is enough data for both values of Y (ar least 20% of all observations correspond to Y=1 but no more than 80%). Otherwise logistic regression is inapplicable. 2) If model contains many categorical regressors, then for each combination of their values there should be at least 5 observations in the data. 3) Regressors are not strongly correlated. Though, in general, multicollinearity is not a serious problem in logisdtic models some authors recommend to drop some variables from the model if their standard errors SE > 5. For a good Model Maximum likelihood chi square test s p < 0,05. Wald test s p < 0,05 for each regressor (all regressors are statistically significant) At least 50 % of cases when Y = 1 and at least 50 % of cases when Y = 0 are correctly classified. Cook s distance 1 and all DFBeta 1 for all observations. Chosen pseudo-determinant coefficient 0,20. Logistic regression with SPSS 1. Data File ESS4EE_PT. We investigate males years of age. Variables: cntry (EE Estonia, PT Portugal), stfedu attitude toward countries educational system (0 very bad,, 10 very good), happy happiness (0 very unhappy,., 10 very happy), freehms attitude toward sexual minorities ( 1 positive,., 5 negative), imsb3 (base on imsclbn social benefits for immigrants, 0 no longer than after a year, 1 after worked and paid taxes at least a year, 2 after becoming citizen or never). Proposed model: cntry = f(stfedu, happy, freehms, imsb3) 3

4 2. SPSS options Analyze Regression Binary Logistic: Since imsb3 is nominal we choose Categorical and move imsclbn to the right In Options menu check CI for exp(b). Then in Save menu check Cook s and DfBeta(s). 4

5 3. Results Table Dependent variable Encoding provides information which country corresponds to models Y = 1 (Portugal). Dependent Variable Encoding Original Value Internal Value EE Estonia 0 PT Portugal 1 We indicated that imsb3 is nominal. Therefore it is replaced by 2 dichotomous dummy variables, taking values 0 and 1. In the table Categorical Variables Codings, we see which combination of dummy variables corresponds to initial value of each of imsb3. Categorical Variables Codings Parameter coding Frequency (1) (2) imsb3 Attitude towards social benefits for immigrants.00 positive attitude more positive than negative negative attitude Classification Table (the second classification table) shows that 76 % Estonians and 78,9% Portugals were correctly classified. The percent of correctly classified respondents is sufficient for acceptable model. Classification Table a Predicted Observed EE Estonia cntry Country PT Portugal Percentage Correct Step 1 cntry Country EE Estonia PT Portugal Overall Percentage

6 The maximum likelihood chi square test is given in Omnibus Tests of Model Coefficients. All three rows are identical. Model fits data statistically significantly, p = 0, < 0,05. Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step Block Model Model Summary contains two pseudodeterminant coefficients. Both are much larger than 0.20 (Cox and Snell R 2 = 0,372, Nagelkerke R 2 = 0,495.) Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square a Table Variables in the Equation contains estimates of model coefficients, information on their statistical significance (significant are variables with p < 0.05 in Sig.) and odds ratio in Exp(B). We see that all regressors, except constant, are statistically significant. As a rule, we ignore statistical insignificance of the constant. Variables in the Equation 95 % C.I.for EXP(B) B S.E. Wald df Sig. Exp(B) Lower Upper Step 1 a freehms happy stfedu imsb imsb3(1) imsb3(2) Constant a. Variable(s) entered on step 1: freehms, happy, stfedu, imsb3. 6

7 Lgostic regression model (we recall that Y=1 corresponds to cntry= PT): ln P (cntry = PT) = 0,76 freehms + 0,30 happy 0,564 stfedu P (cntry = EE) +1, ,661 imsb3(1) + 1,137 imsb3(2) Taking into account the coding of dummy variables, instead of we can write 1, ,661 imsb3(1) + 1,137 imsb3(2) 1,661 if imsb3 = 0, 1, ,137 if imsb3 = 1, 0 if imsb3 = 0. The larger values of freehms and stfedu increase probability that respondent is from Estonia (corresponding coefficients are negative), the larger value of happy increases probability that respondent is from Portugal. Odds ratio for imbs3 allows the following interpretation: in comparison to negative attitude toward immigrants (imsb3 = 2), the positive attitude (imsb = 0) increases odds with respect to Portugal by factor 5,263. Cook s and DFB values appear in data window. One can check that all values indicate that there are no outliers in the data. Note that Descriptive statistics can be used to check for the maximum values of COO_1 and DFB. 7

8 4. Interaction of regressors Model: cntry = f(stfedu, happy, freehms, happy*freehms). Then in Logistic Regression menu mark both variables happy and freehms and choose >a*b>. Wald test shows that interaction is not statistically significant. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1 a freehms stfedu happy freehms by happy Constant a. Variable(s) entered on step 1: freehms, stfedu, happy, freehms * happy. 8

9 2. PROBIT REGRESSION Model We are modelling two-valued variable. Probit regression can be used whenever logistic regression applies and vice versa. Model scheme Variable Y is dependent variable, X, Z, W are independent variables (regressors). Typically Y values are coded 0 or 1. Model is constructed for P(Y = 0): P(YY = 0) = Φ(CC + bb 1 XX + bb 2 ZZ + bb 3 WW). Here Φ( ) is the distribution function of the standard normal random variable. Equivalent expression Φ 1 P(YY = 0) = CC + bb 1 XX + bb 2 ZZ + bb 3 WW. Here Φ 1 ( ) is inverse function, also known as probit function. If bb 1 > 0, then as X grows, also grows P(Y= 0). If bb 1 < 0, then as X, also grows P(Y= 1). Data a) Variable Y is dichotomous. Data for Y contains at least 20% of zeros and at least 20% of 1. b) If model contains many categorical variables, for each combination of categories data should contain at least 5 observations. c) No strong correlation between regressors. 9

10 Model fit Model fits data if: Maximum likelihood p-value p < 0,05. Wald test p-value p < 0,05 for all regressors. Correct classification for many cases of Y = 1 and Y = 0. For all variables Cook s measure 1. Probit regression with SPSS 1. Data File LAMS. Variables: K2 university, K33_2 studies for achievement of my present position were (1 absolutely unimportant,..., 5 very important), K35_1 my present occupation corresponds to bachelor studies (1 agree, 2 more agree, than disagree, 3 more disagree, than agree, 4 disagree), K36_1 I use professional skills obtained during studies (1 never,..., 5 very frequently), K37_1 satisfaction with my profession (1 not at all,..., 5 very much). With recode we create a new two-valued variable Y, (0 if respondent rarely applies professional skills obtained during studies, 1 if frequently). Model scheme: Y = f( K35_1, K33_2, K37_1). Or graphically 10

11 2. SPSS options We investigate Klaipėda University: SELECT CASES : K2=3 Analyze -> Generalized Linear Models Generalized Linear Models. Choose Type of Model and check Binary probit. 11

12 Open Response and put Y into Dependent Variable. Open Predictors and move K37_1 ir K33_2 into Covariates (with some reservation we treat these variables as interval ones). Regressor K35_1 obtains only 4 values, therefore is treated as a categorical variable. Move K35_1 into Factors. 12

13 Open Model window and move all regressors to the right: Open Save and check Predicted category and Cook s distance. 3. Results Model is constructed for P(Y = 0). In Categorical Variable Information we check that there is sufficient number of respondents for each value of categorical variables (Y included). Categorical Variable Information N Percent Dependent Variable Y % % Factor K35_1 Esamo darbo atitikimas bakalauro (vientisųjų) studijų krypčiai Total % 1 Tikrai taip % 2 Greičiau taip % 3 Greičiau ne % 4 Tikrai ne % Total % In Omnibus Test table we check that p-value for the maximum likelihood test is sufficiently small p = 0,00...< 0,05. 13

14 Omnibus Test a Likelihood Ratio Chi-Square df Sig Dependent Variable: Y Model: (Intercept), K35_1, K37_1, K33_2 Parameter Estimates table contains parameter estimates and Wald tests for the significance of each regressor. We do not check the significance of Intercept. Categorical variable K35_1 was replaced by 4 dummy variables, one of which is not statistically significant. However, for one such insignificant result, it is not rational to drop K35_1 from the model. Parameter Estimates 95 % Wald Confidence Interval Hypothesis Test Wald Chi- Parameter B Std. Error Lower Upper Square df Sig. (Intercept) [K35_1=1] [K35_1=2] [K35_1=3] [K35_1=4] 0 a K37_ K33_ (Scale) Dependent Variable: Y 1 b Model: (Intercept), K35_1, K37_1, K33_2 a. Set to zero because this parameter is redundant. b. Fixed at the displayed value. We obtained four models, which differ by the constant only, They can be written in the following way: P (YY = 0) = P(rarely applies knowledge in his work) = Φ(zz), 1,57, if KK35_1 = 1, 1,02, if KK35_1 = 2, zz = 4,85 0,273 KK37 1 0,78 KK ,26, if KK35_1 = 3, 0, if KK35_1 = 4. 14

15 Signs of coefficients agrre with general logic of the models. Coefficient for K37_1 is negative. The larger value of K37_1 (more happy with his work), the less probable that knowledge is rarely used. Other signs of coefficients can be explained similarly. We treated probit regression as a partial case of the generalized linear model. Therefore, one can check the magnitude of deviance in the table Goodness of Fit. We see that deviance is close to unity (1,156), which demonstrates good fit of the model. Note that for the probit regression more important is small p-value of the maximum likelihood test (it can be find in Omnibus Test.). If all model s characteristics except deviance show good model fit, we assume that the model is acceptable. Goodness of Fit b Value Df Value/df Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson Chi-Square Log Likelihood a Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) Consistent AIC (CAIC) Checking for outliers we choose Analyze Descriptive Statistics Descriptives. Move variable CooksDistance intovariable(s). Choose OK. Descriptive Statistics N Minimum Maximum Mean Std. Deviation CooksDistance Cook's Distance Valid N (listwise) 322 Maximal Cook s distance value is 0,039<1. Therefore, there is no outliers in our data. To obtain classification table we choose Analyze Descriptive Statistics Crosstabs. Į Move Y into Row(s) and PredictedValue. into Column(s). Choose Cells and check Row. Then Continue and OK. 15

16 Y * PredictedValue Predicted Category Value Crosstabulation PredictedValue Predicted Category Value Total Y.00 Count % within Y 66.0% 34.0% 100.0% 1.00 Count % within Y 7.7% 92.3% 100.0% Total Count % within Y 25.8% 74.2% 100.0% From 100 respondents, who rarely use professional skills obtained during studies, 66 are correctly classified ( 66 %). From 222 respondents, who frequently use professional skills obtained during studies, 205 are correctly classified ( 92,3 %). Recalling the table Categorical Variable Information and its percents (respectively 31,1 % and 68,9 % ), we see that probit model ensures much better forecasting than random gues. Final conlusion: probit regression model fits data sufficiently well. 4. Forecasting One value can be forecasted in the following way. Let us assume that previous model is applied to respondent, for whom K33_2 = 4, K35_1 = 1, K37_1 = 4. We add additional row in data writing 4 in the column K33_2, 1 in the column K35_1, 4 in the column K37_1 and 1 in the column filter_$. Remaining columns are empty. We repeat probit analysis but check Predicted value of mean of response in window Save In data appears new column MeanPredicted containing probabilities for P( Y = 0). We got 0,175 probability for our respondent. Therefore is likely that this respondent will apply skills from studies in his professional work. 16

17 17

18 3. POISSON REGRESSION We are keeping count for some quite rare events. Variable Y is called dependent variable, variables X, Z, W are independent variables or regressors or predictors. We are modelling the behavior of the mean value of Y: μμ = e zz, zz = CC + bb 1 XX + bb 2 ZZ + bb 3 WW. Estimates for the coefficients CC, bb 1, bb 2, bb 3 are calculated from data. Data Y has Poisson distribution. We expect the mean of Y to be similar by its magnitude to the variance of Y. Possible values for Y are 0, 1,2,3,... Regressors can be interval or categorical random variables. Main steps 1) Check if Y has Poisson distribution. 2) Check if normed deviance is close to 1. 3) Check if maximum likelihood is statistically significant. If p-value 0,05, model is unacceptable. 4) Check if all regressors are significant (Wald test p < 0,05). If not drop them from the model. We do not pay attention to p-value for model constant (intercept). 18

19 Good Poisson regression model: Normed deviance is close to 1; Maximum likelihood has p < 0,05. For all regressors Wald test p < 0,05. Poisson regression with SPSS 1. Data File ESS4FR. Variables: agea respondents age, hhmmb number of household members, imptrad important to keep traditions eduyrs years of formal schooling, cldcrsv help for childcare (0 very bad,..., 10 very good). We will model number of other than respondent household members by agea and cldrsv. We will investigate respondents for whom imptrad 2 and eduyrs 10. Use Select Cases -> If condition is satisfied -> If and write imptrad <= 2 & eduyrs <= 10. Then Continue -> OK. Dependent variable (we call it numbhh) can be created with the help of Transform Compute Variable. 2. Preliminary analysis First we check if numbhh is similar to Poisson variable. Analyze Descriptive Statistics Frequences. 19

20 Further Statistics. Check Mean and Variance. We see that the mean of numbhh (1,0036) is close to its variance (1,482). Thus, numbhh satisfies one of the most important properties of the Poisson variable. Statistics numbhh N Valid 281 Missing 0 Mean Variance It is possible also to check if random variable has Poisson distribution with the help of Kolmogorov- Smirnov test: Analyze Nonparametric Tests Legacy Dialogs 1-Sample K-S. 20

21 We see that we can assume that numbhh has Poisson distribution (p = 0,169), but is not normal (p = 0,000). One-Sample Kolmogorov-Smirnov Test numbhh N 281 Normal Mean Parameters a,b Std. Deviation Most Extreme Absolute.302 Differences Positive.302 Negative Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed).000 a. Test distribution is Normal. One-Sample Kolmogorov-Smirnov Test 2 numbhh N 281 Poisson Mean Parameter a,b Most Extreme Absolute.066 Differences Positive.066 Negative Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed).169 a. Test distribution is Poisson. 3. SPSS Poisson regression options Choose Analyze Generalized Linear Models Generalized Linear Models. Choose Type of Model and check Poisson loglinear. 21

22 Click on Response and move numbhh into Dependent Vriable. Click on Predictors and move both regressors agea and cldcsrv into Covariates. (We do not have categorical variables, which are moved into Factors). After choosing Model both variables should be moved into Model. In Statistics check in addition Include exponential parameter estimates. Then -> OK. 22

23 4. Results In Goodness of Fit table we can find normed deviance. We see that the normed deviance is close to 1 (0,919). Thus, the Poisson regression model fits our data. It remains to decide which regressors are statistically significant. Goodness of Fit b Value df Value/df Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson Chi-Square Log Likelihood a Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) Consistent AIC (CAIC) In the table Omnibus Test we find p-value for maximum likelihood statistic. Since p < 0,05, we conclude that not all regressors are statistically insignificant. Omnibus Test a Likelihood Ratio Chi-Square df Sig

24 In the table Tests of Model Effects we see Wald test p-values for all regressors. We do not check p-value for intercept. Both p < 0,05. Therefore, we conclude that both regressors (agea and cldcrsv) are statistically significant and should remain in the model. Tests of Model Effects Type III Source Wald Chi-Square df Sig. (Intercept) agea cldcrsv In the table Parameter Estimates information about Wald p-values is repeated. Moreover, the tabale contains estimates of the model s coefficients (Column B). Parameter Estimates 95 % Wald Confidence Interval Hypothesis Test 95 % Wald Confidence Interval for Exp(B) Std. Wald Chi- Parameter B Error Lower Upper Square df Sig. Exp(B) Lower Upper (Intercept) agea cldcrsv (Scale) 1 a We can see that coefficient for agea is negative: -0,035 < 0. This means that when respondents age increases, the number of household members decreases. Mathematical model s expression is µ = exp {1,535 0,035 aaaaaaaa + 0,099cccccccccccccc}. Here µ is the mean value of other household members. Forecasting means that we insert given values of agea and cldcrsv into above formula. 5. Categorical regressor Categorical regressors are included into Generalized Linear Models - Predictors -> Factors 24

25 Do not forget to add ctzntr into Model window. Then, in the table Parameter Estimates 95 % Wald Confidence Std. Interval Parameter B Error Lower Upper (Intercept) agea [ctzcntr=1] [ctzcntr=2] 0 a... cldcrsv We get additional information about both ctzcntr. Model then can be written as ln µ = 1,239 0,036aaaaaaaa + 0,104cccccccccccccc + 0,352, if cccccccccccccc = 1, 0, if cccccccccccccc = 2. 25

26 4. NEGATIVE BINOMIAL REGRESSION We are keeping count for some events. Variable Y is called dependent variable, variables X, Z, W are independent variables or regressors or predictors. We are modelling the behavior of the mean value of Y: μμ = e zz, zz = CC + bb 1 XX + bb 2 ZZ + bb 3 WW. Estimates for the coefficients CC, bb 1, bb 2, bb 3 are calculated from data. NB regression is an alternative to the Poisson regression. The main difference is that the variance of Y is larger than the mean of Y. Data Y has negative binomial distribution. We expect the mean of Y to be smaller than the variance of Y. Possible values for Y are 0, 1,2,3,... Regressors can be interval or categorical random variables. Main steps 5) Check if the variance of Y is greater than the mean of Y. Otherwise, the NB regression is not applicable. 6) Check if normed deviance is close to 1. 7) Check if maximum likelihood is statistically significant. If p-value 0,05, model is unacceptable. 8) Check if all regressors are significant (Wald test p < 0,05). If not drop them from the model. We do not pay attention to p-value for model constant (intercept). 26

27 Good Negative Binomial regression model: Normed deviance is close to 1; Maximum likelihood has p < 0,05. For all regressors Wald test p < 0,05. Negative binomial regression with SPSS 1. Data File ESS4SE. Variables: emplno respondent s number of employers, emplnof father s number of employees, (1 if no empoye, 2 has 1 24 employees, 3 more than 25 employees), brmwmny borrow money for living (1 is very difficult,..., 5 very easy), eduyrs years of formal schooling. We will model the dependence of emplno from emplnof, brwmny, eduyrs. Variable emplnof has only one observation greater than 26. Therefore, with recode we create a new dichotomous variable emplnof2, (0 if no employees, 1 at least one employe). 2. SPSS options for the negative binomial regression Analyze Generalized Linear Models Generalized Linear Models. Click on Type of Model. Do not choose Negative binomial with log link. 27

28 Check Custom --> Distribution -> Negative binomial, Link function Log, Parameter Estimate value. Click on Response and put emplno into Dependent Variable. In Predictors put both variables eduyrs and brwmny into Covariates. Categorical variable emplnof2 put into Factors. 28

29 Choose Model and put all variables into Model window. 3. Results At the beginning of output we see descriptive statistics. Observe that standard deviation of emplno (moreover, its variance) is greater than mean. Categorical Variable Information N Percent Factor emplnof % % Total % Continuous Variable Information N Minimum Maximum Mean Std. Deviation Dependent Variable Covariate emplno Number of employees respondent has/had eduyrs Years of full-time education completed brwmny Borrow money to make ends meet, difficult or easy In Goodness of Fit table we see that normed deviance is 0,901, that is we see quite good overall model fit to data. Goodness of Fit b Value df Value/df Deviance Scaled Deviance

30 Omnibus Test table contains maximum likelihood statistics and its p-value. Since p < 0,05, we conclude that at least one regressor is statistically significant. Tests of Model Effects contains Wald tests for each regressor. All regressors are statistically significant (we do not check p-value for intercept). Tests of Model Effects Type III Source Wald Chi- Square df Sig. (Intercept) emplnof Eduyrs Brwmny Omnibus Test a Likelihood Ratio Chi- Square df Sig Parameter Estimates table contains parameter estimates (surprise, surprise) Parameter Estimates 95 % Wald Confidence Interval Hypothesis Test 95 % Wald Confidence Interval for Exp(B) Wald Std. Chi- Parameter B Error Lower Upper Square df Sig. Exp(B) Lower Upper (Intercept) [emplnof2=.00] [emplnof2=1.00] 0 a Eduyrs Brwmny (Scale) (Negative binomial) 1 b Estimated model: 0, if eeeeeeeeeeeeee2 = 1, ln µ = 1, ,286 eeeeeeeeeeee 0,753 bbbbbbbbbbbb + 1,629, if eeeeeeeeeeeeee2 = 0. Here µ is mean value of employees. 30

MULTINOMIAL LOGISTIC REGRESSION

MULTINOMIAL LOGISTIC REGRESSION MULTINOMIAL LOGISTIC REGRESSION Model graphically: Variable Y is a dependent variable, variables X, Z, W are called regressors. Multinomial logistic regression is a generalization of the binary logistic

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Smart Alex s Solutions Task 1 A display rule refers to displaying an appropriate emotion in a given situation. For example, if you receive a Christmas present that you don

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Step 2: Select Analyze, Mixed Models, and Linear.

Step 2: Select Analyze, Mixed Models, and Linear. Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file

More information

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU. SOS3003 Applied data analysis for social science Lecture note 08-00 Erling Berge Department of sociology and political science NTNU Erling Berge 00 Literature Logistic regression II Hamilton Ch 7 p7-4

More information

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 09 ( 04 ) 730 736 nd World Conference On Business, Economics And Management - WCBEM 03 Categorical Principal

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Dependent Variable Q83: Attended meetings of your town or city council (0=no, 1=yes)

Dependent Variable Q83: Attended meetings of your town or city council (0=no, 1=yes) Logistic Regression Kristi Andrasik COM 731 Spring 2017. MODEL all data drawn from the 2006 National Community Survey (class data set) BLOCK 1 (Stepwise) Lifestyle Values Q7: Value work Q8: Value friends

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

SPSS Guide For MMI 409

SPSS Guide For MMI 409 SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Discriminant Analysis

Discriminant Analysis Discriminant Analysis V.Čekanavičius, G.Murauskas 1 Discriminant analysis one categorical variable depends on one or more normaly distributed variables. Can be used for forecasting. V.Čekanavičius, G.Murauskas

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Retrieve and Open the Data

Retrieve and Open the Data Retrieve and Open the Data 1. To download the data, click on the link on the class website for the SPSS syntax file for lab 1. 2. Open the file that you downloaded. 3. In the SPSS Syntax Editor, click

More information

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi. Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression 15.1 Predicting Quality of Life: Chapter 15 - Multiple Regression a. All other variables held constant, a difference of +1 degree in Temperature is associated with a difference of.01 in perceived Quality

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Chapter 866 Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Analysis of Covariance (ANCOVA) with Two Groups

Analysis of Covariance (ANCOVA) with Two Groups Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

4 Multicategory Logistic Regression

4 Multicategory Logistic Regression 4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

Logistic Regression Analysis

Logistic Regression Analysis Logistic Regression Analysis Predicting whether an event will or will not occur, as well as identifying the variables useful in making the prediction, is important in most academic disciplines as well

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Statistics: A review. Why statistics?

Statistics: A review. Why statistics? Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

The Flight of the Space Shuttle Challenger

The Flight of the Space Shuttle Challenger The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Homework Solutions Applied Logistic Regression

Homework Solutions Applied Logistic Regression Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Rama Nada. -Ensherah Mokheemer. 1 P a g e - 9 - Rama Nada -Ensherah Mokheemer - 1 P a g e Quick revision: Remember from the last lecture that chi square is an example of nonparametric test, other examples include Kruskal Wallis, Mann Whitney and

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients SPSS Output Homework 1-1e ANOVA a Sum of Squares df Mean Square F Sig. 1 Regression 351.056 1 351.056 11.295.002 b Residual 932.412 30 31.080 Total 1283.469 31 a. Dependent Variable: Sexual Harassment

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates WI-ATSA June 2-3, 2016 Overview Brief description of logistic

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information