CHAPTER 1: BINARY LOGIT MODEL

Size: px
Start display at page:

Download "CHAPTER 1: BINARY LOGIT MODEL"

Transcription

1 CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44

2 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS SAS codes and basic outputs Wald test for individual significance Likelihood-ratio, LM and Wald tests for overall significance Odds ratio estimates AIC, SC and Generalised R Association of predicted probabilities and observed responses Hosmer-Lemeshow test statistic 2 / 44

3 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: 3 / 44

4 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: Dichotomous dependent variables; 3 / 44

5 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: Dichotomous dependent variables; Problems with Ordinary Least Squares (OLS) in the face of dichotomous dependent variables; 3 / 44

6 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: Dichotomous dependent variables; Problems with Ordinary Least Squares (OLS) in the face of dichotomous dependent variables; Alternative estimation techniques 3 / 44

7 1.1 Dichotomous dependent variables 1.2 Problems with OLS Dichotomous dependent variables Often variables in social sciences are dichotomous: employed vs. unemployed married vs. unmarried guilty vs. innocent voted vs. didn t vote 4 / 44

8 1.1 Dichotomous dependent variables 1.2 Problems with OLS Dichotomous dependent variables Social scientists frequently wish to estimate regression models with a dichotomous dependent variable; Most researchers are aware that something is wrong with OLS in the face of a dichotomous dependent variable but they do not know what makes dichotomous variables problematic in regression, and what other methods are superior 5 / 44

9 1.1 Dichotomous dependent variables 1.2 Problems with OLS Dichotomous dependent variables Focus of this chapter is on binary Logit models (or logistic regression models) for dichotomous dependent variables; Logits have many similarities to OLS but there are also fundamental differences 6 / 44

10 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Examine why OLS regression runs into problems when the dependent variable is 0/1. Example Dataset: penalty.txt Comprises 147 penalty cases in the state of New Jersey; In all cases the defendant was convicted of first-degree murder with a recommendation by the prosecutor that a death sentence be imposed; Penalty trial is conducted to determine if the defendant should receive a death penalty or life imprisonment; 7 / 44

11 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS The dataset comprises the following variables: DEATH 1 for a death sentence 0 for a life sentence BLACKD 1 if the defendant was black 0 otherwise WHITVIC 1 if the victim was white 0 otherwise SERIOUS - an average rating of seriousness of the crime evaluated by a panel of judges, ranging from (least serious) to 15 (most serious) The goal is to regress DEATH on BLACKD, WHITVIC and SERIOUS; 8 / 44

12 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Note that DEATH, which has only two outcomes, follows a Bernoulli(p) distribution with p being the probability of a death sentence. Let Y =DEATH, then Pr(Y = y) = p y (1 p) 1 y, y = 0, 1 Recall that Bernoulli trials led to the Binomial distribution - if we repeat the Bernoulli(p) trials n times and count the number of W successes, the distribution of W follows a Binomial B(n, p) distribution, i.e., Pr(W = w) = n C w p w (1 p) (n w), 0 w n So the Bernoullli distribution is special case of the Binomial distribution when n = 1. 9 / 44

13 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS data penalty; infile 'd:\teaching\ms4225\penalty.txt'; input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2; PROC REG; MODEL DEATH=BLACKD WHITVIC SERIOUS; RUN; 10 / 44

14 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS The REG Procedure Model: MODEL1 Dependent Variable: DEATH Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept BLACKD WHITVIC SERIOUS / 44

15 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS The coefficient of SERIOUS is positive and very significant; Neither of the two racial variables are significantly different from zero; R 2 is low; F -test indicates overall significance of the model; But...can we trust these results? 12 / 44

16 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Note that if y is a 0/1 variable, then E(y i ) = 1 Pr(y i = 1) + 0 Pr(y i = 0) = 1 p i + 0 (1 p i ) = p i. But based on linear regression, y i = β 1 + β 2 X i + ɛ i. Hence E(y i ) = E(β 1 + β 2 X i + ɛ i ) = β 1 + β 2 X i + E(ɛ i ) = β 1 + β 2 X i. Therefore, p i = β 1 + β 2 X i. This is commonly referred to as the linear probability model (LPM). 13 / 44

17 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Accordingly, from the SAS results, a one-point increase in the SERIOUS scale is associated with a increase in the probability of a death sentence; the probability of a death sentence for blacks is 0.12 higher than for non-blacks, ceteris paribus. But do these results make sense? The LPM p i = β 1 + β 2 X i is actually implausible because p i is postulated to be a linear function of X i and thus has no upper and lower bounds. Accordingly, p i (which is a probability) can be greater than 1 or smaller than 0!! 14 / 44

18 Odds versus probability Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur; 15 / 44

19 Odds versus probability Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur; For example, an odds of 4 means we expect 4 times as many occurrences as non-occurrences; an odds of 5/2 (or 5 to 2) means we expect 5 occurrences to 2 non-occurrences; 15 / 44

20 Odds versus probability Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur; For example, an odds of 4 means we expect 4 times as many occurrences as non-occurrences; an odds of 5/2 (or 5 to 2) means we expect 5 occurrences to 2 non-occurrences; Let p be the probability of an event occurring and o the corresponding odds, then o = p/(1 p) or p = o/(1 + o); 15 / 44

21 Odds versus probability Relationship between probability and odds: Probability Odds o < 1 p < 0.5 and o > 1 p > 0.5; 0 o < although 0 p 1 16 / 44

22 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life / 44

23 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life o D = 50/97 = 0.52; o D B = 28/45 = 0.62; and o D NB = 22/52 = 0.42; 17 / 44

24 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life o D = 50/97 = 0.52; o D B = 28/45 = 0.62; and o D NB = 22/52 = 0.42; Hence the ratio of blacks odds of death to non-blacks odds of death are 0.62/0.42 = 1.476; 17 / 44

25 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life o D = 50/97 = 0.52; o D B = 28/45 = 0.62; and o D NB = 22/52 = 0.42; Hence the ratio of blacks odds of death to non-blacks odds of death are 0.62/0.42 = 1.476; This means the odds of death sentence for blacks are 47.6% higher than non-blacks, or the odds of death sentence for non-blacks are 0.63 times the corresponding odds for blacks 17 / 44

26 Logit model: basic elements The Logit model is based on the following cumulative distribution function of the logistic distribution: p i = 1 1+e β 1 +β 2 X i ; Let Z i = β 1 + β 2 X i, then p i = 1 1+e Z i = F (β 1 + β 2 X i ) = F (Z i ); As Z i ranges from to, P i ranges between 0 and 1; P i is non-linearly related to Z i. 18 / 44

27 Logit model: basic elements Graph of the Logit with β 1 = 0 and β 2 = 1: P i / 44

28 Logit model: basic elements Note that e Z i = p i /(1 p i ), the odds of an event; So, ln(p i /(1 p i )) = Z i = β 1 + β 2 X i ; in other words, the log of the odds is linear in X i, although p i and X i have a non-linear relationship. This is different from the LPM. 20 / 44

29 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i = β 2, a constant; 21 / 44

30 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i But for a Logit model, p i = F (β 1 + β 2 X i ) p i = F (β 1 + β 2 X i X i X i = F (β 1 + β 2 X i )β 2 = f (β 1 + β 2 X i )β 2, = β 2, a constant; where f (.) is the probability density function for the logistic distribution. 21 / 44

31 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i But for a Logit model, p i = F (β 1 + β 2 X i ) p i = F (β 1 + β 2 X i X i X i = F (β 1 + β 2 X i )β 2 = f (β 1 + β 2 X i )β 2, = β 2, a constant; where f (.) is the probability density function for the logistic distribution. As f (β 1 + β 2 X i ) is always positive, the sign of β 2 indicates the direction of the relationship between p i and X i. 21 / 44

32 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i But for a Logit model, p i = F (β 1 + β 2 X i ) p i = F (β 1 + β 2 X i X i X i = F (β 1 + β 2 X i )β 2 = f (β 1 + β 2 X i )β 2, = β 2, a constant; where f (.) is the probability density function for the logistic distribution. As f (β 1 + β 2 X i ) is always positive, the sign of β 2 indicates the direction of the relationship between p i and X i. 21 / 44

33 Logit model: basic elements Note that for the Logit model f (β 1 + β 2 X i ) = e Z i (1 + e Z i ) 2 = F (β 1 + β 2 X i )(1 F (β 1 + β 2 X i )) = p i (1 p i ) 22 / 44

34 Logit model: basic elements Note that for the Logit model f (β 1 + β 2 X i ) = e Z i (1 + e Z i ) 2 = F (β 1 + β 2 X i )(1 F (β 1 + β 2 X i )) = p i (1 p i ) Therefore, p i X i = β 2 p i (1 p i ). In other words, a 1-unit change in X i does not produce a constant effect on p i. 22 / 44

35 Maximum Likelihood estimation Note that y i only takes on values of 0 and 1, so p i /(1 p i ) is undefined and OLS is not an appropriate method of estimation. Maximum likelihood (ML) estimation is usually the technique to adopt; 23 / 44

36 Maximum Likelihood estimation Note that y i only takes on values of 0 and 1, so p i /(1 p i ) is undefined and OLS is not an appropriate method of estimation. Maximum likelihood (ML) estimation is usually the technique to adopt; ML principle: choose as estimates the parameter values which would maximise the probability of what we have already observed; 23 / 44

37 Maximum Likelihood estimation Note that y i only takes on values of 0 and 1, so p i /(1 p i ) is undefined and OLS is not an appropriate method of estimation. Maximum likelihood (ML) estimation is usually the technique to adopt; ML principle: choose as estimates the parameter values which would maximise the probability of what we have already observed; Steps of ML estimation: First, construct the likelihood function by expressing the probability of observing the data as a function of the unknown parameters. Second, find the values of the unknown parameters that make the value of this expression as large as possible. 23 / 44

38 Maximum Likelihood estimation The likelihood function is given by L = Pr(y 1, y 2,...y n ) = Pr(y 1 )Pr(y 2 )...Pr(y n ), assuming independent sampling; n = Pr(y i ) i=1 24 / 44

39 Maximum Likelihood estimation The likelihood function is given by L = Pr(y 1, y 2,...y n ) = Pr(y 1 )Pr(y 2 )...Pr(y n ), assuming independent sampling; n = Pr(y i ) i=1 But by definition, Pr(y i = 1) = p i and Pr(y i = 0) = 1 p i. Therefore, Pr(y i ) = p y i i (1 p i ) 1 y i 24 / 44

40 Maximum Likelihood estimation So, L = = n Pr(y i ) = i=1 n i=1 n p i ( ) y i (1 p i ) 1 p i i=1 p y i i (1 p i ) 1 y i It is usually easier to maximise the log of L than L itself. Taking log of both sides yields n p i lnl = log( ) y i + log(1 p i ) 1 p i = i=1 n p i y i log( ) + 1 p i i=1 n log(1 p i ) i=1 25 / 44

41 Maximum Likelihood estimation Substituting p i = 1 1+e β 1 +β 2 X i in lnl leads to n lnl = β 1 y i + β 2 i=1 n X i y i i=1 n log(1 + e β 1+β 2 X i ) i=1 26 / 44

42 Maximum Likelihood estimation Substituting p i = 1 1+e β 1 +β 2 X i in lnl leads to n lnl = β 1 y i + β 2 i=1 n X i y i i=1 n log(1 + e β 1+β 2 X i ) i=1 There are no closed form solutions to β 1 and β 2 when maximizing lnl; 26 / 44

43 Maximum Likelihood estimation Substituting p i = 1 1+e β 1 +β 2 X i in lnl leads to n lnl = β 1 y i + β 2 i=1 n X i y i i=1 n log(1 + e β 1+β 2 X i ) i=1 There are no closed form solutions to β 1 and β 2 when maximizing lnl; Numerical optimisation is required - SAS uses Fisher s Scoring, which is similar in principle to the Newton-Raphson algorithm. 26 / 44

44 Maximum Likelihood estimation Suppose θ is a univariate unknown parameter to be estimated. The Newton-Raphson algorithm derives estimates based on the formula ˆθ new = ˆθ old H 1 (ˆθ old )U(ˆθ old ), where H(.) and U(.) are the second and first derivatives of the objective function with respect to θ. The algorithm stops when the estimates from successive iterations converge; 27 / 44

45 Maximum Likelihood estimation Suppose θ is a univariate unknown parameter to be estimated. The Newton-Raphson algorithm derives estimates based on the formula ˆθ new = ˆθ old H 1 (ˆθ old )U(ˆθ old ), where H(.) and U(.) are the second and first derivatives of the objective function with respect to θ. The algorithm stops when the estimates from successive iterations converge; Consider a simple example, where g(θ) = θ 3 + 3θ 2 5. So, U(θ) = 3θ(θ 2) and H(θ) = 6(θ 1); Actual maximum and minimum of g(θ) are located at θ = 2 and θ = 0 respectively; 27 / 44

46 Maximum Likelihood estimation Step 1: Choose an arbitrary initial starting value, say, ˆθ initial = 1.5. So, U(1.5) = 2.25 and H(1.5) = 3. The new estimate of θ is therefore equal to ˆθ new = /( 3) = 2.25; Step 2: ˆθ old = So, U(2.25) = and H(2.25) = 7.5. The new estimate of θ is ˆθ new = /( 7.5) = 2.025; Continue with Steps 3, 4 and so on until convergence; Caution: Suppose we start with ˆθ initial = 0.5. If the process is left unchecked, the algorithm will converge to the minimum located at θ = 0!!! 28 / 44

47 Maximum Likelihood estimation The only difference between Fisher s Scoring and Newton-Raphson algorithm is that Fisher s Scoring uses E(H(.)) instead of H(.); Our current situation is more complicated in that the unknowns are multivariate. However, the optimisation principle remains the same; In practice, we need a set of initial values. PROC LOGISTIC in SAS starts with all coefficients equal to zero. 29 / 44

48 PROC LOGISTIC: basic elements data PENALTY; infile 'd:\teaching\ms4225\penalty.txt'; input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2; PROC LOGISTIC DATA=PENALTY DESCENDING; MODEL DEATH=BLACKD WHITVIC SERIOUS; RUN; 30 / 44

49 PROC LOGISTIC: basic elements The LOGISTIC Procedure Model Information Data Set WORK.PENALTY Response Variable DEATH Number of Response Levels 2 Number of Observations 147 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value DEATH Frequency Probability modeled is DEATH=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. 31 / 44

50 PROC LOGISTIC: basic elements Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald / 44

51 PROC LOGISTIC: basic elements The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 BLACKD WHITVIC SERIOUS Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits BLACKD WHITVIC SERIOUS Association of Predicted Probabilities and Observed Responses Percent Concordant 67.2 Somers' D Percent Discordant 32.3 Gamma Percent Tied 0.5 Tau-a Pairs 4850 c / 44

52 Wald test for individual significance Test of significance of individual coefficients: H 0 : β j = 0 vs. H 1 : otherwise Instead of reporting the t-stats, PROC LOGISTIC reports the Wald χ 2 -stats for the significance of individual coefficients. Reason being that the t-stat is not t distributed in a Logit model; instead, it has an asymptotic N(0, 1) distribution under the null of H 0 : β j = 0. The square of a N(0, 1) variable is a χ 2 variable with 1 df. The Wald χ 2 -stat is just the square of the usual t-stat. 34 / 44

53 Likelihood-ratio, LM and Wald tests for overall significance Test of overall model significance: H 0 : β 1 = β 2 =... = β k = 0 vs. H 1 : otherwise 1. Likelihood-ratio test: LR = 2[lnL( ˆβ (UR) ) lnl( ˆβ (R) )] χ 2 k 2. Score (Lagrange-multplier)(LM) test: LM = [U( ˆβ (R) )] [ H 1 ( ˆβ (R) )][U( ˆβ (R) )] χ 2 k 3. Wald test: W = ˆβ (UR) [ H( ˆβ (UR) )] ˆβ (UR) χ 2 k 35 / 44

54 Odds ratio estimates The odds ratio estimates are obtained by exponentiating the corresponding β estimates, i.e., e ˆβ j ; The (predicted) odds ratio of indicates that the odds of a death sentence for black defendants are 81% higher than the odds for other defendants; Similarly, the (predicted) odds of death are about 29% higher when the victim is white, notwithstanding the coefficient being insignificant; A 1-unit increase in the SERIOUS scale is associated with a 21% increase in the predicted odds of a death sentence 36 / 44

55 AIC, SC and Generalised R 2 Model selection criteria 1. Akaike s Information Criterion (AIC): AIC = 2[lnL (k + 1)] 2. Schwartz Bayesian Criterion (SBC or SC): SC = 2lnL + (k + 1) ln(n) 3. Generalized R 2 = 1 e LR/n, analogous to the conventional R 2 used in linear regression 37 / 44

56 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; 38 / 44

57 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; For each of these pairs, we ask the following question: Based on estimated model, does the case with a 1 have a higher predicted probability of attaining 1 than the case with a 0? 38 / 44

58 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; For each of these pairs, we ask the following question: Based on estimated model, does the case with a 1 have a higher predicted probability of attaining 1 than the case with a 0? If yes, we call the pair a concordant ; if no, we call the pair a discordant ; if the two cases have the same predicted values, we call it a tie ; 38 / 44

59 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; For each of these pairs, we ask the following question: Based on estimated model, does the case with a 1 have a higher predicted probability of attaining 1 than the case with a 0? If yes, we call the pair a concordant ; if no, we call the pair a discordant ; if the two cases have the same predicted values, we call it a tie ; Obviously, the more concordant pairs, the better the fit of the model. 38 / 44

60 Association of predicted probabilities and observed responses Let C= number of concordant pairs, D= number of discordant pairs, T =number of ties, and N=total number of pairs before eliminating any; Tau a = C D N,Somer sd(sd) = C D C D C+D+T, Gamma = C+D and C stat = 0.5 (1 + SD) 39 / 44

61 Association of predicted probabilities and observed responses Let C= number of concordant pairs, D= number of discordant pairs, T =number of ties, and N=total number of pairs before eliminating any; Tau a = C D N,Somer sd(sd) = C D C D C+D+T, Gamma = C+D and C stat = 0.5 (1 + SD) All 4 measures vary between 0 and 1 with large values corresponding to stronger associations between the predicted and observed values 39 / 44

62 Association of predicted probabilities and observed responses Let C= number of concordant pairs, D= number of discordant pairs, T =number of ties, and N=total number of pairs before eliminating any; Tau a = C D N,Somer sd(sd) = C D C D C+D+T, Gamma = C+D and C stat = 0.5 (1 + SD) All 4 measures vary between 0 and 1 with large values corresponding to stronger associations between the predicted and observed values Rules of thumb for minimally acceptable levels of Tau a, SD, Gamma and C stat are 0.1, 0.3, 0.3 and 0.65 respectively. 39 / 44

63 Hosmer-Lemeshow goodness of fit test The Hosmer-Lemeshow (HL) test is goodness of fit test which may be invoked by augmenting the LACKFIT option in the model statement under PROC LOGISTIC; The HL statistic is calculated as follows. Based on the estimated model, predicted probabilities are generated for all observations. These are sorted by size, then grouped into approximately 10 intervals. Within each interval, the expected frequency is obtained by adding up the predicted probabilities. Expected frequencies are compared with the observed frequencies by the conventional Pearson χ 2 statistic. The df is the number of intervals minus 2; 40 / 44

64 Hosmer-Lemeshow goodness of fit test HL = 2G (O j E j ) 2 j=1 E j χ 2 G 2, where G is the number of intervals, and O and E are the observed and predicted frequencies respectively. LACKFIT output is as follows: 41 / 44

65 Hosmer-Lemeshow goodness of fit test HL = 2G (O j E j ) 2 j=1 E j χ 2 G 2, where G is the number of intervals, and O and E are the observed and predicted frequencies respectively. LACKFIT output is as follows: Partition for the Hosmer and Lemeshow Test DEATH = 1 DEATH = 0 Observed Expected Observed Expected Group Total Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq / 44

66 Class exercises 1. Tutorial 1 2. Table 12.4 of Ramanathan (1995): Introductory Econometrics, presents information on the acceptance or rejection to medical school for a sample of 60 applicants, along with a number of their characteristics. The variables are as follows: ACCEPT=1 if granted acceptance, 0 otherwise; GPA=cumulative undergraduate grade point average; BIO=score in the biology portion of the Medical College Admission Test (MCAT); CHEM=score in the chemistry portion of the MCAT; 42 / 44

67 Class exercises PHY=score in the physics portion of the MCAT; RED=score in the reading portion of the MCAT; PRB=score in the problem portion of the MCAT; QNT=score in the quantitative portion of the MCAT; AGE=age of the applicant; GENDER=1 for male, 0 for female; Answer the following questions with the aid of the program and output medicalsas.txt and medicalout.txt uploaded on the course website: 43 / 44

68 Class exercises 1. Write down the estimated Logit model that regresses ACCEPT on all of the above explanatory variables. 2. Test for the overall significance of the model using the LR, LM and Wald tests. Do the three tests provide consistent results? 3. Test for the significance of the individual coefficients using the Wald test. 4. Predict the probability of success of an individual with the following characteristics: GPA=2.96, BIO=7, CHEM=7, PHY=8, RED=5, PRB=7, QNT=5, AGE=25, GENDER=0. 5. Calculate the Generalised R 2 for the above regression. How well does the model appear to fit the data? 6. AGE and GENDER represent personal characteristics. Test the hypothesis that they jointly have no impact on the probability of success. 44 / 44

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

1.5 Testing and Model Selection

1.5 Testing and Model Selection 1.5 Testing and Model Selection The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses (e.g. Likelihood Ratio statistic) and to choosing between specifications

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Modeling Machiavellianism Predicting Scores with Fewer Factors

Modeling Machiavellianism Predicting Scores with Fewer Factors Modeling Machiavellianism Predicting Scores with Fewer Factors ABSTRACT RESULTS Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Age 55 (x = 1) Age < 55 (x = 0)

Age 55 (x = 1) Age < 55 (x = 0) Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

The Logit Model: Estimation, Testing and Interpretation

The Logit Model: Estimation, Testing and Interpretation The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving

More information

ZERO INFLATED POISSON REGRESSION

ZERO INFLATED POISSON REGRESSION STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

ECON Introductory Econometrics. Lecture 11: Binary dependent variables ECON4150 - Introductory Econometrics Lecture 11: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

The Flight of the Space Shuttle Challenger

The Flight of the Space Shuttle Challenger The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.

More information