CHAPTER 1: BINARY LOGIT MODEL
|
|
- Spencer Marcus Phillips
- 6 years ago
- Views:
Transcription
1 CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44
2 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS SAS codes and basic outputs Wald test for individual significance Likelihood-ratio, LM and Wald tests for overall significance Odds ratio estimates AIC, SC and Generalised R Association of predicted probabilities and observed responses Hosmer-Lemeshow test statistic 2 / 44
3 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: 3 / 44
4 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: Dichotomous dependent variables; 3 / 44
5 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: Dichotomous dependent variables; Problems with Ordinary Least Squares (OLS) in the face of dichotomous dependent variables; 3 / 44
6 1.1 Dichotomous dependent variables 1.2 Problems with OLS Introduction Motivation for Logit model: Dichotomous dependent variables; Problems with Ordinary Least Squares (OLS) in the face of dichotomous dependent variables; Alternative estimation techniques 3 / 44
7 1.1 Dichotomous dependent variables 1.2 Problems with OLS Dichotomous dependent variables Often variables in social sciences are dichotomous: employed vs. unemployed married vs. unmarried guilty vs. innocent voted vs. didn t vote 4 / 44
8 1.1 Dichotomous dependent variables 1.2 Problems with OLS Dichotomous dependent variables Social scientists frequently wish to estimate regression models with a dichotomous dependent variable; Most researchers are aware that something is wrong with OLS in the face of a dichotomous dependent variable but they do not know what makes dichotomous variables problematic in regression, and what other methods are superior 5 / 44
9 1.1 Dichotomous dependent variables 1.2 Problems with OLS Dichotomous dependent variables Focus of this chapter is on binary Logit models (or logistic regression models) for dichotomous dependent variables; Logits have many similarities to OLS but there are also fundamental differences 6 / 44
10 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Examine why OLS regression runs into problems when the dependent variable is 0/1. Example Dataset: penalty.txt Comprises 147 penalty cases in the state of New Jersey; In all cases the defendant was convicted of first-degree murder with a recommendation by the prosecutor that a death sentence be imposed; Penalty trial is conducted to determine if the defendant should receive a death penalty or life imprisonment; 7 / 44
11 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS The dataset comprises the following variables: DEATH 1 for a death sentence 0 for a life sentence BLACKD 1 if the defendant was black 0 otherwise WHITVIC 1 if the victim was white 0 otherwise SERIOUS - an average rating of seriousness of the crime evaluated by a panel of judges, ranging from (least serious) to 15 (most serious) The goal is to regress DEATH on BLACKD, WHITVIC and SERIOUS; 8 / 44
12 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Note that DEATH, which has only two outcomes, follows a Bernoulli(p) distribution with p being the probability of a death sentence. Let Y =DEATH, then Pr(Y = y) = p y (1 p) 1 y, y = 0, 1 Recall that Bernoulli trials led to the Binomial distribution - if we repeat the Bernoulli(p) trials n times and count the number of W successes, the distribution of W follows a Binomial B(n, p) distribution, i.e., Pr(W = w) = n C w p w (1 p) (n w), 0 w n So the Bernoullli distribution is special case of the Binomial distribution when n = 1. 9 / 44
13 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS data penalty; infile 'd:\teaching\ms4225\penalty.txt'; input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2; PROC REG; MODEL DEATH=BLACKD WHITVIC SERIOUS; RUN; 10 / 44
14 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS The REG Procedure Model: MODEL1 Dependent Variable: DEATH Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept BLACKD WHITVIC SERIOUS / 44
15 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS The coefficient of SERIOUS is positive and very significant; Neither of the two racial variables are significantly different from zero; R 2 is low; F -test indicates overall significance of the model; But...can we trust these results? 12 / 44
16 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Note that if y is a 0/1 variable, then E(y i ) = 1 Pr(y i = 1) + 0 Pr(y i = 0) = 1 p i + 0 (1 p i ) = p i. But based on linear regression, y i = β 1 + β 2 X i + ɛ i. Hence E(y i ) = E(β 1 + β 2 X i + ɛ i ) = β 1 + β 2 X i + E(ɛ i ) = β 1 + β 2 X i. Therefore, p i = β 1 + β 2 X i. This is commonly referred to as the linear probability model (LPM). 13 / 44
17 1.1 Dichotomous dependent variables 1.2 Problems with OLS Problems with OLS Accordingly, from the SAS results, a one-point increase in the SERIOUS scale is associated with a increase in the probability of a death sentence; the probability of a death sentence for blacks is 0.12 higher than for non-blacks, ceteris paribus. But do these results make sense? The LPM p i = β 1 + β 2 X i is actually implausible because p i is postulated to be a linear function of X i and thus has no upper and lower bounds. Accordingly, p i (which is a probability) can be greater than 1 or smaller than 0!! 14 / 44
18 Odds versus probability Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur; 15 / 44
19 Odds versus probability Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur; For example, an odds of 4 means we expect 4 times as many occurrences as non-occurrences; an odds of 5/2 (or 5 to 2) means we expect 5 occurrences to 2 non-occurrences; 15 / 44
20 Odds versus probability Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur; For example, an odds of 4 means we expect 4 times as many occurrences as non-occurrences; an odds of 5/2 (or 5 to 2) means we expect 5 occurrences to 2 non-occurrences; Let p be the probability of an event occurring and o the corresponding odds, then o = p/(1 p) or p = o/(1 + o); 15 / 44
21 Odds versus probability Relationship between probability and odds: Probability Odds o < 1 p < 0.5 and o > 1 p > 0.5; 0 o < although 0 p 1 16 / 44
22 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life / 44
23 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life o D = 50/97 = 0.52; o D B = 28/45 = 0.62; and o D NB = 22/52 = 0.42; 17 / 44
24 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life o D = 50/97 = 0.52; o D B = 28/45 = 0.62; and o D NB = 22/52 = 0.42; Hence the ratio of blacks odds of death to non-blacks odds of death are 0.62/0.42 = 1.476; 17 / 44
25 Odds versus probability Death sentence by race of defendant for 147 penalty trials: blacks non-blacks death life o D = 50/97 = 0.52; o D B = 28/45 = 0.62; and o D NB = 22/52 = 0.42; Hence the ratio of blacks odds of death to non-blacks odds of death are 0.62/0.42 = 1.476; This means the odds of death sentence for blacks are 47.6% higher than non-blacks, or the odds of death sentence for non-blacks are 0.63 times the corresponding odds for blacks 17 / 44
26 Logit model: basic elements The Logit model is based on the following cumulative distribution function of the logistic distribution: p i = 1 1+e β 1 +β 2 X i ; Let Z i = β 1 + β 2 X i, then p i = 1 1+e Z i = F (β 1 + β 2 X i ) = F (Z i ); As Z i ranges from to, P i ranges between 0 and 1; P i is non-linearly related to Z i. 18 / 44
27 Logit model: basic elements Graph of the Logit with β 1 = 0 and β 2 = 1: P i / 44
28 Logit model: basic elements Note that e Z i = p i /(1 p i ), the odds of an event; So, ln(p i /(1 p i )) = Z i = β 1 + β 2 X i ; in other words, the log of the odds is linear in X i, although p i and X i have a non-linear relationship. This is different from the LPM. 20 / 44
29 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i = β 2, a constant; 21 / 44
30 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i But for a Logit model, p i = F (β 1 + β 2 X i ) p i = F (β 1 + β 2 X i X i X i = F (β 1 + β 2 X i )β 2 = f (β 1 + β 2 X i )β 2, = β 2, a constant; where f (.) is the probability density function for the logistic distribution. 21 / 44
31 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i But for a Logit model, p i = F (β 1 + β 2 X i ) p i = F (β 1 + β 2 X i X i X i = F (β 1 + β 2 X i )β 2 = f (β 1 + β 2 X i )β 2, = β 2, a constant; where f (.) is the probability density function for the logistic distribution. As f (β 1 + β 2 X i ) is always positive, the sign of β 2 indicates the direction of the relationship between p i and X i. 21 / 44
32 Logit model: basic elements For a linear model y i = β 1 + β 2 X i + ɛ i, y i X i But for a Logit model, p i = F (β 1 + β 2 X i ) p i = F (β 1 + β 2 X i X i X i = F (β 1 + β 2 X i )β 2 = f (β 1 + β 2 X i )β 2, = β 2, a constant; where f (.) is the probability density function for the logistic distribution. As f (β 1 + β 2 X i ) is always positive, the sign of β 2 indicates the direction of the relationship between p i and X i. 21 / 44
33 Logit model: basic elements Note that for the Logit model f (β 1 + β 2 X i ) = e Z i (1 + e Z i ) 2 = F (β 1 + β 2 X i )(1 F (β 1 + β 2 X i )) = p i (1 p i ) 22 / 44
34 Logit model: basic elements Note that for the Logit model f (β 1 + β 2 X i ) = e Z i (1 + e Z i ) 2 = F (β 1 + β 2 X i )(1 F (β 1 + β 2 X i )) = p i (1 p i ) Therefore, p i X i = β 2 p i (1 p i ). In other words, a 1-unit change in X i does not produce a constant effect on p i. 22 / 44
35 Maximum Likelihood estimation Note that y i only takes on values of 0 and 1, so p i /(1 p i ) is undefined and OLS is not an appropriate method of estimation. Maximum likelihood (ML) estimation is usually the technique to adopt; 23 / 44
36 Maximum Likelihood estimation Note that y i only takes on values of 0 and 1, so p i /(1 p i ) is undefined and OLS is not an appropriate method of estimation. Maximum likelihood (ML) estimation is usually the technique to adopt; ML principle: choose as estimates the parameter values which would maximise the probability of what we have already observed; 23 / 44
37 Maximum Likelihood estimation Note that y i only takes on values of 0 and 1, so p i /(1 p i ) is undefined and OLS is not an appropriate method of estimation. Maximum likelihood (ML) estimation is usually the technique to adopt; ML principle: choose as estimates the parameter values which would maximise the probability of what we have already observed; Steps of ML estimation: First, construct the likelihood function by expressing the probability of observing the data as a function of the unknown parameters. Second, find the values of the unknown parameters that make the value of this expression as large as possible. 23 / 44
38 Maximum Likelihood estimation The likelihood function is given by L = Pr(y 1, y 2,...y n ) = Pr(y 1 )Pr(y 2 )...Pr(y n ), assuming independent sampling; n = Pr(y i ) i=1 24 / 44
39 Maximum Likelihood estimation The likelihood function is given by L = Pr(y 1, y 2,...y n ) = Pr(y 1 )Pr(y 2 )...Pr(y n ), assuming independent sampling; n = Pr(y i ) i=1 But by definition, Pr(y i = 1) = p i and Pr(y i = 0) = 1 p i. Therefore, Pr(y i ) = p y i i (1 p i ) 1 y i 24 / 44
40 Maximum Likelihood estimation So, L = = n Pr(y i ) = i=1 n i=1 n p i ( ) y i (1 p i ) 1 p i i=1 p y i i (1 p i ) 1 y i It is usually easier to maximise the log of L than L itself. Taking log of both sides yields n p i lnl = log( ) y i + log(1 p i ) 1 p i = i=1 n p i y i log( ) + 1 p i i=1 n log(1 p i ) i=1 25 / 44
41 Maximum Likelihood estimation Substituting p i = 1 1+e β 1 +β 2 X i in lnl leads to n lnl = β 1 y i + β 2 i=1 n X i y i i=1 n log(1 + e β 1+β 2 X i ) i=1 26 / 44
42 Maximum Likelihood estimation Substituting p i = 1 1+e β 1 +β 2 X i in lnl leads to n lnl = β 1 y i + β 2 i=1 n X i y i i=1 n log(1 + e β 1+β 2 X i ) i=1 There are no closed form solutions to β 1 and β 2 when maximizing lnl; 26 / 44
43 Maximum Likelihood estimation Substituting p i = 1 1+e β 1 +β 2 X i in lnl leads to n lnl = β 1 y i + β 2 i=1 n X i y i i=1 n log(1 + e β 1+β 2 X i ) i=1 There are no closed form solutions to β 1 and β 2 when maximizing lnl; Numerical optimisation is required - SAS uses Fisher s Scoring, which is similar in principle to the Newton-Raphson algorithm. 26 / 44
44 Maximum Likelihood estimation Suppose θ is a univariate unknown parameter to be estimated. The Newton-Raphson algorithm derives estimates based on the formula ˆθ new = ˆθ old H 1 (ˆθ old )U(ˆθ old ), where H(.) and U(.) are the second and first derivatives of the objective function with respect to θ. The algorithm stops when the estimates from successive iterations converge; 27 / 44
45 Maximum Likelihood estimation Suppose θ is a univariate unknown parameter to be estimated. The Newton-Raphson algorithm derives estimates based on the formula ˆθ new = ˆθ old H 1 (ˆθ old )U(ˆθ old ), where H(.) and U(.) are the second and first derivatives of the objective function with respect to θ. The algorithm stops when the estimates from successive iterations converge; Consider a simple example, where g(θ) = θ 3 + 3θ 2 5. So, U(θ) = 3θ(θ 2) and H(θ) = 6(θ 1); Actual maximum and minimum of g(θ) are located at θ = 2 and θ = 0 respectively; 27 / 44
46 Maximum Likelihood estimation Step 1: Choose an arbitrary initial starting value, say, ˆθ initial = 1.5. So, U(1.5) = 2.25 and H(1.5) = 3. The new estimate of θ is therefore equal to ˆθ new = /( 3) = 2.25; Step 2: ˆθ old = So, U(2.25) = and H(2.25) = 7.5. The new estimate of θ is ˆθ new = /( 7.5) = 2.025; Continue with Steps 3, 4 and so on until convergence; Caution: Suppose we start with ˆθ initial = 0.5. If the process is left unchecked, the algorithm will converge to the minimum located at θ = 0!!! 28 / 44
47 Maximum Likelihood estimation The only difference between Fisher s Scoring and Newton-Raphson algorithm is that Fisher s Scoring uses E(H(.)) instead of H(.); Our current situation is more complicated in that the unknowns are multivariate. However, the optimisation principle remains the same; In practice, we need a set of initial values. PROC LOGISTIC in SAS starts with all coefficients equal to zero. 29 / 44
48 PROC LOGISTIC: basic elements data PENALTY; infile 'd:\teaching\ms4225\penalty.txt'; input DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2; PROC LOGISTIC DATA=PENALTY DESCENDING; MODEL DEATH=BLACKD WHITVIC SERIOUS; RUN; 30 / 44
49 PROC LOGISTIC: basic elements The LOGISTIC Procedure Model Information Data Set WORK.PENALTY Response Variable DEATH Number of Response Levels 2 Number of Observations 147 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value DEATH Frequency Probability modeled is DEATH=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. 31 / 44
50 PROC LOGISTIC: basic elements Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald / 44
51 PROC LOGISTIC: basic elements The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 BLACKD WHITVIC SERIOUS Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits BLACKD WHITVIC SERIOUS Association of Predicted Probabilities and Observed Responses Percent Concordant 67.2 Somers' D Percent Discordant 32.3 Gamma Percent Tied 0.5 Tau-a Pairs 4850 c / 44
52 Wald test for individual significance Test of significance of individual coefficients: H 0 : β j = 0 vs. H 1 : otherwise Instead of reporting the t-stats, PROC LOGISTIC reports the Wald χ 2 -stats for the significance of individual coefficients. Reason being that the t-stat is not t distributed in a Logit model; instead, it has an asymptotic N(0, 1) distribution under the null of H 0 : β j = 0. The square of a N(0, 1) variable is a χ 2 variable with 1 df. The Wald χ 2 -stat is just the square of the usual t-stat. 34 / 44
53 Likelihood-ratio, LM and Wald tests for overall significance Test of overall model significance: H 0 : β 1 = β 2 =... = β k = 0 vs. H 1 : otherwise 1. Likelihood-ratio test: LR = 2[lnL( ˆβ (UR) ) lnl( ˆβ (R) )] χ 2 k 2. Score (Lagrange-multplier)(LM) test: LM = [U( ˆβ (R) )] [ H 1 ( ˆβ (R) )][U( ˆβ (R) )] χ 2 k 3. Wald test: W = ˆβ (UR) [ H( ˆβ (UR) )] ˆβ (UR) χ 2 k 35 / 44
54 Odds ratio estimates The odds ratio estimates are obtained by exponentiating the corresponding β estimates, i.e., e ˆβ j ; The (predicted) odds ratio of indicates that the odds of a death sentence for black defendants are 81% higher than the odds for other defendants; Similarly, the (predicted) odds of death are about 29% higher when the victim is white, notwithstanding the coefficient being insignificant; A 1-unit increase in the SERIOUS scale is associated with a 21% increase in the predicted odds of a death sentence 36 / 44
55 AIC, SC and Generalised R 2 Model selection criteria 1. Akaike s Information Criterion (AIC): AIC = 2[lnL (k + 1)] 2. Schwartz Bayesian Criterion (SBC or SC): SC = 2lnL + (k + 1) ln(n) 3. Generalized R 2 = 1 e LR/n, analogous to the conventional R 2 used in linear regression 37 / 44
56 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; 38 / 44
57 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; For each of these pairs, we ask the following question: Based on estimated model, does the case with a 1 have a higher predicted probability of attaining 1 than the case with a 0? 38 / 44
58 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; For each of these pairs, we ask the following question: Based on estimated model, does the case with a 1 have a higher predicted probability of attaining 1 than the case with a 0? If yes, we call the pair a concordant ; if no, we call the pair a discordant ; if the two cases have the same predicted values, we call it a tie ; 38 / 44
59 Association of predicted probabilities and observed responses For the 147 observations in the sample, there are 147 C 2 = ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1 s or both 0 s on y. These we ignore, leaving 4850 pairs for which one case has a 1 and other case has a 0; For each of these pairs, we ask the following question: Based on estimated model, does the case with a 1 have a higher predicted probability of attaining 1 than the case with a 0? If yes, we call the pair a concordant ; if no, we call the pair a discordant ; if the two cases have the same predicted values, we call it a tie ; Obviously, the more concordant pairs, the better the fit of the model. 38 / 44
60 Association of predicted probabilities and observed responses Let C= number of concordant pairs, D= number of discordant pairs, T =number of ties, and N=total number of pairs before eliminating any; Tau a = C D N,Somer sd(sd) = C D C D C+D+T, Gamma = C+D and C stat = 0.5 (1 + SD) 39 / 44
61 Association of predicted probabilities and observed responses Let C= number of concordant pairs, D= number of discordant pairs, T =number of ties, and N=total number of pairs before eliminating any; Tau a = C D N,Somer sd(sd) = C D C D C+D+T, Gamma = C+D and C stat = 0.5 (1 + SD) All 4 measures vary between 0 and 1 with large values corresponding to stronger associations between the predicted and observed values 39 / 44
62 Association of predicted probabilities and observed responses Let C= number of concordant pairs, D= number of discordant pairs, T =number of ties, and N=total number of pairs before eliminating any; Tau a = C D N,Somer sd(sd) = C D C D C+D+T, Gamma = C+D and C stat = 0.5 (1 + SD) All 4 measures vary between 0 and 1 with large values corresponding to stronger associations between the predicted and observed values Rules of thumb for minimally acceptable levels of Tau a, SD, Gamma and C stat are 0.1, 0.3, 0.3 and 0.65 respectively. 39 / 44
63 Hosmer-Lemeshow goodness of fit test The Hosmer-Lemeshow (HL) test is goodness of fit test which may be invoked by augmenting the LACKFIT option in the model statement under PROC LOGISTIC; The HL statistic is calculated as follows. Based on the estimated model, predicted probabilities are generated for all observations. These are sorted by size, then grouped into approximately 10 intervals. Within each interval, the expected frequency is obtained by adding up the predicted probabilities. Expected frequencies are compared with the observed frequencies by the conventional Pearson χ 2 statistic. The df is the number of intervals minus 2; 40 / 44
64 Hosmer-Lemeshow goodness of fit test HL = 2G (O j E j ) 2 j=1 E j χ 2 G 2, where G is the number of intervals, and O and E are the observed and predicted frequencies respectively. LACKFIT output is as follows: 41 / 44
65 Hosmer-Lemeshow goodness of fit test HL = 2G (O j E j ) 2 j=1 E j χ 2 G 2, where G is the number of intervals, and O and E are the observed and predicted frequencies respectively. LACKFIT output is as follows: Partition for the Hosmer and Lemeshow Test DEATH = 1 DEATH = 0 Observed Expected Observed Expected Group Total Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq / 44
66 Class exercises 1. Tutorial 1 2. Table 12.4 of Ramanathan (1995): Introductory Econometrics, presents information on the acceptance or rejection to medical school for a sample of 60 applicants, along with a number of their characteristics. The variables are as follows: ACCEPT=1 if granted acceptance, 0 otherwise; GPA=cumulative undergraduate grade point average; BIO=score in the biology portion of the Medical College Admission Test (MCAT); CHEM=score in the chemistry portion of the MCAT; 42 / 44
67 Class exercises PHY=score in the physics portion of the MCAT; RED=score in the reading portion of the MCAT; PRB=score in the problem portion of the MCAT; QNT=score in the quantitative portion of the MCAT; AGE=age of the applicant; GENDER=1 for male, 0 for female; Answer the following questions with the aid of the program and output medicalsas.txt and medicalout.txt uploaded on the course website: 43 / 44
68 Class exercises 1. Write down the estimated Logit model that regresses ACCEPT on all of the above explanatory variables. 2. Test for the overall significance of the model using the LR, LM and Wald tests. Do the three tests provide consistent results? 3. Test for the significance of the individual coefficients using the Wald test. 4. Predict the probability of success of an individual with the following characteristics: GPA=2.96, BIO=7, CHEM=7, PHY=8, RED=5, PRB=7, QNT=5, AGE=25, GENDER=0. 5. Calculate the Generalised R 2 for the above regression. How well does the model appear to fit the data? 6. AGE and GENDER represent personal characteristics. Test the hypothesis that they jointly have no impact on the probability of success. 44 / 44
UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm
Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.
More informationSection 9c. Propensity scores. Controlling for bias & confounding in observational studies
Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationYou can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.
The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the
More informationCount data page 1. Count data. 1. Estimating, testing proportions
Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More information1.5 Testing and Model Selection
1.5 Testing and Model Selection The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses (e.g. Likelihood Ratio statistic) and to choosing between specifications
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationSAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;
SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationModels for Binary Outcomes
Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationLOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi
LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationChapter 14 Logistic and Poisson Regressions
STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationModeling Machiavellianism Predicting Scores with Fewer Factors
Modeling Machiavellianism Predicting Scores with Fewer Factors ABSTRACT RESULTS Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationExperimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.
Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationChapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationAdvanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y
More informationIntroduction to the Logistic Regression Model
CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More informationAnswers to Problem Set #4
Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationThe Logit Model: Estimation, Testing and Interpretation
The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationRegression with Qualitative Information. Part VI. Regression with Qualitative Information
Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving
More informationZERO INFLATED POISSON REGRESSION
STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS
More informationFinal Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)
Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationECON Introductory Econometrics. Lecture 11: Binary dependent variables
ECON4150 - Introductory Econometrics Lecture 11: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationQ30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only
Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1
More informationLecture 11 Multiple Linear Regression
Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression
More informationTwo Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests
Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationThe Flight of the Space Shuttle Challenger
The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationBiostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression
Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria
ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More information