2. We care about proportion for categorical variable, but average for numerical one.
|
|
- Virginia Snow
- 6 years ago
- Views:
Transcription
1 Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is debt-income ratio, diratio. 2. We care about proportion for categorical variable, but average for numerical one.. import excel "I:\420\420_bank.xls", sheet("sheet1") firstrow clear. label define denyl 0 "Accept" 1 "Deny". label value deny denyl. tab deny deny Freq. Percent Cum Accept 2, Deny Total 2, sum diratio Variable Obs Mean Std. Dev. Min Max diratio So about 12 percent applications are denied. The average diratio is We can use deny as independent variable. By doing so, the following regression effectively reports the two-sample t test for diratio:. reg diratio deny, nohe diratio Coef. Std. Err. t P> t [95% Conf. Interval] deny _cons
2 The average diratio is for the acceptance group (base or reference group, when deny = 0). The average diratio for the denial group is greater than the acceptance group by This positive difference is statistically significant with t-value = This regression shows that diratio is correlated with the probability of acceptance. 4. Now we switch, and use deny as the dependent variable:. reg deny diratio, r Robust deny Coef. Std. Err. t P> t [95% Conf. Interval] diratio _cons This regression reports the Linear Probability Model (LPM) with heteroskedasticityrobust standard error. y i = β 0 + β 1 x i + u i (1) E(y i x i ) = β 0 + β 1 x i (2) P r(y i = 1 x i ) = β 0 + β 1 x i (3) where the second step assumes E(u i x i ) = 0, and the last step is due to the fact that for a Bernoulli variable E(y) = P r(y = 1). It is called LPM because it assumes the probability is a linear function of x, see (3). 5. Exercise: why is the robust standard error necessary here? 6. So, suppose the diratio changes from 0.1 to 0.2, the probability for denial increases by = β = Actually the change in denial probability is the same as when diratio changes from 0.2 to 0.3, from 0.3 to 0.4, and so on. In short, dp r(y=1) LPM assumes that the marginal effect of x on P r(y = 1) is constant: = β dx The constant marginal effect implies that the predicted probability P r(y = 1) can be greater than 1 if x is sufficiently large. This is bad because probability should be bounded between 0 and 1. For instance, below we find a greater-than-one probability: 2
3 . dis "predicted denial probability when diratio=2 is " _b[_cons] + _b[diratio]* predicted denial probability when diratio=2 is twoway (scatter deny diratio) (lfit deny diratio), ytitle(deny) Deny diratio deny Fitted values The graph above clearly shows that LPM may also produce negative P r(y = 1) when x is sufficiently small. 8. In short, LPM is flawed because it does not impose the 0-1 boundary on P r(y i = 1 x i ). By comparison, probit and logit models impose that restriction by letting P r(y i = 1 x i ) = cdf(β 0 + β 1 x i ) (4) where cdf(z) = Pr(Z z) denotes the nonlinear cumulative distribution function (cdf), and by definition 0 cdf 1. Probit model uses the cdf of standard normal distribution, whereas Logit model uses the cdf of logistic distribution. 9. Consider Probit first. Let Φ( ) be the cdf for standard normal distribution. Then the density for the i-th observation is f i = p y i i (1 p i) 1 y i, (y i = 0, 1) (5) p i = Φ(β 0 + β 1 x i ) (6) Assuming iid sample, the joint density (likelihood) is L = Π n i=1f i (7) 3
4 After taking log, we obtain the log likelihood as log(l) = i log(f i ) = i [y i log(p i ) + (1 y i ) log(1 p i )] (8) 10. Notice that p i is no longer a constant. Instead, it is the conditional probability P r(y i = 1 x i ), which varies with x i. Because of this complication, it is impossible to find a closed-form solution or analytical answer for maximizing log(l). Nevertheless, numerical method can be used to solve the optimization problem though iterations:. probit de di Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Probit regression Number of obs = 2380 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = deny Coef. Std. Err. z P> z [95% Conf. Interval] diratio _cons In the end of iteration, the log likelihood is maximized at We may consider the restricted regression in which diratio is excluded. qui probit de. dis "log likelihood for restricted regression is " e(ll) log likelihood for restricted regression is dis "Likelihood Ratio (LR) test is " 2*( ( )) Likelihood Ratio (LR) test is dis "pvalue of LR test is " chi2tail(1, 80.59) pvalue of LR test is 2.778e-19 4
5 So the LR test rejects the null hypothesis that diratio does not matter for deny. You can think of LR test as the maximum likelihood version of F test. The Pseudo R2 is computed as. dis "pseudo R2 is " 1 - ( )/( ) pseudo R2 is According to (6), for a given value of diratio, say 0.3, the denial probability is. dis "denial probability when diratio is 0.3 is " normprob( *0.3) denial probability when diratio is 0.3 is Actually, we can apply a loop to obtain denial probability for a range of diratio forvalues i = 1 (1) 9 { local di = i *0.1 dis "denial probability when diratio is " di " is " normprob( * di ) } denial probability when diratio is.1 is denial probability when diratio is.2 is denial probability when diratio is.3 is denial probability when diratio is.4 is denial probability when diratio is.5 is denial probability when diratio is.6 is denial probability when diratio is.7 is denial probability when diratio is.8 is denial probability when diratio is.9 is Notice that the change in denial probability is NOT constant. Good News! 12. Non-constant marginal effect can be translated to the 0-1 boundary on predicted probability. capture drop dis pd gen dis =. gen pde =. 5
6 forvalues i = 1 (1) 20 { local di = i *0.1 local pd = normprob( * di ) qui replace dis = di in i qui replace pde = pd in i } label variable dis "debt-income ratio" label variable pd "predicted denail probability" twoway (connect pde dis) in 1/20 predicted denail probability debt income ratio We see the predicted denial probability is restricted to be between 0 and 1. Good news! From that graph we know denial is almost sure when diratio exceeds The downside of the probit model is that β 1 becomes hard to interpret. By taking derivative of (6) and applying chain rule dp r(y i = 1 x i ) dx i = dφ(β 0 + β 1 x i ) dx i = ϕ(β 0 + β 1 x i )β 1 (9) where ϕ is the probability density function (pdf) of standard normal distribution, the derivative of Φ. In short, β 1 multiplied by a factor ϕ(β 0 + β 1 x i ) gives the marginal effect; β 1 alone does not. 14. Exercise: (True or False) the sign of marginal effect dp r(y i=1 x i ) dx i sign of β 1. only depends on the 6
7 15. Equation [17.14] of the textbook suggests replacing x i with its sample average. qui sum diratio. sca factor1 = normalden( *r(mean)). dis "marginal effect of diratio on denial probability is " 2.97*factor1 marginal effect of diratio on denial probability is Alternatively, we can use equation [17.17]:. gen gxb = normalden( *diratio). qui sum gxb. dis "marginal effect of diratio on denial probability is " *r(mean) marginal effect of diratio on denial probability is The new version stata offers a command called margins to obtain the marginal effect directly using [17.17]. qui probit den dira. margins, dydx(dira) Average marginal effects Number of obs = 2380 Expression : Pr(deny), predict() dy/dx w.r.t. : diratio Delta-method dy/dx Std. Err. z P> z [95% Conf. Interval] diratio So on average, the denial probability rises by =( )(0.1) when diratio increases by 0.1. The effect of diratio on denial probability is significant with z-value (Optional) Finally, there is a set of stata commands that allow you to specify the log likelihood explicitly and do the MLE 7
8 capture program drop lfprobit program lfprobit version 10.0 args lnf xb local y "$ML_y1" quietly replace lnf = ln( normal( xb )) if y ==1 quietly replace lnf = ln(1-normal( xb )) if y ==0 end ml model lf lfprobit (deny = diratio) ml maximize initial: log likelihood = alternative: log likelihood = rescale: log likelihood = Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Number of obs = 2380 Wald chi2(1) = Log likelihood = Prob > chi2 = deny Coef. Std. Err. z P> z [95% Conf. Interval] diratio _cons (Optional) We can drive the equation (6) by using a latent variable, say badness (score), which is a linear function of debt-income ratio. We assume a mortgage application is 8
9 denied if badness is positive P r(y i = 1) = P r(badness > 0) = P r(β 0 + β 1 x i + u i > 0) = P r(u > β 0 β 1 x i ) = 1 Φ( β 0 β 1 x i ) = Φ(β 0 + β 1 x i ) where the second to last step assumes that u i follows standard normal distribution. 9
10 Logit Model and Odds Ratio 1. The only difference is, Logit model (or Logistic regression) replaces (6) with where p i = e z 1+e z is the cdf of the logistic distribution. 2. Exercise: verify that the cdf is between 0 and 1. eβ 0+β 1 x i 1 + e β 0+β 1 x i (10) 3. The intercept coefficient and slope coefficient of logit model are NOT comparable to those of probit model.. logit deny diratio Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logistic regression Number of obs = 2380 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = deny Coef. Std. Err. z P> z [95% Conf. Interval] diratio _cons However, the marginal effect, which is what really matters, is similar:. margin, dydx(diratio) Average marginal effects Number of obs = 2380 Expression : Pr(deny), predict() dy/dx w.r.t. : diratio 10
11 Delta-method dy/dx Std. Err. z P> z [95% Conf. Interval] diratio So on average, the denial probability rises by =( )(0.1) when diratio increases by 0.1. Notice that This average marginal effect is also close to that reported by LPM ( ), which assumes the marginal effect is constant. 5. (Optional) The set of commands to do MLE explicitly for logit model is capture program drop lflogit program lflogit end version 10.0 args lnf xb local y "$ML_y1" quietly replace lnf = ln( invlogit( xb )) if y ==1 quietly replace lnf = ln(1-invlogit( xb )) if y ==0 ml model lf lflogit (deny = diratio) ml maximize 6. Logit model is very popular in industry due to the fact that β 1 and e β 1 have intuitive interpretations. Notice that (drop β 0 for simplicity) where p i 1 p i log p i = 1 p i = p i e β 1x i 1 + e β 1x i (11) e β 1x i (12) = e β 1x i (13) 1 p ) i = β 1 x i (14) 1 p i ( pi is called odds for denial, and β 1 measures the effect of x on log of odds. 11
12 7. It is especially interesting if x is a dummy variable as well. For instance, we have a dummy variable called black:. logit deny black, nolog Logistic regression Number of obs = 2380 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = deny Coef. Std. Err. z P> z [95% Conf. Interval] black _cons So the log of denial odds for black is greater than white by According to (13), the denial odds for (w)hite and (b)lack are p w i 1 p w i p b i 1 p b i = e β 1(0) = e β 1(1) (15) (16) p b i 1 p b i p w i 1 p w i = e β 1 (17) Equation (17) indicates that the odds ratio is given by e β 1.. dis "odds ratio is " exp( ) odds ratio is logit deny black, or nolog deny Odds Ratio Std. Err. z P> z [95% Conf. Interval] black _cons
13 So the option or can report the odds ratio directly. 9. (Optional) This is the idea of Delta method. Apply the Middle Value Theorem to a nonlinear function g( ˆβ) : g( ˆβ) = g(β) + g ( β)( ˆβ β), where β is between β and ˆβ. It follows that var(g( ˆβ)) [g ( ˆβ)] 2 var( ˆβ) sd(g( ˆβ)) g ( ˆβ) sd( ˆβ) For this problem, g( ˆβ 1 ) = e ˆβ 1, and g ( ˆβ 1 ) = e ˆβ 1.. dis "standard error of exp(beta1) is " exp( )* standard error of exp(beta1) is Alternatively, the odds ratio can be obtained from two-way table. label define blackl 0 "white" 1 "black". label value black blackl. tab black deny, row Key frequency row percentage deny black Accept Deny Total white 1, , black
14 Total 2, , So the denial odds for white is = ; the denial odds for black is = ; finally, the odds ratio is = , the same number reported by command logit deny black, or 11. The advantage of logit model over the two-way table is that the former can control for other factors. logit deny black diratio, or nolog Logistic regression Number of obs = 2380 LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = deny Odds Ratio Std. Err. z P> z [95% Conf. Interval] black diratio _cons dis "t test of odds ratio = 1 is " ( )/ t test of odds ratio = 1 is So after holding constant debt-income ratio, black s denial odds is more than three times of white. With t-value = we can easily reject the null hypothesis of odds ratio = 1, that is, race does not matter for denial probability. 14
15 Multinomial Logit Model (Optional) 1. In this case, the dependent variable is still categorical, however, there are more than two outcomes. In terms of statistics, the random variable follows multinomial distribution. 2. We use the data provided by stata to illustrate multinomial logit model. webuse sysdsn3 (Health insurance data). list insure age in 1/ insure age Indemnity Prepaid Uninsure Prepaid tab insure insure Freq. Percent Cum Indemnity Prepaid Uninsure Total The dependent variable is insure (insurance plan), a categorical variable that can take three values: Indemnity, Prepaid and Uninsure. We want to know how age affects people s decision about insurance plan. 3. One option is to run two logit models seperatedly, one compares prepaid to indemnity, and the other compares uninsure to indemnity. Toward that end, we generate three dummy varables, one for each category:. qui tab insure, gen(y) 15
16 . tab y2 insure==pre paid Freq. Percent Cum Total tab y3 insure==uni nsure Freq. Percent Cum Total Then we run two seperate logistic regressions:. logit y2 age if y3==0, nolog y2 Coef. Std. Err. z P> z [95% Conf. Interval] age _cons logit y3 age if y2==0, nolog y3 Coef. Std. Err. z P> z [95% Conf. Interval] age _cons So when a person gets one year older, the log of odds (prepaid vs indemnity) goes downs 16
17 by ; the log of odds (uninsure vs indemnity) goes downs by Both changes are insignificant. You can add option or to get the odds-ratio interpretation. 4. Option two is using command mlogit, which conducts a joint comparison:. mlogit insure age, nolog Multinomial logistic regression Number of obs = 615 LR chi2(2) = 1.96 Prob > chi2 = Log likelihood = Pseudo R2 = insure Coef. Std. Err. z P> z [95% Conf. Interval] Indemnity (base outcome) Prepaid age _cons Uninsure age _cons We see the coefficient of age in this multinomial logistic regression is similar to the group-wise logistic regression. 5. Let x denote the outcome-invariant regressor, e.g., age. Suppose outcome one is the base outcome. Let β 1 be the coefficient of x when outcome two is compared to outcome one; β 2 be the coefficient of x when outcome three is compared to outcome one. The 17
18 multinomial logit model assumes that P r(y = outcome one) = P r(y = outcome two) = P r(y = outcome three) = 1 (18) 1 + e β 1x + e β 2x e β 1x 1 + e β 1x + e β 2x (19) e β 2x 1 + e β 1x + e β 2x (20) Equivalently, e β 1x e β 2x = = P r(y = outcome two) P r(y = outcome one) P r(y = outcome three) P r(y = outcome one) (21) (22) This model imposes a restriction called Independence of Irrelevant Alternatives (IIA). That is, the comparison of outcomes one and two has nothing to do with outcome three since β 2 is absent in (21). IIA explains why we get very similar results from mlogit and two separated logit. 6. We can use a loop to figure out probability for each outcome for a range of age local ag = 20 while ag <=80 { local xb1 = ( )* ag local xb2 = ( )* ag dis "age is " ag dis "probability of indemnity is " 1/(1+exp( xb1 )+exp( xb2 )) dis "probability of prepaid is " exp( xb1 )/(1+exp( xb1 )+exp( xb2 )) dis "probability of uninsured is " exp( xb2 )/(1+exp( xb1 )+exp( xb2 )) dis "" local ag = ag + 10 } age is 20 probability of indemnity is probability of prepaid is probability of uninsured is
19 ... age is 80 probability of indemnity is probability of prepaid is probability of uninsured is The ML commands are qui tab insure, gen(pr) global y1 "pr1" global y2 "pr2" global y3 "pr3" capture program drop lfmlogit program lfmlogit version 10.0 args lnf xb1 xb2 tempvar p1 p2 p3 quietly { gen double p1 = 1/(1+exp( xb1 )+exp( xb2 )) gen double p2 = exp( xb1 )/(1+exp( xb1 )+exp( xb2 )) gen double p3 = exp( xb2 )/(1+exp( xb1 )+exp( xb2 )) replace lnf = $y1*ln( p1 ) + $y2*ln( p2 ) + $y3*ln( p3 ) } end ml model lf lfmlogit (eq1: insure = age) (eq2: insure = age) ml maximize 19
Chapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationECON Introductory Econometrics. Lecture 11: Binary dependent variables
ECON4150 - Introductory Econometrics Lecture 11: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationHomework Solutions Applied Logistic Regression
Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that
More informationBinary Dependent Variable. Regression with a
Beykent University Faculty of Business and Economics Department of Economics Econometrics II Yrd.Doç.Dr. Özgür Ömer Ersin Regression with a Binary Dependent Variable (SW Chapter 11) SW Ch. 11 1/59 Regression
More informationECON 594: Lecture #6
ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationRegression with a Binary Dependent Variable (SW Ch. 9)
Regression with a Binary Dependent Variable (SW Ch. 9) So far the dependent variable (Y) has been continuous: district-wide average test score traffic fatality rate But we might want to understand the
More informationi (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)
ECO 6375 Prof Millimet Problem Set #2: Answer Key Stata problem 2 Q 3 Q (a) The sample average of the individual-specific marginal effects is 0039 for educw and -0054 for white Thus, on average, an extra
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationNonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as
Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers 1. Consider a binary random variable y i that describes a Bernoulli trial in which the probability of observing y i = 1 in any draw is given
More informationCRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.
CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear
More informationJeffrey M. Wooldridge Michigan State University
Fractional Response Models with Endogenous Explanatory Variables and Heterogeneity Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Fractional Probit with Heteroskedasticity 3. Fractional
More informationEmpirical Application of Panel Data Regression
Empirical Application of Panel Data Regression 1. We use Fatality data, and we are interested in whether rising beer tax rate can help lower traffic death. So the dependent variable is traffic death, while
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationEstimating and Interpreting Effects for Nonlinear and Nonparametric Models
Estimating and Interpreting Effects for Nonlinear and Nonparametric Models Enrique Pinzón September 18, 2018 September 18, 2018 1 / 112 Objective Build a unified framework to ask questions about model
More informationEcon 371 Problem Set #6 Answer Sheet. deaths per 10,000. The 90% confidence interval for the change in death rate is 1.81 ±
Econ 371 Problem Set #6 Answer Sheet 10.1 This question focuses on the regression model results in Table 10.1. a. The first part of this question asks you to predict the number of lives that would be saved
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More informationLecture 7: OLS with qualitative information
Lecture 7: OLS with qualitative information Dummy variables Dummy variable: an indicator that says whether a particular observation is in a category or not Like a light switch: on or off Most useful values:
More informationraise Coef. Std. Err. z P> z [95% Conf. Interval]
1 We will use real-world data, but a very simple and naive model to keep the example easy to understand. What is interesting about the example is that the outcome of interest, perhaps the probability or
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationEmpirical Application of Simple Regression (Chapter 2)
Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget
More informationfhetprob: A fast QMLE Stata routine for fractional probit models with multiplicative heteroskedasticity
fhetprob: A fast QMLE Stata routine for fractional probit models with multiplicative heteroskedasticity Richard Bluhm May 26, 2013 Introduction Stata can easily estimate a binary response probit models
More informationLab 10 - Binary Variables
Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. The Sharp RD Design 3.
More informationDescription Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas Acknowledgment References Also see
Title stata.com hausman Hausman specification test Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas Acknowledgment References Also see Description hausman
More informationMixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.
Mixed Models for Longitudinal Binary Outcomes Don Hedeker Department of Public Health Sciences University of Chicago hedeker@uchicago.edu https://hedeker-sites.uchicago.edu/ Hedeker, D. (2005). Generalized
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationUsing the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.
Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily
More informationNonlinear Regression Functions
Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.
More informationA Journey to Latent Class Analysis (LCA)
A Journey to Latent Class Analysis (LCA) Jeff Pitblado StataCorp LLC 2017 Nordic and Baltic Stata Users Group Meeting Stockholm, Sweden Outline Motivation by: prefix if clause suest command Factor variables
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 13 Nonlinearities Saul Lach October 2018 Saul Lach () Applied Statistics and Econometrics October 2018 1 / 91 Outline of Lecture 13 1 Nonlinear regression functions
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationLecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression
Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression Tore Schweder October 28, 2011 Outline Examples of binary respons variables Probit and logit - examples
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationNinth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"
Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric
More informationh=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio
Multivariate Response Models The response variable is unordered and takes more than two values. The term unordered refers to the fact that response 3 is not more favored than response 2. One choice from
More informationCourse Econometrics I
Course Econometrics I 3. Multiple Regression Analysis: Binary Variables Martin Halla Johannes Kepler University of Linz Department of Economics Last update: April 29, 2014 Martin Halla CS Econometrics
More informationProblem set - Selection and Diff-in-Diff
Problem set - Selection and Diff-in-Diff 1. You want to model the wage equation for women You consider estimating the model: ln wage = α + β 1 educ + β 2 exper + β 3 exper 2 + ɛ (1) Read the data into
More informationLogistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model
Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume
More informationExtensions to the Basic Framework II
Topic 7 Extensions to the Basic Framework II ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Nonlinear regression Limited Dependent Variable regression Applications of
More informationSociology 362 Data Exercise 6 Logistic Regression 2
Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs
More informationHow To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 4.20) revised
WM Mason, Soc 213B, S 02, UCLA Page 1 of 15 How To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 420) revised 4-25-02 This document can function as a "how to" for setting up
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force
More informationEcon 371 Problem Set #6 Answer Sheet In this first question, you are asked to consider the following equation:
Econ 37 Problem Set #6 Answer Sheet 0. In this first question, you are asked to consider the following equation: Y it = β 0 + β X it + β 3 S t + u it. () You are asked how you might time-demean the data
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationPOLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I
POLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I Introduction For the next couple weeks we ll be talking about unordered, polychotomous dependent variables. Examples include: Voter choice
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationSociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,
Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements
More informationLimited Dependent Variable Models II
Limited Dependent Variable Models II Fall 2008 Environmental Econometrics (GR03) LDV Fall 2008 1 / 15 Models with Multiple Choices The binary response model was dealing with a decision problem with two
More informationLab 6 - Simple Regression
Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello
More informationEconometrics Honor s Exam Review Session. Spring 2012 Eunice Han
Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationPOLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II
POLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II Introduction Today we ll talk about interpreting MNL and CL models. We ll start with general issues of model fit, and then get to variable
More informationUsing generalized structural equation models to fit customized models without programming, and taking advantage of new features of -margins-
Using generalized structural equation models to fit customized models without programming, and taking advantage of new features of -margins- Isabel Canette Principal Mathematician and Statistician StataCorp
More informationLecture 3.1 Basic Logistic LDA
y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data
More informationBinary Outcomes. Objectives. Demonstrate the limitations of the Linear Probability Model (LPM) for binary outcomes
Binary Outcomes Objectives Demonstrate the limitations of the Linear Probability Model (LPM) for binary outcomes Develop latent variable & transformational approach for binary outcomes Present several
More informationSociology Exam 2 Answer Key March 30, 2012
Sociology 63993 Exam 2 Answer Key March 30, 2012 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher has constructed scales
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationECONOMETRICS HONOR S EXAM REVIEW SESSION
ECONOMETRICS HONOR S EXAM REVIEW SESSION Eunice Han ehan@fas.harvard.edu March 26 th, 2013 Harvard University Information 2 Exam: April 3 rd 3-6pm @ Emerson 105 Bring a calculator and extra pens. Notes
More informationMonday 7 th Febraury 2005
Monday 7 th Febraury 2 Analysis of Pigs data Data: Body weights of 48 pigs at 9 successive follow-up visits. This is an equally spaced data. It is always a good habit to reshape the data, so we can easily
More informationControl Function and Related Methods: Nonlinear Models
Control Function and Related Methods: Nonlinear Models Jeff Wooldridge Michigan State University Programme Evaluation for Policy Analysis Institute for Fiscal Studies June 2012 1. General Approach 2. Nonlinear
More informationECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests
ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More informationProbit Estimation in gretl
Probit Estimation in gretl Quantitative Microeconomics R. Mora Department of Economics Universidad Carlos III de Madrid Outline Introduction 1 Introduction 2 3 The Probit Model and ML Estimation The Probit
More informationGeneralized linear models
Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution
More informationInterpreting coefficients for transformed variables
Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable
More informationECON 626: Applied Microeconomics. Lecture 11: Maximum Likelihood
ECON 626: Applied Microeconomics Lecture 11: Maximum Likelihood Professors: Pamela Jakiela and Owen Ozier Department of Economics University of Maryland, College Park Maximum Likelihood: Motivation So
More informationLecture#12. Instrumental variables regression Causal parameters III
Lecture#12 Instrumental variables regression Causal parameters III 1 Demand experiment, market data analysis & simultaneous causality 2 Simultaneous causality Your task is to estimate the demand function
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationWeek 2: Review of probability and statistics
Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED
More informationData-analysis and Retrieval Ordinal Classification
Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%
More informationLecture notes to Stock and Watson chapter 8
Lecture notes to Stock and Watson chapter 8 Nonlinear regression Tore Schweder September 29 TS () LN7 9/9 1 / 2 Example: TestScore Income relation, linear or nonlinear? TS () LN7 9/9 2 / 2 General problem
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationDealing With and Understanding Endogeneity
Dealing With and Understanding Endogeneity Enrique Pinzón StataCorp LP October 20, 2016 Barcelona (StataCorp LP) October 20, 2016 Barcelona 1 / 59 Importance of Endogeneity Endogeneity occurs when a variable,
More informationEconometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit
Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationMaximum Likelihood Estimation
Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum
More informationUnderstanding the multinomial-poisson transformation
The Stata Journal (2004) 4, Number 3, pp. 265 273 Understanding the multinomial-poisson transformation Paulo Guimarães Medical University of South Carolina Abstract. There is a known connection between
More informationLab 11 - Heteroskedasticity
Lab 11 - Heteroskedasticity Spring 2017 Contents 1 Introduction 2 2 Heteroskedasticity 2 3 Addressing heteroskedasticity in Stata 3 4 Testing for heteroskedasticity 4 5 A simple example 5 1 1 Introduction
More informationLogistic Regression Analysis
Logistic Regression Analysis Predicting whether an event will or will not occur, as well as identifying the variables useful in making the prediction, is important in most academic disciplines as well
More informationModels for Binary Outcomes
Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.
More informationOrdinal Independent Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 9, 2017
Ordinal Independent Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 9, 2017 References: Paper 248 2009, Learning When to Be Discrete: Continuous
More informationChapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.
More informationcrreg: A New command for Generalized Continuation Ratio Models
crreg: A New command for Generalized Continuation Ratio Models Shawn Bauldry Purdue University Jun Xu Ball State University Andrew Fullerton Oklahoma State University Stata Conference July 28, 2017 Bauldry
More informationEconometrics II Tutorial Problems No. 1
Econometrics II Tutorial Problems No. 1 Lennart Hoogerheide & Agnieszka Borowska 15.02.2017 1 Summary Binary Response Model: A model for a binary (or dummy, i.e. with two possible outcomes 0 and 1) dependent
More informationQuantitative Methods Final Exam (2017/1)
Quantitative Methods Final Exam (2017/1) 1. Please write down your name and student ID number. 2. Calculator is allowed during the exam, but DO NOT use a smartphone. 3. List your answers (together with
More informationLatent class analysis and finite mixture models with Stata
Latent class analysis and finite mixture models with Stata Isabel Canette Principal Mathematician and Statistician StataCorp LLC 2017 Stata Users Group Meeting Madrid, October 19th, 2017 Introduction Latent
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationAddition to PGLR Chap 6
Arizona State University From the SelectedWorks of Joseph M Hilbe August 27, 216 Addition to PGLR Chap 6 Joseph M Hilbe, Arizona State University Available at: https://works.bepress.com/joseph_hilbe/69/
More informationExercise 7.4 [16 points]
STATISTICS 226, Winter 1997, Homework 5 1 Exercise 7.4 [16 points] a. [3 points] (A: Age, G: Gestation, I: Infant Survival, S: Smoking.) Model G 2 d.f. (AGIS).008 0 0 (AGI, AIS, AGS, GIS).367 1 (AG, AI,
More informationChapter 10 Logistic Regression
Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome
More informationAssessing the Calibration of Dichotomous Outcome Models with the Calibration Belt
Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Giovanni Nattino The Ohio Colleges of Medicine Government Resource Center The Ohio State University Stata Conference -
More informationStatistical Modelling with Stata: Binary Outcomes
Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls
More informationWarwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation
Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory
More information