Basically, if you have a dummy dependent variable you will be estimating a probability.

ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy dependent varable you wll be estmatng a probablty. Probabltes are necessarly restrcted to fall n the range [0,1] and ths puts specal condtons on the regresson. Just dong a lnear regresson can result n estmated probabltes that are ether negatve or greater than 1 and are a bt nonsenscal. As a result, there are other technques for estmatng these relatonshps that can generate better results. The Lnear Probablty Model The second or thrd best way to estmate models wth dummy dependent varables s to smply estmate the model as you normally mght: D = β 0 + β 1 X 1 + β 2 X 2 + ε For example, f you have a sample of the U.S. adult populaton and you're tryng to determne the probablty that a person s ncarcerated, you mght estmate the equaton: where D = β 0 + β 1 AGE + β 2 GENDER + ε D s a dummy varable takng the value 1 f a person s ncarcerated and 0 f not AGE s the person's age GENDER s a dummy varable equal to 1 f the person s male and 0 otherwse Imagne that the estmated coeffcents are: Dˆ = 0.0043-0.0001*AGE + 0.0052*GENDER Interpretaton of the estmated coeffcents s straghtforward. If there are two women, one of whom s one year older than the other, the estmated probablty that the older one wll be ncarcerated wll be 0.0001 less than the estmated probablty that the younger one wll be.

ECON 497: Lecture Notes 13 Page 2 of 2 If there are a man and a woman of the same age, the predcted probablty that the man wll be ncarcerated s 0.0052 greater than the predcted probablty that the woman wll be ncarcerated. Interestngly, a woman of age 43 wll have a predcted probablty of 0.0000 and women older than ths wll have negatve predcted probabltes of ncarceraton. Studenmund descrbes ssues regardng the lnear probablty model and you should read ths dscusson. One thng I wll pont out s that the adjusted R 2 s not an accurate measure of overall ft n a lnear probablty model wth dummes as dependent varables. The Weghted Least Squares Approach The most complcated pont from Studenmund's dscusson of the lnear probablty model s the dscusson of weghted least squares. Ths technque s desgned to get around the problem of heteroskedastcty (whch we haven't really dscussed yet) and can be summarzed as follows: 1. Due to the structure of the lnear probablty model, the error terms are not dentcally dstrbuted. Specfcally, error terms wll have greater varance when the actual probablty s close to 0.5 and smaller varance when the actual probablty s close to zero or one. Because all of the error terms are not dentcally dstrbuted (they have dfferent varances) there s a problem wth heteroskedastcty. Coeffcent estmates, however, wll be unbased as long as the other classcal assumptons are satsfed. 2. To address ths problem, do the standard lnear regresson and then use the estmated coeffcents to generate the predcted probabltes ( Dˆ ) for each observaton. Excel wll do ths for you f you ask t ncely. 3. Use these predcted probabltes ( Dˆ ) to generate a new value, whch s equal to the square root of Dˆ ). Call ths Z = [ Dˆ )] 1/2 4. Dvde the dependent and explanatory varables by ths new varable (Z ). 5. Now, redo the regresson usng the values of the dependent and explanatory varables, whch have been dvded by Z. The standard errors and the t-statstcs for the estmated coeffcents wll be dfferent and more accurate. Brefly, the dea behnd ths s that observatons whose error terms have greater varance should be have less nfluence than do those whose error terms have smaller varance. The

ECON 497: Lecture Notes 13 Page 3 of 3 closer to 0.5 D ˆ s, the larger the varance of the error term s lkely to be, so the observatons are weghted by 1/[ Dˆ )] 1/2 = [ Dˆ )] -1/2. The Bnomal Logt Model The proper way to estmate these models s by usng the bnomal logt model. To do ths, the dependent varable needs to be transformed. The equaton to be estmated s: D ln 1 D = β0 + β1x1 + β2x2 + ε The dependent varable s the log of the odds rato and s equal to nfnty f D =1 and s equal to negatve nfnty f D =0. The predcted probablty s equal to Dˆ = 1+ e 1 ( βˆ +βˆ X +βˆ ) X 0 1 1 2 2 The nterpretaton of the estmated coeffcents s less straghtforward here. Estmated coeffcents show the effect of a change n an explanatory varable on the predcted log of the odds rato, not on the probablty tself. Bascally, you can only tell whether an explanatory varable has a postve or negatve mpact on the probablty, not how large that mpact s. An addtonal complcaton s that logt models cannot be estmated usng OLS, so they can't really be done n Excel. Ths s somethng you need a real statstcal analyss package to do. Examples solcted from students. Bnomal Probt Model Ths s a model based on some slghtly dfferent assumptons than the bnomal logt model. In most cases the results from the two models are nearly dentcal. If you're estmatng ether a logt or a probt model, t's usually just one addtonal command to also estmate the other. You should do ths, just for completeness and to check that your mportant results are robust to changes n the model used.

ECON 497: Lecture Notes 13 Page 4 of 4 If a presenter s annoyng you n some way as they dscuss ther bnomal logt or bnomal probt results, you can make yourself equally annoyng by askng f they estmated the other model and f ther results were robust to the change. Ths s knd of a cheap queston and t really shouldn't dsturb them too much because, f they've been even slghtly responsble, they wll have done both. The real dfferent between the Logt and Probt models s that they have slghtly dfferent assumptons about the dstrbuton of the underlyng probabltes. The Probt uses the cumulatve dstrbuton functon of the Normal dstrbuton whle the Logt uses a lnear verson of the odds rato. Here's Studenmund's take on all ths: "From a researcher's pont of vew, the bggest dfferences between the two models are that the probt s based on the cumulatve normal dstrbuton and that the probt estmaton procedure uses more computer tme than does the logt. As computer programs are mproved, and as computer tme contnues to fall n prce, ths latter dfference may eventually dsappear. Snce the probt s smlar to the logt and s more expensve to run, why would you ever estmate one? The answer s that snce the probt s based on the normal dstrbuton, t's qute theoretcally appealng (because many economc varables are normally dstrbuted). Wth extremely large samples, ths advantage falls away, snce maxmum lkelhood procedures can be shown to be asymptotcally normal under farly general condton." Multnomal Logt Model If you have a qualtatve dependent varable that can take multple values, you may wsh to estmate a multnomal logt model. Ths can be a bt trcky and uncooperatve, and t can potentally requre a lot of computng tme, a complant data set wth lots of observatons of each qualtatve outcome and, most mportantly, a bg chunk of your lfe and your santy, not necessarly n that order. Bascally, the results from a multnomal logt model tell you about the effect that a change n the value of a varable has on the relatve probabltes of two of the possble outcomes. Dong ths wth some degree of relablty apparently requres a data set n whch you have a couple hundred observatons of each of the qualtatve outcomes. Example: Votng Choce Imagne that you have votng records showng demographc nformaton for a lot of people and who they voted for (Democrat, Republcan, Lbertaran) n the last electon. You mght use a multnomal logt model to dentfy the factors that have a sgnfcant mpact on makng someone vote Lbertaran rather than Republcan or Democrat.

ECON 497: Lecture Notes 13 Page 5 of 5 Example: Transportaton Choce Imagne that you get a hold of a transportaton survey from the Puget Sound Regonal Councl and you want to model transportaton choce of adults based on such thngs as ncome, number of chldren, commute dstance, etc. You mght use a multnomal logt model wth each possble choce as one possble value of the dependent varable.