Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages or heght; b. Or, s bounded above at, such as top-coded ncome n BLS data; c. Or, s dscrete, takng on just the values or, such as n yes/no responses to survey questons, or bankruptcy ndcators for frms; d. Or, s dscrete, but ordnal and many-valued, such as data that have been grouped nto ranges (often the case for ncome data, or responses to survey questons about the depth of agreement (strongly agree, somewhat agree, don t agree at all; e. Or, s dscrete and many-valued, but not ordnal, such as transt mode choces (dd you take the bus, drve a car, bcycle or walk?. 2. Let us begn wth the smplest case, s a zero-one bnary varable, reflectng the answer to a yes/no queston, coded for yes, for no. a. Consder an OLS regresson of on. If greater values of are assocated wth hgher probabltes of yes, then the OLS regresson coeffcent on wll reflect ths as a postve coeffcent. Indeed, the coeffcent wll have a mean value equal to the margnal mpact of on the probablty that s a yes. Gven that, we call ths the lnear probablty model. OLS regressons for a lnear probablty model delver consstent estmates of coeffcents for an underlyng model where. P[ = ] = β + ε. To see ths, thnk about a model where P[=]=.5 for = and P[=]=.6 for =2. In ths case, 5% of cases wll have = and 5% wll have = at =, but 6% wll have = and 4% wll have = at =2. The regresson wants to fnd the condtonal expectaton of gven, and gven the / codng of, ths condtonal expectaton s the probablty we are seekng. b. Ths lnear probablty model s unsatsfactory, though, for at least two reasons:. The dsturbances are not dstrbuted ncely. In the example above, the dsturbances are evenly splt between the values [-½, ½] at = and take on the values [-.6.4] 4% and 6% of the tme, respectvely, at =2. Although they are everywhere mean-zero, they are heteroskedastc because ther dstrbuton s dfferent at dfferent values of. (Dfferent dstrbuton wth same mean usually mples dfferent varance.. The model s aesthetcally unappealng because we are estmatng a model whch we know could not generate the data. In the data, are not lnear n, but we are estmatng a model n whch s lnear n. c. The solutons to both these problems come wth a sngle strategy: wrte down a model whch could concevably generate the data, and try to estmate ts parameters. The cost of dong ths s that we typcally have to wrte out an explct dstrbuton for the dsturbances (whch we don t have to do when we use OLS and asymptotc varances and tests.
3. Bnary choce models: Logts and Probts. a. Latent varable approach. Assume that there s some latent (unobserved varable whch determnes the observed varable, and whch behaves ncely (usually, lnearly. b. = β + ε, ε ~ f ( ε = f c. Here, f s some nce dstrbuton for the dsturbance terms, perhaps normal, perhaps not. The lnear probablty model would lkely not satsfy any a pror dstrbuton, and snce ths dstrbuton s the same for each and every, the lnear probablty model would certanly suffer from heteroskedastcty. 4. The method of maxmum lkelhood says: "fnd the parameters whch maxmse the probablty of seeng the sample that you saw". = β + ε, ε ~ f ( ε a. For a general model, the probablty of seeng a whole set of observaton s really just the probablty of seeng that set of dsturbances. Consequently, maxmum lkelhood s a method of moments approach, because maxmsng lkelhood mples solvng a frst-order condton, and we solve the frst order condton by pluggng n resduals for dsturbances.. For any, ( P[ =, = ] = P[ ε = ε ] = P[ ε = β ] = f β and snce we know f, and the dsturbances are ndependent, the probablty of seeng the whole set of observatons, known as the lkelhood s:. L = P[,...,,,..., ; β ] = f ( β =. The method of maxmum lkelhood says, choose by maxmsng L. v. The lkelhood above s expressed over dsturbances, but once we choose a β, we are workng wth resduals. So, mplementaton of ths maxmsaton nvolves substtutng resduals for dsturbances. v. Products are a drag to work wth, so t you take the log, t turns nto a sum: v. L = P β = f ( β ln ln [,...,,,..., ; ] ln = 2 2 v. If f were the normal densty functon, then = ( ln f ( ε = lnσ ln 2 π + ε / 2σ 2 v., β f ( ε exp ε / σ 2 / σ 2π 2 2 ( ( 2 2 ln L = lnσ ln2 π β /2σ 2 x., and the frst-order condton for maxmsaton s =
ln L = ( β = ' x., aka, e = β = ( ths expresson s a moment condton, because the frst order condton requres that a partcular weghted average of dsturbances equals zero. We solve t by substtutng resduals for dsturbances. x. So, f you know f, you can typcally do ML. ML wth normalty on lnear models yelds the OLS estmator. 5. Consder the bnary choce latent varable model as a maxmum lkelhood problem. a. = β + ε, ε ~ f ( ε = f v b. Defne the cumulatve densty functon F dual to f as F( v = f ( u u, so that F(v gves the probablty of the dsturbance beng less than v. c. What s the probablty of seeng any partcular, par? P[, ] = P[ β + ε ] = P[ ε β ] = P[ ε β ] d., and = F ( β P[, ] = P[ β + ε < ] = P[ ε < β ] e., whch sum to zero. = F( β f. ASSUME that f s symmetrc, so that a lot of mnuses dsappear. If f s symmetrc, then ( β = ( β ( β ( β F F, F = F g. The lkelhood s thus gven by ( β ( β ( L = F F, h. And the log-lkelhood s =.. = ( ( β ( ( β ln L = ln F + ln F
j. ( ln F ( ln β F ( β ( ( ( F ( β F ( β ( β ( β ( β ( β ln L = β β β = = F F = = f ( β = = F ( β F ( β. Ths s the moment condton for the bnary dscrete choce maxmum lkelhood problem. It s analogous to e= n OLS. F( β = β, f ( β = k. If the probabltes are lnear n, then. In ths l. case, the frst-order condton on the lkelhood functon becomes ln L = β = β β = ( β = = m. That s why we call the OLS approach to ths model the lnear probablty model.. It results n a lnear optmsaton problem whch has a lnear soluton. 6. PROBIT model. 2 ( u φ( u = exp / 2 / 2π a. Denote the standard normal densty as, and denote Φ ( v = φ( u u cumulatve densty of the standard normal densty as. b. So, the probablty of seeng a partcular par,,, s one or the other of these P[, ] = P[ β + ε ] = P[ ε β ] = P[ ε β ] β = Φ σ β c. P[, ] = P[ β + ε < ] = P[ ε < β ] = Φ σ d. These sum to zero. e. The lkelhood s thus gven by β β L = Φ Φ, = σ σ f. And the log-lkelhood s v
β β ln L = ln Φ + ( ln Φ = σ σ β, σ g.. h. Here, are not dentfed, because scalng these parameters by any scalar does not affect the lkelhood.. So, one must mpose a restrcton. Typcally, we ether mpose that one element of β s, or that σ =. They are equally vald dentfyng restrctons, so t s up to convenence to choose. Let us choose σ =.. Frst-order condtons are j. ( ln Φ ( ln β Φ ( β ( ( ( Φ ( β Φ ( β ( β ( β ( β ( β ln L = β β β = = Φ Φ = = = Φ ( β Φ ( β φ ( β k. These frst-order condtons are nonlnear n the parameters, and so the soluton must be found by teraton. l. The dffcult thng here n fndng a soluton s that the parameters are nsde the normal pdf, whch has an explct analytc form, and nsde the normal cdf, whch does not have a close analytc form. m. The pcture for ths s that whch maps y-hat nto y. y-hat maps onto a number lne. For that reason, sometmes, these are called ndex models. If the ndex y- hat s less than zero, then y s zero. The probt puts a pdf onto the densty of ε ε, β corresponds to y-hat less than zero. 7. LOGIT model a. The logt model s a smple alternatve to probt model whch has a smlar dstrbuton for the dsturbance terms, but s much easer to solve. b. Consder the logstc cumulatve dstrbuton functon, denoted wth a captal lambda, exp F( v = Λ ( v = + exp ( v ( v. The assocated probablty densty F( v Λ( v = = Λ( v Λ( v v ( v functon s gven by. c. If we substtute these nto the frst-order condton for the lkelhood functon,
ln L = Λ Λ β Λ β Λ β = ( ( ( Λ ( β ( Λ ( β Λ ( β Λ ( β (( ( β ( ( β ( β ( ( β ( ( β ( β = Λ Λ = = Λ Λ Λ = = = Λ ( ( β ( β d. ou can see how ths mght be easer to solve wth a computer. o ratos. Everythng n there can be expressed analytcally. 8. Interpretaton of Parameters a. The parameter β tells you the margnal effect of on F. F gves the expectaton of gven, but because t s nonlnear n ts argument, β does not gve the dervatve of on. In partcular, b. E[ ] = F( β = E[ = ] F( β = = β f ( β c. So, the margnal effect depends on. In a probt, the margnal effect s gven by βφ ( β, and n a logt, the margnal effect s gven by ( ( ( P ( P, where P s the probablty for. βλ β Λ β = β 9. Multnomal choce wth ordered choces a. Consder a latent varable model wth ordered choces gven by b. = β + ε, ε ~ f ( ε = f < µ = 2f µ < µ... 2 = f < µ c. Ths s extremely smlar to the models above, except that there s an addtonal unknown set of parameters. d. Assume that f s the normal probablty densty functon. (ou could assume any pdf, as long as ts form s known.
e. Then, the probabltes of the varous outcomes are gven by: P[ = ] = Φ ( β ( µ β ( β ( µ β ( µ β P[ = ] = Φ Φ P[ = 2 ] = Φ Φ... = = = 2 ( µ β P[ = ] = Φ = f. Gven that these are the probabltes, the margnal effects of are gven by dfferentatng these probabltes wth respect to. g. These probabltes can be crammed nto a log-lkelhood functon, whch can be maxmsed wth respect to the parameters. β, µ,..., µ. Multnomal choce wth unordered choces. a. Probabltes of dong varous choces are modelled drectly. Here, we assume that could take on the values j=,...,. Here, there are gong to be - latent varables whch generate - probabltes for beng observed n the possble categores. Logt dstrbutons are easy to work wth here. j ( β j + ( β exp P[ = j = ] = exp b.. If =, then ths reverts to the bnary j= logt. Gven ths structure, the probablty that and are observed together s P[ = ] = d P[ = j ] c. Wth d a dummy equal to f s j. = j = j= d. The log lkelhood s gven by ln L = d ln P = j = e. And the frst-order condton s ln L j β = j= j f. ( j = ( j j = = = d P = j = d P, j =,...,. Ths smple dervatve s a consequence of the use of the logt functon. g. Chapters 2 and 22 n Greene are very good on lmted dependent varables.