Models with Discrete Dependent Variables

Models wth Dscrete Dependent Varables Based on J. Scott Long and Jeremy Freese (2006) Regresson Models for Categorcal Dependent Varables Usng Stata 2nd Edton (College Staton, TX: Stata Corporaton)

Introducton 1.1 Course Objectve: Ths class deals wth regresson models for categorcal dependent varables (CDVs),.e., varables that are bnary, ordnal, nomnal or count. Bnary varables ( 二分类变量 ) wth two categores ndcatng an event has occurred or that some characterstc s present, e.g., Dd a ctzen vote n the last electon? Do you consder yourself a Chnese? Do you support ndependence? Ordnal varables ( 有序多分类变量 ) wth multple categores that can be ranked, e.g., answers to survey questons that use strongly agree, dsagree, and strongly dsagree or often, occasonally, seldom, and never. Nomnal varables ( 无序多分类变量 ) wth multple categores that cannot be ranked, e.g., polcy preferences on prefer unfcaton, undetermned, prefer ndependence or ethnc denttes that nclude Tawanese, dual Chnese. Count varables ( 计次变量 ) ndcatng number of tmes that some event has occurred, e.g., how many poltcal demonstratons occurred last year? How many jobs dd someone have? 1.2 Readngs J. Scott Long and Jeremy Freese. 2006. Regresson Models for Categorcal Dependent Varables Usng Stata. Second Edton. (College Staton, TX: Stata Corporaton) J. Scott Long. 1997. Regresson Models for Categorcal and Lmted Dependent Varables (Thousand Oaks, CA: Sage Publcatons) J. Scott Long 着, 郑旭智等译,2002. 类别与受限依变项的回归统计模式, 台北 : 弘智文化事业有限公司 1.3 Before You Start: Installng SPost In order to try the examples below, you must nstall Stata Post-estmaton commands (or SPost). Whle you are n Stata, type search spost, net. One of the lnks shown on the screen s After you clck on the lnk, another lnk sayng clck here to nstall wll appear. Clck on the lnk to nstall SPost commands. 1.4 When dependent varables are categorcal, the lnear regresson model ( 线性回归 ) s napproprate because t would lead to based, neffcent and nonsenscal answers. 1

Examples: Bnary Dependent Varables Dd a ctzen vote n the last electon? Do you consder yourself a Chnese? Consder a varable y where: y value probablty 0 1/4 1 3/4 Then: E( y) 0 Pr( y 0) + 1 Pr( y 1) [ ] [ ] [ 0 1/ 4] + [ 1 3/ 4] 3/ 4 More generally, let y be a bnary random varable wth outcomes 0 and 1. Then: E( y) Pr( y 1) [ 0 Pr( y 0) ] + [ 1 Pr( y 1) ].e., the uncondtonal expected value of a bnary varable s ts probablty n the populaton. Thus, the condtonal expectaton s: E( y X Pr( y For example y ) 1 X [ 0 Pr( y 0 X )] + [ 1 Pr( y 1 X )] For a sngle ndependent varable, ncome Y 1 0 f f a α + β + ε ) X The expected outcome s: person voted she dd not vote E(Y ) Pr( Y 1 X ) α + βx 2

The model s lnear n the probablty: the lnear probablty model(lpm 线性概率模式 ) Estmaton wth STATA (a do-fle) *ncludng verson number verson 11 * f a log fle s open, close t capture log close *don t pause when output scrolls off the page set more off *log results to fle brm_bnlfp2 log usng brm_bnlfp2, replace *open data fle bnlfp2 use bnlfp2, clear *descrbe provdes nformaton of the dataset Descrbe (or data/descrbe data/ descrbe data n memory) *summarze provdes summary statstcs Summarze (or statstcs/summares, tables, and tests/summary and descrptve statstcs/summary statstcs) *regresson wth OLS regress lfp k5 k618 age wc hc lwg nc (or statstcs/lnear models and related/lnear regresson) 3

Output page -----------------------------------------------------------------------------. *open data fle bnlfp2. use bnlfp2, clear (Data from 1976 PSID-T Mroz).. *descrbe provdes nformaton of the dataset. descrbe Contans data from bnlfp2.dta obs: 753 Data from 1976 PSID-T Mroz vars: 8 30 Apr 2001 16:17 sze: 13,554 (99.9% of memory free) (_dta has notes) ----------------------------------------------------------------------------- storage dsplay value varable name type format label varable label ----------------------------------------------------------------------------- lfp byte %9.0g lfplbl Pad Labor Force: 1yes 0no k5 byte %9.0g # kds < 6 k618 byte %9.0g # kds 6-18 age byte %9.0g Wfe's age n years wc byte %9.0g collbl Wfe College: 1yes 0no hc byte %9.0g collbl Husband College: 1yes 0no lwg float %9.0g Log of wfe's estmated wages nc float %9.0g Famly ncome excludng wfe's ------------------------------------------------------------------------------- Sorted by: lfp. *summarze provdes summary statstcs. summarze Varable Obs Mean Std. Dev. Mn Max -------------+-------------------------------------------------------- lfp 753.5683931.4956295 0 1 k5 753.2377158.523959 0 3 k618 753 1.353254 1.319874 0 8 age 753 42.53785 8.072574 30 60 wc 753.2815405.4500494 0 1 -------------+-------------------------------------------------------- hc 753.3917663.4884694 0 1 lwg 753 1.097115.5875564-2.054124 3.218876 nc 753 20.12897 11.6348 -.0290001 96. regress lfp k5 k618 age wc hc lwg nc Source SS df MS Number of obs 753 -------------+------------------------------ F( 7, 745) 18.83 Model 27.7657494 7 3.96653564 Prob > F 0.0000 Resdual 156.962006 745.210687257 R-squared 0.1503 -------------+------------------------------ Adj R-squared 0.1423 Total 184.727756 752.245648611 Root MSE.45901 4

------------------------------------------------------------------------------ lfp Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 -.294836.0359027-8.21 0.000 -.3653185 -.2243534 k618 -.011215.0139627-0.80 0.422 -.038626.016196 age -.0127411.0025377-5.02 0.000 -.017723 -.0077591 wc.163679.0458284 3.57 0.000.0737109.2536471 hc.018951.042533 0.45 0.656 -.0645477.1024498 lwg.1227402.0301915 4.07 0.000.0634697.1820107 nc -.0067603.0015708-4.30 0.000 -.009844 -.0036767 _cons 1.143548.1270527 9.00 0.000.894124 1.392972 ------------------------------------------------------------------------------ The nterpretaton s straghtforward : e.g., for every addtonal chld under sx, the predcted probablty of a woman beng employed decreased by.30, holdng other varables constant Structural defects of LPM: Functonal form: The effect of a varable s the same regardless of the values of the other varables. It s often substantvely reasonable that the effects of ndependent varable wll have dmnshng returns as the predcted probablty approaches 0 or 1. Nonsenscal predctons ( 概率上下限的问题 ): At 35, a woman wth four young chldren, who dd not attend college nor dd her husband, and who s average on other varables has a predcted probablty of beng employed of -.498 If a woman has four young chldren compared to no young chldren, her predcted probablty of employment decreases by 1.18 (4*(-.295)). Does ths make substantve sense? Heteroscedastcty: Var( y X ) (1 p) p Xβ (1 Xβ ) E(X)p Var(X)p(1-p) 0.1 0.09 0.2 0.16 0.3 0.21 0.4 0.24 0.5 0.25 0.6 0.24 0.7 0.21 0.8 0.16 0.9 0.09 5

* predcted probablty prvalue, x(age35 k54 wc0 hc0) rest(mean) Output page prvalue, x(age35 k54 wc0 hc0) rest (mean) regress: Predctons for lfp Predcted value of y: -.4983298 95% c: (-.7566589,-.2400007) k5 k618 age wc hc lwg nc x 4 1.3532537 35 0 0 1.0971148 20.128965 Models to be Consdered Types of Varable Models Bnary Logt( 二分胜算对数模型 ) and Probt( 二分概率模型 ) Ordnal Ordered Logt ( 有序胜算对数模型 )and Probt ( 有序概率单元模型 ) Nomnal Multnomal Logt ( 多项胜算对数模型 ) Count Posson ( 泊松模型 )and Negatve Bnomal( 负二项模型 ) 6

1.5 Maxmum Lkelhood Estmaton( 最大或然估计 ) Assume that s be the number of men n the sample, N be the sample sze π be the populaton probablty of beng male the probablty of havng s men n a sample sze N wth π beng the populaton probablty of beng male Pr( s π, N) N! s π (1 π ) s!( N s)! N s E.g., the probablty of havng 3 men n a sample of 10 gven π 0. 5 Pr( s 3 π.5, N 10) 10! 3.5 (1.5) 3!(10 3)! 7 0.117 Ths s a typcal problem n probablty gven the values of the parameters π and N, what s probablty of a partcular outcome s In statstcs gven the sample nformaton s and N, what s π for the populaton? The ML estmate s that value of the parameter that makes the observed data most lkely The Lkelhood functon( 或然函数 ) when N and π are held constant and s vares, we have a probablty functon when π vares and N and s are constant, we refer to t as a lkelhood functon: how lkely s π gven s and N? L( π s 3, N 10) 10! 3 π (1 π ) 3!(10 3)! 7 For ths example, the lkelhood equaton follows smply the bnomal formula 7

The ML estmate s that value that maxmzes the lkelhood of observng the sample data 8