Regression with limited dependent variables. Professor Bernard Fingleton

Regresson wth lmted dependent varables Professor Bernard Fngleton

Regresson wth lmted dependent varables Whether a mortgage applcaton s accepted or dened Decson to go on to hgher educaton Whether or not foregn ad s gven to a country Whether a job applcaton s successful Whether or not a person s unemployed Whether a company expands or contracts

Regresson wth lmted dependent varables In each case, the outcome s bnary We can treat the varable as a success (Y = 1) or falure (Y = 0) We are nterested n explanng the varaton across people, countres or companes etc n the probablty of success, p = prob( Y =1) Naturally we thnk of a regresson model n whch Y s the dependent varable

Regresson wth lmted dependent varables But the dependent varable Y and hence the errors are not what s assumed n normal regresson Contnuous range Constant varance (homoscedastc) Wth ndvdual data, the Y values are 1(success) and 0(falure) the observed data for N ndvduals are dscrete values 0,1,1,0,1,0, etc not a contnuum The varance s not constant (heteroscedastc)

Bernoull dstrbuton probablty of a success ( Y = 1) s p probablty of falure ( Y = 0) s 1- p = q EY ( ) = p var( Y) = p(1 p) as p vares for = 1,...,N ndvduals then both mean and varance vary EY ( ) = p var( Y ) = p (1 p ) regresson explans varaton n EY ( ) = p as a functon of some explanatory varables EY ( ) = f( X,..., X ) 1 K but the varance s not constant as EY ( ) changes whereas n OLS regresson, we assume only the mean vares as X vares, and the varance remans constant

The lnear probablty model ths s a lnear regresson model Y = b + b X +... b X + e 0 1 1 K K Pr( Y = 1 X,..., X ) = b + b X +... b X b 1 1 K 0 1 1 K K s the change n the probablty that Y = 1 assocated wth a unt change n X, holdng constant X.... X, etc Ths can be estmated by OLS but 1 2 Note that snce var( Y ) s not constant, we need to allow for heteroscedastcty n tf, tests and confdence ntervals K

The lnear probablty model 1996 Presdental Electon 3,110 US Countes bnary Y wth 0=Dole, 1=Clnton

The lnear probablty model Ordnary Least-squares Estmates R-squared = 0.0013 Rbar-squared = 0.0010 sgma^2 = 0.2494 Durbn-Watson = 0.0034 Nobs, Nvars = 3110, 2 *************************************************************** Varable Coeffcent t-statstc t-probablty Constant 0.478917 21.962788 0.000000 prop-gradprof 0.751897 2.046930 0.040749 prop-gradprof = pop wth grad/professonal degrees as a proporton of educated (at least hgh school educaton)

The lnear probablty model 1.5 Clnton 1 0.5 Dole 0-0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 prop-gradprof = pop wth grad/professonal degrees as a proporton of educated (at least hgh school educaton)

The lnear probablty model Dole_Clnton_1 versus prop_gradprof (wth least squares ft) 1 Y = 0.479 + 0.752X 0.8 Dole_Clnton_1 0.6 0.4 0.2 0 0 0.05 0.1 0.15 0.2 0.25 0.3 prop_gradprof

The lnear probablty model Lmtatons The predcted probablty exceeds 1 as X becomes large Y ˆ = 0.479 + 0.752 X f X > 0.693 then Yˆ > 1 X = 0 gves Yˆ = 0.479 f X < 0 possble, then X < -0.637 gves Yˆ < 0

Solvng the problem We adopt a nonlnear specfcaton that forces the dependent proporton to always le wthn the range 0 to 1 We use cumulatve probablty functons (cdfs) because they produce probabltes n the 0 1 range Probt Uses the standard normal cdf Logt Uses the logstc cdf

Probt regresson Φ ( z) = area to left of z n standard normal dstrbuton Φ ( 1.96) = 0.025 Φ (0) = 0.5 Φ (1) = 0.84 Φ (3.0) = 0.999 we can put any value for z from - to +, and the outcome s 0 < p = Φ ( z) < 1

Probt regresson Pr( Y = 1 X, X ) =Φ ( b + b X + b X ) eg.. 0 1 2 1 2 1 2 0 1 1 2 2 b = 1.6, b = 2, b = 0.5 X = 0.4, X = 1 z = b + b X + b X = 1.6 + 2x0.4 + 0.5x1 = 0.3 0 1 1 2 2 Pr( Y = 1 X, X ) =Φ( 0.3) = 0.38 1 2

Probt regresson Model 9: Probt estmates usng the 3110 observatons 1-3110 Dependent varable: Dole_Clnton_1 VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -2.11372 0.215033-9.830 prop_gradprof 9.35232 1.32143 7.077 3.72660 log_urban 15.9631 5.64690 2.827 6.36078 prop_hghs 3.07148 0.310815 9.882 1.22389

Probt regresson Actual and ftted Dole_Clnton_1 versus prop_gradprof 1 ftted actual 0.8 Dole_Clnton_1 0.6 0.4 0.2 0 0 0.05 0.1 0.15 0.2 0.25 0.3 prop_gradprof

Probt regresson Interpretaton The slope of the lne s not constant As the proporton of graduate professonals goes from 0.1 to 0.3, the probablty of Y=1 (Clnton) goes from 0.5 to 0.9 As the proporton of graduate professonals goes from 0.3 to 0.5 the probablty of Y=1 (Clnton) goes from 0.9 to 0.99

Probt regresson Estmaton The method s maxmum lkelhood (ML) The lkelhood s the jont probablty gven specfc parameter values Maxmum lkelhood estmates are those parameter values that maxmse the probablty of drawng the data that are actually observed

Probt regresson Pr( Y = 1) condtonal on X,..., X s p =Φ ( b + b X +... b X ) 1 K 0 1 1 K K Pr( Y = 0) condtonal on X,..., X s 1- p = 1 Φ ( b + b X +... b X ) 1 K 0 1 1 K K y s the value of Y observed for ndvdual for 'th ndvdual, Pr( Y y = y ) s p (1 p ) y for = 1,.., n, jont lkelhood s L=Π Pr( Y = y ) =Π p (1 p ) y [ ] [ ] L= Π Pr( Y = y ) =Π Φ ( b + b X +... b X ) 1 Φ ( b + b X +... b X ) log lkelhood s = { Φ + + } + (1 y )ln { 1 Φ ( b + b X +... b X } 1 y 0 1 1 K K 0 1 1 K K 0 1 1 K K 0 1 1 K K ln L yln ( b b X... b X 0 1 we obtan the values of b, b,..., b that gve the maxmum value of ln L K 1 y 1 y

Hypothetcal bnary data success X 1 10 0 2 0 3 1 9 1 5 1 8 0 4 0 5 1 11 1 12 0 3 0 4 1 12 0 8 1 14 0 3 1 11 1 9 0 4 0 6 1 7 1 9 0 3 0 1

Iteraton 0: log lkelhood = -13.0598448595 Iteraton 1: log lkelhood = -6.50161713610 Iteraton 2: log lkelhood = -5.50794602456 Iteraton 3: log lkelhood = -5.29067548323 Iteraton 4: log lkelhood = -5.26889753239 Iteraton 5: log lkelhood = -5.26836878709 Iteraton 6: log lkelhood = -5.26836576121 Iteraton 7: log lkelhood = -5.26836575008 Convergence acheved after 8 teratons Model 3: Probt estmates usng the 24 observatons 1-24 Dependent varable: success VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -4.00438 1.45771-2.747 X 0.612845 0.218037 2.811 0.241462

Model 3: Probt estmates usng the 24 observatons 1-24 Dependent varable: success VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -4.00438 1.45771-2.747 X 0.612845 0.218037 2.811 0.241462 If X = 6, probt = Φ ( 4.00438 + 0.612845*6) =Φ ( -0.32731) = 0.45

Actual and ftted success versus X 1 ftted actual 0.8 0.6 success 0.4 0.2 0 2 4 6 8 10 12 14 X

Logt regresson Based on logstc cdf Ths looks very much lke the cdf for the normal dstrbuton Smlar results The use of the logt s often a matter of convenence, t was easer to calculate before the advent of fast computers

Logstc functon f(z) EPRO1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0-3 -2-1 0 1 2 3 4 z logt

Z e z 1 Prob success = = 1+ e Z 1+ e Z Z Z e 1+ e e 1 Prob fal = 1- = = = 1 1 + Z Z Z Z 1+ e 1+ e 1+ e 1+ e Z Z Prob success e 1+ e Z odds rato = = = e Z Prob fal 1+ e 1 log odds rato = z e z 1 z = b0+ b1x1+ b2x2 +... + bkx k

Logstc functon p = b + b X 0 1 p plotted aganst X s a straght lne wth p <0 and >1 possble p p = exp( b0 + b1x) 1 + exp( b + b X) 0 1 plotted aganst X gves s-shaped logstc curve so p > 1 and p < 0 mpossble equvalently p ln = b0 + b X 1 1 p ths s the equaton of a straght lne, so p ln plotted aganst X s lnear 1 p

Estmaton - logt X s fxed data, so we choose b, b, hence p p exp( b0 + b1x) = 1 + exp( b + b X) 0 1 0 1 so that the lkelhood s maxmzed

Logt regresson z = b + b X +... b X 0 1 1 K K Pr( Y = 1) condtonal on X,..., X s p = [1 + exp( z)] 1 K Pr( Y = 0) condtonal on X,..., X s 1- p = 1 [1 + exp( z)] 1 K y s the value of Y observed for ndvdual for 'th ndvdual, y Pr( Y = y ) s p (1 p ) y for = 1,.., n, jont lkelhood s L=Π Pr( Y = y ) =Π p (1 p ) 1 y 1 1 y 1 L= Π Pr( Y = y) =Π [1 + exp( z)] 1 [1 + exp( z)] log lkelhood s 1 ln L= yln {[1 + exp( z)] } + (1 y)ln{ 1 [1 + exp( ( z)] 1 } 0 1 1 we obtan the values of b, b,..., b that gve the maxmum value of ln L K 1 y 1 y

Estmaton maxmum lkelhood estmates of the parameters usng an teratve algorthm

Estmaton Iteraton 0: log lkelhood = -13.8269570846 Iteraton 1: log lkelhood = -6.97202524093 Iteraton 2: log lkelhood = -5.69432863365 Iteraton 3: log lkelhood = -5.43182376684 Iteraton 4: log lkelhood = -5.41189406278 Iteraton 5: log lkelhood = -5.41172246346 Iteraton 6: log lkelhood = -5.41172244817 Convergence acheved after 7 teratons Model 1: Logt estmates usng the 24 observatons 1-24 Dependent varable: success VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -6.87842 2.74193-2.509 X 1.05217 0.408089 2.578 0.258390

VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -6.87842 2.74193-2.509 X 1.05217 0.408089 2.578 0.258390 If X = 6, logt = -6.87842 + 1.05217*6 =-0.5654 = ln(p/(1-p) P = exp(-0.5654)/{1+exp(-0.5654)} = 1/(1+exp(0.5654)) = 0.362299

Actual and ftted success versus X 1 ftted actual 0.8 0.6 success 0.4 0.2 0 2 4 6 8 10 12 14 X

Modellng proportons and percentages

The lnear probablty model consder the followng ndvdual data for Y and Y = 0,0,1,0,0,1,0,1,1,1 X = 1,1, 2, 2,3,3, 4, 4,5,5 constant = 1,1,1,1,1,1,1,1,1,1 X Yˆ = 0.1+ 0.2 X s the OLS estmate Notce that the X values for ndvduals 1 and 2 are dentcal, lkewse 3 and 4 and so on If we group the dentcal data, we have a set of proportons p = 0/2, 1/2, 1/2, 1/2, 1 = 0, 0.5, 0.5, 0.5, 1 X = 1,2,3,4,5 pˆ = 0.1+ 0.2 X s the OLS estmate

The lnear probablty model When n ndvduals are dentcal n terms of the varables explanng ther success/falure Then we can group them together and explan the proporton of successes n n trals Ths data format s often mportant wth say developng country data, where we know the proporton, or % of the populaton n each country wth some attrbute, such as the % of the populaton wth no schoolng And we wsh to explan the cross country varatons n the %s by varables such as GDP per capta or nvestment n educaton, etc

Regresson wth lmted dependent varables Wth ndvdual data, the values are 1(success) and 0(falure) and p s the probablty that Y = 1 the observed data for N ndvduals are dscrete values 0,1,1,0,1,0, etc not a contnuum Wth grouped ndvduals the proporton p s equal to the number of successes Y n n trals (ndvduals) So the range of Y s from 0 to n The possble Y values are dscrete, 0,1,2,,n, and confned to the range 0 to n. The proportons p are confned to the range 0 to 1

Modellng proportons Proporton (Y/n( Y/n) ) Contnuous response 5/10 = 0.5 11.32 1/3 = 0.333 17.88 6/9 = 0.666 3.32 1/10 = 0.1 11.76 7/20 = 0.35 1.11 1/2 = 0.5 0.03

Bnomal dstrbuton the moments of the number of successes n trals, each ndependent, = 1,..., N p s the probablty of a success n each tral EY ( ) = np var( Y ) = n p (1 p ) the varance s not constant, but depends on Y ~ B( n, p ) Y n and p

Data Regon Cleveland,Durham Cumbra Northhumberland Humbersde N Yorks Output growth survey of startup frms q 0.169211 0.471863 0.044343 0.274589 0.277872 starts(n) expanded(y) propn =Y/n 13 8 0.61538 34 34 1.00000 10 0 0.00000 15 9 0.60000 16 14 0.87500

The lnear probablty model Regresson Plot Y = 0.296428 + 1.41711X R-Sq = 48.9 % 1.0 propn =e/s 0.5 0.0-0.2-0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 gvagr

OLS regresson wth proportons y = 0.296 + 1.42 x Predctor Coef StDev T P Constant 0.29643 0.05366 5.52 0.000 x 1.4171 0.2559 5.54 0.000 S = 0.2576 R-Sq = 48.9% R-Sq(adj) = 47.3% Negatve proporton Proporton > 1 Ftted values = y = 0.296 + 1.42 x 0.48310-0.00576 1.07892 0.58634 0.25346

Grouped Data Regon Cleveland,Durham Cumbra Northhumberland Humbersde N Yorks Output growth survey of startup frms q 0.169211 0.471863 0.044343 0.274589 0.277872 starts(n) expanded(y) propn =Y/n 13 8 0.61538 34 34 1.00000 10 0 0.00000 15 9 0.60000 16 14 0.87500

Proportons and counts ln( p / (1 p ) = b + b X 0 1 ln( pˆ / (1 pˆ ) = bˆ + bˆ X EY ( ) Yˆ = = n pˆ np 0 1 n Yˆ = sze of sample = estmated expected number of successes n sample

Bnomal dstrbuton For regon n! Pr ob( Y = y) = p (1 p) y!( n y)! y n y n = number of ndvduals p = probablty of a success Y = number of successes n n ndvduals

Bnomal dstrbuton For regon n! Pr ob( Y = y) = p (1 p) y!( n y)! Example P =0.5, n = 10 ob Y 10! = = = 5!(10 5)! y n y 5 5 Pr ( 5) 0.5 (0.5) 0.2461 EY ( ) = np= 5 var( Y) = np(1 p) = 2.5

Y s B(10,0.5) E(Y)=np = 5 var(y)=np(1-p)=2.5 500 400 Frequency 300 200 100 0 1 2 3 4 5 6 7 8 9 10 C25 Y s B(10,0.9) E(Y)=np = 9 var(y)=np(1-p)=0.9 500 400 Frequency 300 200 100 0 1 2 3 4 5 6 7 8 9 10 C27

Maxmum Lkelhood proportons Assume the data observed are Y = y = 5 successes from 10 trals and Y = y = 9 successes from 10 trals 1 1 2 2 what s the lkelhood of these data gven p = 0.5, p = 0.9? n! Pr ob( Y = 5) = p (1 p ) 1 y n y 1 1 1 y1!( n1 y1)! 1 1 1 10! 5 5 Pr ob( Y1 = 5) = 0.5 (0.5) = 0.2461 5!(10 5)! n! Pr ob( Y = 9) = p (1 p ) Y 2 y n y 2 2 2 y2!( n2 y2)! 2 2 2 10! = = = 9!(10 9)! 9 1 Pr ob( 2 9) 0.9 (0.1) 0.3874 1 2 lkelhood of observng y = 5, y = 9 gven p = 0.5, p = 0.9 = 0.2461x0.3874 = 0.095 1 2 1 2 However lkelhood of observng y = 5, y = 9 gven p = 0.1, p = 0.8 = 0.0015x0.2684 = 0.0004 1 2 1 2

Inference

Lkelhood rato/devance Y = 2ln( L / L )~ χ 2 2 u r L u = lkelhood of unrestrcted model wth k1 df L r = lkelhood of restrcted model wth k2 df k2 > k1 Restrctons placed on k2-k1 k1 parameters typcally they are set to zero

Devance Ho: b = 0, = 1,...,( k2 k1) 2 = 2ln( / )~ 2 u r k2 k1 Y L L χ 2 E( Y ) = k2 k1

Iteraton 7: log lkelhood = -5.26836575008 Convergence acheved after 8 teratons = Lu Model 3: Probt estmates usng the 24 observatons 1-24 Dependent varable: success VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -4.00438 1.45771-2.747 X 0.612845 0.218037 2.811 0.241462 Model 4: Probt estmates usng the 24 observatons 1-24 Dependent varable: success VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const 0.000000 0.255832-0.000 Log-lkelhood = -16.6355 Comparson of Model 3 and Model 4: Null hypothess: the regresson parameters are zero for the varables X =Lr 2{Lu Lr]= 2[-5.268 + 16.636] = 22.73 Test statstc: Ch-square(1) = 22.7343, wth p-value = 1.86014e-006 Of the 3 model selecton statstcs, 0 have mproved. Nb 2ln(Lu/Lr) = 2* (-16.6355- -5.2683) = 22.7343

Iteraton 3: log lkelhood = -2099.98151495 Convergence acheved after 4 teratons Model 2: Logt estmates usng the 3110 observatons 1-3110 Dependent varable: Dole_Clnton_1 VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -3.41038 0.351470-9.703 log_urban 25.4951 9.10570 2.800 6.36359 prop_hghs 4.96073 0.508346 9.759 1.23820 prop_gradprof 15.1026 2.16268 6.983 3.76961 Model 3: Logt estmates usng the 3110 observatons 1-3110 Dependent varable: Dole_Clnton_1 VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -0.972329 0.184314-5.275 prop_hghs 2.09692 0.360723 5.813 0.523414 Log-lkelhood = -2136.12 Comparson of Model 2 and Model 3: Null hypothess: the regresson parameters are zero for the varables log_urban prop_gradprof Test statstc: Ch-square(2) = 72.2793, wth p-value = 2.01722e-016

DATA LAYOUT FOR LOGISTIC REGRESSION REGION URBAN SE/NOT SE OUTPUT GROWTH Y n Hants, IoW suburban SE 0.062916 9 11 Kent suburban SE 0.035541 4 10 Avon suburban not SE 0.133422 4 14 Cornwall, Devon rural not SE 0.141939 5 12 Dorset, Somerset rural not SE 0.145993 12 16 S Yorks urban not SE -0.150591 0 11 W Yorks urban not SE 0.152066 7 15

Logstc Regresson Table Predctor Coef StDev Z P Constant -1.2132 0.3629-3.34 0.001 gvagr 9.716 1.251 7.77 0.000 URBAN/SUBURBAN/RURAL suburban -0.8464 0.2957-2.86 0.004 urban -1.3013 0.4760-2.73 0.006 SOUTH-EAST/NOT SOUTH-EAST South-East 2.4411 0.3534 6.91 0.000 Log-Lkelhood = -210.068

Testng varables Prob = f(q) Log-lke degrees of freedom -247.1 32 Prob = f(q,urban,se) -210.068 29 2*Dfference = 74.064 74.064 > 7.81, the crtcal value equal to the upper 5% pont of the ch-squared dstrbuton wth 3 degree of freedom thus ntroducng URBAN/SUBURBAN/RURAL and SE/not SE causes a sgnfcant mprovement n ft

Interpretaton When the transformaton gves a lnear equaton lnkng the dependent varable and the ndependent varables then we can nterpret t n the normal way The regresson coeffcent s the change n the dependent varable per unt change n the ndependent varable, controllng for the effect of the other varables For a dummy varable or factor wth levels, the regresson coeffcent s the change n the dependent varable assocated wth a shft from the baselne level of the factor

Interpretaton ln( Pˆ /(1 Pˆ ) Changes by 9.716 for a unt change n gvagr by 2.441 as we move from not SE to SE countes by -1.3013 as we move from RURAL to URBAN by -0.8464 as we move from RURAL to SUBURBAN

Interpretaton The odds of an event = rato of Prob(event) to Prob(not event) The odds rato s the rato of two odds. The logt lnk functon means that parameter estmates are the exponental of the odds rato (equal to the logt dfferences).

Interpretaton For example, a coeffcent of zero would ndcate that movng from a non SE to a SE locaton produces no change n the logt Snce exp(0) = 1, ths would mean the (estmated) odds = Prob(expand)/Prob(not expand) do not change e the odds rato =1 In realty, snce exp(2.441) = 11.49 the odds rato s 11.49 The odds of SE frms expandng are 11.49 tmes the odds of non SE frms expandng

Interpretaton param. est. s.e. t rato p-value odds rato lower c.. upper c.. Constant -1.2132 0.3629-3.34 0.001 gvagr 9.716 1.251 7.77 0.000 16584.51 1428.30 1.93E+05 RURAL/SUBURBAN/URBAN suburban -0.8464 0.2957-2.86 0.004 0.43 0.24 0.77 urban -1.3013 0.4760-2.73 0.006 0.27 0.11 0.69 SE/not SE SE 2.4411 0.3534 6.91 0.000 11.49 5.75 22.96 Note that the odds rato has a 95% confdence nterval snce 2.4411+1.96*0.3534 =3.1338 and 2.4411-1.96*0.3534 1.96*0.3534 = 1.7484 and exp(3.1338)=22.96, exp(1.7484) = 5.75 The 95% c..for the odds rato s 5.75 to 22.96