Functional Form and Nonlinearities

Sectin 6 Functinal Frm and Nnlinearities This is a gd place t remind urselves f Assumptin #0: That all bservatins fllw the same mdel. Levels f measurement and kinds f variables There are (at least) three essential kinds f variables in ecnmetrics Interval (r cardinal) variables are the usual kind f variables that are cntinuus and where the numbers actually measure smething. Differences are meaningful: An incme f $70,000 exceeds an incme f $60,000 by the same amunt as an incme f $50,000 exceeds an incme f $40,000 With interval variables, we can talk meaningfully abut cntinuus mathematical functins and partial derivatives Ordinal variables take n several rdered values, but differences are nt meaningful. It is a scale n which we knw which directin different values are frm ne anther, but nt hw far each adjacent pair is apart. Example: highest academic actin taken against a student. We knw that dismissal is wrse than denial f registratin, denial is wrse than prbatin, prbatin is wrse than warning, and warning is wrse than n actin. We dn t knw hw much wrse each is than the adjacent actin. Denial may be a bigger step frm prbatin than prbatin is frm warning. Scale f actins N actin Warning Prbatin Deny registratin Dismissal With rdinal variables, we can talk meaningfully abut hw a change in anther variable wuld mve this variable alng its scale, but there is n easy single-number translatin int hw that mvement n the scale wuld translate int the underlying rdinal levels f the variable because we dn t knw a priri hw far they are apart. (Ecnmetrics allws us t estimate this if we use prper prcedures.) Are GPA and SAT scres interval r rdinal? Categrical variables are thse that have several pssible utcmes but where thse utcmes cannt even be ranked rdinally. ~ 66 ~

Fr these variables, we just have t treat the utcmes as separate pssibilities and cannt meaningfully put them n a scale at all. Example: Chsing t attend Reed vs. anther schl wuld be a twutcme categrical variable. (Chsing Reed vs. L&C vs. anther schl wuld be three-utcme variable.) Sex, ethnicity, and many ther variables are categrical. Dummy (binary r indicatr) independent variables Dummy variables are (yes, n) variables. We traditinally give the value 1 t yes and 0 t n. Dummy variables are used t mdel categrical variables as dependent r explanatry variables and rdinal variables as explanatry variables. When there are nly tw pssible utcmes, a single dummy variable is sufficient (e.g., sex, ignring the transgendered). When there are M > 2 utcmes, we need M 1 dummy variables: Regin {Nrtheast, Suth, Midwest, West} Need dummies fr Nrtheast, Suth, Midwest. Dn t need dummy fr West because we can tell thse bservatins frm the fact that they are zer fr the ther three. If we include all fur dummies, they will add up t 1, meaning perfect multicllinearity in a regressin that als includes an intercept term. While dummy variables are ften very useful in multiple regressins (with mre than ne regressr), they are limited in simple regressin, but have a special interpretatin. Suppse that D is a dummy variable fr sex with D = 1 being male. Cnsider the mdel yi 1 2 Di ei. Fr females, D = 0 and the expected value f y is 1. Fr males, D = 1 and the expected value f y is 1 + 2. Thus, 1 is the difference between the expected y fr males and females. A test f the null hypthesis 2 = 0 wuld be a test f whether males and females have the same average y. This is equivalent t the t test fr the equality f means and is a simple applicatin f analysis f variance. When there are ther variables present, a dummy variable shifts the intercept f the relatinship between y and the ther variables upward r dwnward depending n the value f the dummy. (Slpe is assumed t be the same.) Cnsider the fllwing example: lnw 1 t estimate the effect f an additinal year f educatin n the wage. Wages may differ acrss sexes T allw the functin t vary by a cnstant amunt between males and females: lnw 1 3MALE ~ 67 ~

Fr females: lnw 1 Fr males: lnw ED 1 3 2 Thus, the intercept is different fr males than fr females, but the slpe is the same, meaning that we have assumed that the effect f educatin n wages is the same fr males and females. What if we include a FEMALE dummy as well? lnw 1 3MALE 4FEMALE This wuld add nthing f value t the regressin because we already knw the difference between males and females frm 3. MALE FEMALE 1 x1 s there is perfect multicllinearity: XX is singular and the inverse des nt exist. Statistical algrithms will either break dwn r (like Stata) delete ne f the cllinear variables. Hw wuld this wrk with the reginal dummies? T mdel differences in intercept, we include dummies fr three f fur regins: lnw 1 3Nrtheast 4Suth 5Midwest Again, including all fur results in cllinearity The intercept term 1 is the intercept fr the mitted categry (West) The intercept fr Suth is 1 + 4, s 4 measures whether the intercept is different fr the Suth vs. the West. Chse the mitted categry t be the ne against which yu want t test thers, then the t test is easier T test whether regin matters at all, d a jint F test f 3 = 4 = 5 = 0. Making and using dummy variables in Stata Suppse that we have a regin variable and want t make dummies fr the varius regins: Old-fashined way: gen suth=regin== Suth (and similar fr the ther regins) Use the xi cmmand t create a battery f dummies: xi i.regin Easiest f all, use factr variables in yur variable list: reg lwage i.regin educ If we have a dummy variable that is 1 fr nly a single bservatin (presumably in multiple regressin), then the residual fr that bservatin will be zer and the cefficient f that dummy variable will have the value f the residual f that bservatin in an therwise identical regressin that excludes the dummy. Dummy dependent variables can be estimated by OLS using the linear prbability mdel, but this is nt the best way t estimate these mdels, s we wn t g int any details. ~ 68 ~

Interactin effects What if the effect f educatin differs fr males and females? In this case, we need an interactin variable. lnw ED MALE ED MALE 1 2 3 4 Equatin fr females: lnw 1 Equatin fr males: lnw 1 3 2 4 Thus, 3 is the difference in the intercept and 4 is the difference in the slpes. Running this regressin is equivalent t running separate regressins fr the male and female samples The female sample will have an intercept estimate f b 1 and a slpe ED estimate f b 2 The male sample will have an intercept estimate f b 1 + b 3 and a slpe estimate f b 2 + b 4 Running them tgether requires that the variance f the errr term fr males and females be the same But running them tgether allws testing the hyptheses 3 = 0 and 4 = 0, which are ften f interest. The jint test that bth (all) cefficients are the same acrss the tw subsamples is called a Chw test. Allwing the slpe (educatin effect) t vary acrss regins wuld invlve interactin terms between ED and each f the three reginal dummies. We can als interact dummies with ne anther: lnw ED MALE Suth MALE Suth 1 2 3 4 5 Fr nn-suth females: lnw 1 Fr Suth females: lnw 1 4 Fr nn-suth males: lnw 1 3 Fr Suth males: lnw 1 3 4 5 Suth effect fr females = 4 Suth effect fr males = 4 + 5 Male effect fr nn-suth = 3 Male effect fr Suth = 3 + 5 Thus, 5 measures the difference in the male effect between Suth and nn- Suth, r the difference in the Suth effect between males and females. We can als interact cntinuus variables lnw ED AGE ED AGE 1 2 3 4 ~ 69 ~

lnw ED 2 4 lnw AGE 3 4 AGE ED Thus, 4 measure the effect f age n the value f an additinal year f educatin, r the effect f educatin n the value f an additinal year f age. Factr variables in Stata t create interactins: reg lwage i.regin#c.ed runs a regressin f lwage n the interactins f the regin dummies and the educatin variable (the c. means a cntinuus variable) reg lwage i.regin##c.ed includes bth interactins and the regin dummies themselves and ed itself (this is usually what yu want) Yu can run quadratic mdel in educatin as reg lwage c.ed##c.ed. The advantage f this frm is that yu can use margins and marginsplt t shw and plt the relatinship between lwage and ed taking accunt f the nnlinearity. Nnlinearity in variables vs. nnlinearity in parameters Slving fr the OLS estimatr required that we differentiate the LS r likelihd functin with respect t the parameters. In a mdel that is linear in parameters, the LS bjective functin will be quadratic, s that the least-squares nrmal equatins based n setting the first derivatives t zer are linear in the cefficient estimatr. This means that we can use linear algebra t slve fr the cefficient estimatr. If the mdel is nnlinear in parameters, then the LS bjective functin will nt be quadratic and the nrmal equatins will nt be linear in parameters, s numerical search methds must be used fr slutin. This is called nnlinear LS and is much mre cmputatinally difficult and ptentially prblematic than the linear mdel. (Cvered in S&W appendix t Ch. 8.) There are times when nnlinear LS is necessary, but we try t avid it whenever pssible. There are many mdels that are nnlinear in variables but linear in parameters. These mdels are easy t deal with: we can transfrm the variables and use linear OLS methds. If a mdel is nnlinear in its regressrs (r with a nnlinear dependent variable), then the cefficient n the variable is n lnger y/x j. Instead, we have t calculate y/x j as a functin f the cefficients and the values f X. This will vary accrding t the functinal frm, s we ll talk abut the partial effects fr individual frms as we discuss them. ~ 70 ~

The chice f functinal frm shuld be guided by thery, but thery rarely prvides a unique specificatin. It is ften necessary t try varius functinal frms t see which ne seems t fit the best. Pltting actual and fitted values against each regressr can ften be helpful in seeing nnlinearities. One way t explre nnlinearities (if yu have a large enugh sample) is t create a battery f dummy variables with different levels f a regressr. Lking at the pattern f cefficients fr the different levels can tell yu whether the relatinship is apprximately linear. Fr example, we culd examine math SAT scre effects by lking at dummies fr 500 SATM < 600, 600 SATM < 700, and SATM 701, leaving ut the bttm categry belw 500. This will give us fur pints n a general respnse functin (with zer implicit fr the mitted grup, belw 500). If the fur pints seem t lie n a straight line, then the linear specificatin is prbably fine. One may als see evidence f quadratic r cubic behavir and can use mre than fur categries if yu have enugh data and want t be mre discriminating. Quadratic and higher-rder plynmial mdels One easy way f incrprating curvature int a mdel is t intrduce quadratic terms. (Fr the mment, we will assume nly ne regressr is nnlinear, s we ll ignre thers.) 2 Yi 0 1Xi 2Xi ui Pssible shapes fr the relatinship: Upward slping at an increasing rate ( 1 > 0, 2 > 0) Upward slping at a decreasing rate r dwnward slping but flattening ut ( 1 > 0, 2 < 0) Nte that this curve always turns dwnward (upward) after a peak (trugh) at x = 1 /2 2, s it is critical t evaluate which part(s) f the curve the sample lies in. (Are mst/all f the x values f interest < r > 1 /2 2?) This nn-mntnicity may be gd r bad depending n thery. If yu want a universally mntnic but diminishing effect, using ln x may be a gd alternative specificatin. Dwnward slping and getting steeper as x increases ( 1 < 0, 2 < 0) Always include a graph f the respnse functin s that yur reader can understand the shape f the effect. The cefficients dn t tell the stry in a transparent way. ~ 71 ~

Partial effect Y 122X. X The sign f the partial effect will change at x = 1 /2 2 if sgn( 1 ) sgn( 2 ), as discussed abve. Estimating the standard errr f the partial effect Cnditinal n x, var ˆ 2 ˆ X var ˆ 4X 2 var ˆ 4Xcv ˆ, ˆ. 1 2 1 2 1 2 The estimated values f the variances and cvariances can be btained frm the utput f yur regressin package. (They are the diagnal and ff-diagnal elements f the estimated cvariance matrix f the cefficient vectr. This is btained by estat vce after a regressin cmmand. (As usual, it will be the classical estimated cvariance estimatr unless yu use the rbust ptin in the regressin.) S&W pint ut tw ther ways f estimating the standard errr f a linear cmbinatin f cefficients: D a test cmmand that the partial effect is zer t get an F statistic, then an estimate f the standard errr will be the abslute value f the partial effect at that x divided by the square rt f the F value. Transfrm the mdel int ne where the desired effect is directly estimated and get the standard errr frm the regressin table. Relevant significance tests in the quadratic mdel: Des x affect y? This is a test f the jint null hypthesis H 0 : 1 0, 2 0. It is a standard F test. Is the relatinship quadratic rather than linear? This is a t test f H 0 : 2 = 0, given that 1 is assumed t be nnzer (null hypthesis is linear mdel). This is an example f a nested specificatin test because the linear mdel is a special case f (nested within) the quadratic specificatin. Nte that the t test is preferred t cmparing R 2 2 r R values. The frmer will always be higher fr the quadratic specificatin. The latter will be higher if the t value exceeds ne, which is well belw cnventinal critical values. Higher-rder plynmials D cubic, quartic, etc. relatinships ever ccur in ecnmic data? ~ 72 ~

Yes, but they can be hard t sell. Example f SAT scres and Reed GPA. Same prcedures apply fr estimated partial effects and tests. What t d if 3 rd -rder term is significant and 2 nd -rder term is nt? Dn t leave ut the 2 nd -rder term. Test bth jintly t try t reject the linear mdel in favr f the cubic. If significant, retain bth. Nnlinear least squares Fr mdels that are nnlinear in the parameters, we must generally use nnlinear search methds t find the least-squares (r maximum-likelihd) estimates. Linearity in parameters depends crucially n the specificatin f the errr term. The errr term in the mdel must be additive. B y; B x; e, where B is Cnsider the Bx-Cx mdel the Bx-Cx transfrmatin Bz Recall that Bz; z y 1 2 z z 1 ; z. z 1 if z 1, ln z if z 0. Nnlinear estimatin usually requires yu t (at minimum) prvide a frmula fr the deterministic part f the functin. T estimate the abve mdel in Stata (withut using the bxcx cmmand) yu culd type nl ((y^lamy 1)/lamy = beta1+beta2*(x^lamx 1)/lamx), initial (lamy 0.5 lamx 0.5 beta1 0 beta2 0) Nnlinear search algrithms can be very slw and unreliable. It is generally very helpful t prvide starting values near the ptimal parameter values. Nnlinear estimatin is a directed search ver the parameter space t find the best cmbinatin. It is generally guided by taking numerical derivatives f the bjective (LS r likelihd) functin with respect t the parameters, then fllwing the directin f greatest imprvement (the gradient). Sme nnlinear-ptimizatin packages allw yu t enter analytic (algebraic) partial derivatives f the mdel with respect t the parameters. This generally speeds up cnvergence. Sme bjective functins may have multiple lcal ptima. Starting far frm the glbal ptimum can cause the algrithm t becme trapped at a glbal ptimum that is inferir t the glbal ne. Gd initial values can help avid this prblem. T assure that yur ptimum is a glbal ne, try starting frm several different sets f initial values and see if yu cnverge t the same ptimum. z x ~ 73 ~

Sme bjective functins are badly behaved, having ridges (r valleys) where the bjective functin is very flat in ne directin. This is particularly true if multicllinearity is a prblem. If tw variables are highly, psitively crrelated, then increasing the cefficient f ne by a lt and simultaneusly decreasing the cefficient f the ther will have very little effect n the predicted values and the residuals, hence n the bjective functin. This leads t a ridge in the likelihd functin (valley in the least-squares functin) at a diagnal in the space f these tw variables. Nnlinear estimatin is nt as cmputatinally prblematic as in the ld days, but it is still subject t these numerical difficulties. Avid it when pssible by using specificatins that are linear in the parameters. There will be times when we need t use it fr maximum-likelihd estimatrs such as prbit and lgit, but these likelihd functins are ften well-behaved. Treatment effects Crrelatin des nt imply causatin, even if ne event ccurred befre the ther. Hspital stay vs. health status example frm text Selectin bias ccurs when the sample f peple is nt randmly chsen frm the ppulatin. Wage equatin example: nly wrking peple have bserved wages, but peple with higher wage ffers are mre likely t wrk Randmized cntrlled experiments are gld standard f statistical prcedures, but are nt ften available in ecnmics. Unless we have the resurces t create ur wn data, we must use natural experiments arising ut f natural variatin in bserved datasets. Randmized experiments randmly place bservatins int treatment grup r cntrl grup. Often use duble-blind technique in medical treatments where neither the patient nr the dctr knws which grup the patient is in: aviding the Hawthrne effect. Prblems with experiments: Lack f randmizatin can lead t crrelatin between grup selectin and ther variables. Can cntrl fr this by cntrlling fr these variables by including them in a regressin. Partial cmpliance Did the treatment and cntrl grups actually d what they were suppsed t d? Did the jb-training selectees actually attend training? Did the patient take the drug? Is this behavir crrelated with e? ~ 74 ~

Attritin Sme drp ut f bth grups during the experiment. Were they randm r did peple with high (r lw) values f e drp ut? Hawthrne effect Difference estimatr Duble-blind is nt pssible in many experiments. Experimenter bias may result frm incentives t make results lk significant. Let d i = 1 fr bservatins in treatment grup, 0 fr thse in cntrl grup. yi 1 2di ei 2 measures the treatment effect OLS regressin will give b 2 = y1 y0, the differences f the means f the tw grups. D we need ther regressrs? Nt if selectin is randm because there is n mitted variable bias if the mitted regressrs are uncrrelated with the variable f interest (d). If selectin is nn-randm, but all the variables that determine the selectin are bservable and added t the regressin, then ur dummyvariable cefficient will still be unbiased because there are n mitted variables that are crrelated with d. If we allw peple t select int the treatment and cntrl grups, then there will be ther characteristics (thse that affect the chice) that will be crrelated with d. If any f these variables are als crrelated with y, then we have mitted variable bias ~ sample selectin bias. It may still be useful t include ther regressrs because they will lwer the verall variance f the equatin and reduce the amunt f variatin that d needs t explain. Interactin terms between treatment dummy and ther regressrs wuld allw us t see hw treatment effect might vary with subject characteristics. Differences-in-differences estimatr When we nly have natural experiments, we can smetimes d befre and after cmparisns between the cntrl and treatment grups and get valid estimatrs under apprpriate cnditins. In rder t d this we must have tw bservatins (befre and after) fr each unit and we must be able t assume that the befreafter change is independent f any mitted variable crrelated with treatment status. ~ 75 ~

The differences-in-differences estimatr uses the mdel it 1 2 i 3 it y d t dt e, where t = 0 fr befre and 1 fr after and d is the treatment dummy we used abve. ytreatment, after ycntrl, after ytreatment, befre ycntrl, befre ytreatment, after ytreatment, befre ycntrl, after ycntrl, befre ˆ Yu can als use ther cntrls t reduce variance here Differences-in-differences estimatr is example f using panel data, which vary bth acrss time and acrss units. We will study methds fr use with panel data latter n. ~ 76 ~