Problem Set 4, ECON 3033 (Due at the start of class, Wedesday, February 4, 04) (Questos marked wth a * are old test questos) Bll Evas Sprg 08. Cosder a multvarate regresso model of the form y 0 x x. Wrte the st order codtos for the optmzato problem where oe s terested mmzg the sum of squared errors SSE = ˆ. Suppose a sample f 5 observatos, the followg facts are preseted about the model above. x x x x x x y 40 80 0 0 0 0 x y 0 x y 60 Usg the frst order codtos (or ormal equatos) ad these facts, provde the estmates for ˆ 0, ˆ ˆ ad. HINT: Solve for ˆ 0 frst.. Dowload the data cps87.dta. Geerate two ew varables. The frst s the atural log of weekly eargs. The secod s age squared. Next, ru a regresso of the atural log of weekly eargs o age, age squared ad years of educato. We ca wrte ths model as l( weekly ear) 0 age age educ 3 l( weekly ear) Provde a mathematcal expresso that defed age l( weekly ear) s at age? Age 35? Age 50? age. Usg the results from the regresso, what 3. Cosder a multvarate regresso model of the form y 0 x. Suppose the R from ths model s R a. True, False, or Ucerta ad expla. The R ca ever fall below R a whe addtoal varables are added to the model? (Thk of a specal case where someoe adds completely rrelevat varables to the model what wll happe to the R?) 4. O the class web page s a STATA data set called house_prce.dta. It has data o 4 homes sold 998 a small tow New Eglad. The data set cotas formato o the sales prce of the house (measured o thousads of dollars), the umber of bedrooms, bathrooms, other rooms, square feet of lvg space ad age of the home, Dowload the data ad tally estmate a regresso wth house prces as the outcome of terest ad four covarates: age years, # umber of bedrooms, # of bath rooms, # of other rooms. Call ths model. a. Iterpret the coeffcet o age years ad # of bedrooms by provdg a umerc example. Now, estmate a secod model ad add to the orgal regresso the square feet of lvg space. Call ths model.
b. What happes to the coeffcet o # of rooms, # of bedrooms ad # of other rooms ths ew model compared to the prevous oe? Why have the coeffcets o these three varables chaged so dramatcally? c. Iterpret the coeffcet o square feet of lvg space. Now estmate a thrd model wth the same depedet varable but clude oly two covarates: age years ad square feet. d. Compare the R from ths model ad that Model #. Provde a tutve explaato for why the dfferece s so small. 5. O the class web page s a data set amed seor_medcal_exp.dta whch has formato o age, the umber of chroc codtos ad the total medcal expeses for a sample of seor ctzes aged 65 to 84. I select seors for ths example because all of them have health surace through the Medcare program. The three varables the data set are Varable Label totalexp total expedtures o medcal care, 00 chroc umber of chroc codtos (0-5) age age years Load the data set to STATA, the costruct two ew varables: Regress totalexp o age ad chroc. (reg totalexp age chroc) a) Iterpret the coeffcet o age provde a umerc example of the magtude of the coeffcet o ths varable? b) Iterpret the coeffcet o chroc -- provde a umerc example of the magtude of the coeffcet o ths varable? c) Now regress totalexp o age (reg totalexp age). What has happeed to the coeffcet o age compared to the results part a? Does ths make sese? Why or why ot. d) Regress age o chroc. What s the coeffcet o chroc? Does ths make sese? e) After ths regresso, output the resduals from the regresso predct res_age, resdual Next, regress totalexp o res_age. How does ths umber compare to the estmates a)? 6. *Retur to problem 5 o problem set 3. A pharmaceutcal compay s vestgatg the cholesterol lowerg beefts of a ew drug. I a sample of subjects the compay radomly assgs mllgrams of actve gredets (label ths as x ) ad the outcome of terest, labeled as y, s the chage cholesterol from the start utl the ed of the tral. Itally, the researchers estmate a model of the form y 0 x. However, a colleague metos that as part of the expermet, they also collected detaled data o characterstcs of survey partcpats that predct y lke ther weght at the start of the tral, age, sex, ethcty/race, plus other varables. The colleague asks whether o should clude these covarates (label them as x, x 3, x k) to the basc regresso? a) By estmatg a model of y... 0 x x xkk, do you atcpate that the estmate o ˆ wll chage? b) I a multvarate model, the estmated varace of ˆ s gve as ˆ ˆ( ) ( R ) ( x x ) V ˆ
What s the lkely cosequece of addg these addtoal covarates (x, x 3, x k) to the estmated varace of ˆ? Expla your aswer. 7. *O the ext page are the results from two regresso models: I model (), I regress Y o X, ad ote that the stadard error o the coeffcet o X s very small ad the t-statstc o the coeffcet o ˆ s over 3. Note that model (), whe I add X to the model, the stadard error o ˆ creases by a factor of 3 ad the t- statstc o ths parameter falls to.39. Usg the formato gve, provde a tutve explaato for why the stadard creases so much o ˆ whe X s added to the model. To get full credt, you must provde the proper equato. 8. *May people get ther health surace through ther job ad because of hgh health surace costs, may employers are cosderg offerg free o-ste exercse classes as a way of ecouragg healthy behavors ad hopefully reducg medcal care costs. The evdece for subsdzed exercse classes comes prmarly from research the feld of publc health. I these models the authors collect data from a employer ad estmate a regresso of the form y 0 x where y s aual spedg o health care for employee ad x s a dummy varable that equals f the perso uses the o-ste health care servces. Call ths model (). Let be the estmate for from model () ad ths case, the author gets the expected result that < 0 people that use o-ste exercse classes have lower health care spedg. Model () has bee crtczed because t does ot cotrol for the fact that the least healthy employees are the oes the least lkely to eroll these classes. Cosder a smple exteso to the model where the author has detaled data o the health of employees pror to the exercse classes opeg. Let x be a smple dex that equals the umber of chroc health codtos a perso has (e.g., a perso wth hgh blood pressure, obesty, ad dabetes has a cout of three whereas a healthy perso has a cout of zero). Now cosder estmatg model () whch s of the form y 0 x x. If Model () s the true model, do you atcpate that, the estmate from model (), s based up or dow? Expla your aswer ad to get full credt, you must provde a approprate equato. 3
Results for Questo 7 Correlato betwee X ad X. corr x x (obs=489) x x -------------+------------------ x.0000 x 0.9994.0000 Model : Regresso of Y o X. reg y x Source SS df MS Number of obs = 489 -------------+------------------------------ F(, 487) = 56.63 Model.04473.04473 Prob > F = 0.0000 Resdual 535.054756 487.540634 R-squared = 0.845 -------------+------------------------------ Adj R-squared = 0.84 Total 656.09899 488.63705357 Root MSE =.46383 y Coef. Std. Err. t P> t [95% Cof. Iterval] x.0765488.0037 3.7 0.000.07005.08877 _cos 5.059357.043554 6.6 0.000 4.97395 5.44763 Model : Regresso of Y o X ad X. reg y x x Source SS df MS Number of obs = 489 -------------+------------------------------ F(, 486) = 8.4 Model.58 60.5579054 Prob > F = 0.0000 Resdual 534.9838 486.598358 R-squared = 0.846 -------------+------------------------------ Adj R-squared = 0.839 Total 656.09899 488.63705357 Root MSE =.46389 y Coef. Std. Err. t P> t [95% Cof. Iterval] x.30486.0935397.39 0.63 -.059375.339098 x -.0539557.093559-0.58 0.564 -.37338.943 _cos 5.059738.0435649 6.4 0.000 4.9743 5.4566 4
9. *Research has show that studets attedg hgher qualty colleges ad uverstes ted to have hgher wages after graduato tha those attedg less selectve sttutos. Usg a atoally represetatve sample of college graduates aged 30-39, researchers regress the atural log of aual eargs (y ) o the average SAT score from the college the respodet atteded (x ) usg the smple bvarate regresso model y 0 x. Call ths model (). Let be the estmate for β from model () ad ths case, the author gets the expected result that >0 studets that graduated from hgher qualty schools teds to have hgher eargs. Someoe crtczes model () because t does ot cotrol for dffereces other characterstcs of the studets that are lkely to be correlated wth eargs. For example, the author does ot have a measure of academc ablty for the studet lke a SAT score whch they argue should be cluded the model. Suppose the author cosders estmatg model () whch s of the form y 0 x x where x s the studets ow SAT score. If Model () s the true model, do you atcpate that, the estmate from model (), s based up or dow? Expla your aswer ad to get full credt, you must provde a approprate equato. 0. A researcher regresses y o x ad produces the results below. A colleague argues that the model should also clude the covarates x, x, ad x4, whch the colleague argues are strog predctors of y. Below s a matrx that provdes the correlato coeffcets for the varables x, x, x3 ad x4. Gve these results, do you expect that addg x, x3 ad x4 to the model wll chage the results much? Assume your colleague s correct that x, x3 ad x4 are strog predctors of y.. reg y x Results for Problem 0 Source SS df MS Number of obs = 398 -------------+------------------------------ F(, 3979) = 795.9 Model 74.739778 74.739778 Prob > F = 0.0000 Resdual 874.36848 3979.9745785 R-squared = 0.666 -------------+------------------------------ Adj R-squared = 0.664 Total 049.086 3980.6359504 Root MSE =.46877 y Coef. Std. Err. t P> t [95% Cof. Iterval] x.07398.00636 8.0 0.000.0688386.07959 _cos 5.0543.0353055 44.60 0.000 5.03595 5.7436. corr x x x3 x4 (obs=398) x x x3 x4 -------------+------------------------------------ x.0000 x 0.08.0000 x3 0.000 0.006.0000 x4 0.005 0.0075-0.0.0000 5