Handout #6. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/ PDF Free Download

Hadout #6 Ttle: FAE Course: Eco 368/0 Sprg/05 Istructor: Dr. I-Mg Chu Lear Regresso Model (Readg: PE, Chapter 4) So far we have focused mostly o the study of a sgle radom varable, ts correspodg theoretcal dstrbuto, ad samplg scheme. However, very ofte we are more terested bvarate or eve multvarate relatoshps betwee/amog radom varables. We ll beg wth a bvarate case where X ad Y are theoretcally related usg co tossg example. We ll show that the codtoal mea of Y o X ca be formed usg a determstc lear fucto X. After that we ll troduce the smple lear regresso model where Y ca be learly depedet o X emprcally. I the study of ecoomcs, we are ofte terested whether oe varable Y ca be explaed by the other varable X. For example, s log-ru flato caused by the over-jecto of moey supply? Are the households spedg govered by ther dsposable come? Is employmet status for a female worker depedet o the umber of chldre she has? All of the above questos ca be aswered ad examed usg smple lear regresso model. Be otced that the causal relatoshp s establshed usg ecoomc theory ad emprcal lear regresso model s used to exame the valdty of ecoomc theory. Amog these three examples the oly dfferece s, the thrd case, the Y varable s categorcal. We ll study the thrd case later usg Probt/Logt model, a exteso of lear regresso model. As you should fd out by ow, the data type troduced Hadout# plays a mportat role to decde how we choose a approprate model to study the data. *Cosder a expermet where a far co s tossed four tmes; sample space (0, ) 4 X # of heads obtaed o the frst three tosses, Y # of heads obtaed o all four tosses Table 6. Jot Dstrbuto X\Y 0 3 4 f(x) 0 /6 /6 0 0 0 /8 0 3/6 3/6 0 0 3/8 0 0 3/6 3/6 0 3/8 3 0 0 0 /6 /6 /8 g(y) /6 /4 3/8 /4 /6 Table 6. Smulato outcome based o tossg a co four tmes ad repeat t 00 tmes (I do t set the seed umber, your outcome wll be dfferet) x/y 0 3 4 0 0.07 0.04 0 0 0 0 0.7 0. 0 0 0 0 0. 0.7 0 3 0 0 0 0.05 0.07

What s the codtoal mea fucto of Y gve X based o the above jot dstrbuto? Aswer: It s E(Y X). What does ths symbol mea? Table 6.3 Codtoal Dstrbuto Y 0 3 4 g(y X0) / / 0 0 0 g(y X) 0 / / 0 0 g(y X) 0 0 / / 0 g(y X3) 0 0 0 / / 3 5 7 E(Y X0), E(Y X), E(Y X), E(Y X3) If we plot E(Y X) agast X a scatter dagram, t looks lke the followg: Fgure 6. Codtoal Mea Fucto E(Y X) 0.5.0.5.0.5 3.0 3.5 0.0 0.5.0.5.0.5 3.0 X There s a exact (.e., determstc) lear relatoshp betwee E(Y X) & X, so we ca wrte the followg equato: E(Y X) 0 + *X How do we fd 0 ad? Aswer: ΔY E(Y X ) E(Y X 0) ΔX 0 Whe X 0 0 E(Y X0) E(Y X) + X The slope ad tercept of the above codtoal mea fucto are kow costats gve that we kow how X ad Y are related (.e., kowg ther jot dstrbuto fucto).

Lear Regresso Model: regresso aalyss s cocered wth the study of the relatoshp betwee oe varable called the explaed, or depedet, varable ad oe or more other varables called depedet, or explaatory, varables. Y 0 + *X + () Equato () s termed smple (oe X) lear (learty X) regresso model. Y s the depedet varable, X s the depedet varable, ad s the error term. I practce, t s ulkely the relatoshp betwee X ad Y s a exact straght le lke the co tossg example we studed earler. Therefore, the error term s added to represet ucertaty ad all other potetal factors that may cotrbute to the varato of Y (.e., the capture-all effect). We wll make assumptos about the error term later for ferece purpose. Objectves of lear regresso model: () To estmate the mea value of the depedet varable, gve the value of the depedet varable(s). I other words, we assume the codtoal mea fucto s lear: E(Y X) 0 + *X () To test hypotheses about the ature of the depedece (.e., ). The sze ad magtude of beat(s) (f there are more tha oe depedet varables) tell us how the chages Xs affect Y. Ths s called margal effect. (3) To predct, or forecast, the mea value of the depedet varable, gve the value(s) of the depedet varable(s). For example, we are terested fdg the relatoshp betwee wage ad schoolg a small artfcal market ecoomy. If the populato data s avalable, equato () s called lear populato regresso fucto (PRF). However, very ofte t s too costly to get the populato data. Therefore, a small sample s draw from the populato ad our goal s to ucover the ukow parameters the lear sample regresso fucto (SRF). Let s use Y to deote the hourly wage ad X educato backgroud (measured school years). The plot of Y agast X s show the followg scatter dagram (Fg. 6.). As you may otce the actual codtoal mea fucto s ot a straght le. However, a lear regresso le seems a approprate approxmato for descrbg the relatoshp betwee wage ad educato. Be otced that, we artfcally geerate a sample from a model (populato) where people s wages are a lear fucto of schoolg. The case we would lke to aalyze s a sample of 58 (why? I just lke ths umber) dvduals (see EDW.txt data fle). By usg a smulated data, we kow all of the correspodg parameters the populato ad the correspodg samplg scheme. There s a advatage of usg smulated data; frst, we ca see how the SRF s dfferet from the PRF. Secodly, we ca vsualze the cosequeces of assumpto volatos by chagg the model assumpto oe at a tme (.e., geerate a ew populato but wth a dfferet model assumpto). Thrdly, we ca exame the usefuless of model predctablty. 3

Fg. 6. Wage vs. Schoolg hourly wage 0 0 30 40 6 8 0 4 6 8 school years I the above dagram, the sample codtoal mea of wage (the blue le) teds to crease wth schoolg years. It s ot a straght le. However, t ca be approxmated usg a lear equato (the red le) as followg: E(Wage Schoolg) 0 + *Schoolg +, where ~N(0, ) Where s a devato term that captures other factors that may affect wage. Usually we assume that t s ormally dstrbuted wth mea zero ad varace. Ths assumpto s requred for later statstcal ferece purpose. Lear regresso model: Fd a estmator that best descrbes the lear relatoshp betwee Wage (depedet varable) ad Schoolg (depedet varable). I other words, we eed a method to ucover ukow parameters 0,, ad. There are two approaches to do that; MLE (maxmum lkelhood estmator) ad OLS. We ll adopt the OLS (ordary least square) estmator ad t has a ce BLUE property. I ll expla what BLUE stads for later detals. OLS: Y 0 + *X + ( ) () Choosg 0 & to mmze ( Y 0 X ) (3) It stads for Best Lear Ubased Estmator. 4

Usg dfferetal calculus, we ca fd (X (X X) * (Y X) Y) SXY SXX (4) 0 Y - * X (5) σ ê s termed resdual whch equals Y ( 0 + ê *X ). e.g. Tme (X) 5 3 8 0 6 3 Score (Y) 65 69 64 75 90 75 49 77 74 58 Let s use R to compute 0,, ad σ : ----- rm(lstls()) X c(,5,,3,8,,0,6,3,) Y c(65,69,64,75,90,75,49,77,74,58) mea.y mea(y) mea.x mea(x) beta sum((x-mea.x)*(y-mea.y))/sum((x-mea.x)^) beta beta0 mea.y beta*mea.x beta0 The umerator part s called resdual sum of squares; RSS. sgmasq sum((y beta0 beta*x)^)/(legth(x)-) sgmasq ----- 3.8383 (score/hour). the margal score creases by 3.8383 f the study hour creases by oe ut. 0 57.703 (score) σ 3.5/(0-) 40.64 (6) We eed to solve two equatos smultaeously. Takg the frst dervatve wth respect to ad ad we ca obta: -* (Y - 0 - *X ) 0 & - X * (Y - 0 - *X ) 0. The above two equatos mply that ê 0 & X * ê 0 5

Fg. 6.3 Regresso Le ad Resduals y 40 50 60 70 80 90 00 9.6 5.7838 4.7838-0.37786.4605-3.5395-8.70-7.898-3.73.593-0 4 6 8 0 x I Fg 6.3, t shows that the regresso le we get wll yeld the smallest squared aggregate devatos; ( Y 0 X ). I other words, the total of (-8.70) + + (.593) wll be the smallest gve the OLS estmator 0 ad. Alteratvely we ca wrte equato () to a matrx form as followg: Y X* + (7) Where Y s a x matrx, X s a x matrx ad s a x vector. Let s gore temporarly. y y Y, X y x x 0,. x Applyg matrx dfferetato accordg to equato (7), we ca fd 3 that 3 Same as what we dd equato (3). 6

7 φ 0 (X T X )- (X T Y) (8) 4 What does Fg. 6.3 mea mathematcally? y y y 0 * + * x x x (We wat to assg a weght to ad X vector ad thus results a value (dstace) that s closet to Y vector). Small Sample Property of OLS Estmator Estmated 0 ad E( 0 ) 0 OLS estmator s ubased (9) Var( 0 ) has the smallest varace amog all the lear (0) estmators OLS estmator s effcet. 4 The dervato of equato (8) s same as the oe I show footote. I wll brefly expla matrx dfferetato our meetg. As you ca see t ca be appled easly to lear multple (.e., more tha oe regressors) regresso model. Projecto (blue arrow) of Y o (, X) vector space X Y Fg. 6.3 *X 0 * Ŷ ê

Estmated Varaces of 0 ad Var( Var(φ ) 5 0 ) Cov( 0, Cov(, 0 ) Var( ) ) σ *(X T X )- σ *( X X X ) - σ * * X ( X ) X X * () 6 X ------------------------- SXX (X X) (X X * X X ) X - * X * X + X X - * * X + * X ( X ) X X - * + *( ) X ( X ) ( X ) - * + ( X ) X - ------------------------ Let s focus o the estmated varace of : X Var( 0 ) σ * * X X ( X ) σ * SXX ( X ) * SXX / σ * SXX * X * SXX X σ *( + ) () SXX Var( ) σ * σ * σ * * X ( X ) X ( X ) / SXX X X / X Cov( 0, ) σ * - σ * - σ * * X ( X ) X ( X ) / SXX How do we fd the estmate of correlato coeffcet betwee 0 & (.e., ( 0, ))? (3) (4) Hypothess Testg o 0 ad Suppose we are terested whether equals certa value, say *. H 0 : * H A : * Assume the error term,, s NIID 7 (0, ) 5 Var() (X T X )- ; eeds to be estmated sce t s ukow. 6 Ths varace-covarace matrx s the most mportat estmator for statstcal ferece purpose. 8

~ N(, TS ) ~ t - (Why s the degree of freedom -?) Let s use the umercal example from page three to exame whether equals zero (does study tme affect exam score?) I other words, we wat to test H 0 : 0 H A : 0 ( Var( )) σ * SXX 40.64 56.9 0.7076 σ 40.64 SXX 56.9 TS 3.8383 0 0.7076 4.56 (table value, t 8.306 at 5% level of sgfcace) The above TS dcates that we ca ot reject the ull hypothess at 5% (or eve %) level of sgfcace usg two-tal test. Alteratvely, we ca calculate the p-value to decde whether we ca reject the ull. A p- value s a measure of how much evdece we have agast the ull hypothess. I ll show you how to calculate p-value usg R. The p-value for TS 4.56 wth 8 df s about 9.*0-4 What ca we coclude our umercal example? If studets study oe more hour, hs (her) exam score wll be 3.8383 sgfcatly hgher. The Aalyss of Varace (Aga; you should see the patter whe F test s eeded) Let s cosder the codtoal mea of Y: Ŷ 0 + *X Y Ŷ + ê Y - Y ( Ŷ - Y ) + ê (Y - Y ) ( Ŷ - Y ) + ê (5) TSS 8 ESS + RSS 7 Normally, Idetcally, Idepedetly, Dstrbuted. The commo assumptos about the error tem are a) E( X) 0, b)e( X). Meag, the error tem has a mea equals zero ad t s homoscedastc (.e., costat varace). 8 Deote ths term by SYY. 9

ESS ( Ŷ - Y ) ( 0 + *X - ( 0 + * X)) ESS R (coeffcet of determato) TSS * (X - X) R * SXX SYY SXY ( ) * SXX SXX SYY (SXY) SXX * SYY (6) R SXY SXX SYY (7) Note: the formula equato (7) s same as the oe used to calculate the sample correlato coeffcet hadout 4. Table ANOVA Source df SS MS F p-value --------------------------------------------------------------------------------------------------------- Regresso SSreg SSreg/ MSreg 9 / σ Resdual - RSS σ RSS/- ---------------------------------------------------------------------------------------------------------- Total - SYY ------------------------------------------ Notce: t F (see pp., Hadout #3) The Resduals The resduals ca be used for dagostc check. Ths exames whether the model assumptos are volated. Kowg whether the assumptos are volated affects the hypothess testg results. I the ext hadout we wll show that how to modfy OLS estmator gve that we detect ether multcollearty, heteroscedascty, ad/or autocorrelato problems. Predctos Ŷ 0 + * X Where X s a chose value (vector). For example, gve certa study tme, X what should be the expected exam score ( Ŷ )? ( X Stadard error of predcto σ *( + + X) / ) SXX Stadard error of predcto matrx form { σ *[ + X T (X T X )- X ]} / 9 SSreg/ MSreg. 0

Exercse usg R: The Study Tme example (The R code ad outcome are preseted together) > data read.table("study.txt",headert) #you eed to create the data fle by yourself > attach(data) > ames(data) [] "Score" "Tme" > Y Score > X cbd(,tme) > (Beta solve(t(x)%*%x)%*%(t(x)%*%y)) [,] 57.7030 Tme 3.83833 > (Y_hat X%*%Beta) [,] [,] 65.37786 [,] 76.8979 [3,] 6.53954 [4,] 69.67 [5,] 88.40773 [6,] 65.37786 [7,] 57.703 [8,] 80.73 [9,] 69.67 [0,] 6.53954 > (Table cbd(y,y_hat)) Y [,] 65 65.37786 [,] 69 76.8979 [3,] 64 6.53954 [4,] 75 69.67 [5,] 90 88.40773 [6,] 75 65.37786 [7,] 49 57.703 [8,] 77 80.73 [9,] 74 69.67 [0,] 58 6.53954 > resd Y - Y_hat > plot(tme,y) > les(tme,y_hat) Graph appears the ext page > sgma_sq sum(resd^)/(legth(y)-) > sgma_sq [] 40.6406

Y 50 60 70 80 90 0 4 6 8 Tme > (Var_Beta sgma_sq*(solve(t(x)%*%x))) Tme 0.8676 -.93648 Tme -.93648 0.707685 > (TS_Beta0 Beta[,]/sqrt(Var_Beta[,])) 7.5365 > (TS_Beta Beta[,]/sqrt(Var_Beta[,])) Tme 4.56866 > (t_value qt(0.975,8)) [].306004 > (p_value *(pt(-ts_beta,8))) Tme 0.008496 > mea_y mea(y) > (TSS sum((y - mea_y)^)) [] 60.4 > (ESS sum((y_hat - mea_y)^)) [] 838.875 > (RSS sum(resd^)) [] 3.5 > (TSS ESS+RSS) [] 60.4 > (R_square ESS/TSS) [] 0.745 > model lm(score~tme) #use ths to exame the matrx operato results above > X_New data.frame(tme c(, 8, )) #Choose a sample of study tme > Pred predct(model, X_New, terval predct, level95%) #predct the Score ad ts CI (formula at the bottom of pp. 0) DO the last three les by yourself!

Exercse usg R: Captal Asset Prcg Model (CAPM) I ths exercse, we wll scrap the data eeded from the Yahoo Facal secto. We wll use R packages ad exame ther algorthms about how the data s retreved from the web stes. The structos wll be gve our meetg. Exercse 3 usg R: Demad for Moey I ths exercse, we wll test the relatoshp betwee dfferet measuremet of moey wth real GDP, terest rate, ad prce. We ll apply OLS method usg the dffereced varables. The reaso for dfferecg s to acheve statoarty for each varable. For those who wll cotue o takg Ecoometrcs 3, Dr. Emara wll demostrate all the detals o how to deal wth tme seres type of data. Data: moey.csv 3

Handout #6. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1