The Classical Linear Regression Model and Least Squares

Size: px

Start display at page:

Download "The Classical Linear Regression Model and Least Squares"

Darleen Harmon
5 years ago
Views:

1 1 The Classcal Lnear Regresson Model and Least Squares Greene Chapters:, 3 R scrpt mod1_a, mod1_b, mod1_c, mod1_d Regresson Analyss Much work n appled econometrcs s based on regresson analyss. Defnton A regresson s a stpulated relatonshp between the expectaton of a dependent varable and a combnaton of explanatory data and unknown parameters. Genercally: E( y x, β) f ( x, β ) (1) We can perform regresson analyss n fully parametrc and sem-parametrc frameworks. We usually wrte the regresson model as: y E y x, β + y E y x, β E y x, β + f x, β + () where s an "error term" that ncludes everythng we can't explan about y even knowng x and (hypothetcally) populaton parameter vector β. What's n the error term? The error term s also called the "dsturbance term". It can nclude some or all of the followng components: 1. Specfcaton error: There are other factors than x that drve the observed varablty n y, and / or the functonal relatonshp between y and x s ncorrect. The former s essentally unavodable n most applcatons, and does not necessarly jeopardze our ablty to estmate the effect of x on y. The latter s a bgger problem. It can produce neffcent and, n some cases, msleadng estmates.. Measurement error (also known as "errors n varables"): Usually a mnor problem f we msmeasure y, but potentally a serous ssue f we ms-measure elements of x. 3. Human erratc behavor: The stpulated relatonshp between y and x s generally correct, but unts of observaton (usually people) slghtly change ther behavor from case to case, ntroducng devatons from the stpulated relatonshp. Ths s a truly random part of the error term, and n most cases does not jeopardze estmaton results. In a parametrc framework, we would assgn a densty to (such as (, ) n σ ), where n stands for "normally dstrbuted". In that case, σ becomes an addtonal parameter that has to be estmated. In a E to preserve the relatonshp n (1). sem-parametrc framework we would smply state that A specal case of a regresson model s a model that s lnear n β,.e. one that can be wrtten as y x β + (3)

2 Such models can be estmated va a technque called "Least Squares (LS)". If certan assumpton on hold, the model s called "Classcal Lnear Regresson Model" (CLRM), and estmaton can proceed va "Ordnary Least Squares" (OLS), the topc of the next secton. R practce: Buldng a regresson model for study tme R scrpt mod1_a llustrates how to buld a regresson relatonshp wth smulated data. The scrpt also shows the gan n accuracy as the regresson model chosen by the researcher approaches the true model. Notaton Assume you have a sample of wages and explanatory varables for n ndvduals. For a sngle observaton ( person, frm, household, etc), the CLRM can be wrtten as: y β x + β x + + β x + x β + (4) 1 1 k k The β - terms are unknown populaton parameters. In a regresson context, they are often referred to as "coeffcents". In practce, x 1 s often smply the number "1", whch then makes β 1 the ntercept or "constant term" of the regresson model. The remanng β - terms are "slope coeffcents". By conventon, I wll ndex the β - terms from 1 through k. (In some texts, the ndex s from to k, whch mples that the total number of coeffcents s k+1). We can stack these equatons for all 1..n ndvduals: y β x + β x + + β x k 1k 1 y β x + β x + + β x k k y β x + β x + + β x + n 1 n1 n k nk n (5) Next, we note that the left hand sde and the error terms can be compactly expressed as a vectors y y y y and. For a gven ndvdual, the rght hand sde (mnus [ ] [ ] 1 n 1 the error term) can be wrtten as an nner product of vectors: n β x + β x + + β x x β x 1 1 k k [ x x x ] β1 β βk 1 k, β where (6) We can now wrte the entre system of lnear equatons as:

3 3 x1 β x1 x11 x1 x1 k x1 x x xβ x k where y β X β X x β x x x x n n n1 n nk (7) Assumptons The CLRM rests on 4 key assumptons. An optonal 5th assumpton (normalty of the error term) can be added n cases where a fully parametrc approach s desred or needed. Assumpton 1: The stpulated relatonshp s lnear n parameters (.e. n β ). Note: Ths does not mean that the elements of data matrx X also have to be lnear. It's OK to have log-terms, squared terms, etc n the X matrx. Certan nonlnear forms of y are also permssble, as long as they can be transformed to lnearty. For example, the followng common model s a permssble CLRM: ( ) ln ( y ) y exp x β+ xβ + (8) The log-transformaton on both sdes preserves lnearty. If all of the x 's are logged as well, the model s called "log-log" or "log-lnear". If all of the x 's are lnear, the model s called "sem-log". Naturally, a mx of logged and lnear regressors s also possble. As we wll see below, loggng ether sde changes the nterpretaton of margnal effects. Assumpton : The data matrx X has to be "full rank". X has dmensons n x k, n>k, so "full rank" mples that X s rank "k". In words: All varables n X are lnearly ndependent. If ths wasn't the case, there would be an nfnte number of solutons for the estmate of β. Example: see R scrpt mod1_b Assumpton 3: The expectaton of all error terms, condtonal on X, s zero,.e E ( 1 X) E ( ) E ( ) X X E ( n X) (9) In words, ths mples that no element of the data matrx contans any nformaton about the expectaton of any error term, so s a true vector of "unknown" or "unobservable" effects. If ths assumpton s volated, our estmated parameters wll "pck up" undesred effects from "the stuff n the error term", thus producng msleadng results. Ths s the nfamous "omtted varable (OV)" problem, whch comes n many szes & shapes. We wll talk about OV's at length later n ths course. As dscussed n Greene, ch., Assumpton 3 also mples the followng relatonshps:

4 4 E [ ] ( X) [ y X] Cov E, Xβ (1) Assumpton 3 s often ms-nterpreted and can be confusng. Here s a closer look. Frst, we need to dstngush between the dstrbuton of the error term and the dstrbuton of the explanatory varables. In the CLRM, we assume that each observaton (say a gven ndvdual ) s assgned her own error term,. In addton, error terms are ndependent from each other. Thus, we can focus on one error term at the tme when examnng the underpnnngs of Assumpton 3. In the CLRM, by Assumpton 4, ths error term follows a dstrbuton wth mean zero and varance σ, and all error terms,, 1 n, follow the exact same dstrbuton. Thus we can thnk of all error terms as realzatons from the same dstrbuton. However, we wll later encounter models where each has ts own dstrbuton (.e. a volaton of Assumpton 4), but Assumpton 3 may stll hold. Thus, for ths dscusson of Assumpton 3, we wll make no further assumpton regardng the exact dstrbuton of, and how t relates to the dstrbuton of j, j, other than nvokng ndependence. In other words, t's OK to thnk of each as followng ts own dstrbuton. Now consder some explanatory varable x k (example: "SAT score"). Generally, we consder each explanatory varable as a true random varable n the statstcal sense, followng some underlyng dstrbuton, potentally jontly wth other regressors. Indvdual 's realzaton of ths varable s x k. The set of realzatons of xk for the sample at hand s vector x k. For cross-sectonal regresson, where each observaton corresponds to a dfferent ndvdual (or "cross-sectonal unt") t makes sense to thnk of each xk, 1 n n xk as realzaton from the populaton dstrbuton f ( x k ). Ths dstrbuton s shared by all ndvduals n the sample. In tme seres analyss ths s not as clear-cut, snce x tk (example: water use n month t) could feasbly follow a dfferent dstrbuton than x( t 1) (water use n month t-1). Thus, k as wth the error term, we can assume wthout loss of generalty that each xk, 1 nk, 1 Kfollows ts own (unspecfed) dstrbuton wthn the context of ths dscusson. So let's frst focus on a sngle draw of the error term,, and a sngle draw of some regressor for person, x k. For smplcty, let's also assume that there are no other explanatory varables n the model (so we don't have to worry about potental correlaton of xk wth xl, l k ). Ths means we can drop the "k" subscrpt for now. To talk about condtonal expectatons n a meanngful way, we must assume that and x (potentally) follow a jont dstrbuton, say f (, x). Ths jont densty can always be wrtten as a product of a margnal and a condtonal densty,.e. (, ) ( ) ( ) ( ) f x f x f x f x f (11)

5 5 The margnal densty, margnal (uncondtonal) expectaton, and condtonal expectaton of as follows: are gven ( ) (, ) f f x dx x E f d ( ) ( ) E x f x d (1) We can now derve an mportant relatonshp, called the "Law of Iterated Expectatons" (see Greene p. 16 and p. 17): E ( ) f ( ) d f (, x) dx d f ( x) f ( x) dx d x x x f x d f x dx E E x ( ) ( ) ( ) x (13) Ths says that the uncondtonal expectaton of can be nterpreted as the expectaton, over x, of the condtonal expectaton of. x Now back to Assumpton 3: It mples that E ( ) x uncondtonal expectaton s zero as well, snce. By (13), ths automatcally mples that the Ex E x Ex. Assumpton 3 s often nterpreted as " and x are uncorrelated",.e. have a covarance of zero. Let's check f ths s correct,.e. f E ( x ) cov (, x ) : ( x) E, x E x E x, x ( ) +, x ( ), x ( ( ) ), x ( ) + ( ) ( ( )) ( ( )) ( ) ( ) ( ) ( ) cov, * E x E x E x E E x E x E E x E E x E E x ( ) ( ) + ( xe ( )) ( ( )) x E E x Ex E x E x + E E x ( ( )) ( ( )) ( ( ) ( )) ( ( )( )) E E x x E E E x x E E E x x + E E x x x x E xe x E E x E E x E x E E x E x x x x E xe x E E x E x x x E xe x E x E x E E x x E x x x (14)

6 6 Naturally, f E ( ) x E ( x ) ( x ) the last lne n (14) wll always equal zero, so ndeed we have cov,. Note that the last lne also corresponds to Greene's Theorem B. on p. 17. Is the reverse true,.e. does cov (, x ) E ( x ) hold? It wll as long as we have ( ) ( ) x E x, else cov, x E E x E x E x E E x * (15) x x regardless of the value of E ( x ) regresson model.. Naturally, ths can only hold for the constant term n a gven From Assumpton 3 t also follows that E y x E xβ + x E xβ + E x E xβ + xβ (16).e. we have ndeed a true regresson relatonshp. For ths reason Assumpton 3 s often referred to as the "regresson" assumpton. Now let's go a step further and assume that there are two explanatory varables n the model, xk and x l, and that these two regressors are potentally correlated wth each other and wth the error term. (For example, xl could be "frst year GPA" x k could be "SAT score", and both could, n theory, be correlated wth the unobserved varable "ablty", or "spunk"). Thus, we are now consderng the tr-varate densty f, x, x. Ths does not pose any addtonal complcatons. k l Assumpton 3 smply requres that As before, ths mples E ( ) snce ( ) ( ) ( ) xk, xl k l xk, xl E x, x f x, x d k l k l E E E x, x E (17) By analogy to our dervaton above, the covarance between and each of the two regressors must be zero as well. Pushng ths one last step further, we now extend Assumpton 3 also to observatons on regressors for other ndvduals n the sample. Consder a second ndvdual, j, for whom we observe realzatons for the same two regressors, x jk and x jl. Ths smply extends the exposton to a fve-varate jont densty,.e. f, x, x, x, x, and Assumpton 3 requres that ( k l jk jl ) E x, x, x, x f x, x, x, x d k l jk jl k l jk jl All other results follow by analogy. Thus, we can compactly express Assumpton 3 for any generc CLRM as shown n (9).

7 7 R scrpt mod1_c llustrates the mplcatons of Assumpton 3. Assumpton 4 Homoskedastcty ( equal varance) of error terms: ( ) σ V 1 n (18) So we assume all n error terms are "drawn" from the same dstrbuton wth mean and varance Error terms are uncorrelated ("non-autocorrelated"): ( ) ( ) Cov, E E E E, j (19) j j j j Volatons of these assumptons s called "heteroskedastcty" and "autocorrelaton", respectvely. We'll address these ssues later n ths course. For the full model, Assumpton 4 can be wrtten as σ n σ 1 n σ V E E σ I n 1 n n n σ () Dsturbances that satsfy ths property are called "sphercal". For the dependent varable, ths mples: ( ) ( + ) + + (, ) V y X V Xβ V Xβ V Cov Xβ V σ I (1) Assumpton 5 The error terms follow a normal dstrbuton,.e. ~ n, σ or ~ n, σ I () For the dependent varable, ths mples: ( σ ) y ~ n Xβ, I (3) Ths normalty assumpton s not necessary to derve parameter estmates for β, but t s needed to derve some exact statstcal results for estmators, and to construct certan test statstcs.

8 8 The statstcal propertes of X In some applcatons, the elements of the observed data X can be vewed as fxed (e.g. n expermental settngs where all "data" are perfectly controlled), but n most cases X tself wll follow some dstrbuton n the underlyng populaton. If so, we vew the entre analyss as "condtonal on X", whch means we essentally "freeze" X at the observed values. We understand that a dfferent X mght produce dfferent results, but we hope that these dfferences would vansh wth (hypothetcal) collecton of more and more data. What's mportant s Assumpton 3: Whchever mechansm generates X s unrelated to the mechansm that generates the error terms. Estmaton The populaton model at the observaton level s gven by (4). We seek an estmator for β wth desrable statstcal propertes. We wll denote ths estmator genercally as b, and the resultng estmate for E[y x ] as y ˆ,.e. [ ] Eˆ y x y ˆ xb (4) The term yˆ s often called "ftted value". The dfference between actually observed y and estmated (or "predcted") yˆ s called "resdual" e,.e. e y yˆ y xb (5) In passng, we note that ths mples the followng equalty x β + x b + e and Xβ + Xb + e. (6) To fnd b, a natural strategy mght be to get yˆ as close to y as possble for all observatons. By (5) ths mples "makng the resduals as small as possble". A more popular crteron s to mnmze the sum of squared resduals, whch penalzes larger resduals relatvely more. See slde of the PowerPont sldes "OLSregresson" on our course web page, and also Greene Fg. 3.1 (p. ). Denotng a canddate for b as b, our optmzaton goal s thus gven as mn e e y Xb y Xb y y b X y y Xb + b X Xb b y y y Xb + b X Xb (7) We solve for b by dervng the Frst Order Condtons (FOC), usng "matrx calculus" (see Matrx Algebra notes) ee Xy + XXb b Xy + XXb (8)

9 9 The second lne s often referred to as the Least Squares "Normal Equaton". If X s full rank, X'X wll be symmetrc and full rank as well, thus ts nverse wll exst, and we can solve (8) for b to obtan 1 b XX Xy (9) As shown n Greene p. 1, ths soluton s ndeed a mnmum and unque, snce the second dervatve yelds a postve defnte matrx. The soluton s called the Ordnary Least Squares (OLS) estmator. See scrpt mod1_b for an example of OLS usng wage data. Orthogonalty and Projecton In Eucldean space, "orthogonaly" between vectors a and b (some b, not our OLS estmator) arses f two vectors form a rght angle. Mathematcally, ths mples that ther nner product s zero,.e. a'b. Orthogonalty s also possble between a matrx X and a conformable vector a. In that case, t mples that every column of X s orthogonal to a, and that X'a. In contrast, populaton orthogonalty between two varables mples that the two varables are perfectly uncorrelated for the entre populaton, n the sense that the expectaton of ther nner product s zero (essentally our Assumpton 3 for the CLRM for any regressor x and the error term). However, for any fnte set of draws for these varables (.e. for a gven sample), orthogonalty n the Eucldean sense s unlkely to hold (though we would expect the nner product to be relatvely close to zero). Perfect (Eucldean) orthogonalty between explanatory varables s generally a very desrable property n regresson analyss. In realty, t hardly ever occurs (the best we can hope for s "low correlaton"), but n expermental settngs researches can 'desgn" orthogonal explanatory varables. We wll vst the topc of "orthogonalty" many tmes n ths course. For the OLS model, we have orthogonalty between data matrx X and the resduals by constructon. Startng wth the normal equaton, we have: XXb Xy X y Xb Xe (3) Intutvely, ths makes a lot of sense - the resduals should only contan what X couldn't explan about y. If the frst column of X s a vector of "1's" (.e. we are estmatng a regresson model wth a constant term - the standard case, we gan a few more nterestng nsghts from the normal equaton and relatonshps that flow from t: () The resduals sum to zero: e [ ] ce Xe c ck e ce k (31) () The regresson hyperplane passes through the data means:

10 1 c [ ] c XXb Xy c 1 ck b y c k ck 1st equaton: n n n [ c ] 1 ck b y n c1 c b y k n n n n n 1 c1 n ck nb y n x'b y where x 1 c1 n ck n Next, let's derve two mportant matrces n regresson analyss: -1-1 e y Xb y X( X'X) X y I X( X'X) X y My and -1 yˆ y- e y My I M y X( X'X) Xy Py In regresson jargon, M s known as the "resdual maker" and P s the "projecton matrx". Note that both matrces are a functon of X. So wth each new X, we get a new M and P. Also, both matrces are symmetrc and dempotent ( means M*M M). Intutvely, just remember that M turns y nto "everythng X couldn't explan",.e. the resduals, and P turn y nto "everythng X can explan",.e. the ftted values. Some useful relatonshps that follow: (3) (33) MX PM MP PX X y Py + My ee ey yy yy ˆˆ + ee (34) Parttoned Regresson and Partal Regresson R scrpt mod1_d Queston: What computatons are nvolved n obtanng, n solaton, a subset of the coeffcents of a multple regresson model (whle stll controllng for all other effects)? In the early days of regresson analyss, the resultng strategy of "parttoned regresson" was an mportant "shortcut" to reduce computaton tme. Today, we show ths prmarly to gan further nsghts nto the mechancs of OLS. Frst, assume we have data matrx X and correspondng coeffcent vector β, but we are prmarly nterested n the effects of a subset of varables (columns) of X. Denote ths subset as X and the remanng varables as X 1. Conformably, also splt the coeffcent vector nto parts β and β 1. We can then wrte the CLRM as: y Xβ+ Xβ Xβ + (35)

11 11 Our objectve s to fnd the OLS soluton for b. The normal equatons can now be wrtten as XX 1 1 XX 1 b1 Xy 1 XX 1 XX b Xy (36) Frst, use the frst set of equatons to solve for b 1 n terms of X 1, X, y, and b : 1 XXb + XXb Xy b XX X y Xb (37) Note that f all columns of X 1 are orthogonal to all columns of X (a rare occurrence to say the least...), X' 1 X, and we have the usual OLS normal equatons, and 1 b XX Xy. Ths mples that we could then estmate b 1 smply by regressng y on X 1. Intutvely, ths further mples that there s no correlaton between X 1 and X, so we don't have to "control" for the effects of X when estmatng the effects of X 1 on y. (Note: However, we would stll lose effcency and explanatory power for our model f X has somethng to say about y). Combnng (37) wth the second set of normal equatons from (36) yelds the soluton for b : XX b + XX b Xy XX XX X y Xb + XX b Xy b XMX XMy M I X XX X where M 1 s the resdual maker matrx n a regresson of anythng on X 1. Note agan that ths reduces to the standard OLS formula f X' 1 X. We can alternatvely wrte the parttoned regresson as * ( *) 1 * * where b X X X y * * 1 1 X MX, y My (38) (39) Thus, we can thnk of ths as a two-step process: In step 1 we regress all columns of X on X 1 and collect the resduals (.e. get M 1 X ). These resduals wll then contan what's left of the nformaton n X after nettng out the effect of X 1. We do the same for y by regressng t on X 1 and collectng resduals. In step we then regress the purged-of-x 1 -effects verson of y on the purged-of-x 1 -effects verson of X to get the pure effect of X on y,.e. b. Ths s the famous "Frsch-Waugh Theorem" (Greene p. 8). The process can easly extended to derve the parttoned result for an ndvdual coeffcent. In that case, just thnk of X as a sngle column (say "z"), and of b as a scalar, say "c". We then get ( ) 1 where, * * * * * * 1 1 c z z z y z Mz y My (4) If you consder z to be a new varable to be added to X, you get the result n Corollary 3..1, p. 34. R scrpt mod1_d llustrates the Parttoned Regresson approach. Note that the estmated standard devaton for the error term, and thus the standard errors and t-values for the parttoned regresson results are a bt dfferent from those obtaned n the full model. Ths dfference les solely n the (n-k) correcton n the denomnator for the s-formula, snce "k" dffers between the full and parttoned model. Naturally,

12 1 ths dfference dmnshes under ncreasng sample sze, as n wll vastly outwegh k n ether case. (You can verfy ths by droppng the "mnus k" correcton n the s formula for both models the s.e.'s and t- values wll now be dentcal) Goodness of Ft and Analyss of Varance (ANOVA) A specal case of the resdual-maker matrx M 1 s the (n by n) devaton-from-the-mean matrx M : ( ' ) 1 M I (41) where I s the n by n dentty matrx and s an n by 1 vector of ones. Intutvely, ths matrx "regresses" any vector, say x, or matrx, say X, aganst a column of ones and captures the resultng resduals. These resduals wll smply be the devaton of each observaton n x or X from ts respectve column mean. Example: x1 x1 x x x x x Mx x x x x x n n [ ] [ ] X c c c M X c c c c c c 1 k 1 1 k k (4) We can now compactly wrte our CLRM n "devaton from the mean" form as My MXb + Me MXb + e (43) The second equalty follows from the fact that the mean of the resdual vector s zero,.e. Me the model ncludes an ntercept). Notng that t then follows that e'mx e X, we can derve the sum of squared dfferences of the elements of y from ther mean, known as "Total Sum of Squares" (SST) as e(iff My My ymy bxmxb + emxb + bxme + eme bxm Xb+ ee ym ˆ yˆ + ee (44) If the regresson ncludes an ntercept ( constant term), we have ŷ y and yˆ yˆ yˆ y, and we can nterpret (44) as Total Sum of Squares Regresson Sum of Squares + Error Sum of Squares, or SST SSR + SSE. Note: The last term s also known as "sum of squared resduals", but snce the resultng acronym would also be SSR, we'll stck wth Greene's termnology of "Error Sum of Squares". Intuton: Recall that the fundamental am of regresson analyss s to explan observed varaton n y va observed varaton n X. Equaton (44) says that we can break down the varaton n y nto two components: A part that's explaned by the varaton of X (SSR), and a part that our regresson cannot explan (SSE). It s thus natural to use the rato of SSR / SST as Goodness-of-Ft measure for our regresson model. Ths s the "Coeffcent of Determnaton", better known as "R ".

13 13 R b X MXb ee SSR / SST 1 ym y ym y (45) Ths statstc les between and 1. A value of "" mples that X has nothng to say about y. In a - dmensonal model, ths would be a flat regresson lne through the mean of y. A value of 1 mples a "perfect ft" - y actually les n the hyperplane descrbed by the columns of X. As dscussed n more detal n Greene, Ch. 3, a problem wth ths ft measure s that t never declnes as more explanatory varables, even meanngless ones, are added to the model. In other words, the conventonal R measure doesn't penalze for superfluous regressors, or, alternatvely, doesn't reward for parsmony. We therefore prefer the "Adjusted R " gven as /( 1) ee / n k ee n 1 R 1 1 ym y n ym y n k (46) Thus, as the number of regressors (ncludng ntercept) k ncreases relatve to sample sze n, the last term ncreases, and the adjusted ft deterorates. The ordnary and adjusted R are related as follows: R ( R ) n n k (47) Note that the precedng dervatons break down when the regresson model does not nclude a constant term. For models wthout constant term, the nterpretaton of R becomes ambguous, and ths measure should not be used. Instead, you may want to consder the followng alternatve goodness-of-ft measures, the Akake Informaton Crteron (AIC) and the Schwartz or Bayesan Informaton Crteron (BIC). Both work for models wthout constant term and, for that matter, also for nonlnear regresson models. They also reward parsmony. They are usually gven n log-form as follows: ee K ee Klog( n) log AIC log + log BIC log + n n n n (48) (see Greene ch. 5 for more detals). Note that both statstcs DECLINE as model ft mproves. (So a smaller number mples a better ft). Some software packages produce a summary of squared devatons n a table accompanyng the man regresson results. Ths table s often referred to as "Analyss of Varance" (ANOVA) Table. In STATA for example, "Model" SS means SSR, "Resdual" SS means SSE, and "Total" SS means SST. See scrpt mod1_d for basc goodness-of-ft dervatons. Fnally, f you wsh to use R to choose between two models, the followng must hold: 1. The vector of dependent observatons must be dentcal for both models. (So you can't compare a model that s lnear n y to one that uses, say, log y. Also, sample szes must be dentcal)

14 14. The models have to be lnear-n-parameters. For nonlnear regresson models use AIC, BIC or related measures. 3. Both models must have a constant term, and the mean of the error term n the populaton model must be zero.

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also