UNIT 11 MULTIPLE LINEAR REGRESSION
|
|
- Hannah Thompson
- 5 years ago
- Views:
Transcription
1 UNIT MULTIPLE LINEAR REGRESSION Structure. Itroductio release relies Obectives. Multiple Liear Regressio Model.3 Estimatio of Model Parameters Use of Matrix Notatio Properties of Least Squares Estimates.4 Test of Sigificace i Multiple Regressio.5 Coefficiet of Determiatio (R ) ad Adusted R.6 Regressio with Dummy Variables.7 Summary.8 Solutios/Aswers. INTRODUCTION I previous uits, we have discussed the liear relatioship betwee the depedet variable Y ad a idepedet variable X. The coefficiets a ad b were ukow ad for the give data o Y ad X, we have obtaied least squares estimates of parameters, i.e., â ad bˆ. We have also goe through the iferetial study to examie whether there exists a sigificat liear relatioship betwee Y ad X or ot. We have discussed the simple liear regressio model ad estimatio of model parameters, ad determied stadard errors. I this uit, we discuss the multiple liear regressio model alog with the estimatio of parameters i Secs.. ad.3. I multiple liear regressio, the basic cocept is the same as that of simple regressio. However, istead of oe idepedet variable, there are several idepedet variables, say, X, X, X 3,, X p. For example, the umber of uits sold by a car maufacturig compay per year may ot deped o oly oe idepedet variable such as price, but also o mileage per uit of fuel, appearace of the car, comfort level, durability ad moey spet o advertisig, etc. Here we may like to idetify the importat idepedet variables, which cotribute more to the variatio i the depedet variable(s). For this purpose, a mathematical relatioship betwee the depedet ad idepedet variables is established ad this relatio is further used for predictio purposes. We also discuss the iferetial study i multiple liear regressio i Sec..4. Sice the model may ivolve several idepedet variables affectig the depedet variable because of their relatioship via regressio, it may be of iterest to estimate their importace by estimatig regressio coefficiets alog with their stadard errors. The adequacy of model fit may be examied by overall fit of the model with the help of coefficiet of determiatio (R ). I this uit, we also discuss a method for calculatig R ad adusted R i Sec..5. The regressio aalysis with dummy variables is also discussed i Sec
2 Regressio Modellig I the ext uit, we shall discuss how to calculate the extra sum of squares explaied by the regressor variables o the respose variable. We shall also discuss the methods of selectio of importat regressor variables which play a importat role i selectio of the best fitted models. Obectives After studyig this uit, you should be able to: explai the cocept of multiple liear regressio; formulate a multiple liear regressio model; estimate the regressio coefficiets ad their stadard errors; calculate the coefficiet of determiatio (R ) ad adusted R ; ad predict the depedet variable for give values of idepedet variables.. MULTIPLE LINEAR REGRESSION MODEL I this sectio, we geeralise the simple regressio model cosidered i Uit 9. We have assumed i Uit 9 that (Y, X ), (Y, X ),, (Y, X ) are pairs of values. The equatio of the simple liear regressio model may be writte as Y = a + bx + e where e represets the error term, which arises from the differece of the observed Y ad the straight lie Y = a + bx. To miimise the term e, we use the method of least squares. From the above equatio, we may write a simple regressio model as Y i = a + bx i + e i i =,,, for the sample data of pairs give i terms of (Y i, X i ) (i=,, ). I agriculture, the crop yield depeds o more tha oe variable such as fertility of the soil, amout of raifall, amout of fertilisers, etc. A multiple regressio model that might describe this relatioship is Y = B + B X +B X + B 3 X 3 + e where Y deotes the yield, X deotes the fertility of soil, X deotes the raifall ad X 3 deotes the amout of fertilisers used. This is called the multiple liear regressio model with three idepedet/regressor variables. The term liear is used because the depedet/respose variable Y is a liear fuctio of the ukow parameters B, B, B ad B 3. I geeral, the respose variable may be related to p regressors or idepedet variables. Let Y be the depedet variable ad X, X,..., X p be p idepedet variables. The the multiple regressio model ca be writte as: Y B B X B X... B X e () p p 46 The parameters B, B,, B p are called the regressio coefficiets. The parameters B i (i =,,,, p) represet the expected chage i the respose variable Y per uit chage i X i whe the remaiig regressor variables are treated as costat. For the sake of simplicity, we shall attach a dummy variable X with the itercept B ; X takes value for all observatios. Now the model i equatio () ca be writte as: Y B X B X B X... B X e () p p
3 The simple regressio model cosidered i Uit 9 becomes a particular case of this model with X =, B = a, B = b ad B i =, (i ). The iterpretatio of coefficiets B ( =,,, p) is that B represets the amout of chage i Y for a uit chage i X, keepig the other idepedet variables X k (k ) fixed. These coefficiets are kow as partial regressio coefficiets as the effect of oe idepedet variable is studied o the depedet variable while the other variables are held fixed or costat. We use the term multiple liear regressio for this model because two or more tha two variables are icluded i the regressio aalysis ad the parameters B, B,, B p appear i a liear form. Moreover, the effect of these variables ca be studied oitly. Here X i ca be ay cotiuous fuctio such as log X, X, X 3, X, etc. However, it is ecessary that the equatio is liear. Let us cosider a polyomial model Multiple Liear Regressio Y p B BX BX... BpX e If we let X = X, X = X, X 3 = X 3 ad so o, the above model ca be writte i a liear form as give i equatio (). As i the case of simple liear regressio (Uit 9), here too we make the assumptios that e is ormally ad idepedetly distributed with mea zero ad costat variace σ..3 ESTIMATION OF MODEL PARAMETERS Recall that i Uit 9, we have estimated the parameters a ad b of a simple liear regressio equatio usig the method of least squares. I this method, we miimise the total error term, so that the sum of the squares of the differeces betwee the observed values ad their expected values is miimum, i.e., the sum of squares of the error terms is miimum. We also use the method of least squares to estimate the regressio coefficiets give i equatio (). Let the umber of observatios be (> p). Let y i deote the i th observed value ad x i deote the th observatio of the regressor variable X i. The data is represeted as give i the table below: pose Variable Y Regressor Variables X X.... X p y x x.... x p y x x.... x p y 3 x 3 x x p y x x.... x p The the multiple regressio model for the i th observatio Y i ca be writte as: Yi = B + BXi + BX i BpXpi + e i, i =,,..., where X i, X i,..., X pi are the correspodig values of p idepedet variables, B is the itercept, B, B,..., B p are p regressio coefficiets correspodig to idepedet variables X, X,, X p, respectively. 47
4 Regressio Modellig We ow miimise e i, the sum of squares of errors i the model give i equatio (): i i i i p pi (3) E e Y B X B X... B X i i with respect to B, B,..., B p to obtai their least squares estimates. For estimatig the model parameters B, B, B,, B p, we differetiate E with respect to B, B, B,, B p, respectively, ad equate the result to zero. If we differetiate E with respect to B, we obtai the th ( =,,, p) ormal equatios as follows: E B Yi BXi BXi... BpXpi X i,,,,..., p i Simplifyig equatio (4), we obtai the least squares ormal equatios: B Xi B Xi... Bp Xpi Yi i i i i Xi B Xi B XiX i... Bp XiX pi XiYi i i i i i B B B Xi B XiXi B Xi... Bp XiXpi X iyi i i i i i (4)..... (5) B Xpi B XpiXi B X pix i... Bp Xpi XpiYi i i i i i These are p + ormal equatios ad ca be solved usig the methods of solvig simultaeous liear equatios. The solutios of the above ormal equatios called the least squares estimates are B ˆ, B ˆ ˆ ˆ, B,..., B p,respectively. For simplicity, we shall rewrite the model i equatio () by cetralisig the idepedet variables X, X,..,X p, i.e., by takig differeces from their meas: Y B B X... B X B X X... B X X e p p p p p i B X B (X X )... B (X X ) e p p p 48 where B B BX... BpXp. Here X, X,..., Xp are the meas of p idepedet/ regressor variables. With this, the ormal equatio becomes i E Yi B Xi B Xi X Bp Xpi X p (X i X ), B (6) Note that X i = for all i. The coefficiets B, B,..., B p remai the same, but the itercept chages from B to B. Oce we have obtaied the estimates of B, B, B,, B p, we ca obtai ˆB from the followig equatio: Bˆ = Bˆ - Bˆ X Bˆ X (7) ' p p Let us cosider a applicatio of these results.
5 Example : A statistical aalyst is aalysig the vedig machie routes i the distributio system. He/she is iterested i predictig the amout of time required by the route driver to service the vedig machies i a outlet. The compay maager resposible for the study has suggested that the two most importat variables affectig the delivery time Y (i miutes), are (i) the umber of cases (X ) ad (ii) the distace travelled (i m) by the route driver (X ). The delivery time data collected by the statistical aalyst is give below: Multiple Liear Regressio Time (Y) No. of Cases (X ) Distace (X ) Check whether there is a liear relatioship betwee Y (Time) ad the two idepedet variables X (umber of cases) ad X (distace). Calculate the values of the regressio coefficiets ad fit the regressio equatio. Solutio: To fid the values of regressio coefficiets ad fit the regressio equatio for the give data, we form the followig table: Time (Y) No. of Cases (X ) Distace (X ) Y (X ) (X ) X Y X Y X X Y i = X i = X i =35 Y i =38 X i =95 X i =5 Xi Yi =85 XiYi =68 XiXi =35 49
6 Regressio Modellig O puttig the values from the above table i the ormal equatios (5) for p =, ad otig that X =, we get B ˆ B ˆ X B ˆ X Y i i i B ˆ X B ˆ X B ˆ X X Y X i i i i i i B ˆ X B ˆ X X B ˆ X YX i i i i i i O puttig the values calculated i the table i the above equatios, we get Bˆ Bˆ 35 Bˆ (i) Bˆ 95 Bˆ 35 Bˆ 85 (ii) 35 Bˆ 35 Bˆ 5 Bˆ 68 (iii) From equatio (i), we have ˆB Bˆ ˆ 35 B (iv) O puttig the value of ˆB i equatios (ii) ad (iii) ad simplifyig, we get 4 Bˆ 8 Bˆ (v) 8 Bˆ 587 Bˆ 6 O solvig equatios (v) ad (vi), we get ad Bˆ.3 Bˆ.356 Hece, the fitted equatio is ˆB.8765 Y X.356 X (vi) So we ca coclude that there is a liear relatioship betwee Y (time i secods) ad the two idepedet variables X (umber of cases) ad X (distace). As the regressio coefficiets for both variables are positive, these affect the delivery time. The umerical value of the regressio coefficiet ˆB associated with X is higher tha the value of ˆB associated with X. It shows that the umber of cases affects the delivery time more tha the distace travelled..3. Use of Matrix Notatio Whe p is greater tha, it is more coveiet to write the ormal equatios i matrix form. The regressio equatios i matrix otatio ca be writte as: 5 Y = X B + e (8)
7 where Multiple Liear Regressio y X. Xp B e X. Xp B y e Y.., X (p)...., Bp ad e y B X. X p p e I geeral, Y is a vector of the observed values of the respose variable Y, X is a (p+) matrix of the values of regressor variables, B is a (p+) vector of regressio coefficiets ad e is a vector of radom errors. I matrix otatio, the (p+) ormal equatios ca be writte as follows: X XBˆ XY (9a) Equatio (9a) represets the ormal least squares equatios. For the sake of simplicity, we may write them as xi. xpi B y i x i x. B i xix pi xi yi B pi pi i pi p x x x. x x piyi To solve the ormal least squares equatios give i equatio (9a), we multiply both sides by the iverse of X X. Thus, the estimates of the regressio coefficiets are give by XX X Y Bˆ (9b) () O puttig the values of the estimates i equatio (), we get the fitted regressio model correspodig to the observatios of the regressor variables X, X,, X p as Yˆ Bˆ Bˆ X Bˆ X... Bˆ X e () p p The matrix represetatio of the fitted values correspodig to the observed values are similar to the equatio (9a) ad are give as Yˆ X Bˆ X X X XY () The differece betwee the observed value y i ad the correspodig estimated value ŷ i is called the i th residual r i, i.e., Here, we shall use the followig otatio: Y Y.,. Y X X X. Xp Xp..... X(p) X X X. Xp Xp Note that X i s are all uity ad other variables are cetralised (deviatios from mea). The (k+) ormal equatios ca be writte as where ˆB' = X XBˆ = X Y (B ˆ ', ˆB, -----, ˆB ) I case X X is o-sigular, i.e. ( X X ) is of rak (p+), the least squares estimates of B, deoted by ˆB, ca be writte as - ˆB = (X X) X Y r y ŷ i i i i,,3,...,. The residuals may be writte i matrix otatio as r Y Ŷ Y X Bˆ (3) 5
8 Regressio Modellig.3. Properties of Least Squares Estimates We ow describe the statistical properties of least squares estimates. Whe X, ( =,,..., p) are liearly related, ( X X ) is ot ivertible. I this case we caot obtai uique estimates of B. We shall ot cosider this case ay more. It is to be oted that Bˆ is a ubiased estimate of B because E(B) ˆ (X X) X E(Y) (X X) X E(X B e) (X X) (X X)B = B sice E(e ) = ad (X X) (X X) I This shows that ˆB is ubiased. The variace of Y (which is actually the variace-covariace matrix as Y is a vector) is give as ( ) V Y = s I where I is a idetity matrix of order. The variace-covariace matrix of Bˆ is give by ˆ V B X X X V Y X X X XX X I X XX where (X X) (4) X X... X X X X X X X Xi XiX i Xi XiX X 'X Xpi XiX pi XiXpi Xpi i i pi i i i i i pi pi ad s X X. k i ki i Here V( Bˆ )= σ (X X) is a (p+) (p+) matrix ad its diagoal elemets give the variaces of coefficiets ad off diagoal elemets give the covariaces. If we use the otatio we ca write - ˆ ( ) ( ) V B = s X 'X = ( s ),,k =,,..., p k k k V(B ˆ ) = s, ad Cov(B ˆ,B ˆ ) = s (5) 5 The stadard error of Bˆ is give by S. E. ( ) ˆB = s (6)
9 The residual sum of squares SS is obtaied by substitutig the least squares estimates of B, B,, B p i equatio (3): SS Y Bˆ X Bˆ X... Bˆ X i oi i p pi i This is the sum of squares ot accouted for by the regressio model. I matrix otatio, this ca be writte as SS YY YX Bˆ Y B ˆ (Y'X ) B ˆ (Y 'X ) B ˆ (Y'X )... B ˆ (Y'X ) i p p (7) Note that X, X,, X p are deviatios from respective meas. As we have fitted (p +) parameters, the degree of freedom of residual sum of squares is ( p ). A ubiased estimate of σ is obtaied by dividig the residual sum of squares, i.e., SS, by its degree of freedom ( p ). Thus Multiple Liear Regressio ˆ SS /( p ) (8) If we are iterested i predictig the mea value of Y for a give set of idepedet variables X,, X p, the we use the fitted model. The predicted mea value of Y for give X,, X p is give by Y ˆ B ˆ B ˆ X... B ˆ X p p Let us explai the matrix method with the help of a example. Example : Usig the data of Example, fid the estimate of regressio coefficiets ad SS by usig the matrix method. Also predict the expected time Y at X = 7, X =. Solutio: Usig the matrix otatio we have from the data: ad Y = [,,, 5, 5,,, 5, 3, 5,, ] 35 X' X 95 35, X' Y X ' X ' ˆB ˆB X 'X X 'Y ˆB
10 Regressio Modellig Hece the fitted equatio is Y X.356 X Now, we calculate the value of residual sum of squares to obtai a estimate of ˆ as follows: SS YY YXBˆ = 38 (.8765) 85 (.3) 68 (.356) = = 97.5 Therefore, o puttig the value of ˆ 97.5/( 3).87 SS i equatio (8), we get Usig the above results ad puttig the values X = 7 ad X = i the fitted equatio for multiple regressio, we get Ŷ X.356 X Ŷ As far as the iterpretatio of coefficiets is cocered, there is a icrease of.3 secods i time for oe uit icrease i X. Similarly, for oe uit icrease i X there is a icrease of.356 secods i time. You may like to pause here ad solve the followig exercises to check your uderstadig. E) I a study of firms, the depedet variable was the total delivery time (Y) ad the idepedet variables were the distace covered (X ) ad the packagig time (X ). The delivery time data collected by the statistical aalyst is give below: 54 Time (Y) Distace (X ) Packagig Time (X ) Y = 66 i X = 747 i X = 6 i Estimate the parameters B, B, ad B by solvig ormal equatios ad fid the estimated multiple liear regressio equatio. E) Use the matrix method to estimate parameters from the data give i E).
11 .4 TEST OF SIGNIFICANCE IN MULTIPLE REGRESSION Multiple Liear Regressio So far you have leart how to estimate the parameters ad fit the multiple regressio model. You may ow like to test the adequacy of the fitted model ad examie whether the idepedet variables cotribute sigificatly i explaiig the variability i Y or ot. For this purpose, we use the test of sigificace of equality of variaces of the regressor variables. If there is a liear relatioship betwee the respose variable Y ad ay of the idepedet variables X, X,, X p, we use the test of sigificace of regressio. The test of sigificace of regressio is a test to determie the liear relatioship betwee the respose variable ad regressor variables ad is ofte used to examie the adequacy of the model. I order to test whether the cotributio of idepedet variables X,,X p is sigificat or ot, we test whether B, B,, B p are all zero i the model or at least oe of them is ot zero. This hypothesis ca be writte as: H : B B... Bp H : At least oe of the regressio coefficiets is ot zero It ca be tested by cosiderig the followig F-ratio: SS p F Reg SS p (9) I this test, the total sum of squares SS T is partitioed ito a sum of squares due to the cotributio of regressor variables ( SS ) ad a residual sum of squares ( SS ). From equatio (7), the residual sum of squares ( SS ) is: SS Y B ˆ (Y'X ) B ˆ (Y'X )... B ˆ (Y'X ) i p p or SS YY Y'XBˆ () If B, B,, B k are all zeros, i.e., idepedet variables do ot cotribute to the variability i Y, the the total sum of squares, deoted by SS T, is give as: Y i SST Yi Y YY () This is the total variability preset i Y aroud the mea Y. We ca rewrite equatio () as Reg Y i Yi i SS Y BY 'X that is, SS SST SSReg Hece, the differece of SST SS gives the cotributio of idepedet variables X, X,, X p, i explaiig the variability i Y, i.e., 55
12 Regressio Modellig SS SS B ˆ (Y'X ) B ˆ (YX )... B ˆ (YX ) Y T p p Y i SS ˆ Reg Y'X B or SS Bˆ Y 'X Y Reg We ow summarise these results i the followig ANOVA Table: () ANOVA TABLE Sources of Variatio Degree of Freedom (d.f.) Sum of Squares (S.S.) Mea Sum of Squares Variace Ratio Idepedet Variables (X, X,, X p ) p p SS Bˆ Y 'X Y Reg SS Reg SS F Reg SS p p p iduals ( SS ) p SS Y 'Y Bˆ Y 'X p SS p Total Y' Y Y Uder the ull hypothesis, i.e., whe B = B = = B p =, F is distributed as Fisher s F-distributio with p ad (p) degree of freedom, i.e., F ~ F p,( p) (3) If the calculated F is less tha the tabulated F p, (p) at α level of sigificace, the we coclude that the cotributio of X, X,..., X p to the variability i Y is ot sigificat. Thus, they have o cotributio i predictio. It may be of further iterest to examie whether ay oe coefficiet (say B ) correspodig to the idepedet variable X is differet from zero, after accoutig for other variables X k (all k ). This ca be tested by cosiderig the statistic t: Bˆ t (4) S.E.(Bˆ ) where S.E.( Bˆ ) uses the estimated value of ˆ give i equatio (8). Uder the ull hypothesis, i.e., B =, the proposed statistic t follows the Studet s t-distributio with (p) d.f. Thus, if t t (5) /, p we accept H. Otherwise, we reect it. If B is sigificatly differet from zero, it cotributes sigificatly to the variability i Y after takig ito accout the cotributio of other variables. If B is ot sigificatly differet from zero, its cotributio is ot sigificat after accoutig for other variables i the model. 56 Example 4: Usig the data of Example ad the results of Example, costruct the ANOVA table, apply a relevat test of hypothesis ad iterpret the results.
13 Solutio: As per the data give i Example ad the results of Example, we have SS = ad p ˆB Y'X = Usig these values, we costruct the ANOVA table as follows: ANOVA TABLE Multiple Liear Regressio Sources of Variatio Degree of Freedom (d.f.) Sum of Squares (S.S.) Mea Sum of Squares Variace Ratio Idepedet Variables (X, X ) iduals ( SS ) p SS Bˆ Y 'X Y Reg = p SS Y'Y Bˆ Y'X = SS Reg = SS p =.87 F SSReg p = SS p =7.5 ( - - ) Total Y' Y Y = We have obtaied the Variace Ratio F = 7., whereas the tabulated value of F, 9 at α =.5 is 4.6. Hece, we reect H ad coclude that X ad X cotribute sigificatly to the variability. It may be of further iterest to examie whether the coefficiet B correspodig to idepedet variable X is differet from zero, after accoutig for other variables X k (all k ). This ca be tested by cosiderig the statistic t: Bˆ t S.E.(Bˆ ) From the result of Example, we also have ˆB.3 ad ˆB.356 The Variace-Covariace matrix is ˆ ˆ XX V B Thus ˆ V B Usig equatio (5), we obtai V(B ˆ ) 7.7, V(B ˆ ˆ ).4 ad V(B ).4 ad therefore, ˆ S.E. B =
14 Regressio Modellig ˆ S.E. B ˆ S.E. B = Therefore, the statistic t is give as: ˆB.8765 t.6758 S.E.(B ˆ.7769 ) ˆB.3 t 4.64 S.E.(B ˆ.399 ) ˆB.356 t.7444 S.E.(B ˆ.494 ) But the tabulated value of t-statistic for α =.5 is t.5,.6 Hece, both variables cotribute sigificatly to the variability i Y. You may ow like to solve the followig exercise. E3) Make the ANOVA table, calculate stadard errors of estimates ad test their sigificace usig the data i E. Iterpret the results..5 COEFFICIENT OF DETERMINATION (R ) AND ADJUSTED R We defie the coefficiet of determiatio, R, i the same way as for simple regressio. It gives a measure of adequacy of model fit. We defie R as follows: R = Variability accouted by idepedet variables/total variability aroud the mea 58 p ˆB Y'X Y'Y Y Y (6) Its value always lies betwee ad. Whe the fit is good, R ~. Otherwise, R ~. The value of R always icreases with p. The icrease may be egligible, but R ever decreases. Whe we compare two models with differet values of p, the model with larger p is preferable if R correspodig to it is sigificatly larger tha R with smaller p. A model with smaller p with large R is always preferable as it is a simple model. Hece, you should choose a model with small p if its R is ot much smaller tha R for a model with a larger p. For this, we defie a adusted R, viz., R Ad, which pealises R whe p icreases but R does ot icrease sigificatly. We kow that
15 R R SSReg (7) SS T SS SS T Reg = SS T SS SS T Multiple Liear Regressio The we defie R Ad as SS /( p ) ( )( R ) Ad SS T /( ) ( p ) R (8) Here, we have divided the umerator ad deomiator by their degree of freedom. SS /( p ) may decrease with icrease i p eve whe there is o appreciable decrease i R. Hece, R Ad ( )( R ) (9) ( p ) Therefore, we should stop icludig the terms i the model if decreasig. We prefer a model with larger with smaller R Ad but larger p. R Ad starts R Ad ad smaller p tha a model Example 5: Usig the data of Example ad the results of Examples ad 3, calculate R, R ad iterpret the results. Ad Solutio: Usig the data of Example ad the results of Examples ad 3, ad o puttig the values i equatio (7), we get SS SS Reg R.797 T Therefore, the adusted R is obtaied as follows: ( )(.797) R Ad.7454 ( ) From the coefficiet of determiatio, R, we see that 79% variability i Y is due to X. This is quite a good fit. Adusted R is.7454, which is quite large. Hece we coclude that both X ad X cotribute adequately to the model fit. You may ow like to calculate R ad adusted R yourself. Try the followig exercise. E4) Calculate R ad adusted R ad commet o the goodess of fit of the model, for the data give i E..6 REGRESSION WITH DUMMY VARIABLES I previous sectios, we have dealt with multiple liear regressio whe the idepedet / regressor variables are quatitative. The quatitative variables such as height, distace, temperature, time, icome, pressure, etc. have a well 59
16 Regressio Modellig defied scale of measuremet. However, sometimes idepedet variables iclude qualitative variables such as sex (male/female), regios (orth, south, east, west, etc.), religio such as Hidu, Muslim, Christia, etc. Such variables called categorical variables caot be measured ad hece o quatitative umber ca be assiged to them. We defie dummy variables to accout for the effect that the qualitative variables may have o the respose variable. Dummy variables are also kow as idicator variables. Suppose, k is the umber of levels a categorical variable takes. The we defie (k ) dummy variables. For example, if we have two categories of male or female i the data, i.e., k = ad we defie oe dummy variable. Suppose that a statistical aalyst is aalysig the vedig machie s efficiecy i the distributio of a product. She/he is iterested i relatig the time required to service the cosumer with the distace travelled by the product i the vedig machie for machies of two types, A ad B. The secod regressor variable, machie type is qualitative, ad has two levels: Type A ad Type B. It allows us to code the types of machies used. Therefore, we defie a dummy variable X which takes o the values ad to idetify the types of machies as follows: X, if distributio is doe by machie A, if distributio is doe by machie B The variable X is called a idicator variable because it is used to idicate the presece or absece of Machie A or B. For such situatios, we have a multiple liear regressio model give by Y B BX BX e (3) To determie the regressio coefficiets i this model, we first cosider machie type A for which X takes value. The the regressio model is give by: Y B B X B e Y B BX e (3) The relatioship betwee the respose variable Y ad regressor variable X, i.e., distace travelled by the product i the machie is a straight lie with itercept B ad slope B. For machie of type B, we have X =. The the regressio model becomes Y B B X B e Y B B B X e (3) which shows that the relatioship betwee Y ad X is also a straight lie with slope B but itercept (B + B ). 6 Note that these models are liear with the same slope B but differet itercepts. Hece, these two models describe two parallel regressio lies, i.e., two lies with a commo slope ad differet itercepts. The vertical distace betwee these two lies is the differece i the itercepts, i.e., B. The two parallel regressio lies formed by the above models give i equatios (3) ad (3) are show Fig...
17 Multiple Liear Regressio Fig.. For three Machie types A, B ad C, two dummy variables X ad X 3 are used. The model becomes The levels of dummy variable would be: Y B BX BX B3X3 e (33) X = X 3 = X = X 3 = X = X 3 = For Machie Type A For Machie Type B For Machie Type C I geeral, a categorical variable with k categories is deoted by (k ) dummy variables. Let us try to uderstad regressio aalysis usig dummy variables with the help of a example. Example 6: A statistical aalyst is aalysig the performace of washig machies i the distributio system. He/she is iterested i predictig the amout of time required by the driver to service washig machies of two types: i) Type A ad ii) Type B. The data o the required time collected by the statistical aalyst is give below: Time (Y) Distace (X ) Machie Type (X )
18 Regressio Modellig Check whether there is a liear relatioship betwee Y (time) ad the two idepedet variables X (distace) ad X ( type). Calculate the values of the coefficiets ad fit the regressio equatio. Solutio: Sice two types of washig machies A ad B have bee used, k =. Here we have to defie oe dummy variable X, which takes two values: X = if the observatio is from machie A = if the observatio is from machie B We form the followig table from the give data to fit the regressio equatio: Time (Y) Distace (X ) Machie Type (X ) Y (X ) (X ) X Y X Y X X Y i = X i =35 X i = 5 Y = i 38 X i = 5 X i =5 X Y i i = 68 X Y i i = 8 X X i i = The ormal equatios (5) for p = ad X i = are: B ˆ B ˆ X B ˆ X Y ' i i i B ˆ X B ˆ X B ˆ X X YX i i i i i i B ˆ X B ˆ X X B ˆ X Y X i i i i i i O puttig the values of the sums calculated i the above table, we get Bˆ 35 Bˆ 5Bˆ (i) 35 Bˆ 5 Bˆ Bˆ 68 (ii) 5 Bˆ Bˆ 5Bˆ 8 (iii) 6 From equatio (iii), we have Bˆ 6 44 Bˆ Bˆ (iv)
19 O puttig the value of ˆB i equatios (i) ad (ii) ad simplifyig, we get Multiple Liear Regressio 78Bˆ 7 Bˆ 8 (v) 3 Bˆ 3 Bˆ O solvig equatios (v) ad (vi), we get Bˆ =.3498, Bˆ =-.38 Bˆ 6 44 Bˆ Bˆ.646 ad Hece, the fitted regressio equatio is (vi) Y X.38 X (vii) We coclude that there is a liear relatioship betwee Y (time i secods) ad the two idepedet variables X (distace) ad X (type of machie). Sice the regressio coefficiet for the variable X is egative, it affects the delivery time. The umerical value of the regressio coefficiet associated with X is higher tha that of the other regressor variable. It shows that distace travelled (i m) affects the delivery time less tha the type of machies. To determie the regressio coefficiets i this model for each type of machie, we first cosider machie A for which X takes value. We put the values of regressio coefficiets i equatio (8). The the regressio model becomes Y X (viii) For machie B, we put the value of the regressio coefficiet ad X =. The the regressio model becomes Y X (ix) Note that as discussed i Sec.5, these estimated regressio lies have the same slope, i.e.,.3498, but have differet itercepts, i.e.,.646 ad.66. You may ow like to solve the followig problem to check your uderstadig: E5) Usig the data give i the followig table, fid the regressio coefficiets ad obtai the estimated regressio equatios for the model give i equatios (7), (8) ad (9) : Time (hour) Y Distace (feet) X Machie Type X 8 6 A 4 95 A 7 7 A 4 84 A 3 98 A 4 53 B 3 68 B 54 B 89 B 9 73 B 63
20 Regressio Modellig Check whether there is a liear relatioship betwee Y (time) ad the two idepedet variables X (distace) ad X (machie type). Calculate the values of the coefficiets ad fit the regressio equatio. We ow summarise the cocepts that we have discussed i this uit..7 SUMMARY. The basic cocept of multiple liear regressio is the same as that of simple regressio. However, istead of oe idepedet variable, there are several idepedet variables, say, X, X, X 3,, X p.. A multiple regressio model is give by Y = B + B X +B X + + B p X p + e where Y is the depedet variable ad X, X,, X p are p idepedet variables. This is called the multiple liear regressio model with p idepedet/regressor variables. The term liear is used because the depedet/respose variable Y is a liear fuctio of the ukow parameters B, B, B,, B p. 3. The simple regressio model cosidered i Uit 9 becomes a particular case of this model with X =, B = a, B = b ad B i = (i ). The iterpretatio of coefficiets B ( =,,, p) is that B represets the amout of chage i Y for a uit chage i X, keepig the other idepedet variables X k s (k ) fixed. These coefficiets are kow as partial regressio coefficiets as the effect of oe idepedet variable is studied o the depedet variable while the other variables are held fixed or costat. We use the term multiple liear regressio for this model because several variables are icluded i the regressio ad the parameters B, B,, B p appear i a liear form. 4. We estimate the parameters of a multiple liear regressio equatio usig the method of least squares. I this method, we miimise the total error term, so that the sum of the squares of the differeces betwee the observed values Y i ad its expected values is miimum, i.e., the sum of squares of the error terms is miimum. Whe p is greater tha, it is more coveiet to write the ormal equatios i matrix form. The regressio equatios i matrix otatio ca be writte as Y = X B + e, where Y is a vector of the observed values of the respose variable Y, X is a (p + ) matrix of the values of regressor variables, B is a (p + ) vector of regressio coefficiets ad e is a vector of radom errors. I matrix otatio, the (p + ) ormal equatios ca be writte as X XBˆ XY 5. The variace-covariace matrix of Bˆ is give by V( Bˆ ) (X X) where V( Bˆ )= σ (X X) is a (p + ) (p + ) matrix ad its diagoal elemets give the variaces of coefficiets ad off diagoal elemets give the covariaces. If we use the otatio 64 V Bˆ X' X ( ),, k,,...,p. k
21 we ca write V(Bˆ ), ad, k ) Cov(Bˆ Bˆ k Multiple Liear Regressio The stadard error of Bˆ is give by Bˆ S.E. 6. If there is a liear relatioship betwee the respose variable Y ad ay of the idepedet variables X, X,, X p, we use the test of sigificace of regressio. The test of sigificace of regressio is a test to determie the liear relatioship betwee the respose variable ad regressor variables ad is ofte used to examie the adequacy of the model. 7. The coefficiet of determiatio, R ad adusted R are measures of goodess of fit of the multiple regressio model. The value of R always icreases with p. The icrease may be egligible, but R ever decreases. Whe we compare two models with differet values of p, the model with larger p is preferable if R correspodig to it is sigificatly larger tha R with smaller p. A model with smaller p with large R is always preferable as it is a simple model. Hece, oe should choose a model with small p if its R is ot much smaller tha R for a model with a larger p. 8. We defie dummy variables to accout for the effect that qualitative variables may have o the respose variable. Dummy variables are also kow as categorical as idicator variables. Suppose, k represets the umber of levels a categorical variable takes, the we defie (k ) dummy variables. For example, if we have two categories, male or female, i the data, k = ad we defie oe dummy variable..8 SOLUTIONS/ANSWERS E) We do the followig calculatios for the give data: Time (Y) Distace (X ) Vedig Time (X ) Y (X ) (X ) X Y X Y X X Y i = 66 X i = 747 X i = 6 Y i = 98 X i = 5889 X i = 75 X Y i i = 9 XiYi = 4535 X X i i = 86 65
22 Regressio Modellig From the above table, puttig the values of Yi, X, i X, i X, i X, i Xi Yi, XiY ad i XiX i ormal equatios, we get i Bˆ 747 Bˆ 6Bˆ 66 (i) 747 Bˆ 5889 Bˆ 86 Bˆ 9 (ii) 6 Bˆ 86 Bˆ 75 Bˆ 4535 (iii) Solvig these equatios, we get ˆB , Bˆ ˆ.79 ad B.6 The fitted regressio equatio is: Y = X +.6 X E) Usig the matrix otatio, we have from the data: Y [8, 4, 7, 4, 3, 4, 3,,, 9] X'X , X'Y X ' X ˆB ˆB X 'X X 'Y.79 Bˆ.6 ad Hece, the fitted equatio is Y = X +.6X We ow calculate the value of residual sum of squares to obtai a estimate of ˆ as follows: SS YY YXBˆ = (6.7569) 9 (.79) 4535 (.6) = = 4.45 Therefore, o puttig the value of ˆ 4.45/( 3) 6.6 SS i equatio (8), we get 66 E3) Usig the data of E ad the results of E, we get ˆB , Bˆ ˆ.79 ad B.6
23 As per the data give i Example ad the result of Example, we have Multiple Liear Regressio SS 4.45 ad p Bˆ Y' X = Usig these values we costruct the ANOVA table as follows: ANOVA TABLE Sources of Variatio Degree of Freedom Sum of Squares (S.S.) Mea Sum of Squares Variace Ratio (d.f.) Idepedet Variables (X, X ) iduals ( SS ) p å SS = Bˆ Y 'X - Y Reg = = p SS = Y 'Y - Bˆ å Y 'X = = 4.45 SS Reg = SS ( - p - ) = 6.6 F SSReg p = SS p = 9.74 ( - - ) Total 9 Y' Y Y = 5.4 The calculated value of Variace Ratio F = 9.74, whereas the tabulated value of F, at α =.5 is Hece, we reect H ad coclude that X ad X cotribute sigificatly i explaiig the variability. It may be of further iterest to examie whether the coefficiet B, correspodig to idepedet variable X, is differet from zero, after accoutig for other variables X k (all k ). This ca be tested by cosiderig statistic t: ˆB t = S.E.(B ˆ ) From the result of Example, we have Bˆ.79 ad Bˆ.6 The Variace-Covariace matrix is ˆ ˆ X X V B Thus V(B) ˆ Usig equatio (5), we obtai V(B ˆ ) , V(B ˆ ˆ ).4 ad V(B ).3 ad therefore, 67
24 Regressio Modellig ˆ S.E. B = ˆ S.E. B.4.64 ˆ S.E. B = Therefore, the statistic t is give as: ˆB t 3.65 S.E.(B ˆ 7.56 ) ˆB.79 t.74 S.E.(B ˆ.64 ) ˆB.6 t.94 S.E.(B ˆ.489 ) But the tabulated value of t-statistic for α =.5 is t.5,7.37 Hece, variable X cotributes sigificatly i explaiig the variability i Y but the variable X does ot. As far as the iterpretatio of coefficiets is cocered, there is a icrease of.6 secods i time for oe uit chage i cases (X ). Similarly, for oe uit icrease i X, there is a.79 secods decrease i time. E4) Usig the data of E) ad the results of E) ad E3), we get R = Sum of Squares due to X, X /Total Sum of Squares = 9.58/5.4 =.79 R = - Ad ( - )( - R ) ( - p - ) ( ) = - = R idicates that oly 7% of variability i Y is explaied by X ad X. E5) Two types of washig machies A ad B have bee used. Hece, k =. Here we have to defie oe dummy variable X, which takes two values: 68 X = if the observatio is from machie type A = if the observatio is from machie type B
25 From the give data, we form the followig table to fid ad fit the regressio equatio: Multiple Liear Regressio Time Y Distace (X ) (X ) Y (X ) (X ) X Y X Y X X Y i =66 X i =747 X i = 5 Y = i 98 X i = 5889 X i =5 X Y i i = 9 X Y i i = 9 X X i i =337 From the above table, puttig the values i the ormal equatios (5) for p = ad otig that X =, we get Bˆ 747 Bˆ 5Bˆ 66 (i) 747 Bˆ 5889 Bˆ 337 Bˆ 9 (ii) 5 Bˆ 337 Bˆ 5Bˆ 9 (iii) From equatio (iii), we have ˆB Bˆ 5Bˆ 5 (iv) O puttig the value of ˆB i equatios (i) ad (ii) ad simplifyig, we get 365Bˆ 5 Bˆ 7 (v) 396 Bˆ 5 Bˆ 735 O solvig equatios (v) ad (vi), we get Bˆ = -.4, Bˆ = (.4) 5(.344) ad ˆB (vi) Hece the fitted equatio for the model give i equatio (7) is Y X.344 X (vii) 69
26 Regressio Modellig Now we ca coclude that there is a liear relatioship betwee Y (time) ad the two idepedet variables X (distace) ad X (type of machies). As the regressio coefficiet for the variable X is egative, it affects the delivery time. To determie whether the regressio coefficiets i this model are correct, we first cosider machie A for which X takes value. We put the values of regressio coefficiets i equatio (8). The the regressio model becomes Y X (viii) For machie B, we put the value of regressio coefficiets ad X =. The the regressio model becomes Y 3.6.4X (ix) Note that as discussed i Sec.5, these estimated regressio lies have the same slope, i.e.,.4, but differet itercepts, i.e., ad
1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationCorrelation Regression
Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More informationFirst, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,
0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationChapter 13, Part A Analysis of Variance and Experimental Design
Slides Prepared by JOHN S. LOUCKS St. Edward s Uiversity Slide 1 Chapter 13, Part A Aalysis of Variace ad Eperimetal Desig Itroductio to Aalysis of Variace Aalysis of Variace: Testig for the Equality of
More information3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.
3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationMatrix Representation of Data in Experiment
Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationSimple Linear Regression
Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationChapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).
Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each
More information(all terms are scalars).the minimization is clearer in sum notation:
7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1
More informationDr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)
Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable
More informationOpen book and notes. 120 minutes. Cover page and six pages of exam. No calculators.
IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits
More informationSimple Linear Regression
Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio
More informationLecture 11 Simple Linear Regression
Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp
More informationSIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS
SIMPLE LINEAR REGRESSION AND CORRELATION ANALSIS INTRODUCTION There are lot of statistical ivestigatio to kow whether there is a relatioship amog variables Two aalyses: (1) regressio aalysis; () correlatio
More informationSummary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.
Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationS Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y
1 Sociology 405/805 Revised February 4, 004 Summary of Formulae for Bivariate Regressio ad Correlatio Let X be a idepedet variable ad Y a depedet variable, with observatios for each of the values of these
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationSolutions to Odd Numbered End of Chapter Exercises: Chapter 4
Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd Numbered Ed of Chapter Exercises: Chapter 4 (This versio July 2, 24) Stock/Watso - Itroductio to Ecoometrics
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More informationStatistical Properties of OLS estimators
1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationRegression, Inference, and Model Building
Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More informationST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.
ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic
More informationRegression and Correlation
43 Cotets Regressio ad Correlatio 43.1 Regressio 43. Correlatio 17 Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationSTP 226 ELEMENTARY STATISTICS
TP 6 TP 6 ELEMENTARY TATITIC CHAPTER 4 DECRIPTIVE MEAURE IN REGREION AND CORRELATION Liear Regressio ad correlatio allows us to examie the relatioship betwee two or more quatitative variables. 4.1 Liear
More informationOverview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions
Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationRegression and correlation
Cotets 43 Regressio ad correlatio 1. Regressio. Correlatio Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should ote
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS
PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationCorrelation and Regression
Correlatio ad Regressio Lecturer, Departmet of Agroomy Sher-e-Bagla Agricultural Uiversity Correlatio Whe there is a relatioship betwee quatitative measures betwee two sets of pheomea, the appropriate
More informationIntroduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4
Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd- Numbered Ed- of- Chapter Exercises: Chapter 4 (This versio August 7, 204) 205 Pearso Educatio, Ic. Stock/Watso
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that
More informationCommon Large/Small Sample Tests 1/55
Commo Large/Small Sample Tests 1/55 Test of Hypothesis for the Mea (σ Kow) Covert sample result ( x) to a z value Hypothesis Tests for µ Cosider the test H :μ = μ H 1 :μ > μ σ Kow (Assume the populatio
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationStatistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005
Statistics 203 Itroductio to Regressio ad Aalysis of Variace Assigmet #1 Solutios Jauary 20, 2005 Q. 1) (MP 2.7) (a) Let x deote the hydrocarbo percetage, ad let y deote the oxyge purity. The simple liear
More informationWorksheet 23 ( ) Introduction to Simple Linear Regression (continued)
Worksheet 3 ( 11.5-11.8) Itroductio to Simple Liear Regressio (cotiued) This worksheet is a cotiuatio of Discussio Sheet 3; please complete that discussio sheet first if you have ot already doe so. This
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,
3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More information[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:
PROBABILITY FUNCTIONS A radom variable X has a probabilit associated with each of its possible values. The probabilit is termed a discrete probabilit if X ca assume ol discrete values, or X = x, x, x 3,,
More information[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION
[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION BY ALAN STUART Divisio of Research Techiques, Lodo School of Ecoomics 1. INTRODUCTION There are several circumstaces
More information10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random
Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationSEQUENCES AND SERIES
9 SEQUENCES AND SERIES INTRODUCTION Sequeces have may importat applicatios i several spheres of huma activities Whe a collectio of objects is arraged i a defiite order such that it has a idetified first
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationMBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS
MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationContinuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised
Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationECON 3150/4150, Spring term Lecture 1
ECON 3150/4150, Sprig term 2013. Lecture 1 Ragar Nymoe Uiversity of Oslo 15 Jauary 2013 1 / 42 Refereces to Lecture 1 ad 2 Hill, Griffiths ad Lim, 4 ed (HGL) Ch 1-1.5; Ch 2.8-2.9,4.3-4.3.1.3 Bårdse ad
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationSimple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700
Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU) Basics Purpose of regressio aalysis: predict the value of a depedet or respose
More information9. Simple linear regression G2.1) Show that the vector of residuals e = Y Ŷ has the covariance matrix (I X(X T X) 1 X T )σ 2.
LINKÖPINGS UNIVERSITET Matematiska Istitutioe Matematisk Statistik HT1-2015 TAMS24 9. Simple liear regressio G2.1) Show that the vector of residuals e = Y Ŷ has the covariace matrix (I X(X T X) 1 X T )σ
More informationComparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes
The 22 d Aual Meetig i Mathematics (AMM 207) Departmet of Mathematics, Faculty of Sciece Chiag Mai Uiversity, Chiag Mai, Thailad Compariso of Miimum Iitial Capital with Ivestmet ad -ivestmet Discrete Time
More information4 Multidimensional quantitative data
Chapter 4 Multidimesioal quatitative data 4 Multidimesioal statistics Basic statistics are ow part of the curriculum of most ecologists However, statistical techiques based o such simple distributios as
More informationStat 139 Homework 7 Solutions, Fall 2015
Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationCorrelation and Covariance
Correlatio ad Covariace Tom Ilveto FREC 9 What is Next? Correlatio ad Regressio Regressio We specify a depedet variable as a liear fuctio of oe or more idepedet variables, based o co-variace Regressio
More informationSTA6938-Logistic Regression Model
Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationCLRM estimation Pietro Coretto Econometrics
Slide Set 4 CLRM estimatio Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Thursday 24 th Jauary, 2019 (h08:41) P. Coretto
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More information1 General linear Model Continued..
Geeral liear Model Cotiued.. We have We kow y = X + u X o radom u v N(0; I ) b = (X 0 X) X 0 y E( b ) = V ar( b ) = (X 0 X) We saw that b = (X 0 X) X 0 u so b is a liear fuctio of a ormally distributed
More informationFull file at
Chapter Ecoometrics There are o exercises or applicatios i Chapter. 0 Pearso Educatio, Ic. Publishig as Pretice Hall Chapter The Liear Regressio Model There are o exercises or applicatios i Chapter. 0
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More informationComparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading
Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual
More informationChimica Inorganica 3
himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationRegression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.
Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationSeptember 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1
September 0 s (Edecel) Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright www.pgmaths.co.uk - For AS, A otes ad IGCSE / GCSE worksheets September 0 Copyright
More informationQuestion 1: Exercise 8.2
Questio 1: Exercise 8. (a) Accordig to the regressio results i colum (1), the house price is expected to icrease by 1% ( 100% 0.0004 500 ) with a additioal 500 square feet ad other factors held costat.
More informationAssessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions
Assessmet ad Modelig of Forests FR 48 Sprig Assigmet Solutios. The first part of the questio asked that you calculate the average, stadard deviatio, coefficiet of variatio, ad 9% cofidece iterval of the
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More information(X i X)(Y i Y ) = 1 n
L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More information