The Standard Linear Model: Hypothesis Testing

Size: px
Start display at page:

Download "The Standard Linear Model: Hypothesis Testing"

Transcription

1 Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]: Section 114 Chapter 12 Theil [6]: Chapters The OLS estimator Let there be N observations on K regressors X (including any constant term), and a response y The matrix X is N K, and assume it has rank K A constant term is a column of all ones The model is where y = Xβ + ε, E ε = 0 and Var ε = E(εε ) = σ 2 I The OLS estimator ˆβ OLS of the coefficients β is given by ˆβ OLS = (X X) 1 X y (1) The coefficient on the constant term is called the intercept Set ŷ = X ˆβ OLS and e = y ŷ = y X ˆβ OLS The vector e of residuals satisfies X e = 0 That is, e is orthogonal to each column vector of regressors If there is a constant term in the regression, then the sum of residuals 1 e = 0, so ē, the mean residual is also zero 252 More complicated hypotheses Sometimes we entertain more complicated null hypotheses on our coefficients For instance, economists are often interested in whether estimated shares of income add up to one This material is not in Larsen and Marx [4], but can be found for instance, in Theil [6, pp ] 25 1

2 KC Border Standard Linear Model: Testing 25 2 y ˆβ 1 x 1 e x 1 ŷ 0 x 2 ˆβ2 x 2 Figure 251 Geometry of OLS as The general linear hypothesis of q linear constraints on the vector β can be written simply H 0 : a = Aβ, where A is a q K matrix and a is q 1 column vector We assume that A has rank q For instance the hypothesis that β β K = 1 uses a = [1] and A = [1,, 1] The next result may be found in [6, Theorem 39, p 144] It may seem impossibly opaque now, but I will explain it in Section Theorem Under the null hypothesis, the test statistic F = 1 qs 2 (a A ˆβ OLS ) [ A(X X) 1 A ] (a A ˆβ OLS ) has a Snedecor F -distribution with (q, N K) degrees of freedom Tests based on these statistics are called F -tests Many software packages, including R, compute for you something called the F -statistic for the regression If you have a constant term, it is usually the first one, X 1 in our terminology The F -statistic for the regression test the null hypothesis that all the coefficients on the nonconstant terms are zero, H 0 : β 2 = β 3 = = β K = 0 The F -statistic for this test has (K 1, N K) degrees of freedom, and it only makes sense to perform this test if one of the regressors is a constant term This is a hypothesis that you probably would like to reject, so a small p-value (or large F -statistic) is a good thing The reason this is important is that if the columns of X are nearly collinear, each individual ˆβ kols is hard to estimate, that is, it will have a large standard error, so each one may turn out to be statistically insignificant, but together they could truly determine y 253 Lagrange Multiplier Theorem Before I can explain why the test of general linear restriction is based on an F -test, or the ratio of independent χ 2 random variables, we need to characterize how to impose restrictions on a v ::1445 KC Border

3 KC Border Standard Linear Model: Testing 25 3 minimization problem 2531 Lagrange Multiplier Theorem Let X be a subset of R n, and assume that the functions f, g 1,, g m : X R are continuous Let x be an interior constrained local minimizer of f subject to g i (x) = 0, i = 1,, m Suppose f, g 1,, g m are differentiable at x, and that g 1 (x ),, g m (x ) are linearly independent Then there exist real numbers λ 1,, λ m, such that The function f(x ) m λ i g i (x ) = 0 (2) i=1 L(x, λ) = f(x) n λ i g i (x) is called the Lagrangean and (2) is the set of first order conditions for a constrained extremum It is also the set of FOCs for the minimization of the Lagrangean When f is quadratic, and each g i is linear, it turns out that the constrained minimizer of f is an unconstrained minimizer of the Lagrangean L(x, λ ) For a discussion and proof of the Lagrange Multiplier Theorem, see my online notes [1] i=1 254 Restricted regression In Section 252, I asserted how to test a linear restriction on the coefficients Here s how to derive that In the linear model y = Xβ + ε, (3) to test the linear restriction on β H 0 : a = Aβ, where A is q K and has full rank, we first estimate (3) by minimizing the sum of squared residuals without restrictions, this gives us our usual estimates, but let me label them Not in Larsen Marx [4] ˆbu = (X X) 1 X y, ŷ u = X ˆb u, e u = y ŷ u Next we impose the restriction a = Ab and minimize the sum of squared residuals (with respect to b) to get the restricted estimates: ˆb r, ŷ r, and residuals e r To minimize the sum of squared residuals subject to the constraints, instead of projecting y onto the space {Xb : b R K } spanned by the columns of X, we have to project y onto {Xb : b R K and Ab = a} The point y r is still a linear combination of the columns of X, so the unrestricted residual vector e u is orthogonal to y u y r and the Pythagorean Theorem tells us that e r e r e u e u = (y u y r ) (y u y r ) See Figure 252 The first question is, What is the formula for ˆb r? To answer this, note that the Lagrangean for this minimization is (y Xb) (y Xb) λ (Ab a) where λ is a q-vector of Lagrange multipliers The Lagrange Multiplier Theorem states that the first order condition for a constrained minimum is that the vector of partial derivatives with respect to b of the Lagrangean must be zero, 2X y + 2X X ˆb r A λ = 0 (4) KC Border v ::1445

4 KC Border Standard Linear Model: Testing 25 4 y e r e u y r y u y r x 2 y u 0 x 1 {Xb : Ab = a} Figure 252 Restricted regression with restriction β 1 = 1 The points y r, y u, y form a right triangle with hypotenuse y r y v ::1445 KC Border

5 KC Border Standard Linear Model: Testing 25 5 Premultiply by A(X X) 1 : or 2A (X X) 1 X y +2 A(X X) 1 (X X) }{{} ˆb r A(X X) 1 A λ = 0, }{{} =ˆb u =A ˆb r=a A(X X) 1 A λ = 2(a A ˆb u ) (5) I now claim that A(X X) 1 A is a q q matrix of full rank, 1 so it has an inverse This let s us solve (5) for λ : λ = 2 [ A(X X) 1 A ] 1 [a A ˆbu ] Substitute this into (4) to get Premultiply this by (X X) 1 to get (X X) 1 X y + (X }{{} =ˆb u X y + X X ˆb r A [A(X X) 1 A ] 1 [a A ˆb u ] = 0 X) 1 X X ˆb r }{{} =ˆb r so the formula for the restricted estimator is: (X X) 1 A [A(X X) 1 A ] 1 [a A ˆb u ] = 0 ˆbr = ˆb u +(X X) 1 A [A(X X) 1 A ] 1 [a A ˆb u ] (6) The next question is, What is the sum of squared residuals? Well and we can use (6) to write e r e r = (y X ˆb r ) (y ˆb r ) y X ˆb r = y X ˆb u }{{} =e u X(X X) 1 A [A(X X) 1 A ] 1 [a A ˆb u ] = e u + XC, where C = (X X) 1 A [A(X X) 1 A ] 1 [a A ˆb u ] This means we can write e r e r = (e u + XC) (e u + XC) = e u e u + 2e u XC + (XC) (XC) But since the unrestricted residual vector e u is orthogonal to the columns of X, we have e u X Now let s write out (recalling that the transpose of a product is the product of the transpose in reverse order, and observing that [A(X X) 1 A ] 1 and (X X) 1 are symmetric): (XC) (XC) = Thus [a A ˆb u ] [A(X X) 1 A ] 1 A(X X) 1 X X(X X) 1 A [A(X X) 1 A ] 1 [a A ˆb u ] = [a A ˆb u ] [A(X X) 1 A ] 1 [a A ˆb u ] e r e r = e u e u + [a A ˆb u ] [A(X X) 1 A ] 1 [a A ˆb u ] (7) 1 To see that the q q matrix A(X X) 1 A has full rank, suppose A(X X) 1 A b = 0 for some q-vector b We want to show that this implies that b = 0 Since A has rank q, so does A A, so (A A) 1 exists Premultiply the equation above by (X X)(A A) 1 A to get (X X)(A A) 1 A A(X X) 1 A b = A b = 0 Since A has rank q, this implies b = 0 qed KC Border v ::1445

6 KC Border Standard Linear Model: Testing 25 6 A test of H 0 amounts to a test of the null hypothesis H 0 : e re r e u e u = 1 versus the alternative hypothesis H 1 : e re r = e u e u + [a A ˆb u ] [A(X X) 1 A ] 1 [a A ˆb u ] > 1 e u e u e u e u Now recalling that s 2 = e u e u /(N K) is the unbiased estimate of the variance we can rewrite the ratio as 1 + (a A ˆβ OLS ) [A(X X) 1 A ] 1 (a A ˆβ OLS ) (n K)s 2 So form the test statistic F = [a A ˆb u ] [A(X X) 1 A ] 1 [a A ˆb u ]/q e u e u /(N K) Using a little tedious manipulation, you can show that if a = A ˆb u, that is, if the null hypothesis is true, then the numerator is distributed χ 2 (q) The denominator is distributed as σ 2 χ 2 (n k) and you can show that it is independent of the numerator Thus, under the null hypothesis, the test statistic has an F -distribution with (q, n k) degrees of freedom The null hypothesis should be rejected if F F 1 α,q,n k For the missing details see, eg, Theil [6], Section 18, pages 42 45, and the discussion on pages The t-test vs the F -test We have now seen to ways to test the null hypothesis H 0 : β j = 0 against the alternative H 1 : β j 0 One way is to use the t-test from Section 249 based the test statistic ˆβ jols t = s (X X) 1 jj The other is to treat it as linear restriction e j β = 0, and use the F -statistic, which for this case (A = e j, a = 0, q = 1) becomes F = ˆβ jols [(X X) 1 jj ] 1 ˆβ jols s 2 Note that F = t 2, so either test (with the same significance level α) leads to the same decision 256 Two t-tests vs a single F -test While the t-test and the F -test for single ˆβ jols agree, there can be disagreement between a joint test and two separate tests For instance, consider the joint null hypothesis H 0 : ( ˆβ 1, ˆβ 2 ) = (0, 0) This will be rejected at the α = 005% level if the vector ( ˆβ 1, ˆβ 2 ) lies outside the ellipse shown in Figure 253 The point A is such a point, but B is not Now consider the two hypotheses H 0 : ˆβ 1 = 0 and H 0 : ˆβ 2 = 0 The first will be rejected if( ˆβ 1, ˆβ 2 ) lies outside the vertical lines, and the second will be rejected if ( ˆβ 1, ˆβ 2 ) lies outside the horizontal lines Thus A fails the joint test, but passes both separate tests; and B passes the joint test, fails the each of the separate v ::1445 KC Border

7 KC Border Standard Linear Model: Testing B 1 A Figure 253 A joint test vs two separate tests KC Border v ::1445

8 KC Border Standard Linear Model: Testing 25 8 tests This need not always happen, but it might You as a researcher have to decide what is the appropriate test (By the way the figure [ is drawn ] assuming that ( ˆβ 1, ˆβ 2 ) is bivariate normal with mean zero and covariance matrix The correlation 0f 08 is important in making the ellipse the right size and shape for the example The fact that I use the multivariate normal instead of a multivariate t is secondary) 257 Coefficient of Multiple Correlation Given the standard linear model we have y = Xβ + ε = X ˆβ OLS + e = ŷ + e y y = ˆβ OLSX X ˆβ OLS + e e + 2 ˆβ OLS }{{} X e = ŷ ŷ + e e (8) = 0 as X e = 0 The coefficient of multiple correlation R is a measure of the fraction of y y explained by the regressors Specifically, 1 R 2 = e e y y, or R2 = ˆβ OLSX X ˆβ OLS y y = ŷ ŷ y y One of the consequences of (8) is that y y e e, so it is always the case that 0 R 2 1 The geometric interpretation of R is that it is the cosine of the angle between y and ŷ (Since e is orthogonal to ŷ = X ˆβ OLS the points 0, ŷ, and y form a right triangle in R N with 0y as the hypotenuse Thus R is the ratio ŷ / y, which is the cosine of the angle between them An R near one corresponds to an angle near zero The quantity R 2 is usually referred to simply as R squared Incidentally, almost no one talks about the value R for a regression Instead they talk about R 2, and most software reports R 2 y ˆβ 1 x 1 e 0 x 1 ϕ x 2 ˆβ2 x 2 ŷ v ::1445 KC Border

9 KC Border Standard Linear Model: Testing 25 9 Note that increasing the number of right-hand side variates can only decrease the sum of squared residuals, so it is desirable to penalize the measure of fit One way to do this is with the adjusted R 2 The adjusted R 2 is defined by: or (1 R 2 ) = 1 N K e e 1 N 1 y y = N 1 N K (1 R2 ) R 2 = 1 K N K + N 1 N K R2 It is possible for the adjusted R 2 to be negative You might ask why do we divide e e by y y instead of (y ȳ) (y ȳ)? Theil [6, p 164n] claims that it is for notational simplicity But if the regression model includes a constant term (one of the columns of X is a vector of ones), then the mean residual is zero, so e e/(n K) is the unbiased estimate of σ 2 Moreover the regression plane passes through the sample mean; ȳ = x ˆβ OLS, where x is the vector of sample means So if we drop the constant term and express all variates as deviations from their sample mean, we still get the same estimate of β But the adjusted coefficient of multiple correlation is now a ratio of unbiased estimated variances [6, 44, p ], and is sometimes called the coefficient of determination [4, p 579], and denoted r 2 Thus it make more sense to say that the coefficient of determination is the fraction of the sample variance of the ys that is explained by the Xs What value is a good R 2 A natural question is, Is this a good R 2? Unfortunately there is no simple answer to that question Sometimes you might hope to get a low value for the R 2 in a regression For instance, in HW 6, you were given Problem 749 in Larsen Marx [4, pp ], which concerned the age at which scientists did their best work You were asked, Plot the age versus date for these twelve discoveries (Put the date on the abscissa) Does the variability of the y i s appear to be random with respect to time? In HW 8 you are asked to regress the age on the date In each instance, you are hoping that there is no significant time trend, so that the data from different centuries are comparable In this case, you would hope for an insignificant coefficient and a low R 2 Other times you may hope to get a high R 2 For instance, if you are testing Hubble s Law using the data from [4, p 543], you might be disappointed if your R 2 is less than 095 On the other hand if you are regressing economic growth rates on literacy rates an R 2 of 030 could be a significant find 258 Prediction intervals in the linear model We often want to predict the value of Y for values of X 1,, X K that we have not observed (Recall the Road & Track example) Let x = (x 1,, x K ) be a vector of values for the independent variables Our model predicts so we form the prediction What is the confidence interval for y? Now y = x β + ε ŷ = x ˆβ OLS ŷ y = x ˆβ OLS x β ε = x ( ˆβ OLS β) ε KC Border v ::1445

10 KC Border Standard Linear Model: Testing Therefore Therefore Var(ŷ y ) = Var (x ( ˆβ ) OLS β) ε ( = Var x ( ˆβ ) OLS β) + Var(ε ) = σ 2 (x (X X) 1 x + 1) x ˆβ OLS y N(0, 1) (9) σ2 (x (X X) 1 x + 1) Recall that s 2 = e e/(n K), and that (N K)s 2 σ 2 χ 2 (N K), and note that this is independent of the ratio (9), because s 2 and ˆβ OLS are independent and ε is independent of ε from the sample Therefore the ratio x ˆβ OLS y σ2 (x (X X) 1 x +1) (N K)s 2 σ 2 = x ˆβ OLS y s t(n K) x (X X) 1 x + 1 Thus a (1 α) confidence interval of y is [ ŷ t α/2,n K s x (X X) 1 x + 1, ŷ + t α/2,n K s ] x (X X) 1 x + 1 See Johnston [2, pp ] 259 Measurement error When the variables X 1,, X K are measured with error, then the previous results need to be modified It is no longer true that ˆβ OLS is consistent and unbiased Suppose that the observed data X is actually X = X + V, where V is a matrix of measurement errors We estimate the model when in fact the correct model is or The OLS estimate derived from (10) is The expectation is y = Xβ + u, (10) y = Xβ + ε, y = (X V )β + ε = Xβ + (ε V β) ˆβ = β + (X X) 1 X (ε V β) E ˆβ = β + E(X X) 1 X V β One way to correct for this problem is to find a vector of random variables Z, called instrumental variables, which is uncorrelated with the errors V and with ε, then we can use these to estimate X and then use the estimated Xs to estimate β See, eg, Johnston [2, pp ] v ::1445 KC Border

11 KC Border Standard Linear Model: Testing Larsen Marx [4]: Chapter ANOVA is Multiple Regression in disguise According to Wikipedia, the law of the instrument was formulated by Abraham Kaplan [3, p 28]: Give a small boy a hammer, and he will find that everything he encounters needs pounding This was reformulated by Abraham Maslow [5, p 15]: it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail Perhaps that is why I think of ANalysis Of VAriance, or ANOVA for short, as a special case of multiple linear regression (which it is) However, the language of those who use ANOVA tends to be different from that of those who use linear regression, and part of what we will do today is learn that language 2511 Some Terminology According to Larsen Marx [4, pp ], The word factor is used to denote any treatment or therapy applied to the subjects being measured or any relevant feature (age, sex, ethnicity, etc) characteristic of those subjects Different versions, extents, or aspects, of a factor are referred to as levels Sometimes subjects or environments share certain characteristics that affect the way levels of a factor respond, yet those characteristics are of no intrinsic interest to the experimenter Any such set of conditions or subjects is called a block I would probably call a factor an explanatory variable, and the level its value Blocks would correspond to additional variables The thing actually being measured is called the response, or what I would call the dependent, or left-hand side, variable Analysis of variance is designed to address data from a completely randomized single factor design with two or more levels 2512 The model In terms of model equations (which is the most transparent way to look at models), we have Y ij = µ j + ε ij, (i = 1,, n j ; j = 1,, k) (11) where there are k different factor levels, we have n j observations of the response Y at level j, and Y ij is the i th measurement of the response at level j, j = 1,, k Let n = n n k be the total number of observations The random variables ε ij are assumed to be independent, have common mean zero and common variance σ 2 Thus µ j is just the expected value of the response at level j The object of the analysis is estimate the vector µ = (µ 1,, µ k ) and test hypotheses regarding it The most common hypothesis is H 0 : µ 1 = = µ k, against the alternative H 1 : not all the µ j s are equal KC Border v ::1445

12 KC Border Standard Linear Model: Testing To see why this framework is subsumed by regression, stack the vectors of observations y and errors ε ij, let X j be a dummy variable or indicator for the j th level, and create an n k matrix X and rewrite the system (11) as y 11 y n1 1 y 12 y n2 2 y 1k y nk k = µ 1 µ 2 µ k + ε 11 ε n1 1 ε 12 ε n2 2 ε 1k ε nk k This is a regression model with no constant term If were to add a constant term it would be the sum of the of the other columns so X X would be singular As it is, X X is the diagonal matrix with n j on the diagonal (and therefore nonsingular) and it is easy to see that the OLS estimated coefficients are the sample means for each level: X X = = n n n k v ::1445 KC Border

13 KC Border Standard Linear Model: Testing and so X y = = n1 i=1 y i1 n2 i=1 y i2 nk i=1 y ik (X X) 1 X y = n1 n2 i=1 y i1 n 1 i=1 y i2 n 2 nk i=1 y ik n k y 11 y n11 y 12 y n22 y 1k y nk k In Lecture 25 we saw how to use an F -statistic to test linear restrictions on the coefficient vector (ˆµ 1,, ˆµ k ) So we could stop here One reason for not stopping here is that the F -statistic is more transparent and intuitive in the traditional analysis The other reason is that ANOVA analysis is usually presented in particular format, so you should learn to understand that Historically, the traditional analysis may have developed because it is less computationally intensive than OLS (The general matrix inversion to obtain (X X) 1 is replaced by k division operations) 2513 ANOVA terminology Time for some more terminology: y ij is the response of the i th observation at level j n j T j = y ij is the response total at level j i=1 Ȳ j = T j n j is the sample mean at level j T = n k j y ij = i=1 n T j is the sample overall total response KC Border v ::1445

14 KC Border Standard Linear Model: Testing Ȳ = 1 n n k j y ij = 1 n i=1 k T j is the sample overall average response The treatment sum of squares SSTR is defined to be µ = k SSTR = k n j (Ȳ j Ȳ ) 2 n j n µ j is the overall average of the (unobserved) µ j s Now the MLE of µ j = Ȳ j, so the MLE of µ is Ȳ It is not hard to show that (see Larsen Marx [4, Theorem 1221, p ]) E(SSTR) = (k 1)σ 2 + k n j (µ j µ) 2 (12) Now σ 2 is not generally known, so we must estimate it Start by defining and aggregating SSE = s 2 j = nj k (n j 1)s 2 j = i=1 (Y ij Ȳ j) 2, n j 1 n k j (y ij ȳ j ) 2, (13) i=1 which is called the error sum of squares The important fact about these is: Theorem (Larsen Marx [4, Theorem 1223, p 600]) SSE σ 2 χ 2 (n k) and SSE and SSTR are stochastically independent There is one last sum of squares of interest If we ignore the levels (treatments) the variance about µ can be estimated by the total sum of squares SSTOT: SSTOT = k (n j 1)s 2 j = n k j (y ij ȳ ) 2 i=1 Tedious arithmetic (Larsen Marx [4, Theorem 1224, pp ]) shows that SSTOT = SSTR + SSE 2514 Testing equality of means Note that if the null hypothesis is true, then H 0 : µ 1 = = µ k E SSTR = (k 1)σ 2 (by (12)) and E SSE = (n k)σ 2 (by (13)) v ::1445 KC Border

15 KC Border Standard Linear Model: Testing Therefore we should expect F = SSTR/(k 1) SSE/(n k) to be close to one Otherwise it will be significantly greater than one So the appropriate test is a one-sided F -test, Reject the null hypothesis H 0 if F F 1 α,k 1,n k 2515 ANOVA tables The traditional way to present ANOVA data is in the form of a table like this: Source df SS MS F P Treatment k-1 SSTR Error n-k SSE Total n-1 SSTOT SSTR k 1 SSE n k Two more terms: the mean square for treatments is the mean square for errors is 2516 Contrasts A linear combination of the form MSTR = SSTR k 1 MSE = SSE n k C = w µ, SSTR/(k 1) SSE/(n k) F F k 1,n k where 1 w = 0 is called a contrast A typical contrast uses a vector of the form so w = (0,, 0, 1, 0,, 0, 1, 0, 0), j j C = w µ = µ j µ j Then the hypothesis H 0 : C = 0 amounts to H 0 : µ j = µ j This is probably why it is called a contrast To test a hypothesis that C = 0, we weight the sample means Then Define Ĉ = k w j ȳ j E Ĉ = C Var Ĉ = σ2 k SS C = Ĉ 2 k KC Border v ::1445 wj 2 n j w 2 j n j

16 KC Border Standard Linear Model: Testing Theorem Larsen Marx [4, Theorem 1241, p614] The test statistic F = SS C SSE/(n k) has F -distribution with (1, n k) degrees of freedom The null hypothesis should be rejected if F F 1 α,1,n k H 0 : w µ = 0 Bibliography [1] K C Border Maximization with more than one variable [2] J Johnston 1972 Econometric methods, 2d ed New York: McGraw Hill [3] A Kaplan 1964 The conduct of inquiry: Methodology for behavioral science San Francisco: Chandler Publishing Co [4] R J Larsen and M L Marx 2012 An introduction to mathematical statistics and its applications, fifth ed Boston: Prentice Hall [5] A H Maslow 1966 The psychology of science: A reconnaissance NY: Harper & Row [6] H Theil 1971 Principles of econometrics New York: Wiley v ::1445 KC Border

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Lecture 4: Testing Stuff

Lecture 4: Testing Stuff Lecture 4: esting Stuff. esting Hypotheses usually has three steps a. First specify a Null Hypothesis, usually denoted, which describes a model of H 0 interest. Usually, we express H 0 as a restricted

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution Relevant

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Topic 3: Inference and Prediction

Topic 3: Inference and Prediction Topic 3: Inference and Prediction We ll be concerned here with testing more general hypotheses than those seen to date. Also concerned with constructing interval predictions from our regression model.

More information

Analisi Statistica per le Imprese

Analisi Statistica per le Imprese , Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33 You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

STA 2101/442 Assignment Four 1

STA 2101/442 Assignment Four 1 STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

The Linear Regression Model

The Linear Regression Model The Linear Regression Model Carlo Favero Favero () The Linear Regression Model 1 / 67 OLS To illustrate how estimation can be performed to derive conditional expectations, consider the following general

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Topic 3: Inference and Prediction

Topic 3: Inference and Prediction Topic 3: Inference and Prediction We ll be concerned here with testing more general hypotheses than those seen to date. Also concerned with constructing interval predictions from our regression model.

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A ) Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Brief Suggested Solutions

Brief Suggested Solutions DEPARTMENT OF ECONOMICS UNIVERSITY OF VICTORIA ECONOMICS 366: ECONOMETRICS II SPRING TERM 5: ASSIGNMENT TWO Brief Suggested Solutions Question One: Consider the classical T-observation, K-regressor linear

More information

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

A Introduction to Matrix Algebra and the Multivariate Normal Distribution A Introduction to Matrix Algebra and the Multivariate Normal Distribution PRE 905: Multivariate Analysis Spring 2014 Lecture 6 PRE 905: Lecture 7 Matrix Algebra and the MVN Distribution Today s Class An

More information

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

FIRST MIDTERM EXAM ECON 7801 SPRING 2001 FIRST MIDTERM EXAM ECON 780 SPRING 200 ECONOMICS DEPARTMENT, UNIVERSITY OF UTAH Problem 2 points Let y be a n-vector (It may be a vector of observations of a random variable y, but it does not matter how

More information

16.3 One-Way ANOVA: The Procedure

16.3 One-Way ANOVA: The Procedure 16.3 One-Way ANOVA: The Procedure Tom Lewis Fall Term 2009 Tom Lewis () 16.3 One-Way ANOVA: The Procedure Fall Term 2009 1 / 10 Outline 1 The background 2 Computing formulas 3 The ANOVA Identity 4 Tom

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

STA 114: Statistics. Notes 21. Linear Regression

STA 114: Statistics. Notes 21. Linear Regression STA 114: Statistics Notes 1. Linear Regression Introduction A most widely used statistical analysis is regression, where one tries to explain a response variable Y by an explanatory variable X based on

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Greene, Econometric Analysis (7th ed, 2012)

Greene, Econometric Analysis (7th ed, 2012) EC771: Econometrics, Spring 2012 Greene, Econometric Analysis (7th ed, 2012) Chapters 2 3: Classical Linear Regression The classical linear regression model is the single most useful tool in econometrics.

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Basic Probability Reference Sheet

Basic Probability Reference Sheet February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest.

Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest. Experimental Design: Much of the material we will be covering for a while has to do with designing an experimental study that concerns some phenomenon of interest We wish to use our subjects in the best

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

REGRESSION ANALYSIS AND INDICATOR VARIABLES

REGRESSION ANALYSIS AND INDICATOR VARIABLES REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij = 20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information