2.7 Estimation with linear Restriction
|
|
- Mervyn Powers
- 5 years ago
- Views:
Transcription
1 Proof (Method 1: show that that a C(W T ), which implies that the GLSE is an estimable function under the old model is also an estimable function under the new model; secnd show that E[a T ˆβ G ] = a T β and a T ˆβ G is a linear function of Z.) First notice that a T ˆβ G is a linear function of Y. Rewrite the model to Z = W β + ɛ, where Z = Σ 1/2 Y, W = Σ 1/2 X, and E[ɛ ] = 0, and V ar[ɛ ] = I n. It can be shown that a T β is also estimable under the new model If this is true, a T ˆβ G is the BLUE for a T β. a T β is estimable a C(X T ). Claim: C(X T ) = C(X T Σ 1/2 ) = C(W T ). Based on the claim, a C(W ), which implies that a T β is estimable for β under Z = W β + ɛ. Because ˆβ G is the OLSE of β under Z = W β + ɛ, a T ˆβ G is the BLUE of a T β under Z = W β + ɛ, which is equivalent to the original model. Proof of the claim. For any y C(X T Σ 1/2 ), c R n s.t. y = X T Σ 1/2 = X T (Σ 1/2 ), which implies that y C(X T ). This proves that C(X T Σ 1/2 ) C(X T ). For any y C(X T ), c R n s.t. y = X T c = X T Σ 1/2 Σ 1/2 c = (X T Σ 1/2 )(Σ 1/2 c), which implies that C(X T ) C(X T Σ 1/2 ). Therefore, C(X T Σ 1/2 ) = C(X T ). Proof 1. Since Σ is positive, its square root exists and is nonsingular. Let Z = Σ 1/2 Y and W = Σ 1/2 X. Then Z = Σ 1/2 Y = Σ 1/2 Xβ + Σ 1/2 ɛ and in class we showed that ˆβ G = (X T Σ 1 X) X T Σ 1 Y = (W T W ) W T Z. Since a T β is estimable, there exists a c such that a = X T c = X T Σ 1/2 Σ 1/2 c = W T Σ 1/2 c (W T ), which implies that a T β is also estimable under the new model. 2. a T ˆβG = a T (W T W ) W T Z = c T W (W T W ) W T Z = c T P W W β Since P W is invariant to (W T W ), a T ˆβ G is also invariant to choice of generalized inverse. E[a T ˆβG ] = E[a T (W T W ) W T Z] = a T (W T W ) W T W β = a T β since if a T β is estimable a T (W T W ) W T W = a T (Thm 4.9 (2)). 3. It is clear that any linear estimator can be written as d T, because we can always write it to (d T Σ)Y. Clearly ˆβ G is the OLSE for Z = W β + ɛ with ɛ (0, I). Thus a T ˆβ G is the BLUE. 2.7 Estimation with linear Restriction Design Matrix of Full rank Let Y = Xβ + ɛ, where X is full rank. Suppose we want minimize (Y Xβ) T (Y Xβ) 11
2 subject to the linear restriction Aβ = c, where A is a known q p matrix of rank q and c is a known q 1 vector. We will use the Lagrange multipliers to find the least square estimate subject to the linear restriction. Take derivative with respect to β, we have f(β, λ) = (Y Xβ) T (Y Xβ) + λ T (Aβ c) 2X T Y + 2X T Xβ + A T λ = 0 Let ˆβ H denote the estimate subject to the linear restriction, The linear restriction ˆβ H = (X T X) 1 X T Y 1 2 (XT X) 1 A T ˆλH = ˆβ 1 2 (XT X) 1 A T ˆλH c = A ˆβ H = A ˆβ 1 2 A(XT X) 1 A T ˆλH which gives 1 2 ˆλ H = [A(X T ) 1 A T ] 1 (c A ˆβ) Substitute in the formula for ˆβ H, we have ˆβ H = ˆβ + (X T X) 1 A T [A(X T X) 1 A T ] 1 (c A ˆβ) Note, when X does not have full rank, not all linear restrictions have a solution. Proposition 2.14 The estimate given above does minimize ɛ T ɛ subject to Aβ = c. Proof First, ɛ T ɛ = Y Xβ 2 = Y Ŷ + Ŷ Xβ 2 = Y Ŷ 2 + Ŷ Xβ 2 + 2(Y Ŷ )T (Ŷ Xβ) = Y Ŷ 2 + Ŷ Xβ 2 + 2Y T (I P X ) T X( ˆβ β) = Y Ŷ 2 + X( ˆβ β) 2 Second X( ˆβ β) 2 = ( ˆβ β) T X T X( ˆβ β) = ( ˆβ ˆβ H + ˆβ H β) T X T X( ˆβ ˆβ H + ˆβ H β) = X( ˆβ ˆβ H ) 2 + X( ˆβ H β) 2 The last step is true because 2( ˆβ ˆβ H ) T X T X( ˆβ H β) = ˆλ H A(X T X) 1 X T X( ˆβ H β) = ˆλ T HA( ˆβ H β) = ˆλ T H(c c) = 0 Thus, e T e = Y Ŷ 2 + X( ˆβ ˆβ H ) 2 + X( ˆβ H β) 2 It is minimized when β = ˆβ H. 12
3 Here is another proof Proof Y Xβ 2 = Y X ˆβ + X ˆβ X ˆβ H + X ˆβ H Xβ 2 = Y X ˆβ 2 + X ˆβ X ˆβ H 2 + X ˆβ H Xβ 2 + 2[ ] = Y X ˆβ 2 + X ˆβ X ˆβ H 2 + X ˆβ H Xβ 2 The last step is true because Y X ˆβ = (I P X )Y X ˆβ X ˆβ H = X( ˆβ ˆβ H ) = 1 2 X(XT X) 1 A T ˆλH X ˆβ H Xβ = X( ˆβ H β) Thus, (Y X ˆβ) T (X ˆβ X ˆβ H ) = Y T (I P X )X( ˆβ ˆβ H ) = 0 (Y X ˆβ) T (X ˆβ H Xβ) = Y T (I P X )X( ˆβ H β) = 0 (X ˆβ X ˆβ H ) T (X ˆβ H Xβ) = ( ˆβ ˆβ H ) T X T X( ˆβ H β) = 1 2 ˆλ T HA(X T X) 1 X T X( ˆβ H β) = 1 2 ˆλ T H(A ˆβ H Aβ) = 1 c ˆλ T H(c c) = 0 An interesting observation (Exercise 3.g.4 on page 62): Y ŶH 2 Y Ŷ 2 = σ 2ˆγ T H(V ar[ˆγ H ]) 1ˆγ H Example Consider the linear model Y ij = µ i + ɛ ij ; i = 1, 2, j = 1, 2 where ɛ (0, Iσ 2 ). Consider the linear restriction µ 1 = µ 2, or µ 1 µ 2 = 0. We can rewrite the linear restriction to Aβ = (1 1)β = 0. A = (1 1) ˆβ = (Ȳ1., Ȳ2.) T [ ( ˆβ H = ˆβ 1/ / (1, 1) 0 1/ /2 1 ) = ˆβ 1/2 ( 1/2 ˆβ 1 ˆβ 2 ) = ( Ȳ1. Ȳ2. 2 Ȳ 1. Ȳ2. 2 )] 1 ( ˆβ1 (0 (1, 1) ˆβ 2 ) ) Example Consider the model Y i = µ i + ɛ i, i=1,2,3,4 where ɛ (0, I 4 σ 2 ). We are interested the the restriction µ 1 + µ 2 + µ 3 + µ 4 = 0. 13
4 Clearly X = I 4, ˆθ = (y 1, y 2, y 3, y 4 ) T, A = (1, 1, 1, 1), and c = 0. Thus the restrcited estimate is y 1 ȳ ˆβ H = ˆβ + (I 4 ) 1 A T (4) 1 (0 (y 1, y 2, y 3, y 4 ) T ) = y 2 ȳ y 3 ȳ y 4 ȳ Design Matrix of Less Than Full Rank Take a subset of X, X 1 and a subset of Z, Z Adding/Deleting Covariates/Cases The technique introduced here is closely related to the sweep algorithm for linear model fitting. See Chapter 11 for more about the sweep algorithm Adding Covariates Suppose we have fit the model Y = Xβ + ɛ and obtained the OLSE ˆβ. Now we have a new set of variables Z and we want to add it to the model. In other words, we want to obtain the OLSE of β and γ under the following model: β Y = Xβ + Zγ + ɛ = (X Z) + ɛ = W δ + ɛ γ Here we assume that (1) the columns of X and the columns of Z are linearly independent; (2) both X and X are full rank; (3) ɛ (0, σ 2 I). One can refit the model and estimate β and γ using the new model. However, this is not efficient. A more efficient way is to use what we have learned from the old model to obtain OLSE for the new model. Lemma 2.15 Let R = I P X. Suppose the columns of X and the columns of Z are linearly independent, both X and Z are full rank (rank(x) = p, rank(z) = t), and the their columns are linearly independent, then (1) Z T RZ is p.d.; (2) rank(z T RZ) = rank(z) = t. Before proving the lemma, let s review a property regarding p.d. matrices. We learned that If A n p has rank p, then A T A > 0. Proof: x T A T Ax = y T y 0 with = iff Ax = 0 iff x = 0. The last iff is because rank(a) = p thus the columns of A are linearly independent. Here is the proof of the lemma: Proof Suppose the columns of RZ are not linearly independent, then a 0 s.t. RZa = 0, i.e., Za = P x a = Xb. Because we the columns of X and Z, we have a = b = 0. The contradiction implies the columns of RZ are linearly independent and rank(rz) = t. In addition, Z T RZ = (RZ) T RZ. Therefore, (RZ) T RZ = Z T RZ is p.d. and rank(z T RZ) = rank(rz) = t. Theorem 2.16 Let R G = I P W, L = (X T X) 1 X T Z, M = (Z T RZ) 1, and ˆδ G is the OLSE of δ. Then 1. ˆγ G = (Z T RZ) 1 Z T RY = [(RZ) T (RZ)] 1 (RZ) T RY = MZ T RY 14
5 2. ˆβG = (X T X) 1 X T (Y Zˆγ G ) = ˆβ Lˆγ G 3. RSS new = (Y ŶG) T (Y ŶG) = (Y Zˆγ G ) T R(Y Zˆγ G ) = Y T RY ˆγ G T ZT RY (X 4. var(ˆδ G ) = σ 2 T X) 1 + LML T LM ML T M Proof 1. First orthogonalize Z by writing it to Z = P X Z + R Z. Thus the new model is Y = Xβ + P X Z + RZγ + ɛ = Xα + RZγ + ɛ = (X RZ) ( α γ ) + ɛ = V λ + ɛ where α = β + (X T X) 1 X T Zγ = β + Lγ. Note that C(X) C(RZ) (this is true because for a and b we have a T X T RZb = a T (X T X T )Zb = 0), thus the columns of X and RZ are linearly independent and V is full rank. The OLSE of λ is ˆλ = (V T V ) 1 V T X Y = T X X T 1 RZ X T Z T RX Z T RZ Z T Y R X = T 1 X 0 X T 0 Z T RZ Z T Y R ) (X = T X) 1 X T Y (ˆαG (Z T RZ) 1 Z T = RY ˆγ G Thus ˆγ G = (Z T RZ) 1 Z T RY. Because R is a projection matrix, we can rewrite ˆγ G to ˆγ G = (Z T RZ) 1 Z T RY = [(RZ) T (RZ)] 1 (RZ) T RY Note that if Z is the only set of covariates in the model, we would solve Z T Zγ = Z T Y. With X already in the model, we replace Z with RZ and replace Y with RY. 2. Since ˆα G = ˆβ G + Lˆγ G we have ˆβ G = (X T X) 1 X T Y Lˆγ G This correlates the OLSE under the old model to that under the new model. 3. Here we want to show the relationship between the RSS of the old model to that of the new model. Note that Thus Y Ŷnew = Y Ŷnew = Y X ˆβ + XLˆγ G Zˆγ G = (I P X )Y + (X(X T X) 1 X T I)Zˆγ G = RY RZˆγ G RSS new = (RY RZˆγ G ) T (RY RZˆγ G ) = (Y Zˆγ G ) T R(Y Zˆγ G ) Also note that RZˆγ G = RZ(Z T RZ) 1 Z T RY = P RZ Y. Hence, RSS new = Y T RY 2Y T (RZγ G ) + (γ G ) T Z T RZγ G = Y T RY 2Y T (RZγ G ) + (RZγ G ) T (RZγ G ) = Y T RY 2Y T P RZ Y + Y T P RZ Y = RSS old Y T P RZ Y This result indicates that adding new variables will decrease RSS. 15
6 4. Note that ˆγ G = (Z T RZ) 1 Z T RY = MZ T RY. Thus V ar(ˆγ G ) = V ar(mz T RY ) = MZ T RZMσ 2 = σ 2 M Cov( ˆβ, ˆγ G ) = cov((x T X) 1 X T Y, MZ T RY ) = (X T X) 1 X T RZM 1 Lσ 2 = 0(X T R = 0) Cov( ˆβ G, ˆγ G ) = Cov( ˆβ Lˆγ G, ˆγ G ) = cov( ˆβ, ˆγ G ) LV ar(ˆγ G ) = σ 2 LM V ar( ˆβ G ) = V ar( ˆβ) 2cov[ ˆβ, Lˆγ G ] + V ar(lˆγ G ) = (X T X) 1 σ LV ar(ˆγ G )L T = σ 2 [(X T X) 1 + LML T ] These results imply that if we add one single covariate into the model, then ˆβold lz ˆβ new = T RY m z T RY m where l = (X T X) 1 X T z, m = (z T Rz) 1. This is a homework assignment We can also use A.9.1 to prove the results. A.9.1 states that if all inverses exist, then 1 ( A11 A 12 A 11 A = 12 A 1 A 21 A 22 A 21 A 22 = 11 + B 12B22 1 B 21 B 12 B22 1 ) B22 1 B 21 B22 1 where B 22 = A 22 A 21 A 1 11 A 12, B 12 = A 1 11 A 12, B 21 = A 21 A Now let A 11 = (X T X), A 12 = X T Z, A 21 = Z T X, A 22 = Z T Z and L = (X T X) 1 X T Z, M = (Z T RZ) 1, we can show that A 11 = (X T X) 1 + LML T, A 12 = LM, A 21 = ML T, A 22 = M Because We have ˆβ G = ˆβ Lˆγ G. ˆβG A 11 A = 12 X T Y ˆγ G A 21 A 22 Z T Y Example of Seber (page 57). Let the columns of X be denoted by x (j), j = 0, 1,, p 1, so that E[Y ] = x (0) β 0 + x (1) β 1 + x (2) β x (p 1) β p 1 Suppose now that we want to introduce the explanatory variable x (p) into the model. Thus Z T RZ = x T (p) Rx (p) is a scalar. Hence, ˆβ p,g = ˆγ G = (Z T RZ) 1 Z T RY = xt (p) RY ( ˆβ 0,G,, ˆβ p 1,G ) T = ˆβ (X T X) 1 X T x (p) ˆβp,G Y T R G Y = Y T RY ˆβ p,g x T (p) RY x T (p) Rx (p) 16
7 2.8.2 Removing Covariates This is a homework problem Assume that X 1 and X 2 are full rank and the columns between the two matrices are linearly independent. Suppose we have obtained the OLSE of β 1 and β 2, denoted by ˆβ 1 and ˆβ 2 respectively, in the following full model: Y = X 1 β 1 + X 2 β 2 + ɛ Now consider removing X 2 from the model and fit the following new model: Y = X 1 β 1 + ɛ and denote the corresponding OLSE as β 1. Then β 1 = ˆβ 1 + (X T 1 X 1 ) 1 X T 1 X 2 ˆβ2 Question: When does adding/removing covariates will not effect the OLSE for the covariates in the model? Think about perpendicular covariates and Analysis of variance (balanced). This happens when the columns of X and the columns of Z are orthogonal Adding Subjects Suppose we have fit a linear model using n subjects with the design matrix X. Now data from k new subjects are available and we denote the design matrix for them by X 1 and the responses by Y 1. How can we update our model efficiently? Let ˆβ old denote the OLSE based on the first n subjects, i.e., ˆβ old = (X T X) 1 X T Y and let ˆβ new denote the OLSE using all available data. Then we have ˆβ new = (X T X + X T 1 X 1 ) 1 (X T Y + X T 1 Y 1 ). Recall the Sherman-Morrison-Woodbury formula (A.9.3), (A + UBV ) 1 = A 1 A 1 UB(B + BV A 1 UB) 1 BV A 1 Here A = X T X, U = X T 1, B = I k, and V = X 1. We have Thus, (X T X + X T 1 X 1 ) 1 = (X T X) 1 (X T X) 1 X T 1 (I + X 1 (X T X) 1 X T 1 ) 1 X 1 (X T X) 1 ˆβ new = [(X T X) 1 (X T X) 1 X T 1 (I + X 1 (X T X) 1 X T 1 ) 1 X 1 (X T X) 1 ](X T Y + X T 1 Y 1 ) = ˆβ old + (X T X) 1 X T 1 (I + X 1 (X T X) 1 X T 1 ) 1 [ X 1 (X T X) 1 X T Y X 1 (X T X) 1 X T 1 Y 1 + (I + X 1 (X T X) 1 X T 1 )Y 1 ] = ˆβ old + (X T X) 1 X T 1 (I + X 1 (X T X) 1 X T 1 ) 1 [Y 1 X 1 ˆβold ] When adding one subject We add a single row to X and let x T 1 corresponding response. We have denote the row. Let y 1 denote the ˆβ new = ˆβ old + (X T X) 1 x x T 1 (XT X) 1 x 1 [y 1 x T 1 ˆβ old ] Note, this can also be proved using A.9.4 b let C = (X T X) 1 x T 1, D = (1+x 1 (X T X) 1 x T 1 ) 1 = (1+x 1 C) 1. 17
8 By A.9.4, (X T X + x T 1 x 1 ) 1 = (X T X) 1 CDC T. Thus, ˆβ new = (X T X + x T 1 x 1 ) 1 (X T Y + x T 1 y 1 ) = ˆβ old CDC T (X T Y + x T 1 y 1 ) + cy 1 = ˆβ old + CD[D 1 y 1 C T X T Y C T x T 1 y 1 ] = ˆβ old + CD[y 1 C T X T Y ] = ˆβ old + CD[y 1 x T 1 ˆβ old ] = ˆβ old + (X T X) 1 x x T 1 (XT X) 1 x 1 [y 1 x T 1 ˆβ old ] Removing subjects We single out the situation when we want to remove the ith subject. Use A.9.4, i.e., (A uv T ) 1 = A 1 + A 1 uv T A 1 1 v T A 1 u Let X (i) denote the design matrix with the ith observation remved. Note that X T X = x i x T i. we have (X T (i) X (i)) 1 = (X T X x i x T i ) 1 = (X T X) 1 + (XT X) 1 x i x T i (XT X) 1 1 x T i (XT X) 1 x i = (X T X) 1 + (XT X) 1 x i x T i (XT X) 1 1 h ii The last equation is true because we define h ii as the (i, i) element of X(X T X) 1 X T. Hence, ˆβ (i) = [X T (i) X (i)] 1 (X T Y x i y i ) = [(X T X) 1 + (XT X) 1 x i x T i (XT X) 1 1 h ii ](X T Y x i y i ) = ˆβ (XT X) 1 x i 1 h ii [y i (1 h i ) x T i ˆβ + h ii y i ] = ˆβ (XT X) 1 x i e i 1 h ii Note, this result is very useful in diagnostics, as it shows the influence of a single data point on OLSE. Since hii = trace(h) = p, the average of h ii is p/n. If h ii is close to 1, the ith observation has a large influence on parameter estimation. The difference, ˆβ ˆβ (i) (called DF BET A) and its standardized version is often used to test outliers. We will discuss this later. 3 Hypothesis Testing 3.1 Introduction Discuss all three tests: Wald, LRT, and Score? In this section we assume Y = Xβ + ɛ where ɛ N(0, σ 2 I n ), and X is full rank, i.e. Rank(X) = p. 18
3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationEstimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27
Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix
More informationXβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =
The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design
More informationBasic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:
8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58
ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationMEI Exam Review. June 7, 2002
MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)
More informationThe Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61
The Gauss-Markov Model Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 61 Recall that Cov(u, v) = E((u E(u))(v E(v))) = E(uv) E(u)E(v) Var(u) = Cov(u, u) = E(u E(u)) 2 = E(u 2
More informationConstraints on Solutions to the Normal Equations. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41
Constraints on Solutions to the Normal Equations Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 41 If rank( n p X) = r < p, there are infinitely many solutions to the NE X Xb
More informationChapter 5 Matrix Approach to Simple Linear Regression
STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationTopic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form
Topic 7 - Matrix Approach to Simple Linear Regression Review of Matrices Outline Regression model in matrix form - Fall 03 Calculations using matrices Topic 7 Matrix Collection of elements arranged in
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationRandom Vectors, Random Matrices, and Matrix Expected Value
Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 16 Random Vectors,
More informationMatrix Approach to Simple Linear Regression: An Overview
Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix
More informationMATH 221: SOLUTIONS TO SELECTED HOMEWORK PROBLEMS
MATH 221: SOLUTIONS TO SELECTED HOMEWORK PROBLEMS 1. HW 1: Due September 4 1.1.21. Suppose v, w R n and c is a scalar. Prove that Span(v + cw, w) = Span(v, w). We must prove two things: that every element
More informationLinear Algebra Review
Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More information2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28
2. A Review of Some Key Linear Models Results Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics 510 1 / 28 A General Linear Model (GLM) Suppose y = Xβ + ɛ, where y R n is the response
More informationEstimable Functions and Their Least Squares Estimators. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51
Estimable Functions and Their Least Squares Estimators Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 51 Consider the GLM y = n p X β + ε, where E(ε) = 0. p 1 n 1 n 1 Suppose
More informationCOMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)
STAT 52 Completely Randomized Designs COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure) Completely randomized means no restrictions on the randomization
More informationFor GLM y = Xβ + e (1) where X is a N k design matrix and p(e) = N(0, σ 2 I N ), we can estimate the coefficients from the normal equations
1 Generalised Inverse For GLM y = Xβ + e (1) where X is a N k design matrix and p(e) = N(0, σ 2 I N ), we can estimate the coefficients from the normal equations (X T X)β = X T y (2) If rank of X, denoted
More informationLECTURE 6: VECTOR SPACES II (CHAPTER 3 IN THE BOOK)
LECTURE 6: VECTOR SPACES II (CHAPTER 3 IN THE BOOK) In this lecture, F is a fixed field. One can assume F = R or C. 1. More about the spanning set 1.1. Let S = { v 1, v n } be n vectors in V, we have defined
More informationChapter 3 Best Linear Unbiased Estimation
Chapter 3 Best Linear Unbiased Estimation C R Henderson 1984 - Guelph In Chapter 2 we discussed linear unbiased estimation of k β, having determined that it is estimable Let the estimate be a y, and if
More informationBiostatistics 533 Classical Theory of Linear Models Spring 2007 Final Exam. Please choose ONE of the following options.
1 Biostatistics 533 Classical Theory of Linear Models Spring 2007 Final Exam Name: Problems do not have equal value and some problems will take more time than others. Spend your time wisely. You do not
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationVector Spaces, Orthogonality, and Linear Least Squares
Week Vector Spaces, Orthogonality, and Linear Least Squares. Opening Remarks.. Visualizing Planes, Lines, and Solutions Consider the following system of linear equations from the opener for Week 9: χ χ
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationChapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
More information7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.
7.5 Operations with Matrices Copyright Cengage Learning. All rights reserved. What You Should Learn Decide whether two matrices are equal. Add and subtract matrices and multiply matrices by scalars. Multiply
More informationGeneral Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35
General Linear Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 35 Suppose the NTGMM holds so that y = Xβ + ε, where ε N(0, σ 2 I). opyright
More informationProblem Set #6: OLS. Economics 835: Econometrics. Fall 2012
Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationQuick Review on Linear Multiple Regression
Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationE2 212: Matrix Theory (Fall 2010) Solutions to Test - 1
E2 212: Matrix Theory (Fall 2010) s to Test - 1 1. Let X = [x 1, x 2,..., x n ] R m n be a tall matrix. Let S R(X), and let P be an orthogonal projector onto S. (a) If X is full rank, show that P can be
More informationEstimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17
Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7 The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationJim Lambers MAT 610 Summer Session Lecture 1 Notes
Jim Lambers MAT 60 Summer Session 2009-0 Lecture Notes Introduction This course is about numerical linear algebra, which is the study of the approximate solution of fundamental problems from linear algebra
More informationBiostatistics 533 Classical Theory of Linear Models Spring 2007 Final Exam. Please choose ONE of the following options.
1 Biostatistics 533 Classical Theory of Linear Models Spring 2007 Final Exam Name: KEY Problems do not have equal value and some problems will take more time than others. Spend your time wisely. You do
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationc i r i i=1 r 1 = [1, 2] r 2 = [0, 1] r 3 = [3, 4].
Lecture Notes: Rank of a Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Linear Independence Definition 1. Let r 1, r 2,..., r m
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationMath 24 Spring 2012 Questions (mostly) from the Textbook
Math 24 Spring 2012 Questions (mostly) from the Textbook 1. TRUE OR FALSE? (a) The zero vector space has no basis. (F) (b) Every vector space that is generated by a finite set has a basis. (c) Every vector
More informationP = 1 F m(p ) = IP = P I = f(i) = QI = IQ = 1 F m(p ) = Q, so we are done.
Section 1.6: Invertible Matrices One can show (exercise) that the composition of finitely many invertible functions is invertible. As a result, we have the following: Theorem 6.1: Any admissible row operation
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationSTA 302f16 Assignment Five 1
STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove
More informationEconomics 620, Lecture 5: exp
1 Economics 620, Lecture 5: The K-Variable Linear Model II Third assumption (Normality): y; q(x; 2 I N ) 1 ) p(y) = (2 2 ) exp (N=2) 1 2 2(y X)0 (y X) where N is the sample size. The log likelihood function
More information20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =
20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationOrthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6
Chapter 6 Orthogonality 6.1 Orthogonal Vectors and Subspaces Recall that if nonzero vectors x, y R n are linearly independent then the subspace of all vectors αx + βy, α, β R (the space spanned by x and
More informationSTAT Homework 8 - Solutions
STAT-36700 Homework 8 - Solutions Fall 208 November 3, 208 This contains solutions for Homework 4. lease note that we have included several additional comments and approaches to the problems to give you
More informationGeneral Linear Model: Statistical Inference
Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least
More informationRecall the convention that, for us, all vectors are column vectors.
Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationMa 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA
Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationJim Lambers MAT 419/519 Summer Session Lecture 11 Notes
Jim Lambers MAT 49/59 Summer Session 20-2 Lecture Notes These notes correspond to Section 34 in the text Broyden s Method One of the drawbacks of using Newton s Method to solve a system of nonlinear equations
More informationMATH 304 Linear Algebra Lecture 20: Review for Test 1.
MATH 304 Linear Algebra Lecture 20: Review for Test 1. Topics for Test 1 Part I: Elementary linear algebra (Leon 1.1 1.4, 2.1 2.2) Systems of linear equations: elementary operations, Gaussian elimination,
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationSTAT200C: Review of Linear Algebra
Stat200C Instructor: Zhaoxia Yu STAT200C: Review of Linear Algebra 1 Review of Linear Algebra 1.1 Vector Spaces, Rank, Trace, and Linear Equations 1.1.1 Rank and Vector Spaces Definition A vector whose
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationPreliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38
Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationReview 1 Math 321: Linear Algebra Spring 2010
Department of Mathematics and Statistics University of New Mexico Review 1 Math 321: Linear Algebra Spring 2010 This is a review for Midterm 1 that will be on Thursday March 11th, 2010. The main topics
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationChapter 3: Theory Review: Solutions Math 308 F Spring 2015
Chapter : Theory Review: Solutions Math 08 F Spring 05. What two properties must a function T : R m R n satisfy to be a linear transformation? (a) For all vectors u and v in R m, T (u + v) T (u) + T (v)
More informationRow Space, Column Space, and Nullspace
Row Space, Column Space, and Nullspace MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Every matrix has associated with it three vector spaces: row space
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationThe norms can also be characterized in terms of Riccati inequalities.
9 Analysis of stability and H norms Consider the causal, linear, time-invariant system ẋ(t = Ax(t + Bu(t y(t = Cx(t Denote the transfer function G(s := C (si A 1 B. Theorem 85 The following statements
More informationSTAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017
STAT 151A: Lab 1 Billy Fang 2 September 2017 1 Logistics Billy Fang (blfang@berkeley.edu) Office hours: Monday 9am-11am, Wednesday 10am-12pm, Evans 428 (room changes will be written on the chalkboard)
More informationAn Introduction to Linear Matrix Inequalities. Raktim Bhattacharya Aerospace Engineering, Texas A&M University
An Introduction to Linear Matrix Inequalities Raktim Bhattacharya Aerospace Engineering, Texas A&M University Linear Matrix Inequalities What are they? Inequalities involving matrix variables Matrix variables
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More information1 Last time: least-squares problems
MATH Linear algebra (Fall 07) Lecture Last time: least-squares problems Definition. If A is an m n matrix and b R m, then a least-squares solution to the linear system Ax = b is a vector x R n such that
More informationEconomics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N
1 Economics 620, Lecture 4: The K-Variable Linear Model I Consider the system y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N or in matrix form y = X + " where y is N 1, X is N
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationSolution to Homework 1
Solution to Homework Sec 2 (a) Yes It is condition (VS 3) (b) No If x, y are both zero vectors Then by condition (VS 3) x = x + y = y (c) No Let e be the zero vector We have e = 2e (d) No It will be false
More informationLecture 6: Linear models and Gauss-Markov theorem
Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables
More informationANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32
ANOVA Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 32 We now consider the ANOVA approach to variance component estimation. The ANOVA approach
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationA Note on UMPI F Tests
A Note on UMPI F Tests Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico May 22, 2015 Abstract We examine the transformations necessary for establishing
More informationChapter 6 - Orthogonality
Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1 Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/
More informationChapter 5 Prediction of Random Variables
Chapter 5 Prediction of Random Variables C R Henderson 1984 - Guelph We have discussed estimation of β, regarded as fixed Now we shall consider a rather different problem, prediction of random variables,
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationWhen is the OLSE the BLUE? Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 40
When is the OLSE the BLUE? Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 40 When is the Ordinary Least Squares Estimator (OLSE) the Best Linear Unbiased Estimator (BLUE)? Copyright
More information