i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n

Size: px
Start display at page:

Download "i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n"

Transcription

1 Math 583 Exam 3 is on Wednesday, Dec 3 You are allowed 10 sheets of notes and a calculator CHECK FORMULAS! 85 If the MLR (multiple linear regression model contains a constant, then SSTO = SSE + SSR where the total sum of squares (corrected for the mean SSTO = n i=1 (Y i Y 2, the regression sum of squares SSR = n i=1 (Ŷ i Y 2 and error (or residual sum of squares SSE = n i=1 (Y i Ŷ i 2 = n i=1 e2 i 86 If the MLR model contains a constant, then the coefficient of multiple determination R 2 = [corr(y i, Ŷi] 2 = SSR SSTO = 1 SSE SSTO 87 MLR output for the Anova F test H 0 : β 1 = = β p 1 = 0 has the form shown below where under Source, Model often replaces Regression and Error often replaces Residual Here MS = SS/df Source df SS MS F p-value Regression p-1 SSR MSR Fo=MSR/MSE for Ho: Residual n-p SSE MSE β 1 = = β p 1 = 0 88 If the MLR model has a constant β 0, then F o = MSR MSE = R2 n p 1 R 2 p 1 89 R 2 does not decrease and usually increases as predictors are added to the linear model Want n 10p 90 If ˆθ AN p (θ,σ n, then a T ˆθ ANp (a T θ, a T Σ n a, and a large sample 100(1 δ% confidence interval (CI for a T θ is a T ˆθ ± tdn,1 δse(a T ˆθ = a T ˆθ ± tdn,1 δ a T ˆΣ n a where d n as n 91 For the full rank OLS model, ˆβ AN p (β, MSE(X T X 1 So a T ˆβ AN 1 (a T β, MSE a T (X T X 1 a, and a large sample 100(1 δ% confidence interval (CI for a T β is a T ˆβ ± tn p,1 δ MSE a T (X T X 1 a where P(T t n p,1 δ = 1 δ if T t n p 92 A large sample 100(1 δ% CI for β i uses a T = (0,, 0, 1, 0,, 0 where the 1 is in the (i + 1th position for i = 0,, p 1 Then a T (X T X 1 a is the (i + 1th diagonal element d ii of (X T X 1 So the CI is ˆβ i ± t n p,1 δ MSE dii = ˆβ i ± t n p,1 δ SE(ˆβ i 93 Suppose there are k 100(1 δ S % CIs where 1 δ S is the confidence for a single confidence interval Suppose we want the overall familywise confidence 1 δ T (probability before gathering the data and making the k CIs that all k CIs contain their θ i Hence δ T = P(at least one of the k CIs does not contain its θ i, before gathering data Then δ T is called the familywise error rate Can get procedures where 1 δ T 1 δ 94 Assume ɛ N n (0, σ 2 I i The Bonferroni t intervals use δ S = δ/k Then 1 δ T 1 δ ii Scheffe s CIs for a T β have the form a T ˆβ ± pfp,n p,1 δ MSE a T (X T X 1 a, and have 1 δ T 1 δ 95 Scheffe s CIs are longer than the corresponding Bonferroni CIs Scheffe s CIs allow data snooping: you can decide on the a T β to use after getting the data Usually need to decide on the a T β to use before gathering data for valid inference 1

2 96 If the normality in 94 does not hold, then the large sample familywise confidence 1 δ T,n 1 δ T 1 δ as n for a large class of 0 mean iid error distributions 97 A large sample 100(1 δ% confidence region for θ is a set C n such that P(θ C n 1 δ as n For the full rank OLS model, a large sample 100(1 δ% confidence region for β is C n = {β : (β ˆβ T X T X(β ˆβ MSE p F p,n p,1 δ } = {β : D 2 β (ˆβ, (X T X 1 MSE p F p,n p,1 δ }, a hyperellipsoid for β centered at ˆβ 98 For the full rank OLS model Y = x T β + ɛ, a large sample 100(1 δ% CI for E(Y = E(Y x = x T β is x T ˆβ ± tn p,1 δ MSE x T (X T X 1 x = Ŷ ± t n p,1 δ MSE h x Want hx max(h 1,, h n, where h i = x T i (X T X 1 x i, to avoid extrapolation 99 Consider predicting future test value Y f given x f and training data (x 1, Y 1,, (x n, Y n where Y i = x i β +ɛ i and Y f = x f β +ɛ f For the full rank OLS model, ɛ 1,, ɛ n, ɛ f are iid and Y f Y 1,, Y n Hence Y f Ŷ f = x T f ˆβ since ˆβ is computed using the training data Then E(Ŷ f Y f = 0 and V (Ŷ f Y f = σ 2 (1 + h f where h f = x f (X T X 1 x f is the leverage of x f Want h f max(h 1,, h n to avoid extrapolation 100 A large sample 100(1 δ% prediction interval (PI has the form (ˆL n, Ûn where P(ˆL n < Y f < Û n P 1 δ as the sample size n If the highest density region is an interval, then a PI is asymptotically optimal if it has the shortest asymptotic length that gives the desired asymptotic coverage 101 The length of a large sample CI goes to 0 while the length of a good PI goes to U L as n, where P(Y f (L, U x f = 1 δ 102 Know: Let Z 1,, Z n be random variables, let Z (1,, Z (n be the order statistics, and let c be a positive integer Compute Z (c Z (1, Z (c+1 Z (2,, Z (n Z (n c+1 Let shorth(c = (Z (d, Z (d+c 1 = ( ξ δ1, ξ 1 δ2 correspond to the interval with the smallest distance 103 Let k n = n(1 δ where x is the smallest integer x, eg, 77 = 8 Let a n = ( n n n p (1 + hf Apply the shorth(c = k n estimator to the residuals e 1,, e n : shorth(c = (e (d, e (d+c 1 = ( ξ δ1, ξ 1 δ2 Then a large sample 100 (1 δ% PI for Y f is (Ŷ f + a n ξδ1, Ŷ f + a n ξ1 δ2 For the full rank OLS model, this PI is asymptotically optimal if the x i are bounded in probability and the iid ɛ i come from a large class of zero mean unimodal distributions 104 For the full rank OLS model, the 100 (1 δ% classical PI for Y f is Ŷ f ± t n p,1 δ/2 MSE (1 + h f where P(T t n p,δ = δ if T has a t distribution with n p degrees of freedom Asymptotically, this PI estimates (E(Y f x f σz 1 δ/2, E(Y f x f +σz 1 δ/2, the interval between two quantiles of a N(E(Y f x f, σ 2 distribution where P(Z Z α = α if Z N(0, 1 This PI may not perform well if the ɛ i are not iid N(0, σ 2 since the normal quantiles are not the correct quantiles for other error distributions 2

3 105 One of the best ways to check the linear model is to make a residual plot of Ŷ versus e and a response plot of Ŷ versus Y with the identity line that has unit slope and zero intercept added as a visual aid For multiple linear regression (MLR, assume the zero mean errors are iid from a unimodal distribution that is not highly skewed If the iid constant variance MLR model is useful, then i the plotted points in the response plot should scatter about the identity line with no other pattern, and ii the plotted points in the residual plot should scatter about the e = 0 line with no other pattern If either i or ii is violated, then the iid constant variance MLR model is not sustained In other words, if the plotted points in the residual plot show some type of dependency, eg increasing variance or a curved pattern, then the MLR model may be inadequate 106 Omitting important predictors, known as underfitting, can be a serious problem Then for the multiple linear regression model, ˆβ tends to be a biased estimator of β, and leaving out important predictors could destroy the linearity of the model and could result in a model that has a nonconstant variance function Let ( ( x1 β1 x = and β =, x p 1 β p 1 assume that Y = x T β + ɛ is a good OLS model Hence E(Y x = β T x = β T 1 x 1 + β p 1 x p 1 and V (Y x = σ 2 If x p 1 is omitted from the model, then E(Y x 1 = β T 1 x 1 + β p 1 E(x p 1 x 1 and V (Y x 1 = σ 2 + β 2 p 1V (Y x 1 Note that linearity is destroyed if E(x p 1 x 1 is nonlinear and the model has a nonconstant variance function if V (Y x 1 is not constant and so depends on x 1 On the other hand, if E(x p 1 x 1 = θ T x 1 and V (x p 1 x 1 = τ 2, then E(Y x 1 = η T x 1 is linear and V (Y x 1 = σ 2 + β 2 p 1τ 2 = γ 2 is constant, where η = β 1 + β p 1 θ 107 Suppose x 1 = 1 and (Y, x 2,, x p 1 T = (Y, w T T N p ( ( µy µw, ( σ 2 Y Σ Y,w Σw,Y Σw Then Y x i1,, x ik follows a linear model with constant variance: Y i = β 0k +β 1k x i1 + + β kk x ik + ɛ ik where V (ɛ ik = σk 2 Models with lower σ2 k are better 108 Can also get linear models with underfitting if the columns of X are orthogonal: predictors can be omitted without changing the ˆβ i of the predictors that are in the model 109 Having too many predictors, known as overfitting, is much less serious than omitting important predictors The ˆβ i for unneeded x i tend to have ˆβ P i 0 Suppose Y = X 1 β 1 +ɛ is the appropriate OLS model where X 1 is an n k matrix, X = (X 1 X 2 is n p, and β = ( β1 β 2 = ( β1 0 since β 2 = 0 Consider overfitting by fitting the OLS model using X instead of X 1 Then large sample inference is correct using X, but not as precise as the model that omits predictors with β i = 0 (the model that uses X 1 For the overfitted model, R 2 is too high, and CIs for β i are longer using X than using X 1 for i = 1,, k Also want n 10p for the overfitted model and n 10k for the model using X 1 3

4 110 If Y = Xβ + ɛ with E(ɛ = 0 but Cov(ɛ = σ 2 V instead of σ 2 I, then under regularity conditions ˆβ P β, but typically i Cov(ˆβ σ 2 (X T X 1 and ii E(MSE σ 2 GLS can be used if V is known A sandwich estimator can also be used to get a consistent estimator of Cov(ˆβ 111 Outliers can often be found using the response and residual plots The OLS fitted values (so identity line will often go right through a cluster of gross outliers Look for a gap separating the outliers from the bulk of the data Fit OLS to the bulk of the data producing OLS estimator b Then make the response and residual plots for all of the data using Ŷ = xt b If the identity line still goes through the far away cluster, then it may be a cluster of good leverage cases, otherwise the cases are likely outliers Robust estimators attempt to automatically fit the bulk of the data well 112 For variable selection, the model Y = x T β+ɛ that uses all of the predictors is called the full model A model Y = x T I β I +ɛ that only uses a subset x I of the predictors is called a submodel The full model is always a submodel 113 Let I min correspond to the submodel with the smallest C p Find the submodel I I with the fewest number of predictors such that C p (I I C p (I min + 1 Then I I is the initial submodel that should be examined It is possible that I I = I min or that I I is the full model Models I with fewer predictors than I I such that C p (I C p (I min + 4 are interesting and should also be examined Models I with k predictors, including a constant, and with fewer predictors than I I such that C p (I min + 4 < C p (I min(2k, p should be checked Be able to find model I I from computer output 114 Forward selection Step 1 k = 1: Start with a constant w 1 = x 1 Step 2 k = 2: Compute C p for all models with k = 2 containing a constant and a single predictor x i Keep the predictor w 2 = x j, say, that minimizes C p Step 3 k = 3: Fit all models with k = 3 that contain w 1 and w 2 Keep the predictor w 3 that minimizes C p Step j k = j: Fit all models with k = j that contains w 1, w 2,, w j 1 Keep the predictor w j that minimizes C p Step p: Fit the full model Backward elimination: All models contain a constant = u 1 Step 0 k = p: Start with the full model that contains x 1,, x p We will also say that the full model contains u 1,, u p where u 1 = x 1 but u i need not equal x i for i > 1 Step 1 k = p 1: Fit each model with k = p 1 predictors including a constant Delete the predictor u p, say, that corresponds to the model with the smallest C p Keep u 1,, u p 1 Step 2 k = p 2: Fit each model with p 2 predictors including a constant Delete the predictor u p 1 corresponding to the smallest C p Keep u 1,, u p 2 Step j k = p j: fit each model with p j predictors including a constant Delete the predictor u p j+1 corresponding to the smallest C p Keep u 1,, u p j Step p 2 k = 2 The current model contains u 1, u 2 and u 3 Fit the model u 1, u 2 and the model u 1, u 3 Assume that model u 1, u 2 minimizes C p Then delete u 3 and keep u 1 and u Can do all subsets variable selection for up to about 30 predictors Criterion other than C p, such as MSE(I or R 2 (I = 1 [1 R 2 n (I] where model I contains k 4 n k

5 predictors, including a constant (adjusted R 2, can be used C p needs a good full model with n 10p or 5p 116 Collinearity occurs when at least one column of X = [v 0, v 1,, v p 1 ] is highly correlated with a linear combination of the other columns Regress v j on v 0, v 1,, v j 1, v j+1,, v p 1 Let R 2 j be the coefficient of determination (squared multiple correlation 1 coefficient from the regression The variance inflation factor V IF j = Let d 1 R 2 jj j be given in 92 Then SE( ˆβ j = MSE d jj = MSE V IFj SD(v j n 1 where SD(v j is the sample standard deviation of the n elements of v j Collinearity does not affect prediction much, provided that the software does not fail because X T X is nearly singular 117 The ith residual e i = ɛ i + N i where N i = x T i (β ˆβ AN 1 (0, MSE h i P 0 if max(h 1,, h n 0 as n 118 In experimental design models or design of experiments (DOE, the entries of X are coded, often as 1, 0 or 1 Often X is not a full rank matrix 119 Some DOE models have one Y i per x i and lots of x i s Then the response and residual plots are used like those for MLR 120 Some DOE models have n i Y i s per x i, and only a few distinct values of x i Then the response and residual plots no longer look like those for MLR 121 A dot plot of Z 1,, Z m consists of an axis and m points each corresponding to the value of Z i 122 Let f Z (z be the pdf of Z Then the family of pdfs f Y (y = f Z (y µ indexed by the location parameter µ, < µ <, is the location family for the random variable Y = µ + Z with standard pdf f Z (y A one way fixed effects ANOVA model has a single qualitative predictor variable W with p categories a 1,, a p There are p different distributions for Y, one for each category a i The distribution of Y (W = a i f Z (y µ i where the location family has second moments Hence all p distributions come from the same location family with different location parameter µ i and the same variance σ 2 The one way fixed effects normal ANOVA model is the special case where Y (W = a i N(µ i, σ The response plot is a plot of Ŷ versus Y For the one way Anova model, the response plot is a plot of Ŷ ij = ˆµ i versus Y ij Often the identity line with unit slope and zero intercept is added as a visual aid Vertical deviations from the identity line are the residuals e ij = Y ij Ŷij = Y ij ˆµ i The plot will consist of p dot plots that scatter about the identity line with similar shape and spread if the fixed effects one way ANOVA model is appropriate The ith dot plot is a dot plot of Y i,1,, Y i,ni Assume that each n i 10 If the response plot looks like the residual plot, then a horizontal line fits the p dot plots about as well as the identity line, and there is not much difference in the µ i 5

6 If the identity line is clearly superior to any horizontal line, then at least some of the means differ The residual plot is a plot of Ŷ versus e where the residual e = Y Ŷ The plot will consist of p dot plots that scatter about the e = 0 line with similar shape and spread if the fixed effects one way ANOVA model is appropriate The ith dot plot is a dot plot of e i,1,, e i,ni Assume that each n i 10 Under the assumption that the Y ij are from the same location scale family with different parameters µ i, each of the p dot plots should have roughly the same shape and spread This assumption is easier to judge with the residual plot than with the response plot 124 Rule of thumb: Let R i be the range of the ith dot plot = max(y i1,, Y i,ni min(y i1,, Y i,ni If the n i n/p and if max(r 1,, R p 2min(R 1,, R p, then the one way ANOVA F test results will be approximately correct if the response and residual plots suggest that the remaining one way ANOVA model assumptions are reasonable 125 Let Y i0 = n i j=1 Y ij and let ˆµ i = Y i0 = Y i0 /n i = 1 n i Y ij n i Hence the dot notation means sum over the subscript corresponding to the 0, eg j Similarly, Y 00 = p ni i=1 j=1 Y ij is the sum of all of the Y ij Be able to find ˆµ i from data 126 The cell means model for the fixed effects one way Anova is Y ij = µ i + ɛ ij where Y ij is the value of the response variable for the jth trial of the ith factor level for i = 1,, p and j = 1,, n i The µ i are the unknown means and E(Y ij = µ i The ɛ ij are iid from the location family with pdf f Z (z, zero mean and unknown variance σ 2 = V (Y ij = V (ɛ ij For the normal cell means model, the ɛ ij are iid N(0, σ 2 The estimator ˆµ i = Y i0 = n i j=1 Y ij/n i = Ŷij The ith residual is e ij = Y ij Y i0, and Y 00 is the sample mean of all of the Y ij and n = p i=1 n i The total sum of squares SSTO = p ni i=1 j=1 (Y ij Y 00 2, the treatment sum of squares SSTR = p i=1 n i(y i0 Y 00 2, and the error sum of squares SSE = RSS = p ni i=1 j=1 (Y ij Y i0 2 The MSE is an estimator of σ 2 The Anova table is the same as that for multiple linear regression, except that SSTR replaces the regression sum of squares and that SSTO, SSTR and SSE have n 1, p 1 and n p degrees of freedom Summary Analysis of Variance Table Source df SS MS F p-value Treatment p-1 SSTR MSTR Fo=MSTR/MSE for Ho: Error n-p SSE MSE µ 1 = = µ p 127 Shown is a one way ANOVA table given in symbols Sometimes Treatment is replaced by Between treatments, Between Groups, Model, Factor or Groups Sometimes Error is replaced by Residual, or Within Groups Sometimes p-value is replaced by P, Pr(> F or PR > F SSE is often replaced by RSS = residual sum of squares 128 In matrix form, the cell means model is the linear model without an intercept j=1 6

7 (although 1 C(X, where µ = β = (µ 1,, µ p T, and Y = Xµ + ɛ = Y ɛ 11 Y 1,n ɛ 1,n1 Y µ 1 ɛ 21 µ 2 = Y 2,n ɛ 2,n2 µ p Y p, ɛ p, Y p,np 129 For the cell means model, X T X = diag(n 1,, n p, (X T X 1 = diag(1/n 1,, 1/n p, and X T Y = (Y 10,, Y p0 T So ˆβ = ˆµ = (X T X 1 X T Y = (Y 10,, Y p0 T Then Ŷ = X(X T X 1 X T Y = X ˆµ, and Ŷ ij = Y i0 Hence the ijth residual e ij = Y ij Ŷ ij = Y ij Y i0 for i = 1,, p and j = 1,, n i 130 In the response plot, the dot plot for the jth treatment crosses the identity line at Y j0 131 For the one way anova F test has hypotheses H 0 : µ 1 = = µ p and H A : noth 0 (not all of the p population means are equal The one way Anova table for this test is given above 127 Let RSS = SSE The test statistic F = MSTR MSE = [RSS(H RSS]/(p 1 MSE ɛ p,np F p 1,n p if the ɛ ij are iid N(0, σ 2 If H 0 is true, then Y ij = µ+ɛ ij and ˆµ = Y 00 Hence RSS(H = SSTO = p ni i=1 j=1 (Y ij Y 00 2 Since SSTO = SSE + SSTR, the quantity SSTR = RSS(H RSS, and MSTR = SSTR/(p The one way Anova F test is a large sample test if the ɛ ij are iid with mean 0 and variance σ 2 Then the Y ij come from the same location family with the same variance σi 2 = σ 2 and different mean µ i for i = 1,, p Thus the p treatments (groups, populations have the same variance σi 2 = σ2 The V (ɛ ij σ 2 assumption (which implies that σi 2 = σ 2 for i = 1,, p is a much stronger assumption for the one way Anova model than for MLR, but the test has some resistance to the assumption that σi 2 = σ 2 by Other design matrices X can be used for the full model One design matrix adds a column of ones to the cell means design matrix This model is no longer a full rank model 134 A full rank one way Anova model with an intercept adds a constant but deletes the last column of the X for the cell means model Then Y = Xβ+ɛ where Y and ɛ are as in the cell means model Then β = (β 0, β 1,, β p 1 T = (µ p, µ 1 µ p, µ 2 µ p,, µ p 1 µ p T So β 0 = µ p and β i = µ i µ p for i = 1,, p 1 It can be shown that the OLS estimators are ˆβ 0 = Y p0 = ˆµ p, and ˆβ i = Y i0 Y p0 = ˆµ i ˆµ p for i = 1,, p 1 (The cell means model has ˆβ i = ˆµ i = Y i0 In matrix form the model is 7

8 Y 11 Y 1,n1 Y 21 Y 2,n2 Y p,1 Y p,np = β 0 β 1 β p 1 Then X T Y = (Y 00, Y 10, Y 20,, Y p 1,0 T and n n 1 n 2 n 3 n p 2 n p 1 n 1 n X T n 2 0 n X = = n p n p 2 0 n p n p 1 Hence (X T X 1 = 1 n p 1 n p + n p 1 ɛ 11 ɛ 1,n1 ɛ 21 ɛ 2,n2 ɛ p,1 ɛ p,np n (n 1 n 2 n p 1 n 1 n 2 diag(n 1,, n p np n np n np n p np n p 1 [ 1 1 T 1 11 T + diag( np n 1,, n p n p 1 ] = This model is interesting since the one way Anova F test of H 0 : µ 1 = = µ p versus H A : not H 0 corresponds to the MLR Anova F test of H 0 : β 1 = = β p 1 = 0 versus H A : not H A contrast θ = p i=1 c iµ i where p i=1 c i = 0 The estimated contrast is ˆθ = p i=1 c iy i0 Then SE(ˆθ = MSE p c 2 i and a 100(1 δ% CI for θ is n ˆθ±t n 1,1 δ/2 SE(ˆθ i i=1 CIs for one way Anova are less robust to the assumption that σ 2 i σ2 than the one way Anova F test 8

9 136 Two important families of contrasts are the family of all possible contrasts and the family of pairwise differences θ ij = µ i µ j where i j The Scheffé multiple comparisons procedure has a δ F for the family of all possible contrasts while the Tukey multiple comparisons procedure has a δ F for the family of all ( p 2 pairwise contrasts 137 The multivariate linear regression model y i = B T x i + ɛ i for i = 1,, n has m 2 response variables Y 1,, Y m and p predictor variables x 1, x 2,, x p where x 1 1 is the trivial predictor The ith case is (x T i, y T i = (1, x i2,, x ip, Y i1,, Y im where the 1 could be omitted The model is written in matrix form as Z = XB+E where the matrices are defined below The model has E(ɛ k = 0 and Cov(ɛ k = Σɛ = (σ ij for k = 1,, n Also E(e i = 0 while Cov(e i, e j = σ ij I n for i, j = 1,, m where I n is the n n identity matrix and e i is defined below Then the p m coefficient matrix B = [ ] β 1 β 2 β m and the m m covariance matrix Σ ɛ are to be estimated, and E(Z = XB while E(Y ij = x T i β j The ɛ i are assumed to be iid The data matrix W = [X Y ] except usually the first column 1 of X is omitted The n m matrix Y 1,1 Y 1,2 Y 1,m Y 2,1 Y 2,2 Y 2,m Z = = [ ] Y 1 Y 2 Y m = Y n,1 Y n,2 Y n,m y T 1 y T n The n p design matrix of predictor variables is x 1,1 x 1,2 x 1,p x 2,1 x 2,2 x 2,p X = = [ ] v 1 v 2 v p = x n,1 x n,2 x n,p x T 1 x T n where v 1 = 1 The p m matrix β 1,1 β 1,2 β 1,m β 2,1 β 2,2 β 2,m B = β p,1 β p,2 β p,m = [ β 1 β 2 ] β m The n m matrix ɛ 1,1 ɛ 1,2 ɛ 1,m ɛ 2,1 ɛ 2,2 ɛ 2,m E = ɛ n,1 ɛ n,2 ɛ n,m = [ e 1 e 2 ɛ ] e m = 1 ɛ T n Considering the ith row of Z, X and E shows that y T i = x T i B + ɛt i 9

10 138 We have changed notation for multiple linear regression, using e and ɛ for errors and ˆɛ, r, and ˆɛ ij for residuals For the multiple linear regression model, m = 1 and Y i = x i,1 β 1 + x i,2 β x i,p β p + e i = x T i β + e i (1 for i = 1,, n In matrix notation, these n equations become Y = Xβ + e, where Y is an n 1 vector of response variables, X is an n p matrix of predictors, β is a p 1 vector of unknown coefficients, and e is an n 1 vector of unknown errors 139 Each response variable in a multivariate linear regression model follows a multiple linear regression model Y j = Xβ j + e j for j = 1,, m where it is assumed that E(e j = 0 and Cov(e j = σ jj I n Hence the errors corresponding to the jth response are uncorrelated with variance σ 2 j = σ jj Notice that the same design matrix X of predictors is used for each of the m models, but the jth response variable vector Y j, coefficient vector β j and error vector e j change and thus depend on j Now consider the ith case (x T i, y T i which corresponds to the ith row of Z and the ith row of X Then Y i1 = β 11 x i1 + + β p1 x ip + ɛ i1 = x T i β 1 + ɛ i1 Y i2 = β 12 x i1 + + β p2 x ip + ɛ i2 = x T i β 2 + ɛ i2 Y im = β 1m x i1 + + β pm x ip + ɛ im = x T i β m + ɛ im or, suppressing the condition y i x i, we have y i = µx i + ɛ i = E(y i + ɛ i where x T i β 1 E(y i = µx i = B T x T i β 2 x i = x T i β m 140 The least squares estimators are ˆB = (X T X 1 X T Z = [ ˆβ1 ˆβ2 ˆβm ] The predicted values or fitted values Ẑ = X ˆB = [ ] Ŷ 1 Ŷ 2 Ŷ m = Ŷ 1,1 Ŷ 1,2 Ŷ 1,m Ŷ 2,1 Ŷ 2,2 Ŷ 2,m Ŷ n,1 Ŷ n,2 Ŷ n,m The residuals Ê = Z Ẑ = Z X ˆB = ˆɛ T 1 ˆɛ T 2 ˆɛ T n = [ ] r 1 r 2 r m = 10 ˆɛ 1,1 ˆɛ 1,2 ˆɛ 1,m ˆɛ 2,1 ˆɛ 2,2 ˆɛ 2,m ˆɛ n,1 ˆɛ n,2 ˆɛ n,m

11 These quantities can be found from the m multiple linear regressions of Y j on the predictors: βj = (X T X 1 X T Y j, Ŷ j = X β j and r j = Y j Ŷ j for j = 1,, m Hence ˆɛ i,j = Y i,j Ŷi,j where Ŷ j = (Ŷ1,j,, Ŷn,j T Finally, ˆΣɛ,d = (Z ẐT (Z Ẑ n d = (Z X ˆB T (Z X ˆB n d = ÊT Ê n d = 1 n d n ˆɛ iˆɛ T i i=1 The choices d = 0 and d = p are common If d = 1, then ˆΣɛ,d=1 = S r, the sample covariance matrix of the residual vectors ˆɛ i, since the sample mean of the ˆɛ i is 0 Let ˆΣ ɛ = ˆΣ ɛ,p be the unbiased estimator of Σɛ Also, and ˆΣɛ,d = (n d 1 Z T [I X(X T X 1 X]Z, Ê = [I X(X T X 1 X]Z 141 Theorem: Suppose X has full rank p < n and the covariance structure of 137 holds Then E( ˆB = B so E(ˆβ j = β j, Cov(ˆβ j, ˆβ k = σ jk (X T X 1 for j, k = 1,, p Also Ê and ˆB are uncorrelated, E(Ê = 0 and (ÊTÊ E(ˆΣɛ = E = Σɛ n p Also, ˆΣɛ is a n consistent estimator of Σɛ under mild regularity conditions 142 A response plot for the jth response variable is a plot of the fitted values Ŷij versus the response Y ij The identity line with slope one and zero intercept is added to the plot as a visual aid A residual plot corresponding to the jth response variable is a plot of Ŷ ij versus r ij where i = 1,, n Make the m response and residual plots for any multivariate linear regression For each response variable Y ij, the response and residual plots behave just as they do for MLR 143 Let the observed multivariate data w i for i = 1,, n be collected in an n p matrix W with n rows w T 1,, w T n Let the p 1 column vector T = T(W be a multivariate location estimator, and let the p p symmetric positive definite matrix C = C(W be a dispersion estimator such as the sample covariance matrix The ith squared Mahalanobis distance is D 2 i = D 2 i (T, C = D 2 w i (T, C = (w i T T C 1 (w i T for each point w i Notice that D 2 i is a random variable (scalar valued Notice that the population squared Mahalanobis distance is D 2 w (µ,σ = (w µt Σ 1 (w µ and that the term Σ 1/2 (w µ is the p dimensional analog to the Z-score used to transform a univariate N(µ, σ 2 random variable into a N(0, 1 random variable Hence the sample Mahalanobis distance D i = D 2 i is an analog of the absolute value Z i of 11

12 the sample Z-score Z i = (W i W/ˆσ Also notice that the Euclidean distance of w i from the estimate of center T is D i (T, I p where I p is the p p identity matrix 144 The classical Mahalanobis distance corresponds to the sample mean and sample covariance matrix T = W = 1 n n i=1 w i, and C = S = 1 n 1 n (w i W (w i W T and will be denoted by MD i When T and C are estimators other than the sample mean and covariance, D i = D 2 i will sometimes be denoted by RD i Then the DD plot is a plot of the classical Mahalanobis distances MD i versus robust Mahalanobis distances RD i The identity line is added as a visual aid If n is large and the w i are iid N p (µ,σ, then the data will cluster tightly about the identity line Tight clustering about a line through the origin that is not the identity line suggests that the w i are iid from an elliptically contoured distribution that is not multivariate normal 145 A large sample (1 δ100% prediction region is a set A n such that P(y f A n 1 δ as n, and is asymptotically optimal if the volume of the region converges in probability to the volume of the population minimum volume covering region 146 For multivariate linear regression, the classical large sample 100(1 δ% prediction region for a future value y f given x f and past data (x 1, y i,, (x n, y n is {y : D 2 y (ŷ f, ˆΣɛ χ 2 m,1 δ } and does not work well unless the ɛ i are iid N m (0,Σɛ 147 Let q n = min(1 δ + 005, 1 δ + m/n for δ > 01 and i=1 q n = min(1 δ/2, 1 δ + 10δm/n, otherwise If q n < 1 δ , set q n = 1 δ Let ẑ i = ŷ f + ˆɛ i for i = 1,, n The large sample nonparametric prediction region that works for a large class of error vector distributions is {y : D 2 y(ŷ f, S r D 2 (U n } where D (U n is the q n th sample quantile of the D i = Dẑi (ŷ f, S r = Dˆɛ i (0, S r In the DD plot, the cases to the left of the vertical line MD = D (Un correspond to y i that are in their nonparametric prediction region when x f = x i Hence 100q n % of the training data are in their prediction region, and 100q n % 100(1 δ% as n 148 Consider testing a linear hypothesis H 0 : LB = 0 versus H 1 : LB 0 where L is a full rank r p matrix Let H = ˆB T L T [L(X T X 1 L T ] 1 L ˆB The error or residual sum of squares and cross products matrix is W e = (Z Ẑ T (Z Ẑ = Z T Z Z T X ˆB = Z T [I n X(X T X 1 X T ]Z Note that W e = ÊT Ê and W e /(n p = ˆΣɛ 149 Let λ 1 λ 2 λ m be the ordered eigenvalues of W 1 e H Then there are four commonly used test statistics The Roy s maximum root statistic is λ max (L = λ 1 m The Wilks Λ statistic is Λ(L = (H +W e 1 W e = W 1 e H +I 1 = (1+λ i 1 The Pillai s trace statistic is V (L = tr[(h + W e 1 H] = 12 m i=1 λ i 1 + λ i i=1

13 The Hotelling-Lawley trace statistic is U(L = tr[w 1 e H] = m λ i 150 Let matrix A = [a 1 a 2 a p ] Then the vec operator stacks the columns of A on top of one another so a 1 a 2 vec(a = a p Let A = (a ij be an m n matrix and B a p q matrix Then the Kronecker product of A and B is the mp nq matrix a 11 B a 12 B a 1n B a 21 B a 22 B a 2n B A B = a m1 B a m2 B a mn B An important fact is that if A and B are nonsingular square matrices, then [A B] 1 = A 1 B 1 The following assumption is important 151 Theorem: The Hotelling-Lawley trace statistic U(L = 1 n p [vec(l ˆB] T [ ˆΣ 1 ɛ (L(XT X 1 L T 1 ][vec(l ˆB] 152 Assumption D1: Let h i be the ith diagonal element of X(X T X 1 X T Assume max(h 1,, h n 0 as n, assume that the zero mean iid error vectors have finite fourth moments, and assume that 1 n XT X P W Multivariate Least Squares Central Limit Theorem (MLS CLT: For the least squares estimator, if assumption D1 holds, then ˆΣɛ is n consistent and n vec( ˆB B D N pm (0,Σɛ W 154 Theorem: If assumption D1 holds and if H 0 is true, then (n pu(l D χ 2 rm 155 Consider testing a linear hypothesis H 0 : LB = 0 versus H 1 : LB 0 where L is a full rank r p matrix For now assume the error distribution is multivariate normal N m (0,Σɛ Then ˆβ 1 β 1 vec( ˆB ˆβ 2 β 2 B = N pm(0,σɛ (X T X 1 ˆβ m β m i=1 where C = Σɛ (X T X 1 = σ 11 (X T X 1 σ 12 (X T X 1 σ 1m (X T X 1 σ 21 (X T X 1 σ 22 (X T X 1 σ 2m (X T X 1 σ m1 (X T X 1 σ m2 (X T X 1 σ mm (X T X 1 13

14 Now let A be a rm pm block diagonal matrix: A = diag(l,, L Then A vec( ˆB B = vec(l( ˆB B = L(ˆβ 1 β 1 L(ˆβ 2 β 2 L(ˆβ m β m N rm(0,σɛ L(X T X 1 L T where D = Σɛ L(X T X 1 L T = ACA T = σ 11 L(X T X 1 L T σ 12 L(X T X 1 L T σ 1m L(X T X 1 L T σ 21 L(X T X 1 L T σ 22 L(X T X 1 L T σ 2m L(X T X 1 L T σ m1 L(X T X 1 L T σ m2 L(X T X 1 L T σ mm L(X T X 1 L T and Under H 0, vec(lb = A vec(b = 0, and Hence under H 0, vec(l ˆB = Lˆβ 1 Lˆβ 2 Lˆβ m N rm(0,σɛ L(X T X 1 L T [vec(l ˆB] T [Σ 1 ɛ (L(X T X 1 L T 1 ][vec(l ˆB] χ 2 rm, T = [vec(l ˆB] T 1 [ ˆΣ ɛ (L(X T X 1 L T 1 ][vec(l ˆB] D χ 2 rm (2 A large sample level α test will reject H 0 if pval < α where pval = P( T rm < F rm,n mp (3 Since least squares estimators are asymptotically normal, if the ɛ i are iid for a large class of distributions, ˆβ 1 β 1 ˆβ 2 β 2 D n vec( ˆB B = n N pm (0,Σɛ W ˆβ m β m where X T X n W 1 14

15 Then under H 0, n vec(l ˆB = n Lˆβ 1 Lˆβ 2 D N rm (0,Σɛ LWL T, Lˆβ m and n [vec(l ˆB] T [Σ 1 ɛ (LWL T 1 ][vec(l ˆB] D χ 2 rm Hence (2 holds, and (3 gives a large sample level δ test if the least squares estimators are asymptotically normal 156 Theorems 151 and 154 are useful for relating multivariate tests with the partial F test for multiple linear regression that tests whether a reduced model that omits some of the predictors can be used instead of the full model that uses all p predictors The partial F test statistic is [ ] SSE(R SSE(F F R = /MSE(F df R df F where the residual sums of squares SSE(F and SSE(R and degrees of freedom df F and df r are for the full and reduced model while the mean square error MSE(F is for the full model Let the null hypothesis for the partial F test be H 0 : Lβ = 0 where L sets the coefficients of the predictors in the full model but not in the reduced model to 0 Seber and Lee (2003, p 100 shows that F R = [Lˆβ] T (L(X T X 1 L T 1 [Lˆβ] rˆσ 2 is distributed as F r,n p if H 0 is true and the errors are iid N(0, σ 2 Note that for multiple linear regression with m = 1, F R = (n pu(l/r since = 1/ˆσ 2 Hence the scaled Hotelling Lawley test statistic is the partial F test statistic extended to m > 1 predictor variables by Theorem By Theorem 154, for example, rf D R χ 2 r for a large class of nonnormal error distribution If Z n F k,dn, then Z D n χ 2 k /k as d n Hence using the F r,n p approximation gives a large sample test with correct asymptotic level, and the partial F test is robust to nonnormality Similarly, using an F rm,n pm approximation for the following test statistics gives large sample tests with correct asymptotic level and similar power for large n The large sample test will have correct asymptotic level as long as the denominator degrees of freedom d n as n, and d n = n pm reduces to the partial F test if m = 1 and U(L is used Then the three test statistics are [n p 05(m r + 3] rm it can be shown that log(λ(l, n p rm V (L log(λ(l U(L 15 ˆΣ 1 ɛ n p V (L, and rm U(L

16 Hence the Hotelling Lawley test will have the most power and Pillai s test will have the least power 158 Under regularity conditions, [n p (m r + 3] log(λ(l D χ 2 rm, (n pv (L D χ 2 rm, and (n pu(l D χ 2 rm These statistics are robust against nonnormality For the ( Wilks Lambda test, [n p (m r + 3] pval = P rm ( n p For the Pillai s trace test, pval = P rm ( n p For the Hotelling Lawley trace test, pval = P log(λ(l < F rm,n rm V (L < F rm,n rm U(L < F rm,n rm rm The above three tests are large sample tests, P(reject H 0 H 0 is true α as n, under regularity conditions 159 The 4 step MANOVA F test of hypotheses uses L = [0 I p 1 ]: i State the hypotheses H 0 : the nontrivial predictors are not needed in the mreg model H 1 : at least one of the nontrivial predictors is needed ii Find the test statistic F o from output iii Find the pval from output iv If pval < α, reject H 0 If pval α, fail to reject H 0 If H 0 is rejected, conclude that there is a mreg relationship between the response variables Y 1,, Y m and the predictors X 2,, X p If you fail to reject H 0, conclude that there is a not a mreg relationship between Y 1,, Y m and the predictors X 2,, X p (Get the variable names from the story problem 160 The 4 step F j test of hypotheses uses L j = [0,, 0, 1, 0,, 0] where the 1 is in the jth position Let b T j be the jth row of B The hypotheses are equivalent to H 0 : b T j = 0 H 1 : b T j 0 This test is a test for whether x j is needed in the model i State the hypotheses H 0 : x j is not needed in the model H 1 : x j is needed in the model ii Find the test statistic F j from output iii Find pval from output iv If pval < α, reject H 0 If pval α, fail to reject H 0 Give a nontechnical sentence restating your conclusion in terms of the story problem If H 0 is rejected, then conclude that X j is needed in the mreg model for Y 1,, Y m If you fail to reject H 0, then conclude that X j is not needed in the mreg model for Y 1,, Y m given that the other predictors are in the model The statistic F j = 1 d j ˆBT j ɛ ˆB j = 1 (ˆβ j1, d ˆβ j2,, ˆβ jm ˆΣ 1 ɛ j ˆΣ 1 ˆβ j1 ˆβ j2 ˆβ jm where ˆB T j is the jth row of ˆB and d j = (X T X 1 jj, the jth diagonal entry of (X T X 1 16

17 The statistic F j could be used for forward selection and backward elimination in variable selection 161 The 4 step MANOVA partial F test of hypotheses has a full model using all of the variables and a reduced model where r of the variables are deleted The ith row of L has a 1 in the position corresponding to the ith variable to be deleted Omitting the jth variable corresponds to the F j test while omitting variables X 2,, X p corresponds to the MANOVA F test i State the hypotheses H 0 : the reduced model is good H 1 : use the full model ii Find the test statistic F R from output iii Find the pval from output iv If pval < α, reject H 0 and conclude that the full model should be used If pval α, fail to reject H 0 and conclude that the reduced model is good 162 The 4 step MANOVA F test should reject H 0 if the response and residual plots look good, n is large enough and at least one response plot does not look like the corresponding residual plot A response plot for Y j will look like a residual plot if the identity line appears almost horizontal, hence the range of Ŷj is small The lregpack function mltreg produces the m response and residual plots, gives ˆB, ˆΣɛ, the MANOVA partial F test statistic and pval corresponding to the reduced model that leaves out the variables given by indices (so x 2 and x 4 in the output below with F = 077 and pval = 0614, F j and the pval for the F j test for variables 1, 2,, p (where p = 4 in the output below so F 2 = 151 with pval = 0284 and F 0 and pval for the MANOVA F test (in the output below F 0 = 315 and pval= 006 Right click Stop on the plots m times to advance the plots and to get the cursor back on the command line in R The command out <- mltreg(x,y,indices=c(2 would produce a MANOVA partial F test corresponding to the F 2 test while the commandout <- mltreg(x,y,indices=c(2,3,4 would produce a MANOVA partial F test corresponding to the MANOVA F test for a data set with p = 4 predictor variables The Hotelling Lawley trace statistic is used in the tests out <- mltreg(x,y,indices=c(2,4 $Bhat [,1] [,2] [,3] [1,] [2,] [3,] [4,] $Covhat [,1] [,2] [,3] [1,] [2,] [3,] $partial partialf Pval 17

18 [1,] $Ftable Fj pvals [1,] [2,] [3,] [4,] $MANOVA MANOVAF pval [1,] #Output for Example 122 y<-marry[,c(2,3]; x<-marry[,-c(2,3]; mltreg(x,y,indices=c(3,4 $partial partialf Pval [1,] $Ftable Fj pvals [1,] [2,] [3,] [4,] $MANOVA MANOVAF pval [1,] e-16 Example 122 The above output is for the Hebbler (1847 data from the 1843 Prussia census Sometimes if the wife or husband was not at the household, then s/he would not be counted Y 1 = number of married civilian men in the district, Y 2 = number of women married to civilians in the district, x 2 = population of the district in 1843, x 3 = number of married military men in the district, x 4 = number of women married to military men in the district The reduced model deletes x 3 and x 4 a Do the MANOVA F test b Do the F 2 test c Do the F 4 test d Do an appropriate 4 step test for the reduced model that deletes x 3 and x 4 e The output for the reduced model that deletes x 1 and x 2 is shown below Do an appropriate 4 step test 18

19 $partial partialf Pval [1,] Solution: a i H 0 : the nontrivial predictors are not needed in the mreg model H 1 : at least one of the nontrivial predictors is needed ii F 0 = iii pval = 0 iv Reject H 0, the nontrivial predictors are needed in the mreg model b i H 0 : x 2 is not needed in the model H 1 : x 2 is needed ii F 2 = iii pval = 0 iv Reject H 0, population of the district is needed in the model c i H 0 : x 4 is not needed in the model H 1 : x 4 is needed ii F 4 = 0065 iii pval = 0937 iv Fail to reject H 0, number of women married to military men is not needed in the model given that the other predictors are in the model d i H 0 : the reduced model is good H 1 : use the full model ii F R = 0200 iii pval = 0935 iv Fail to reject H 0, so the reduced model is good e i H 0 : the reduced model is good H 1 : use the full model ii F R = 5696 iii pval = 000 iv Reject H 0, so use the full model 163 Recall the population OLS coefficients and second way to compute ˆβ from 74 and 75 Similar results will hold for multivariate linear regression Let y = (Y 1,, Y m T, let w = (x 2,, x p T, let ˆβ j = (ˆα j, ˆη T j T where ˆα j = Y j ˆη T j w and ˆη j = ˆΣ 1 w ˆΣwY j Let ˆΣwy = 1 n n 1 i=1 (w i w(y i y T which has jth column ˆΣwY j for j = 1,, m Let u = ( y w, E(u = µu = ( E(y E(w = ( Σ yy Σyw Σwy Σww ( µ y µw, and Cov(u = Σu = Let the vector of constants be α T = (α 1,, α m and the matrix of slope vectors B S = [ η1 η 2 η m ] Then the population least squares coefficient matrix is B = ( α T B S where α = µy B T Sµw and B S = Σ 1 w Σ wy where Σw = Σww 19

20 If the u i are iid with nonsingular covariance matrix Cov(u, the least squares estimator ( ˆα T ˆB = ˆB S where ˆα = y ˆB T S w and ˆB S = ˆΣ 1 w ˆΣwy The least squares multivariate linear regression estimator can be calculated by computing the classical estimator (u, Su = (u, ˆΣu of multivariate location and dispersion on the u i, and then plug in the results into the formulas for ˆα and ˆB S 20

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

Visualizing and Testing the Multivariate Linear Regression Model

Visualizing and Testing the Multivariate Linear Regression Model Visualizing and Testing the Multivariate Linear Regression Model Abstract Recent results make the multivariate linear regression model much easier to use. This model has m 2 response variables. Results

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Chapter 2 Multiple Linear Regression

Chapter 2 Multiple Linear Regression Chapter 2 Multiple Linear Regression This chapter introduces the multiple linear regression model, the response plot for checking goodness of fit, the residual plot for checking lack of fit, the ANOVA

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Bootstrapping Analogs of the One Way MANOVA Test

Bootstrapping Analogs of the One Way MANOVA Test Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University July 17, 2017 Abstract The classical one way MANOVA model is used to

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

REGRESSION AND ANOVA UNDER HETEROGENEITY. Elmer Agudelo Rodríguez. Colombia (May 1994) Bolivariana University, Colombia (June 2000)

REGRESSION AND ANOVA UNDER HETEROGENEITY. Elmer Agudelo Rodríguez. Colombia (May 1994) Bolivariana University, Colombia (June 2000) REGRESSION AND ANOVA UNDER HETEROGENEITY by Elmer Agudelo Rodríguez Bachelor of Science, Mathematics and Physics Education, Antioquia University, Colombia (May 1994) Graduate Certificate in Pedagogy Contemporaneous

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Bootstrapping Hypotheses Tests

Bootstrapping Hypotheses Tests Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Summer 2015 Bootstrapping Hypotheses Tests Chathurangi H. Pathiravasan Southern Illinois University Carbondale, chathurangi@siu.edu

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Highest Density Region Prediction

Highest Density Region Prediction Highest Density Region Prediction David J. Olive Southern Illinois University November 3, 2015 Abstract Practical large sample prediction regions for an m 1 future response vector y f, given x f and past

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Diagnostics and Remedial Measures: An Overview

Diagnostics and Remedial Measures: An Overview Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.

More information

Research Methods II MICHAEL BERNSTEIN CS 376

Research Methods II MICHAEL BERNSTEIN CS 376 Research Methods II MICHAEL BERNSTEIN CS 376 Goal Understand and use statistical techniques common to HCI research 2 Last time How to plan an evaluation What is a statistical test? Chi-square t-test Paired

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013 INFERENCE AFTER VARIABLE SELECTION by Lasanthi C. R. Pelawa Watagoda M.S., Southern Illinois University Carbondale, 2013 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Doctor

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

PLOTS FOR THE DESIGN AND ANALYSIS OF EXPERIMENTS. Jenna Christina Haenggi

PLOTS FOR THE DESIGN AND ANALYSIS OF EXPERIMENTS. Jenna Christina Haenggi PLOTS FOR THE DESIGN AND ANALYSIS OF EXPERIMENTS by Jenna Christina Haenggi Bachelor of Science in Mathematics, Southern Illinois University, 2007 A Research Paper Submitted in Partial Fulfillment of the

More information

1 The classical linear regression model

1 The classical linear regression model THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) II Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 1 Compare Means from More Than Two

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

The Standard Linear Model: Hypothesis Testing

The Standard Linear Model: Hypothesis Testing Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]:

More information

Multiple comparisons - subsequent inferences for two-way ANOVA

Multiple comparisons - subsequent inferences for two-way ANOVA 1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Comparisons of Several Multivariate Populations Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS

More information

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3. Simfit Tutorials and worked examples for simulation, curve fitting, statistical analysis, and plotting. http://www.simfit.org.uk MANOVA examples From the main SimFIT menu choose [Statistcs], [Multivariate],

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information