1 The classical linear regression model
|
|
- Annice Joseph
- 5 years ago
- Views:
Transcription
1 THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used statistical methods We shall start with the multiple linear regression (MLR) analysis before studying the multivariate linear regression model Our discussion of the multiple linear regression uses matrix algebra 1 The classical linear regression model The multiple linear regression model consists of a single response variable with multiple predictors and assumes the form where the error terms {ɛ i } satisfy 1 E(ɛ i ) = 0 for all i; Y i = β 0 + β 1 Z i1 + β 2 Z i2 + + β r Z ir + ɛ i, i = 1,, n, (1) 2 Var(ɛ i ) = σ 2, a constant; and 3 Cov(ɛ i, ɛ j ) = 0 if i j In matrix notation, Eq (1) becomes or Y 1 Y 2 Y n and the assumptions become 1 E(ɛ) = 0; and 2 Cov(ɛ) = E(ɛɛ ) = σ 2 I n n 1 Z 11 Z 12 Z 1r 1 Z 21 Z 22 Z 2r = 1 Z n1 Z n2 Z nr β 0 β 1 β r Y n 1 = Z n (r+1) β (r+1) 1 + ɛ n 1, The matrix Z is referred to as the design matrix + ɛ 1 ɛ 2 ɛ n
2 2 Least squares estimation The least squares estimate (LSE) of β is obtained by minimizing the sum of squares of errors, n S(β) = (Y i β 0 β 1 Z i1 β r Z ir ) 2 = (Y Zβ) (Y Zβ) i=1 Before deriving the LSE, we first define β = (Z Z) 1 Z Y and H = Z(Z Z) 1 Z, assuming that the inverse exists Then, it is easy to see that H is symmetric and H 2 = H The latter property says that H is an idempotent matrix Define ɛ = Y Z β = Y Z(Z Z) 1 Z Y = [I H]Y Then, using Z H = Z, we have Z ɛ = Z [I H]Y = [Z Z H]Y = [Z Z ]Y = 0 In other order, ɛ is orthogonal to the column space of the design matrix Z Result 71 Assume that Z is of full rank (r + 1), where r + 1 n The LSE of β is Proof Since we have β = (Z Z) 1 Z Y Y Zβ = Y Z β + Z β Zβ = Y Z β + Z( β β), S(β) = (Y Zβ) (Y Zβ) = (Y Z β) (Y Z β) + ( β β) Z Z( β β) + 2(Y Z β) Z( β β) = (Y Z β) (Y Z β) + ( β β) Z Z( β β) because (Y Z β) Z = ɛ Z = 0 The first term of S(β) does not depend on β and the second term is the squared length of Z( β β) Because Z has full rank, Z( β β) 0 if β β, so the minimum of S(β) is unique and ocurs at β = β Consequently, the LSE of β is β = β = (Z Z) 1 Z Y QED The deviations ˆɛ i = Y i ˆβ 0 ˆβ 1 Z i1 ˆβ r Z ir, i = 1,, n, are the residuals of the MLR model Let Ŷ = Z β = HY denote the fitted values of Y The H matrix is called the hat matrix The LS residuals then become ɛ = Y Ŷ = [I Z(Z Z) 1 Z ]Y = [I H]Y Note that I H is also a symmetric and idempotent matrix 2
3 Our prior discussion shows that the residuals satisfy (a) Z ɛ = 0 and (b) Ŷ ɛ = 0 Also, the residual sum of squares (RSS) is n RSS = ˆɛ 2 i = ɛ ɛ = Y [I H]Y i=1 Sum of squares decomposition: Since Ŷ ɛ = 0, we have Y Y = (Ŷ + ɛ) (Ŷ + ɛ) = Ŷ Ŷ + ɛ ɛ Since the first column of Z is 1, the result Z ɛ = 0 includes 0 = 1 ɛ Consequently, we have Ȳ = Ŷ Subtracing nȳ 2 = n( Ŷ ) 2 from the prior decomposition, we have Y Y nȳ 2 = Ŷ Ŷ n( Ŷ ) 2 + ɛ ɛ, or The quantity n n n (Y i Ȳ )2 = (Ŷi Ȳ )2 + ˆɛ 2 i i=1 i=1 i=1 R 2 = 1 ni=1 ˆɛ 2 i ni=1 (Y i Ȳ )2 is the coefficient of determination It is the proportion of the total variation in the Y i s explained by the predictors Z 1,, Z p Sampling properties: 1 The LSE β satisfies E( β) = β and Cov( β) = σ 2 (Z Z) 1 2 The residuals satisfy E( ɛ) = 0 and Cov( ɛ) = σ 2 [I H] 3 E( ɛ ɛ) = (n r 1)σ 2 so that, defining we have E(s 2 ) = σ 2 4 β and ɛ are uncorrelated s 2 = ɛ ɛ n r 1 = Y [I H]Y n r 1, Proof First, E( β) = E[(Z Z) 1 Z Y ] = (Z Z) 1 Z E(Y ) = (Z Z) 1 Z [Zβ + E(ɛ)] = β Also, Cov( β) = (Z Z) 1 Z Cov(Y )Z(Z Z) 1 = (Z Z) 1 Z (σ 2 I)Z(Z Z) 1 = σ 2 (Z Z) 1 Next, E( ɛ) = [I H]E(Y ) = [I H]Zβ = [Z Z]β = 0 Also, Cov( ɛ) = [I H]Cov(Y )[I H] = σ 2 [I H], because [I H] is idempotent 3
4 Next, using tr(h) = tr(z(z Z) 1 Z ) = tr[(z Z) 1 Z Z] = tr(i r+1 ) = r + 1, we have E( ɛ ɛ) = E[tr( ɛ ɛ)] = E[tr( ɛ ɛ )] = tr[cov( ɛ)] = tr[σ 2 (I H)] = σ 2 [tr(i) tr(h)] = σ 2 (n r 1) Finally, Cov( β, ɛ) = Cov[(Z Z) 1 Z Y, (I H)Y ] = (Z Z) 1 Z Cov(Y, Y )(I H) = (Z Z) 1 Z (σ 2 I)(I H) = σ 2 (Z Z) 1 Z (I H) = σ 2 (Z Z) 1 (Z Z ) = 0 so that β and ɛ are uncorrelated Gauss least squares theorem Let Y = Zβ + ɛ, where E(ɛ) = 0, Cov(ɛ) = σ 2 I, and Rank(Z) = r + 1 For any c, the estimator c β of c β has the smallest possible variance among all linear estiamtors of the form a Y that are unbiased for c β Proof For any fixed c, let a Y be an unibased estimator of c β Then, E(a Y ) = c β, whatever the value of β But E(a Y ) = E[a (Zβ + ɛ)] = a Zβ Consequently, c β = a Zβ or equivalently, (c a Z)β = 0 for all β In particular, choosing β = (c a Z), we obtain c = a Z for any unbiased estimator Next, c β = c (Z Z) 1 Z Y a Y, where a = Z(Z Z) 1 c Since E( β) = β, so c ˆβ = a Y is an unbiased estimator of c β The result of the last paragraph says that c = (a ) Z Finally, for any a satisfying the unbiased requirement c = a Z, we have Var(a Y ) = Var(a Zβ + a ɛ) = Var(a ɛ) = σ 2 a a = σ 2 (a a + a ) (a a + a ) = σ 2 [(a a ) (a a ) + (a ) a ], where (a a ) a = (a a ) Z(Z Z) 1 c = (a Z (a ) Z)(Z Z) 1 c = (c c )(Z Z) 1 c = 0 Because a is fixed, and (a a ) (a a ) is positive unless a = a, Var(a Y ) is minimized by the choice of a = a and we have, (a ) Y = Ec (Z Z) 1 Z Y = c ˆβ QED The result says that the LSE c ˆβ is the best linear unbiased estiamtor (BLUE) of c β 3 Inference For the multiple linear regression model in (1), we further assume that ɛ N n (0, σ 2 I) Result 74 β is also the maximum likelihood estimate of β In addition, β N r+1 [β, σ 2 (Z Z) 1 ] and is independent of the residuals ɛ = Y Z β Let σ 2 be the maximum likelihood estimate of σ 2 Then, n σ 2 = ɛ ɛ σ 2 χ 2 n r 1 Proof Follows what we discussed before for the MLE of multivariate model random sample QED 4
5 Note that the MLE σ 2 of σ 2 is ɛ ɛ/n, which is different from the LSE of ˆσ 2 Result 75 For the Guassian MLR model, a 100(1 α) percent confidence region for β is given by (β β) Z Z(β β) (r + 1)s 2 F r+1,n r 1 (α), where s 2 = ɛ ɛ/(n r 1) is the LSE of σ 2 Also, simultaneous 100(1 α) percent confidence intervals for the β j are ˆβ i ± Var( ˆβ i ) (r + 1)F r+1,n r 1 (α), i = 0, 1,, r, where Var( ˆβ i ) is the diagonal element of s 2 (Z Z) 1 corresponding to ˆβ i Proof Consider the vector V = (Z Z) 1/2 ( β β), which is normally distributed with mean zero and covariance matrix σ 2 I Consequently, V V σ 2 χ 2 r+1 In addition, (n r 1)s 2 = ɛ ɛ is distributed as σ 2 χ 2 n r 1 and is independent of V The result then follows QED Remark: The R command for MLR is lm, which stands for linear model Likelihood ratio tests for the regression parameters: Consider H o : β q+1 = β q+2 = = β r = 0, vs H a : β i 0 for some q + 1 i r Under H o, the model is Under H a, the model is where Y = Z 1 β 1 + ɛ (2) Y = Z 1 β 1 + Z 2 β 2 + ɛ, (3) Z = [Z 1, Z 2 ], β = Result 76 Let Z have full rank r + 1 and ɛ N n (0, σ 2 I) The likelihood ratio test for the null hypothesis H o : β 2 = 0 is [ β1 β 2 [SS res (Z 1 ) SS res (Z)]/(r q) s 2 F r q,n r 1, where SS res (Z 1 ) and SS res (Z 2 ) are the sum of squares of the models in (2) and (3), respectively Proof: Under the model in (3), the maximizied likelihood function is L( β, ˆσ 2 ) = ] 1 (2π) n/2ˆσ n e n/2, 5
6 where β = (Z Z) 1 Z Y and ˆσ 2 = (Y Z β) (Y Z β)/n On the other hand, under the submodel in (2), the maximized likelihood function is L( β 1, ˆσ 1) 2 1 = e n/2, (2π) n/2ˆσ n 1 where β 1 = (Z 1Z 1 ) 1 Z 1Y and ˆσ 1 2 = (Y Z β 1 1 ) (Y Z β 1 1 )/n Thus, the likelihood ratio is L( β 1, ˆσ 1) 2 ( ) ˆσ 2 n/2 ( L( β, ˆσ 2 ) = 1 = 1 + ˆσ2 1 ˆσ 2 ) n/2, ˆσ 2 ˆσ 2 which gives rise to the test statistic n(ˆσ 2 1 ˆσ 2 )/(r q) nˆσ 2 /(n r 1) = (SS res(z 1 ) SS res (Z))/(r q) s 2 F r q,n r 1 This completes the proof Alternatively, one can construct a matrix C such that the null hypothsis becomes H o : Cβ = 0 In this way, C β N r q (Cβ, σ 2 C(Z Z) 1 C ), which can be used to perform the test 4 Inferences from the fitted model Consider a specific point of interest, say z o = (1, z 01,, z 0r ), in the design-matrix space Then, the model says E(Y o z o ) = z oβ, and the LSE of this expectation is z β o In addition, Var(z β) o = σ 2 z o(z Z) 1 z o Consequently, under the normality assumption, a 100(1 α)% confidence interval for z oβ is z β o ± t n r 1 (α/2) z o(z Z) 1 z o s 2 Forecasting: The point prediction of Y at z o is z β, o which is an unbiased estimator Since Y o = z oβ +ɛ o, the variance of the forecast is σ 2 (1+z o(z Z) 1 z o ), where we use the property β and ɛ o are uncorrelated Therefore, a 100(1 α)% prediction interval for Y o is z β o ± t n r 1 (α/2) s 2 (1 + z o(z Z) 1 z o ) 5 Model checking Studentized residuals: From ɛ = [I H]Y, we have Cov( ɛ) = σ 2 [I H] In particular, Var(ˆɛ j ) = σ 2 (1 h jj ), for j = 1,, n The studentized residuals are ˆɛ j = ˆɛ j, j = 1,, n s2 (1 h jj ) If the fitted regression model is adequate,we expected the studentized residuals to look like independent draws from an N(0, 1) 6
7 51 Residual plots The residuals ˆɛ j or studentized residuals ˆɛ j are used to obtain various residual plots for model checking: 1 Plot ˆɛ j against the fitted model ŷ j = Z j β, where Z j is the jth row of the design matrix Z This plot can be used to check (a) validity of linear model assumtion and (b) constant variance of ɛ j 2 Plot ˆɛ j against individual explanatory variable, eg Z 1 This is less common when the number of regressors is large 3 QQ-plot of ˆɛ j to check the normality assumption and possible outliers 4 Time plot of ˆɛ j to check for serial correlations This is often accompanied by the Durbin-Watson statistic nj=2 (ˆɛ j ˆɛ j 1 ) 2 DW = nj=1 2(1 ˆρ ˆɛ 2 1 ), j where ˆρ 1 is the lag-1 autocorrelation function of the residuals defined as ˆρ 1 = nj=2 ˆɛ jˆɛ j 1 nj=1 ˆɛ 2 j The range of DW-statistic is [0,4] with 2 as the ideal value A DW statistic greater than 2 indicates negative correlation between the residuals In practice, when the data have time or spatial charactersitics, one should also check higher lags of autocorrelations of the residuals 52 High leverage points and influential observations From Ŷ = HY, we have n ŷ j = h ji y i = h jj y j + h ji y i i=1 i j In addition, it can be shown that 0 < h jj < 1 for all j In fact, n j=1 h jj = tr(h) = r + 1, under the assumption of Z is full rank r + 1 Thus, if h jj is large relatively to other h ji (in magnitude), then y j will be a major contributor to the fitted value ŷ j Consequently, h jj is called the leverage of the linear regression A large h jj tends to pull the regression line toward the jth data point The leverage h jj has another interpretation It measures the distance of Z j to the center of the explantory variables, where Z j is the jth data point of the design matrix Z For instance, consider the simple linear regression y j = β 0 + β 1 z j + ɛ j It can be shown that h jj = 1 n + (z j z) 2 ni=1 (z i z) 2 7
8 Consequently, if the jth data point of the explanatory variables is far away from the center, then it has a high leverage and pulls the model fit toward itself It is, therefore, useful in linear regression analysis to check the high leverage points Influential observations of a linear regression model are defined as those points that significantly affect the inferences drawn from the data Methods for assessing the influence are often derived from the change in the LSE β if the observations are removed from the data The well-known statistics for assessing influential observations is the Cook s distance The Cook s distance for the ith observation is defined as D i = ( β (i) β) (Z Z)( β (i) β) (r + 1)s 2, (4) where β (i) is the LSE of β with the ith data point removed, and s 2 = ɛ ɛ/(n r 1) is the LSE of σ 2 See Cook (1977,Technometrics) It is the squared distance between β and β (i) relative to the fixed geometry of Z Z A large D i indicates the ith data point is influential, because removing it from the data leads to a substantial change in the parameter estimates It can be shown that D i = 1 ˆɛ i h ii r + 1 s 2 (1 h ii ) = 1 h ii (ˆɛ 2 i ) 2, r h ii where ˆɛ i is the studentized residual This expression leads to several interpretations for the Cook s distance For instance, h ii /(1 h ii ) is a monotonic function of the leverage h ii and ˆɛ i is large for an outlying observation Thus, D i is the product of a random deviation and a leverage measure In addition, h ii /(1 h ii ) = Var(Ŷi)/Var(ˆɛ i ) It can also be shown that z i(z (i)z (i) ) 1 z i = h ii /(1 h ii ) Finally, h ii 1 h ii = nj=1 Var(z j β (i) ) n j=1 Var(z j β) σ 2 Thus, h ii /(1 h ii ) is proportional to the total change in the variance of prediction at z 1,, z n when z i is deleted 6 Variable selection For the MLR model considered, we have r predictors In theory, there are 2 r 1 possible linear regression models available Because the r predictors may be highly correlated, this can lead to the problem of multi-collinearity, which, in turn, results in a unstable or overfitted model An important question then is to select the best MLR model among those candidate models This is the variable selection problem in MLR analysis The problem of variable selection has been widely studied in the literature It is an on-going project for many statisticians and researchers The current research is to investigate variable 8
9 selection when both r and n (the sample size) go to infinity See LASSO, boosting and some Bayesian methods in the literature In this lecture, we focus on the traditional approach of variable selection That is, r is fixed, but n may increase The methods we discuss include (a) Stepwise regression, (b) Mallow s C p criterion, (c) Information criteria such as AIC and BIC, and (d) Stochastic search variable selection by George and McCulloch (1993, JASA, Variable selection via Gibbs sampling) 61 Stepwise regression It is an iterative procedure alternating between forward selection and backward elimination The procedure requires two prespecified thresholds, which I shall denote by F in and F out Typically, F in and F out are squares of some critical values of a Student-t distribution with n 1 degrees of freedom In some cases, one may use F in = F out = 4 To describe the procedure, we divide the r regressors into two subsets, say S in and S out, where S in contains all the regressors already selected and S out denotes the remaining regressors S in and S out may be the empty set Suppose that S in has p variables and denote the corresponding MLR model as Y i = β 0 + β 1 Z i1 + + β p Z pi + e i, i = 1,, n (5) Forward-selection step: The goal of forward selection to expand the regression by adding a regressor to the model Let ê i be the residuals of the MLR in (5) 1 Find the variable, say Z p+1, in S out that has the maximum correlation with ê i 2 Perform the MLR Y i = β 0 + β 1 Z i1 + + β p Z pi + β p+1 Z (p+1)i + e i, i = 1,, n 3 Let t p+1 be the t-ratio of the LSE ˆβ p+1 of the prior regression Compare t 2 p+1 with F in If t 2 p+1 F in, then move Z p+1 from S out to S in and go to the backward-elimination step If t 2 p+1 < F in, stop the procedure Backward-elimination step The goal of backword elimination is to simplify the existing model by removing one variable out of S in Again, consider the existing model in (5) 1 Find the regressor in S in that has the smallest t-ratio, in absoluate value, of the MLR in (5) Denote the regressor by Zj and the corresponding t-ratio by t j Compare t 2 j with F out If t 2 j < F out, move Zj from S in to S out and go to the forward-selection step On the other hand, if t 2 j F out, do not move Zj and go to the forward selection In practice, one starts with S in being the empty set and S out containing all regressors One can also start the backward-elimination step with S in containing all regressors The stepwise procedure will stop after certain iterations Obviously, the result would depend on the choices of F in and F out 9
10 62 Mallow s C p Mallow (1973, Technometrics, Vol 15) proposes the following criterion for model selection: C p = SSE p s 2 F n + 2(p + 1), where p denotes the number of variables in the regression Y i = β 0 + β 1 Z i1 + + β p Z ip + ɛ i, i = 1,, n, (6) where {Z1,, Zp} is a subset of the available regressors {Z 1,, Z r }, SSE p = n i=1 (Y i Ŷ pi ) 2 with Ŷpi denoting the fitted value of the MLR in (6), and s 2 F is the residual mean squared errors of the MLR model in (1), ie, the residual mean squared error when ALL predictors are used Here we use the subscript F to denotes the FULL model Suppose that the subset model in (6) is the true model Then, s 2 F converges to σ 2 (variance of ɛ i ) as n increases and E(C p ) = E(SSE p) (n p 1)σ2 n + 2(p + 1) = n + 2(p + 1) = p + 1 σ 2 σ 2 Therefore, in practice, one selects a model such that C p is less than, but close to (p + 1) Remark: In the discussion of C p, I use a constant term in the MLR If no constant term is used or the constant term is treated as a predictor, then C p is defined as C p = SSEp n + 2p s 2 F 63 Information criteria Assume that ɛ i is normally distributed For a given subset of regressors, say S in = {Z 1,, Z p}, one estimates the model parameters of MLR (6) by the maximum likelihood method As shown before, the estimate of the coefficient vector β = (β 0, β 1,, β p ) is the same as the LSE However, the MLE of the variance of ɛ t is σ 2 = SSE n, where SSE denotes the sum of squared errors The AIC criterion for the fitted MLR model is then AIC(p) = ln( σ 2 ) + 2p n (7) The first term of the AIC is a goodness-of-fit statistic measuring the fidelity of the model to the data and the second term is a penalty function From the definition, AIC uses a penalty of 2 for any parameter used Note that the constant term is assumed to be included for all model so that it is not counted In practice, one entertains a set of possible MLR models for Y i and selects the model that has the smallest AIC values If one change the penalty term to p ln(n)/n, then the criterion becomes the BIC 10
11 Example Consider silver-zinc battery data set of the textbook The dependent variable is the logarithm of the cycles to failure of the battery The five predictors are (1) charge rate (Z 1 ), (2) discharge rate (Z 2 ), (3) depth of discharge (Z 3 ), (4) temperature (Z 4 ) and end of charge voltage (Z 5 ) We shall use the leaps package and the command step in R to carry out the computation Since r = 5 is small, we have = 31 possible regressions and may entertain all of them in model selection When r is large, some procedure will become useful in reducing the number of MLR models entertained The leaps package can be used to consider all possible regressions > setwd("c:/users/rst/teaching/ama") > da=readtable("t7-5dat",header=t) > da z1 z2 z3 z4 z5 y > y=log(da$y) > x=asmatrix(da[,1:5]) > x z1 z2 z3 z4 z5 [1,] [2,] [3,] [4,] [5,] [6,] [7,]
12 [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] > library(leaps) > help(leaps) > > leaps(x,y,nbest=3) $which FALSE FALSE FALSE TRUE FALSE 1 FALSE TRUE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE 2 FALSE TRUE FALSE TRUE FALSE 2 FALSE FALSE FALSE TRUE TRUE 2 FALSE FALSE TRUE TRUE FALSE 3 FALSE TRUE FALSE TRUE TRUE 3 TRUE TRUE FALSE TRUE FALSE 3 FALSE TRUE TRUE TRUE FALSE 4 TRUE TRUE FALSE TRUE TRUE 4 FALSE TRUE TRUE TRUE TRUE 4 TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" [6] "5" $size [1] $Cp [1] [8] > > leaps(x,y,nbest=1) 12
13 $which FALSE FALSE FALSE TRUE FALSE 2 FALSE TRUE FALSE TRUE FALSE 3 FALSE TRUE FALSE TRUE TRUE 4 TRUE TRUE FALSE TRUE TRUE 5 TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" [6] "5" $size [1] $Cp [1] > x1=dataframe(x) > nn=lm(y~,data=x1) > step(nn) Start: AIC=758 y ~ z1 + z2 + z3 + z4 + z5 Df Sum of Sq RSS AIC - z z <none> z z z Step: AIC=618 y ~ z1 + z2 + z4 + z5 Df Sum of Sq RSS AIC - z <none> z z z Step: AIC=48 y ~ z2 + z4 + z5 13
14 Df Sum of Sq RSS AIC <none> z z z Call: lm(formula = y ~ z2 + z4 + z5, data = x1) Coefficients: (Intercept) z2 z4 z > > m1=step(nn,trace=f) > m1 Call: lm(formula = y ~ z2 + z4 + z5, data = x1) Coefficients: (Intercept) z2 z4 z > m2=lm(y~z2+z4+z5,data=x1) > summary(m2) Call: lm(formula = y ~ z2 + z4 + z5, data = x1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) z z *** z Residual standard error: 1032 on 16 degrees of freedom Multiple R-squared: 06421, Adjusted R-squared: 0575 F-statistic: 9568 on 3 and 16 DF, p-value:
15 64 Stochastic search variable selection This approach uses a hierarchical model and Gibbs sampling to perform variable selection Readers are referred to George and McCulloch (1993, JASA) for details Given the observed response Y and the design matrix Z, the approach assumes that Y β, σ 2 N(Zβ, σ 2 I), where I is the n n identity matrix The key concept of this hierarchical approach is that each component of β = (β 0,, β r ) follows a mixture of two normal distributions with mean zero, but different variances Specifically, β i (1 γ i )N(0, τ 2 i ) + γ i N(0, c i τ 2 i ), i = 0,, r (8) where c i is a sufficiently large positive real number, and γ i is a Bernoulli variable such that P (γ i = 1) = 1 P (γ i = 0) = p i (9) The idea is as follows: First, one sets τ i close to zero so that if γ i = 0, then β i would probably be so small that it could be safely estimated by 0 Second, one sets c i large, (in practice c i > 1 always) so that if γ i = 1, then a non-zero estimate of β i should probably be included in the final model See George and McCulloch (1993) for discussion of selecting c i and τ i Based on the setup, p i can be regarded as the prior probability that β i will require a non-zero estimate, ie, the predictor Z i should be selected From (8), β γ N(0, D γ RD γ ),, where γ = (γ 0, γ 1,, γ r ), R is a prior correlation matrix, and D γ = diag{a 0 τ 0, a 1 τ 1,, a r τ r }, where a i = 1 if γ i = 0 and a i = c i if γ i = 1 Finally, the prior distribution of σ 2 given γ is inverse Gaussian as IG(v γ /2, v γ λ γ /2) This says that v γ λ γ /σ 2 is distributed as χ 2 v γ Under the framewotk described ealier, the variable selection is governed by the posterior distribution of γ given the data We denote it as f(γ Y ) The Markov chain Monte Carlo (MCMC) methods with Gibbs samplign can be used to obtain this posterior distribution What happen when r is really large and n is not large? For instance, r = 3000 and n = Boosting: Bühlmann, P (2006) Boosting for high-dimensional linear models Annals of Statistics, 34, Bühlmann and Yu (2003) Boosting with the L2-loss: regression and classification Journal of the American Statistical Association, 98, L2 Boosting: For simplicity, assume that the data are mean-corrected so that there is no constant term in the linear regression Boosting is an iterated procedure as follows Let m = 0 Define Ŷ (0) = 0 15
16 (a) Construct the residuals of the mth iteration as R (m) = Y Ŷ (m) (b) Fit r simple linear regression ˆR (m) i = β j Z ij + ɛ ij and compute the resulting sum of squares of residuals (c) Let j m be the explanatory variable that has the smallest sum of squares of residuals, ie the maximum R 2 among the r simple linear regression Compute the fitted value of this simple linear regression Ŷ (j m) = ˆβ jm Z jm (d) Update the fit as Ŷ (m+1) = Ŷ (m) +vŷ (j m), where v (0, 1] is a tuning parameter (e) Advance m by 1 and go to Step 1 The iteration is stopped by some information criteria such as AIC A smaller v requires longer iteration, but limited experience shows that it works better For example, v = 005 has been used 2 LASSO: Tibshirani (1996) Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society, Series B, 58, The LASSO estimate of β, denoted by β l, is obtained by β l n = argmin β Y i β 0 i=1 subject to r β j s, j=1 2 r Z ij β j, j=1 where s is a tuning parameter In practice, s can be chosen by cross-validation Typically, a 10-fold cross-validation is used 7 Omitted variables In linear regression analysis, if some relevant predictors are omitted, then the coefficient estimates may contain bias and the residuals may have some serial dependence Suppose that the true model is Y = Z 1 β (1) + Z 2 β (2) + ɛ, where the dimension of Z 1 is q + 1 < r + 1 Suppose that the investigator unknowingly fits the model Y = Z 1 β (1) + ɛ (1) 16
17 The LSE of β (1) is β (1) = (Z 1Z 1 ) 1 Z 1Y In this case, E( β (1) ) = (Z 1Z 1 ) 1 Z 1E(Y ) = (Z 1Z 1 ) 1 Z 1(Z 1 β (1) + Z 2 β (2) + E(ɛ)) = β (1) + (Z 1Z 1 ) 1 Z 1Z 2 β (2) Consequently, β (1) is a biased estimate of β (1) unless Z 1Z 2 = 0 Appendix In this appendix, we consider some useful matrix properties for linear regression model Property 1 Assume that the inverse matrices exist, then [ ] 1 [ A B A 1 + F E 1 F F E 1 ] B = D E 1 F E 1, where E = D B A 1 B and F = A 1 B Peoperty 2 Assume that the design matrix Z n (r+1) is of full rank (r + 1) The matrix H = Z(Z Z) 1 Z and I H are idempotent matrices Property 3 Partition the design matrix as Z = [Z 1, Z 2 ], where Z 1 is an n q matrix of full rank q Define the hat matrix of the design matrix Z 1 as H 1 = Z 1 (Z 1Z 1 ) 1 Z 1 Regress Z 2 on Z 1 The residuals of such a linear regression are W 2 = [I H 1 ]Z 2 Now, consider W 2 as a design matrix and contruct the associated hat matrix T as Then, we have Proof: From the partition, we have T = W 2(W 2W 2 ) 1 W 2 Z Z = H = H 1 + T [ Z 1Z 1 Z ] 1Z 2 Z 2Z 1 Z 2Z 2 To obtain (Z Z) 1, we apply Property 1 with A = Z 1Z 1 and use the property that I H 1 is an idempotent matrix In particular, E = D B A 1 B = Z 2Z 2 (Z 2Z 1 )(Z 1Z 1 ) 1 (Z 1Z 2 ) = Z 2[I Z 1 (Z 1Z 1 ) 1 Z 1]Z 2 = Z 2(I H 1 )Z 2 = W 2W 2, F = (Z 1Z 1 ) 1 Z 1Z 2, 17
18 and Finally, F E 1 = (Z 1Z 1 ) 1 Z 1Z 2 [Z 2(I H 1 )Z 2 ] 1, Z 1 (A 1 + F E 1 F )Z 1 = H 1 + H 1 Z 2 E 1 Z 2H 1 H = Z 1 (A 1 + F E 1 F )Z 1 Z 2 E 1 F Z 1 Z 1 F E 1 Z 2 + Z 2 E 1 Z 2 = H 1 + H 1 Z 2 E 1 Z 2H 1 Z 2 E 1 Z 2H 1 H 1 Z 2 E 1 Z 2 + Z 2 E 1 Z 2 = H 1 (I H 1 )Z 2 E 1 Z 2H 1 + (I H 1 )Z 2 E 1 Z 2 = H 1 + (I H 1 )Z 2 E 1 Z 2(I H 1 ) = H 1 + W 2 (W 2W 2 ) 1 W 2 This property says that we can project on the subspace of Z 1, then project on the subspace of Z 2 that is orthogonal to Z 1 Property 4 Let Z 1 = 1 be the n-dimensional vector of ones Applying Property 3, we have H = 11 /n + H, where H is the hat matrix of the centered design matrix Z n r given by Z = Z 11 Z 1 Z 12 Z 2 Z 1r Z r Z 21 Z 1 Z 22 Z 2 Z 2r Z r Z n1 Z 1 Z n2 Z 2 Z nr Z r where Z i = n j=1 Z ji /n is the mean of the ith column of Z Property 5 Let z i be the ith row of the design matrix Z and Z (i) be the matrix Z with the z i removed Then, (Z (i)z (i) ) 1 = (Z Z z i z i) 1 = (Z Z) 1 + (Z Z) 1 z i z i(z Z) 1 1 z i(z Z) 1 z i, Proof: Let A = Z Z, a = z i and b = z i The result then follows the identity (A + a b) 1 = A 1 A 1 a (I + ba 1 a ) 1 ba 1, where A is a k k symmetric and invertiable matrix and a and b are q k matrices of rank q Note that z i β = Ŷi is the fitted value of the ith observation so that the ith residual is ˆɛ i = Y i z i β Also, from the definition h ii = z i(z Z) 1 z i 18
19 Property 6 β (i) β = (Z Z) 1 z iˆɛ i 1 h ii Proof Note that Z Y = Z (i)y (i) + z i Y i, where Y i is the ith element of Y and Y (i) is the vector Y with Y i removed Using Property 5, we have β (i) = (Z (i)z (i) ) 1 Z (i)y (i) [ = (Z Z) 1 + (Z Z) 1 z i z i(z Z) 1 ] (Z Y z i Y i ) 1 h ii = β + (Z Z) 1 z i z i β 1 h ii (Z Z) 1 z i Y i (Z Z) 1 z i h ii Y i 1 h ii = β + (Z Z) 1 z i Ŷ i 1 h ii (Z Z) 1 z i Y i 1 h ii = β + (Z Z) 1 z iˆɛ i 1 h ii Consequently, β (i) β = (Z Z) 1 z iˆɛ i 1 h ii Property 7 The Cook s distance can be written as D i = 1 ˆɛ 2 i h ii r + 1 s 2 (1 h ii ) 2 19
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationMultivariate Linear Regression Models
Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationThe Multiple Regression Model Estimation
Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More information4.1 Order Specification
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationMLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More informationGeneral Linear Model: Statistical Inference
Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More information2.2 Classical Regression in the Time Series Context
48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationAuto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,
1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationLeast Squares Estimation
Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationIn the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)
RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationEmpirical Economic Research, Part II
Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationMultivariate Regression Analysis
Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationChapter 12: Multiple Linear Regression
Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationDiagnostics for Linear Models With Functional Responses
Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University
More informationReview: Second Half of Course Stat 704: Data Analysis I, Fall 2014
Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationMultiple Linear Regression
Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationFöreläsning /31
1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationModel Selection Procedures
Model Selection Procedures Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Model Selection Procedures Consider a regression setting with K potential predictor variables and you wish to explore
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationLinear Regression Models
Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationVector Auto-Regressive Models
Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More information