1 The classical linear regression model

Size: px
Start display at page:

Download "1 The classical linear regression model"

Transcription

1 THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used statistical methods We shall start with the multiple linear regression (MLR) analysis before studying the multivariate linear regression model Our discussion of the multiple linear regression uses matrix algebra 1 The classical linear regression model The multiple linear regression model consists of a single response variable with multiple predictors and assumes the form where the error terms {ɛ i } satisfy 1 E(ɛ i ) = 0 for all i; Y i = β 0 + β 1 Z i1 + β 2 Z i2 + + β r Z ir + ɛ i, i = 1,, n, (1) 2 Var(ɛ i ) = σ 2, a constant; and 3 Cov(ɛ i, ɛ j ) = 0 if i j In matrix notation, Eq (1) becomes or Y 1 Y 2 Y n and the assumptions become 1 E(ɛ) = 0; and 2 Cov(ɛ) = E(ɛɛ ) = σ 2 I n n 1 Z 11 Z 12 Z 1r 1 Z 21 Z 22 Z 2r = 1 Z n1 Z n2 Z nr β 0 β 1 β r Y n 1 = Z n (r+1) β (r+1) 1 + ɛ n 1, The matrix Z is referred to as the design matrix + ɛ 1 ɛ 2 ɛ n

2 2 Least squares estimation The least squares estimate (LSE) of β is obtained by minimizing the sum of squares of errors, n S(β) = (Y i β 0 β 1 Z i1 β r Z ir ) 2 = (Y Zβ) (Y Zβ) i=1 Before deriving the LSE, we first define β = (Z Z) 1 Z Y and H = Z(Z Z) 1 Z, assuming that the inverse exists Then, it is easy to see that H is symmetric and H 2 = H The latter property says that H is an idempotent matrix Define ɛ = Y Z β = Y Z(Z Z) 1 Z Y = [I H]Y Then, using Z H = Z, we have Z ɛ = Z [I H]Y = [Z Z H]Y = [Z Z ]Y = 0 In other order, ɛ is orthogonal to the column space of the design matrix Z Result 71 Assume that Z is of full rank (r + 1), where r + 1 n The LSE of β is Proof Since we have β = (Z Z) 1 Z Y Y Zβ = Y Z β + Z β Zβ = Y Z β + Z( β β), S(β) = (Y Zβ) (Y Zβ) = (Y Z β) (Y Z β) + ( β β) Z Z( β β) + 2(Y Z β) Z( β β) = (Y Z β) (Y Z β) + ( β β) Z Z( β β) because (Y Z β) Z = ɛ Z = 0 The first term of S(β) does not depend on β and the second term is the squared length of Z( β β) Because Z has full rank, Z( β β) 0 if β β, so the minimum of S(β) is unique and ocurs at β = β Consequently, the LSE of β is β = β = (Z Z) 1 Z Y QED The deviations ˆɛ i = Y i ˆβ 0 ˆβ 1 Z i1 ˆβ r Z ir, i = 1,, n, are the residuals of the MLR model Let Ŷ = Z β = HY denote the fitted values of Y The H matrix is called the hat matrix The LS residuals then become ɛ = Y Ŷ = [I Z(Z Z) 1 Z ]Y = [I H]Y Note that I H is also a symmetric and idempotent matrix 2

3 Our prior discussion shows that the residuals satisfy (a) Z ɛ = 0 and (b) Ŷ ɛ = 0 Also, the residual sum of squares (RSS) is n RSS = ˆɛ 2 i = ɛ ɛ = Y [I H]Y i=1 Sum of squares decomposition: Since Ŷ ɛ = 0, we have Y Y = (Ŷ + ɛ) (Ŷ + ɛ) = Ŷ Ŷ + ɛ ɛ Since the first column of Z is 1, the result Z ɛ = 0 includes 0 = 1 ɛ Consequently, we have Ȳ = Ŷ Subtracing nȳ 2 = n( Ŷ ) 2 from the prior decomposition, we have Y Y nȳ 2 = Ŷ Ŷ n( Ŷ ) 2 + ɛ ɛ, or The quantity n n n (Y i Ȳ )2 = (Ŷi Ȳ )2 + ˆɛ 2 i i=1 i=1 i=1 R 2 = 1 ni=1 ˆɛ 2 i ni=1 (Y i Ȳ )2 is the coefficient of determination It is the proportion of the total variation in the Y i s explained by the predictors Z 1,, Z p Sampling properties: 1 The LSE β satisfies E( β) = β and Cov( β) = σ 2 (Z Z) 1 2 The residuals satisfy E( ɛ) = 0 and Cov( ɛ) = σ 2 [I H] 3 E( ɛ ɛ) = (n r 1)σ 2 so that, defining we have E(s 2 ) = σ 2 4 β and ɛ are uncorrelated s 2 = ɛ ɛ n r 1 = Y [I H]Y n r 1, Proof First, E( β) = E[(Z Z) 1 Z Y ] = (Z Z) 1 Z E(Y ) = (Z Z) 1 Z [Zβ + E(ɛ)] = β Also, Cov( β) = (Z Z) 1 Z Cov(Y )Z(Z Z) 1 = (Z Z) 1 Z (σ 2 I)Z(Z Z) 1 = σ 2 (Z Z) 1 Next, E( ɛ) = [I H]E(Y ) = [I H]Zβ = [Z Z]β = 0 Also, Cov( ɛ) = [I H]Cov(Y )[I H] = σ 2 [I H], because [I H] is idempotent 3

4 Next, using tr(h) = tr(z(z Z) 1 Z ) = tr[(z Z) 1 Z Z] = tr(i r+1 ) = r + 1, we have E( ɛ ɛ) = E[tr( ɛ ɛ)] = E[tr( ɛ ɛ )] = tr[cov( ɛ)] = tr[σ 2 (I H)] = σ 2 [tr(i) tr(h)] = σ 2 (n r 1) Finally, Cov( β, ɛ) = Cov[(Z Z) 1 Z Y, (I H)Y ] = (Z Z) 1 Z Cov(Y, Y )(I H) = (Z Z) 1 Z (σ 2 I)(I H) = σ 2 (Z Z) 1 Z (I H) = σ 2 (Z Z) 1 (Z Z ) = 0 so that β and ɛ are uncorrelated Gauss least squares theorem Let Y = Zβ + ɛ, where E(ɛ) = 0, Cov(ɛ) = σ 2 I, and Rank(Z) = r + 1 For any c, the estimator c β of c β has the smallest possible variance among all linear estiamtors of the form a Y that are unbiased for c β Proof For any fixed c, let a Y be an unibased estimator of c β Then, E(a Y ) = c β, whatever the value of β But E(a Y ) = E[a (Zβ + ɛ)] = a Zβ Consequently, c β = a Zβ or equivalently, (c a Z)β = 0 for all β In particular, choosing β = (c a Z), we obtain c = a Z for any unbiased estimator Next, c β = c (Z Z) 1 Z Y a Y, where a = Z(Z Z) 1 c Since E( β) = β, so c ˆβ = a Y is an unbiased estimator of c β The result of the last paragraph says that c = (a ) Z Finally, for any a satisfying the unbiased requirement c = a Z, we have Var(a Y ) = Var(a Zβ + a ɛ) = Var(a ɛ) = σ 2 a a = σ 2 (a a + a ) (a a + a ) = σ 2 [(a a ) (a a ) + (a ) a ], where (a a ) a = (a a ) Z(Z Z) 1 c = (a Z (a ) Z)(Z Z) 1 c = (c c )(Z Z) 1 c = 0 Because a is fixed, and (a a ) (a a ) is positive unless a = a, Var(a Y ) is minimized by the choice of a = a and we have, (a ) Y = Ec (Z Z) 1 Z Y = c ˆβ QED The result says that the LSE c ˆβ is the best linear unbiased estiamtor (BLUE) of c β 3 Inference For the multiple linear regression model in (1), we further assume that ɛ N n (0, σ 2 I) Result 74 β is also the maximum likelihood estimate of β In addition, β N r+1 [β, σ 2 (Z Z) 1 ] and is independent of the residuals ɛ = Y Z β Let σ 2 be the maximum likelihood estimate of σ 2 Then, n σ 2 = ɛ ɛ σ 2 χ 2 n r 1 Proof Follows what we discussed before for the MLE of multivariate model random sample QED 4

5 Note that the MLE σ 2 of σ 2 is ɛ ɛ/n, which is different from the LSE of ˆσ 2 Result 75 For the Guassian MLR model, a 100(1 α) percent confidence region for β is given by (β β) Z Z(β β) (r + 1)s 2 F r+1,n r 1 (α), where s 2 = ɛ ɛ/(n r 1) is the LSE of σ 2 Also, simultaneous 100(1 α) percent confidence intervals for the β j are ˆβ i ± Var( ˆβ i ) (r + 1)F r+1,n r 1 (α), i = 0, 1,, r, where Var( ˆβ i ) is the diagonal element of s 2 (Z Z) 1 corresponding to ˆβ i Proof Consider the vector V = (Z Z) 1/2 ( β β), which is normally distributed with mean zero and covariance matrix σ 2 I Consequently, V V σ 2 χ 2 r+1 In addition, (n r 1)s 2 = ɛ ɛ is distributed as σ 2 χ 2 n r 1 and is independent of V The result then follows QED Remark: The R command for MLR is lm, which stands for linear model Likelihood ratio tests for the regression parameters: Consider H o : β q+1 = β q+2 = = β r = 0, vs H a : β i 0 for some q + 1 i r Under H o, the model is Under H a, the model is where Y = Z 1 β 1 + ɛ (2) Y = Z 1 β 1 + Z 2 β 2 + ɛ, (3) Z = [Z 1, Z 2 ], β = Result 76 Let Z have full rank r + 1 and ɛ N n (0, σ 2 I) The likelihood ratio test for the null hypothesis H o : β 2 = 0 is [ β1 β 2 [SS res (Z 1 ) SS res (Z)]/(r q) s 2 F r q,n r 1, where SS res (Z 1 ) and SS res (Z 2 ) are the sum of squares of the models in (2) and (3), respectively Proof: Under the model in (3), the maximizied likelihood function is L( β, ˆσ 2 ) = ] 1 (2π) n/2ˆσ n e n/2, 5

6 where β = (Z Z) 1 Z Y and ˆσ 2 = (Y Z β) (Y Z β)/n On the other hand, under the submodel in (2), the maximized likelihood function is L( β 1, ˆσ 1) 2 1 = e n/2, (2π) n/2ˆσ n 1 where β 1 = (Z 1Z 1 ) 1 Z 1Y and ˆσ 1 2 = (Y Z β 1 1 ) (Y Z β 1 1 )/n Thus, the likelihood ratio is L( β 1, ˆσ 1) 2 ( ) ˆσ 2 n/2 ( L( β, ˆσ 2 ) = 1 = 1 + ˆσ2 1 ˆσ 2 ) n/2, ˆσ 2 ˆσ 2 which gives rise to the test statistic n(ˆσ 2 1 ˆσ 2 )/(r q) nˆσ 2 /(n r 1) = (SS res(z 1 ) SS res (Z))/(r q) s 2 F r q,n r 1 This completes the proof Alternatively, one can construct a matrix C such that the null hypothsis becomes H o : Cβ = 0 In this way, C β N r q (Cβ, σ 2 C(Z Z) 1 C ), which can be used to perform the test 4 Inferences from the fitted model Consider a specific point of interest, say z o = (1, z 01,, z 0r ), in the design-matrix space Then, the model says E(Y o z o ) = z oβ, and the LSE of this expectation is z β o In addition, Var(z β) o = σ 2 z o(z Z) 1 z o Consequently, under the normality assumption, a 100(1 α)% confidence interval for z oβ is z β o ± t n r 1 (α/2) z o(z Z) 1 z o s 2 Forecasting: The point prediction of Y at z o is z β, o which is an unbiased estimator Since Y o = z oβ +ɛ o, the variance of the forecast is σ 2 (1+z o(z Z) 1 z o ), where we use the property β and ɛ o are uncorrelated Therefore, a 100(1 α)% prediction interval for Y o is z β o ± t n r 1 (α/2) s 2 (1 + z o(z Z) 1 z o ) 5 Model checking Studentized residuals: From ɛ = [I H]Y, we have Cov( ɛ) = σ 2 [I H] In particular, Var(ˆɛ j ) = σ 2 (1 h jj ), for j = 1,, n The studentized residuals are ˆɛ j = ˆɛ j, j = 1,, n s2 (1 h jj ) If the fitted regression model is adequate,we expected the studentized residuals to look like independent draws from an N(0, 1) 6

7 51 Residual plots The residuals ˆɛ j or studentized residuals ˆɛ j are used to obtain various residual plots for model checking: 1 Plot ˆɛ j against the fitted model ŷ j = Z j β, where Z j is the jth row of the design matrix Z This plot can be used to check (a) validity of linear model assumtion and (b) constant variance of ɛ j 2 Plot ˆɛ j against individual explanatory variable, eg Z 1 This is less common when the number of regressors is large 3 QQ-plot of ˆɛ j to check the normality assumption and possible outliers 4 Time plot of ˆɛ j to check for serial correlations This is often accompanied by the Durbin-Watson statistic nj=2 (ˆɛ j ˆɛ j 1 ) 2 DW = nj=1 2(1 ˆρ ˆɛ 2 1 ), j where ˆρ 1 is the lag-1 autocorrelation function of the residuals defined as ˆρ 1 = nj=2 ˆɛ jˆɛ j 1 nj=1 ˆɛ 2 j The range of DW-statistic is [0,4] with 2 as the ideal value A DW statistic greater than 2 indicates negative correlation between the residuals In practice, when the data have time or spatial charactersitics, one should also check higher lags of autocorrelations of the residuals 52 High leverage points and influential observations From Ŷ = HY, we have n ŷ j = h ji y i = h jj y j + h ji y i i=1 i j In addition, it can be shown that 0 < h jj < 1 for all j In fact, n j=1 h jj = tr(h) = r + 1, under the assumption of Z is full rank r + 1 Thus, if h jj is large relatively to other h ji (in magnitude), then y j will be a major contributor to the fitted value ŷ j Consequently, h jj is called the leverage of the linear regression A large h jj tends to pull the regression line toward the jth data point The leverage h jj has another interpretation It measures the distance of Z j to the center of the explantory variables, where Z j is the jth data point of the design matrix Z For instance, consider the simple linear regression y j = β 0 + β 1 z j + ɛ j It can be shown that h jj = 1 n + (z j z) 2 ni=1 (z i z) 2 7

8 Consequently, if the jth data point of the explanatory variables is far away from the center, then it has a high leverage and pulls the model fit toward itself It is, therefore, useful in linear regression analysis to check the high leverage points Influential observations of a linear regression model are defined as those points that significantly affect the inferences drawn from the data Methods for assessing the influence are often derived from the change in the LSE β if the observations are removed from the data The well-known statistics for assessing influential observations is the Cook s distance The Cook s distance for the ith observation is defined as D i = ( β (i) β) (Z Z)( β (i) β) (r + 1)s 2, (4) where β (i) is the LSE of β with the ith data point removed, and s 2 = ɛ ɛ/(n r 1) is the LSE of σ 2 See Cook (1977,Technometrics) It is the squared distance between β and β (i) relative to the fixed geometry of Z Z A large D i indicates the ith data point is influential, because removing it from the data leads to a substantial change in the parameter estimates It can be shown that D i = 1 ˆɛ i h ii r + 1 s 2 (1 h ii ) = 1 h ii (ˆɛ 2 i ) 2, r h ii where ˆɛ i is the studentized residual This expression leads to several interpretations for the Cook s distance For instance, h ii /(1 h ii ) is a monotonic function of the leverage h ii and ˆɛ i is large for an outlying observation Thus, D i is the product of a random deviation and a leverage measure In addition, h ii /(1 h ii ) = Var(Ŷi)/Var(ˆɛ i ) It can also be shown that z i(z (i)z (i) ) 1 z i = h ii /(1 h ii ) Finally, h ii 1 h ii = nj=1 Var(z j β (i) ) n j=1 Var(z j β) σ 2 Thus, h ii /(1 h ii ) is proportional to the total change in the variance of prediction at z 1,, z n when z i is deleted 6 Variable selection For the MLR model considered, we have r predictors In theory, there are 2 r 1 possible linear regression models available Because the r predictors may be highly correlated, this can lead to the problem of multi-collinearity, which, in turn, results in a unstable or overfitted model An important question then is to select the best MLR model among those candidate models This is the variable selection problem in MLR analysis The problem of variable selection has been widely studied in the literature It is an on-going project for many statisticians and researchers The current research is to investigate variable 8

9 selection when both r and n (the sample size) go to infinity See LASSO, boosting and some Bayesian methods in the literature In this lecture, we focus on the traditional approach of variable selection That is, r is fixed, but n may increase The methods we discuss include (a) Stepwise regression, (b) Mallow s C p criterion, (c) Information criteria such as AIC and BIC, and (d) Stochastic search variable selection by George and McCulloch (1993, JASA, Variable selection via Gibbs sampling) 61 Stepwise regression It is an iterative procedure alternating between forward selection and backward elimination The procedure requires two prespecified thresholds, which I shall denote by F in and F out Typically, F in and F out are squares of some critical values of a Student-t distribution with n 1 degrees of freedom In some cases, one may use F in = F out = 4 To describe the procedure, we divide the r regressors into two subsets, say S in and S out, where S in contains all the regressors already selected and S out denotes the remaining regressors S in and S out may be the empty set Suppose that S in has p variables and denote the corresponding MLR model as Y i = β 0 + β 1 Z i1 + + β p Z pi + e i, i = 1,, n (5) Forward-selection step: The goal of forward selection to expand the regression by adding a regressor to the model Let ê i be the residuals of the MLR in (5) 1 Find the variable, say Z p+1, in S out that has the maximum correlation with ê i 2 Perform the MLR Y i = β 0 + β 1 Z i1 + + β p Z pi + β p+1 Z (p+1)i + e i, i = 1,, n 3 Let t p+1 be the t-ratio of the LSE ˆβ p+1 of the prior regression Compare t 2 p+1 with F in If t 2 p+1 F in, then move Z p+1 from S out to S in and go to the backward-elimination step If t 2 p+1 < F in, stop the procedure Backward-elimination step The goal of backword elimination is to simplify the existing model by removing one variable out of S in Again, consider the existing model in (5) 1 Find the regressor in S in that has the smallest t-ratio, in absoluate value, of the MLR in (5) Denote the regressor by Zj and the corresponding t-ratio by t j Compare t 2 j with F out If t 2 j < F out, move Zj from S in to S out and go to the forward-selection step On the other hand, if t 2 j F out, do not move Zj and go to the forward selection In practice, one starts with S in being the empty set and S out containing all regressors One can also start the backward-elimination step with S in containing all regressors The stepwise procedure will stop after certain iterations Obviously, the result would depend on the choices of F in and F out 9

10 62 Mallow s C p Mallow (1973, Technometrics, Vol 15) proposes the following criterion for model selection: C p = SSE p s 2 F n + 2(p + 1), where p denotes the number of variables in the regression Y i = β 0 + β 1 Z i1 + + β p Z ip + ɛ i, i = 1,, n, (6) where {Z1,, Zp} is a subset of the available regressors {Z 1,, Z r }, SSE p = n i=1 (Y i Ŷ pi ) 2 with Ŷpi denoting the fitted value of the MLR in (6), and s 2 F is the residual mean squared errors of the MLR model in (1), ie, the residual mean squared error when ALL predictors are used Here we use the subscript F to denotes the FULL model Suppose that the subset model in (6) is the true model Then, s 2 F converges to σ 2 (variance of ɛ i ) as n increases and E(C p ) = E(SSE p) (n p 1)σ2 n + 2(p + 1) = n + 2(p + 1) = p + 1 σ 2 σ 2 Therefore, in practice, one selects a model such that C p is less than, but close to (p + 1) Remark: In the discussion of C p, I use a constant term in the MLR If no constant term is used or the constant term is treated as a predictor, then C p is defined as C p = SSEp n + 2p s 2 F 63 Information criteria Assume that ɛ i is normally distributed For a given subset of regressors, say S in = {Z 1,, Z p}, one estimates the model parameters of MLR (6) by the maximum likelihood method As shown before, the estimate of the coefficient vector β = (β 0, β 1,, β p ) is the same as the LSE However, the MLE of the variance of ɛ t is σ 2 = SSE n, where SSE denotes the sum of squared errors The AIC criterion for the fitted MLR model is then AIC(p) = ln( σ 2 ) + 2p n (7) The first term of the AIC is a goodness-of-fit statistic measuring the fidelity of the model to the data and the second term is a penalty function From the definition, AIC uses a penalty of 2 for any parameter used Note that the constant term is assumed to be included for all model so that it is not counted In practice, one entertains a set of possible MLR models for Y i and selects the model that has the smallest AIC values If one change the penalty term to p ln(n)/n, then the criterion becomes the BIC 10

11 Example Consider silver-zinc battery data set of the textbook The dependent variable is the logarithm of the cycles to failure of the battery The five predictors are (1) charge rate (Z 1 ), (2) discharge rate (Z 2 ), (3) depth of discharge (Z 3 ), (4) temperature (Z 4 ) and end of charge voltage (Z 5 ) We shall use the leaps package and the command step in R to carry out the computation Since r = 5 is small, we have = 31 possible regressions and may entertain all of them in model selection When r is large, some procedure will become useful in reducing the number of MLR models entertained The leaps package can be used to consider all possible regressions > setwd("c:/users/rst/teaching/ama") > da=readtable("t7-5dat",header=t) > da z1 z2 z3 z4 z5 y > y=log(da$y) > x=asmatrix(da[,1:5]) > x z1 z2 z3 z4 z5 [1,] [2,] [3,] [4,] [5,] [6,] [7,]

12 [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] > library(leaps) > help(leaps) > > leaps(x,y,nbest=3) $which FALSE FALSE FALSE TRUE FALSE 1 FALSE TRUE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE 2 FALSE TRUE FALSE TRUE FALSE 2 FALSE FALSE FALSE TRUE TRUE 2 FALSE FALSE TRUE TRUE FALSE 3 FALSE TRUE FALSE TRUE TRUE 3 TRUE TRUE FALSE TRUE FALSE 3 FALSE TRUE TRUE TRUE FALSE 4 TRUE TRUE FALSE TRUE TRUE 4 FALSE TRUE TRUE TRUE TRUE 4 TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" [6] "5" $size [1] $Cp [1] [8] > > leaps(x,y,nbest=1) 12

13 $which FALSE FALSE FALSE TRUE FALSE 2 FALSE TRUE FALSE TRUE FALSE 3 FALSE TRUE FALSE TRUE TRUE 4 TRUE TRUE FALSE TRUE TRUE 5 TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" [6] "5" $size [1] $Cp [1] > x1=dataframe(x) > nn=lm(y~,data=x1) > step(nn) Start: AIC=758 y ~ z1 + z2 + z3 + z4 + z5 Df Sum of Sq RSS AIC - z z <none> z z z Step: AIC=618 y ~ z1 + z2 + z4 + z5 Df Sum of Sq RSS AIC - z <none> z z z Step: AIC=48 y ~ z2 + z4 + z5 13

14 Df Sum of Sq RSS AIC <none> z z z Call: lm(formula = y ~ z2 + z4 + z5, data = x1) Coefficients: (Intercept) z2 z4 z > > m1=step(nn,trace=f) > m1 Call: lm(formula = y ~ z2 + z4 + z5, data = x1) Coefficients: (Intercept) z2 z4 z > m2=lm(y~z2+z4+z5,data=x1) > summary(m2) Call: lm(formula = y ~ z2 + z4 + z5, data = x1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) z z *** z Residual standard error: 1032 on 16 degrees of freedom Multiple R-squared: 06421, Adjusted R-squared: 0575 F-statistic: 9568 on 3 and 16 DF, p-value:

15 64 Stochastic search variable selection This approach uses a hierarchical model and Gibbs sampling to perform variable selection Readers are referred to George and McCulloch (1993, JASA) for details Given the observed response Y and the design matrix Z, the approach assumes that Y β, σ 2 N(Zβ, σ 2 I), where I is the n n identity matrix The key concept of this hierarchical approach is that each component of β = (β 0,, β r ) follows a mixture of two normal distributions with mean zero, but different variances Specifically, β i (1 γ i )N(0, τ 2 i ) + γ i N(0, c i τ 2 i ), i = 0,, r (8) where c i is a sufficiently large positive real number, and γ i is a Bernoulli variable such that P (γ i = 1) = 1 P (γ i = 0) = p i (9) The idea is as follows: First, one sets τ i close to zero so that if γ i = 0, then β i would probably be so small that it could be safely estimated by 0 Second, one sets c i large, (in practice c i > 1 always) so that if γ i = 1, then a non-zero estimate of β i should probably be included in the final model See George and McCulloch (1993) for discussion of selecting c i and τ i Based on the setup, p i can be regarded as the prior probability that β i will require a non-zero estimate, ie, the predictor Z i should be selected From (8), β γ N(0, D γ RD γ ),, where γ = (γ 0, γ 1,, γ r ), R is a prior correlation matrix, and D γ = diag{a 0 τ 0, a 1 τ 1,, a r τ r }, where a i = 1 if γ i = 0 and a i = c i if γ i = 1 Finally, the prior distribution of σ 2 given γ is inverse Gaussian as IG(v γ /2, v γ λ γ /2) This says that v γ λ γ /σ 2 is distributed as χ 2 v γ Under the framewotk described ealier, the variable selection is governed by the posterior distribution of γ given the data We denote it as f(γ Y ) The Markov chain Monte Carlo (MCMC) methods with Gibbs samplign can be used to obtain this posterior distribution What happen when r is really large and n is not large? For instance, r = 3000 and n = Boosting: Bühlmann, P (2006) Boosting for high-dimensional linear models Annals of Statistics, 34, Bühlmann and Yu (2003) Boosting with the L2-loss: regression and classification Journal of the American Statistical Association, 98, L2 Boosting: For simplicity, assume that the data are mean-corrected so that there is no constant term in the linear regression Boosting is an iterated procedure as follows Let m = 0 Define Ŷ (0) = 0 15

16 (a) Construct the residuals of the mth iteration as R (m) = Y Ŷ (m) (b) Fit r simple linear regression ˆR (m) i = β j Z ij + ɛ ij and compute the resulting sum of squares of residuals (c) Let j m be the explanatory variable that has the smallest sum of squares of residuals, ie the maximum R 2 among the r simple linear regression Compute the fitted value of this simple linear regression Ŷ (j m) = ˆβ jm Z jm (d) Update the fit as Ŷ (m+1) = Ŷ (m) +vŷ (j m), where v (0, 1] is a tuning parameter (e) Advance m by 1 and go to Step 1 The iteration is stopped by some information criteria such as AIC A smaller v requires longer iteration, but limited experience shows that it works better For example, v = 005 has been used 2 LASSO: Tibshirani (1996) Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society, Series B, 58, The LASSO estimate of β, denoted by β l, is obtained by β l n = argmin β Y i β 0 i=1 subject to r β j s, j=1 2 r Z ij β j, j=1 where s is a tuning parameter In practice, s can be chosen by cross-validation Typically, a 10-fold cross-validation is used 7 Omitted variables In linear regression analysis, if some relevant predictors are omitted, then the coefficient estimates may contain bias and the residuals may have some serial dependence Suppose that the true model is Y = Z 1 β (1) + Z 2 β (2) + ɛ, where the dimension of Z 1 is q + 1 < r + 1 Suppose that the investigator unknowingly fits the model Y = Z 1 β (1) + ɛ (1) 16

17 The LSE of β (1) is β (1) = (Z 1Z 1 ) 1 Z 1Y In this case, E( β (1) ) = (Z 1Z 1 ) 1 Z 1E(Y ) = (Z 1Z 1 ) 1 Z 1(Z 1 β (1) + Z 2 β (2) + E(ɛ)) = β (1) + (Z 1Z 1 ) 1 Z 1Z 2 β (2) Consequently, β (1) is a biased estimate of β (1) unless Z 1Z 2 = 0 Appendix In this appendix, we consider some useful matrix properties for linear regression model Property 1 Assume that the inverse matrices exist, then [ ] 1 [ A B A 1 + F E 1 F F E 1 ] B = D E 1 F E 1, where E = D B A 1 B and F = A 1 B Peoperty 2 Assume that the design matrix Z n (r+1) is of full rank (r + 1) The matrix H = Z(Z Z) 1 Z and I H are idempotent matrices Property 3 Partition the design matrix as Z = [Z 1, Z 2 ], where Z 1 is an n q matrix of full rank q Define the hat matrix of the design matrix Z 1 as H 1 = Z 1 (Z 1Z 1 ) 1 Z 1 Regress Z 2 on Z 1 The residuals of such a linear regression are W 2 = [I H 1 ]Z 2 Now, consider W 2 as a design matrix and contruct the associated hat matrix T as Then, we have Proof: From the partition, we have T = W 2(W 2W 2 ) 1 W 2 Z Z = H = H 1 + T [ Z 1Z 1 Z ] 1Z 2 Z 2Z 1 Z 2Z 2 To obtain (Z Z) 1, we apply Property 1 with A = Z 1Z 1 and use the property that I H 1 is an idempotent matrix In particular, E = D B A 1 B = Z 2Z 2 (Z 2Z 1 )(Z 1Z 1 ) 1 (Z 1Z 2 ) = Z 2[I Z 1 (Z 1Z 1 ) 1 Z 1]Z 2 = Z 2(I H 1 )Z 2 = W 2W 2, F = (Z 1Z 1 ) 1 Z 1Z 2, 17

18 and Finally, F E 1 = (Z 1Z 1 ) 1 Z 1Z 2 [Z 2(I H 1 )Z 2 ] 1, Z 1 (A 1 + F E 1 F )Z 1 = H 1 + H 1 Z 2 E 1 Z 2H 1 H = Z 1 (A 1 + F E 1 F )Z 1 Z 2 E 1 F Z 1 Z 1 F E 1 Z 2 + Z 2 E 1 Z 2 = H 1 + H 1 Z 2 E 1 Z 2H 1 Z 2 E 1 Z 2H 1 H 1 Z 2 E 1 Z 2 + Z 2 E 1 Z 2 = H 1 (I H 1 )Z 2 E 1 Z 2H 1 + (I H 1 )Z 2 E 1 Z 2 = H 1 + (I H 1 )Z 2 E 1 Z 2(I H 1 ) = H 1 + W 2 (W 2W 2 ) 1 W 2 This property says that we can project on the subspace of Z 1, then project on the subspace of Z 2 that is orthogonal to Z 1 Property 4 Let Z 1 = 1 be the n-dimensional vector of ones Applying Property 3, we have H = 11 /n + H, where H is the hat matrix of the centered design matrix Z n r given by Z = Z 11 Z 1 Z 12 Z 2 Z 1r Z r Z 21 Z 1 Z 22 Z 2 Z 2r Z r Z n1 Z 1 Z n2 Z 2 Z nr Z r where Z i = n j=1 Z ji /n is the mean of the ith column of Z Property 5 Let z i be the ith row of the design matrix Z and Z (i) be the matrix Z with the z i removed Then, (Z (i)z (i) ) 1 = (Z Z z i z i) 1 = (Z Z) 1 + (Z Z) 1 z i z i(z Z) 1 1 z i(z Z) 1 z i, Proof: Let A = Z Z, a = z i and b = z i The result then follows the identity (A + a b) 1 = A 1 A 1 a (I + ba 1 a ) 1 ba 1, where A is a k k symmetric and invertiable matrix and a and b are q k matrices of rank q Note that z i β = Ŷi is the fitted value of the ith observation so that the ith residual is ˆɛ i = Y i z i β Also, from the definition h ii = z i(z Z) 1 z i 18

19 Property 6 β (i) β = (Z Z) 1 z iˆɛ i 1 h ii Proof Note that Z Y = Z (i)y (i) + z i Y i, where Y i is the ith element of Y and Y (i) is the vector Y with Y i removed Using Property 5, we have β (i) = (Z (i)z (i) ) 1 Z (i)y (i) [ = (Z Z) 1 + (Z Z) 1 z i z i(z Z) 1 ] (Z Y z i Y i ) 1 h ii = β + (Z Z) 1 z i z i β 1 h ii (Z Z) 1 z i Y i (Z Z) 1 z i h ii Y i 1 h ii = β + (Z Z) 1 z i Ŷ i 1 h ii (Z Z) 1 z i Y i 1 h ii = β + (Z Z) 1 z iˆɛ i 1 h ii Consequently, β (i) β = (Z Z) 1 z iˆɛ i 1 h ii Property 7 The Cook s distance can be written as D i = 1 ˆɛ 2 i h ii r + 1 s 2 (1 h ii ) 2 19

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

4.1 Order Specification

4.1 Order Specification THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

2.2 Classical Regression in the Time Series Context

2.2 Classical Regression in the Time Series Context 48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e., 1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Multiple Linear Regression

Multiple Linear Regression Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Model Selection Procedures

Model Selection Procedures Model Selection Procedures Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Model Selection Procedures Consider a regression setting with K potential predictor variables and you wish to explore

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information