Chapter 13 Variable Selection and Model Building

Size: px

Start display at page:

Download "Chapter 13 Variable Selection and Model Building"

Frank Barber
6 years ago
Views:

1 Chater 3 Variable Selection and Model Building The comlete regsion analysis deends on the exlanatory variables ent in the model. It is understood in the regsion analysis that only correct and imortant exlanatory variables aears in the model. In ractice, after ensuring the correct functional form of the model, the analyst usually has a ool of exlanatory variables which ossibly influence the rocess or exeriment. Generally, all such candidate variables are not used in the regsion modeling but a subset of exlanatory variables is chosen from this ool. How to determine such an aroriate subset of exlanatory variables to be used in regsion is called the roblem of variable selection. While choosing a subset of exlanatory variables, there are two ossible otions:. In order to make the model as realistic as ossible, the analyst may include as many as ossible exlanatory variables.. In order to make the model as simle as ossible, one way include only fewer number of exlanatory variables. Both the aroaches have their own consequences. In fact, model building and subset selection have contradicting objectives. When large number of variables are included in the model, then these factors can influence the rediction of study variable y. On the other hand, when small number of variables are included then the redictive variance of ŷ decreases. Also, when the observations on more number are to be collected, then it involves more cost, time, labour etc. A comromise between these consequences is striked to select the best regsion equation. The roblem of variable selection is addsed assuming that the functional form of the exlanatory variable, e.g., x,,log x etc., is known and no outliers or influential observations are ent in the data. x Various statistical tools like idual analysis, identification of influential or high leverage observations, model adequacy etc. are linked to variable selection. In fact, all these rocesses should be solved simultaneously. Usually, these stes are iteratively emloyed. In the first ste, a strategy for variable selection is oted and model is fitted with selected variables. The fitted model is then checked for the functional form, outliers, influential observations etc. Based on the outcome, the model is re-examined and egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur

2 selection of variable is reviewed again. Several iterations may be required before the final adequate model is decided. There can be two tyes of incorrect model secifications.. Omission/exclusion of relevant variables.. Inclusion of irrelevant variables. Now we discuss the statistical consequences arising from the both situations.. Exclusion of relevant variables: In order to kee the model simle, the analyst may delete some of the exlanatory variables which may be of imortance from the oint of view of theoretical considerations. There can be several reasons behind such decision, e.g., it may be hard to quantity the variables like taste, intelligence etc. Sometimes it may be difficult to take correct observations on the variables like income etc. Let there be k candidate exlanatory variables out of which suose r variables are included and ( k r) variables are to be deleted from the model. So artition the X and β as X = X X β = β β n k and n r n ( k r) r ( k r) ). The model y Xβ ε, E( ε) 0, V( ε) σ I = + = = can be exsed as y = X β + X β + ε which is called as full model or true model. After droing the r exlanatory variable in the model, the new model is y = Xβ+ δ which is called as missecified model or false model. Alying OLS to the false model, the OLSE of β is b = ( XX) Xy. ' ' The estimation error is obtained as follows: egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur

3 b = ( XX) X( Xβ + Xβ + ε) ' ' = β + ( XX) XXβ + ( XX) Xε ' ' ' ' b β = θ + ( XX) Xε ' ' where θ = ( XX) XXβ. ' ' Thus ' Eb ( β) = θ + ( XX ) E( ε) = θ which is a linear function of β, i.e., the coefficients of excluded variables. So b is biased, in general. The bias vanishes if ' XX = 0, i.e., X and X are orthogonal or uncorrelated. The mean squared error matrix of b is MSEb ( ) = Eb ( β)( b β)' ' ' ' ' ' ' = E θθ ' + θε ' X( XX ) + ( XX ) Xεθ ' + ( XX ) Xεε ' X( XX ) ' ' ' = θθ ' σ ( XX ) XIX ( XX ) ' = θθ ' + σ ( XX ). So efficiency generally declines. Note that the second term is the conventional form of MSE. The idual sum of squa is SS ee ' ˆ σ = = n r n r where e= y Xb = Hy, Thus H = I X( XX) X. ' ' Hy= H( Xβ + Xβ + ε) = 0 + H ( X β + ε) = H ( X β + ε). yh y= ( X β + X β + ε) H ( X β + ε) = ( β XHHXβ + β XHε + β XHXβ + β XHε + ε' HXβ + ε' Hε). ' ' ' ' ' ' ' ' ' egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 3

4 Thus ' ' Es ( ) = E( βxhx β) E( ε ' Hε) n r ' ' = βxhx β) + ( n r) σ n r ' ' = σ + βxhx β. n r s is a biased estimator of σ and s rovides an over estimate of σ. Note that even if ' XX = 0, then also s gives an ovetimate of faulty. The t -test and confidence region will be invalid in this case. σ. So the statistical inferences based on this will be If the onse is to be redicted at x' = ( x, x ), then using the full model, the redicted value is ' ' ˆ = ' = '( ' ) ' y xb x X X X y with E( yˆ ) = x' β Var yˆ = σ + x X X x ( ) '( ' ). When subset model is used then the redictor is and then yˆ = xb ' E( yˆ ) = x ( X X ) X E( y) = ( ) ( β + β + ε) ' ' ' ' ' ' x XX XE X X ' ' ' = x( XX ) X( Xβ+ Xβ) ' ' ' ' = xβ+ x( XX ) XX β ' ' = xβ+ xiθ. Thus ŷ is a biased redictor of y. It is unbiased when ' XX = 0. The MSE of redictor is Also ( ) ' ' ' ' MSE( yˆ) = σ + x( XX) x + xθ xβ. ˆ MSE yˆ Var( y) ( ) ' rovided V ( ˆ β) ββ is ositive semidefinite.. Inclusion of irrelevant variables egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 4

5 Sometimes due to enthusiasm and to make the model more realistic, the analyst may include some exlanatory variables that are not very relevant to the model. Such variables may contribute very little to the exlanatory ower of the model. This may tend to reduce the degrees of freedom ( n k) and consequently the validity of inference drawn may be questionable. or examle, the value of coefficient of determination will increase indicating that the model is getting better which may not really be true. Let the true model be y = Xβ + ε, E( ε) = 0, V( ε) = σ I which comrise k exlanatory variable. Suose now r additional exlanatory variables are added to the model and ulting model becomes y = Xβ + γ + δ where is a n r matrix of n observations on each of the r exlanatory variables and γ is r vector of regsion coefficient associated with and δ is disturbance term. This model is termed as false model. Alying OLS to false model, we get b X ' X X ' X ' y = c ' X ' ' y X ' X X ' b X ' y = ' X ' c ' y X ' Xb + X ' C = X ' y () ' Xb + ' C = ' y () where b and C are the OLSEs of β and γ ectively. Premultily equation () by X ' ( ' ), we get X Xb X C X y ' ( ' ) ' + ' ( ' ) ' = ' ( ' ) '. (3) Subtracting equation () from (3), we get X' X X ' ( ' ) X ' b = X' y X ' ( ' ) y ' X I Xb X I y b = X H X X H y where ( ' ) '. ' ( ' ) ' = ' ( ' ) ' ( ' ) ' H = I egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 5

6 The estimation error of b is Thus b = X H X X H y β β ( ' ) ' = β + ε β = ( X ' HX) X ' H( X ) ( X ' HX) X ' Hε. E b X H X X H E ( β) = ( ' ) ' ( ε) = 0 so b is unbiased even when some irrelevant variables are added to the model. The covariance matrix is ( β)( β) Vb ( ) = Eb b ( ) = E X ' HX X ' Hεε ' HX( X ' HX) = σ = σ ( X ' HX) X ' HIHX ( X ' HX) ( X ' H X). If OLS is alied to true model, then with Eb ( T ) bt = ( X ' X) X ' y Vb = β ( T ) = σ ( X' X). To comare b and b, we use the following ult. T esult: If A and B are two ositive definite matrices then A B is atleast ositive semi definite if B A is also atleast ositive semi definite. Let A= ( X ' H X) B= ( X ' X) B A = X ' X X ' H X = ' ' + ' ( ' ) ' X X X X X X = X ' ( ' ) X ' which is atleast ositive semi definite matrix. This imlies that the efficiency declines unless X ' = 0. If X ' = 0, i.e., X and are orthogonal, then both are equally efficient. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 6

7 The idual sum of squa under false model is where SS = e e ' So Thus e = y Xb C = ( ) ' b XH X X H y ( ' ) ' ( ' ) ' C = y Xb ( ' ) '( y Xb ) = ( ' ) ' I X( X ' HX) X ' H z y ( ' ) ' ( ' ) ' x = ( ' ) ' X H y X X H = I H I X X H X X H H = = = H : idemotent. = ( ' ) ' ( ' ) ' X e y X X H X X H y H y = I X( X ' H X) X ' H ( ' ) ' H X y = HX ( I H ) HX y = H H y X = H y where H = H H. * * X X X SS = e e ' E SS = y' H H H H y = yhh ' = yh ' X X * X * ( ) ( X ) σ ( n k r) = SS E = σ. n k r y X y = σ tr H So SS n k r is an unbiased estimator of σ. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 7

8 A comarison of exclusion and inclusion of variables is as follows: Exclusion tye Inclusion tye Estimation of coefficients Biased Unbiased Efficiency Generally declines Declines Estimation of disturbance term Over-estimate Unbiased Conventional test of hyothesis Invalid and faulty inferences Valid though erroneous and confidence region Evaluation of subset regsion model A question arises after the selection of subsets of candidate variables for the model, how to judge which subset yields better regsion model. Various criteria have been roosed in the literature to evaluate and comare the subset regsion models.. Coefficient of determination The coefficient of determination is the square of multile correlation coefficient between the study variable y and set of exlanatory variables X, X,..., X denotes as Note that X = for all i =,,..., n. i which simly indicates the need of intercet term in the model without which the coefficient of determination can not be used. So essentially, there will be a subset of ( ) exlanatory variables and one intercet term in the notation. The coefficient of determination based on such variables is SS ( ) reg = SST SS ( ) = SS T where SSreg ( ) and SS ( ) are the sum of squa due to regsion and iduals, ectively in a subset model based on ( ) exlanatory variables. Since there are k exlanatory variables available and we select only ( ) out of them, so there are k ossible choices of subsets. Each such choice will roduce one subset model. Moreover, the coefficient of determination has a tendency to increase with the increase in. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 8

9 So roceed as follows: Choose any aroriate value of, fit the model and obtain. Add one variable, fit the model and again obtain +. Obviously If regsion. + > If + + is small, then sto and choose the value of for subset. is high, then kee on adding variables uto a oint where an additional variable does not roduces a large change in the value of or the increment in becomes small. To know such value of, create a lot of versus. or examle, the curve will look like as in the following figure. Choose the value of coronding to a value of where the knee of the curve is clearly seen. Such choice of may not be unique among different analyst. Some exerience and judgment of analyst will be helful is finding the aroriate and satisfactory value of. To choose a satisfactory value analytically, a solution is a test which can identify the model with does not significantly differ from the Let where dα,, subset with = ( )( + d α ) 0 k +, nk, nk kα ( n, n k ) = n k > 0 is called an and based on all the exlanatory variables. + is the value of k - adequate( α) subset. which based on all ( k + ) exlanatory variables. A egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 9

10 . Adjusted coefficient of determination The adjusted coefficient of determination has certain advantages over the usual coefficient of determination. The adjusted coefficient of determination based on -term model is n adj ( ) = ( ). n An advantage of ( ) is that it does not necessarily increases as increases. adj If there are r more exlanatory variables which are added to a term model then ( + r) > ( ) adj adj if and only if the artial statistic for testing the significance of r additional exlanatory variables exceeds. So the subset selection based on ( ) can be made on the same lines are in adj. In general, the value of coronding to maximum value of ( ) is chosen for the subset model. adj 3. esidual mean square A model is said to have a better fit if iduals are small. This is reflected in the sum of squa due to iduals SS. A model with smaller SS is referable. Based on this, the idual mean square based on a variable subset regsion model is defined as MS SS ( ) ( ) =. n So MS ( ) can be used as a criterion for model selection like SS. The SS ( ) decreases with an increase in. So similarly as increases, MS ( ) initially decreases, then stabilizes and finally may increase if the model is not sufficient to comensate the loss of one degree of freedom in the factor ( n ). When MS ( ) is lotted versus, the curve look like as in the following figure. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 0

11 So lot MS ( ) versus. Choose coronding to minimum value of MS ( ). Choose coronding to which MS ( ) is aroximately equal to MS based on full model. Choose near the oint where the smallest value of MS ( ) turns uward. Such minimum value of MS ( ) will roduce a ( ) with maximum value. So adj n adj ( ) = ( ) n n SS ( ) =. n SS n SS ( ) =. SS n T MS ( ) =. SS /( n ) T T Thus the two criterion, viz, minimum MS ( ) and maximum ( ) are equivalent. adj 4. Mallow s C statistics: Mallow s C criterion is based on the mean squared error of a fitted value. Consider the model y = Xβ + ε with artitioned X = ( X, X) where X is n matrix and X is n q matrix, so that y X β X β ε E ε V ε σ I = + +, ( ) = 0, ( ) = where β = ( β, β )'. ' ' Consider the reduced model = +, ( ) = 0, ( ) = y X β δ E δ V δ σ I and redict y based on subset model as yˆ = X ˆ β, where ˆ β = ( XX) Xy. ' ' egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur

12 The rediction of y can also be seen as the estimation of E( y) = Xβ, so the exected outweighed squared error loss of ŷ is given by ( ˆ β β) ( ˆ β β) Γ = E X X ' X X. So the subset model can be considered as an aroriate model if Γ is small. Since H = X( XX) X, so ' ' Γ = β β + β β E( yhy ' ) ' X' HX ' X' X where EyHy ( ' ) = E[ ( Xβ + ε)' H( Xβ + ε) ] Thus [ β' ' β β' ' ε ε' β ε' ε] = E X HX + X H + HX + H = β' X ' H X β σ tr H ' X' HX. = β β + σ Γ = σ + β' ' β β' ' β + β' ' β X HX X HX X X = + ' X' X ' X' HX σ β β β β = + ' X '( I H ) X σ β β = + ' X' HX σ β β where H = I X( XX) X. ' ' Since Thus [ β ε β ε ] EyHy ( ' ) = E( X + )' H( X + ) = + σ trh β' X ' HX β = + σ ( n ) β' X' HX β β β = σ ' X' HX E( yhy ' ) ( n ) Γ = ( ) + ( ' ). σ n EyHy Note that Γ deends on β and σ which are unknown. So Γ can not be used in ractice. A solution to this roblem is to relace β and σ by their ective estimators which gives ˆ ˆ σ ( n) SS ( ) Γ = +. where SS ( ) = y ' Hy is the iduals sum of squa based on the subset model. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur

13 A caled vision of Γ ˆ is SS ( ) C = ( n) + ˆ σ which is the Mallow s b= ( X ' X) X ' y C statistic for the model y = Xβ + δ, the subset model. Usually σ ˆ β ˆ β n q ˆ = ( y X )'( y X ) are used to estimate β and σ ectively which are based on full model. When different subset models are considered, then the models with smallest than those models with higher C. So lower C is referable. C are considered to be better If the subset model has negligible bias, (in case of b, then bias is zero), then and [ ( )] = ( ) E SS n σ The lot of ( n ) σ E C Bias = 0 = n =. σ like as follows: C versus for each regsion equation will be a straight line assing through origin and look Those oints which have smaller bias will be near to line and those oints with significant bias will lie above the line. or examle, oint A has little bias, so it is closer to line A whereas oints B and C have substantial bias, so they are above the line. Moreover, oint C is above oint A and it reents a model egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 3

14 with lower total error. It may be referred to accet some bias in the regsion equation to reduce the average rediction error. Note that an unbiased estimator of σ is used in C = which is based on the assumtion that the full model has negligible bias. In case, the full model contains non-significant exlanatory variables with zero regsion coefficients, then the same unbiased estimator of σ will ovetimate σ and then C will have smaller values. So working of C deends on the good choice of estimator of σ. 5. Akaike s information criterion (AIC) The Akaike s information criterion statistic is given as SS ( ) AIC = n ln + n where SS ( ) = y ' H y = y ' X ( X X ) X y ' ' is based on the subset model y = Xβ+ δ derived from the full model y = Xβ+ Xβ + ε = Xβ + ε. The AIC is defined as AIC = -(maximized log likelihood) + (number of arameters). In linear regsion model with ε ~ N(0, σ I), the likelihood function is Ly (, βσ, ) ex n and log likelihood is ( y Xβ)'( y Xβ) σ ( πσ ) n n ( y Xβ)'( y Xβ) σ ln Ly ( ; βσ, ) = ln π ln( σ ). The log-likelihood is maximized at β = ( X ' X) ' n σ = ˆ σ n X y where β is maximum likelihood estimate of β which is same as OLSE, σ is maximum likelihood estimate of σ and ˆ σ is OLSE of σ. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 4

15 So AIC = L y βσ + ln ( ;, ) SS = nln + + n ln( ) + n [ π ] where SS = y ' I X ( X ' X ) X ' y. The term n[ ln( π ) + ] remains same for all the models under comarison if same observations y are comared. So it is irrelevant for AIC. 6. Bayesian information criterion (BIC) Similar to AIC, the Bayesian information criterion is based on maximizing the osterior distribution of model given the observations y. In the case of linear regsion model, it is defined as BIC = n ln( SS ) + ( k n) ln n. A model with smaller value of BIC is referable. 7. PESS statistic Since the iduals and idual sum of squa acts as a criterion of subset model selection, so similarly the PESS iduals and rediction sum of squa can also be used as a basis for subset model selection. The usual idual and PESS iduals have their own characteristics which use used is regsion modeling. The PESS statistic based on a subset model with exlanatory variable is given by PESS( ) = y ˆ i y n i= n e i = i= hii where h ii is the i th element in () i. H X X X X = ( ' ). This criterion is used on the similar lines as in the case of SS ( ). A subset regsion model with smaller value of PESS( ) is referable. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 5

16 Partial - statistic The artial statistic is used to test the hyothesis about a subvector of the regsion coefficient. Consider the model y = X β + ε n n n where = k+ which includes an intercet term and k exlanatory variables. Suose a subset of r < k exlanatory variables is to be obtained which contribute significantly to the regsion model. So artition β X = X X, β = β where X and X are matrices of order n ( r) and n r ectively; β and β are the vectors of order ( ) and r ectively. The objective is to test the null hyothesis Then H : β = 0 0 H : β 0. y = Xβ + ε = X β + X β + ε is the full model and alication of least squa gives the OLSE of β as b= X X X y ( ' ) '. The coronding sum of squa due to regsion with degrees of freedom is SSreg = b' X ' y and the sum of squa due to iduals with ( n ) degrees of freedom is and SS = y ' y b ' X ' y MS yy ' bx ' ' y = n is the mean square due to idual. The contribution of exlanatory variables in β in the regsion can be found by considering the full model under H0 : β = 0. Assume that H0 : β = 0 is true, then the full model becomes y X E Var I = β + δ, ( δ) = 0, ( δ) = σ which is the reduced model. Alication of least squa to reduced model yields the OLSE of β as egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 6

17 b= ( XX) Xy ' ' and coronding sum of squa due to regsion with ( r) degrees of freedom is SSreg = b X y ' '. The sum of squa of regsion due to β given that β in already in the model can be found by SS ( β β ) = SS ( β) SS ( β ) reg reg reg where SSreg ( β ) and SSreg ( β ) are the sum of squa due to regsion with all exlanatory variables coronding to β is the model and the exlanatory variables coronding to β in the model. The term SSreg ( β β ) is called as the extra sum of squa due to β and has degrees of freedom ( r) = r. It is indeendent of MS and is a measure of regsion sum of squa that ults from adding the exlanatory variables X,..., k r+ Xkin the model when the model has already X, X,..., Xk r exlanatory variables. The null hyothesis H0 : β = 0 can be tested using the statistic SS = 0 ( β β)/ r MS which follows distribution with r and ( n ) degrees of freedom under H. 0 The decision rule is to reject H 0 whenever 0 > α ( rn, ). This is known as the artial test. It measu the contribution of exlanatory variables in X given that the other exlanatory variables in X are already in the model. Comutational techniques for variable selection In order to select a subset model, several techniques based on comutational rocedu and algorithm the available. They are essentially based on two ideas select all ossible exlanatory variables or select the exlanatory variables stewise. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 7

18 . Use all ossible exlanatory variables This methodology is based on following stes: it a model with one exlanatory variable. it a model with two exlanatory variables. it a model with three exlanatory variables. and so on. Choose a suitable criterion for model selection and evaluate each of the fitted regsion equation with the selection criterion. The total number of models to be fitted sharly rises with increase in k. So such models can be evaluated using model selection criterion with the hel of an efficient comutation algorithm on comuters.. Stewise regsion techniques This methodology is based on choosing the exlanatory variables in the subset model in stes which can be either adding one variable at a time or deleting one variable at a times. Based on this, there are three rocedu. - forward selection - backward elimination and - stewise regsion. These rocedu are basically comuter intensive rocedu and are executed using a software. orward selection rocedure: This methodology assumes that there is no exlanatory variable in the model excet an intercet term. It adds variables one by one and test the fitted model at each ste using some suitable criterion. It has following stes. Consider only intercet term and insert one variable at a time. Calculate the simle correlations of x' si ( =,,..., k) with y. Choose x i which has largest correlation with y. i Suose x is the variable which has highest correlation with y. Since statistic given by n k 0 =., k so x will roduce the largest value of 0 in testing the significance of regsion. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 8

19 Choose a ecified value of value, say IN ( to enter). If > IN, then accet x and so x enters into the model. Adjust the effect of x on y and re-comute the correlations of remaining obtain the artial correlations as follows. - it the regsion ŷ = ˆ β ˆ 0+ βxand obtain the iduals. x ' s with y and i - it the regsion of x on other candidate exlanatory variables as ˆ j αoj ˆ α j xˆ = + x, j =,3,..., k and obtain the iduals. - ind the simle correlation between the two iduals. - This gives the artial correlations. Choose x i with second largest correlation with y, i.e., the variable with highest value of artial correlation with y. Suose this variable is x. Then the largest artial statistic is SS x x = MS ( x, x ) reg ( ). It > IN then x enters into the model. These stes are reeated. At each ste, the artial correlations are comuted and exlanatory variable coronding to highest artial correlation with y is chosen to be added into the model. Equivalently, the artial -statistics are calculated and largest statistic given the other exlanatory variables in the model is chosen. The coronding exlanatory variable is added into the model if artial -statistic exceeds IN. Continue with such selection as long as either at articular ste, the artial statistic does not exceed IN or when the least exlanatory variable is added to the model. Note: The SAS software choose IN by choosing a tye I error rate α so that the exlanatory variable with highest artial correlation coefficient with y is added to model if artial statistic exceeds α (, n ). egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 9

20 Backward elimination rocedure: This methodology is contrary to forward selection rocedure. The forward selection rocedure starts with no exlanatory variable in the model and kee on adding one variable at a time until a suitable model is obtained. The backward elimination methodology begins with all exlanatory variables and kee on deleting one variable at a time until a suitable model is obtained. It is based on following stes: Consider all k exlanatory variables and fit the model. Comute artial statistic for each exlanatory variables as if it were the last variable to enter in the model. Choose a elected value 0UT ( to-remove). Comare the smallest of the artial statistics with OUT the coronding exlanatory variable from the model. The model will have now ( k ) exlanatory variables.. If it is less than OUT, then remove it the model with these ( k ) exlanatory variables, comute the artial statistic for the new model and comare it with OUT variable from the model. eeat this rocedure.. If it is less them OUT Sto the rocedure when smallest artial statistic exceeds OUT., then remove the coronding Stewise regsion rocedure: A combinations of forward selection and backward elimination rocedure is the stewise regsion. It is a modification of forward selection rocedure and has following stes. Consider all the exlanatory variables entered into to the model at revious ste. Add a new variable and regses it via their artial statistics. An exlanatory variable that was added at an earlier ste may now become insignificant due to its relationshi with currently ent exlanatory variables in the model. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 0

21 If artial -statistic for an exlanatory variable is smaller than OUT deleted from the model., then this variable is Stewise needs two cut-off values, IN and OUT. Sometimes IN = out or IN > OUT are considered. The choice IN variable than to delete one. > makes relatively more difficult to add an exlanatory OUT General comments:. None of the methods among forward selection, backward elimination or stewise regsion guarantees the best subset model.. The order in which the exlanatory variables enter or leave the models does not indicate the order of imortance of exlanatory variable. 3. In forward selection, no exlanatory variable can be removed if entered in the model. Similarly in backward elimination, no exlanatory variable can be added if removed from the model. 4. All rocedu may lead to different models. 5. Different model selection criterion may give different subset models. Comments about stoing rules: Choice of IN and/or OUT rovides stoing rules for algorithms. Some comuter software allows analyst to secify these values directly. Some algorithms require tye I errors to generate IN or/and OUT. Sometimes, taking α as level of significance can be misleading because several correlated artial variables are considered at each ste and maximum among them is examined. Some analyst refer small values of IN and OUT whereas some refer extreme values. Poular choice is = = 4 which is coronding to 5% level of significance of distribution. IN OUT egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur

Variable Selection and Model Building

LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression