Variable Selection and Model Building

Similar documents
Chapter 13 Variable Selection and Model Building

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Variable Selection and Model Building

4. Score normalization technical details We now discuss the technical details of the score normalization method.

General Linear Model Introduction, Classes of Linear models and Estimation

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

8 STOCHASTIC PROCESSES

Variable Selection and Model Building

A New Asymmetric Interaction Ridge (AIR) Regression Method

ute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the e

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Simple Linear Regression Analysis

8.7 Associated and Non-associated Flow Rules

Estimation of Separable Representations in Psychophysical Experiments

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

Notes on Instrumental Variables Methods

Radial Basis Function Networks: Algorithms

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

An Improved Calibration Method for a Chopped Pyrgeometer

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

arxiv: v3 [physics.data-an] 23 May 2011

State Estimation with ARMarkov Models

Chapter 10. Supplemental Text Material

Lecture 3 Consistency of Extremum Estimators 1

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process

On split sample and randomized confidence intervals for binomial proportions

Machine Learning: Homework 4

Chapter 11 Specification Error Analysis

Algorithms for Air Traffic Flow Management under Stochastic Environments

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

An Analysis of Reliable Classifiers through ROC Isometrics

A continuous review inventory model with the controllable production rate of the manufacturer

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Estimation of the large covariance matrix with two-step monotone missing data

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Feedback-error control

Flexible Pipes in Trenches with Stiff Clay Walls

General Random Variables

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

CSC165H, Mathematical expression and reasoning for computer science week 12

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

x 2 a mod m. has a solution. Theorem 13.2 (Euler s Criterion). Let p be an odd prime. The congruence x 2 1 mod p,

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

PERFORMANCE BASED DESIGN SYSTEM FOR CONCRETE MIXTURE WITH MULTI-OPTIMIZING GENETIC ALGORITHM

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

Statistics for Engineers Lecture 9 Linear Regression

PROFIT MAXIMIZATION. π = p y Σ n i=1 w i x i (2)

PHYS 301 HOMEWORK #9-- SOLUTIONS

Round-off Errors and Computer Arithmetic - (1.2)

3 Properties of Dedekind domains

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Finding Shortest Hamiltonian Path is in P. Abstract

¼ ¼ 6:0. sum of all sample means in ð8þ 25

One-way ANOVA Inference for one-way ANOVA

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Numerical Linear Algebra

Approximating min-max k-clustering

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

Finite-Sample Bias Propagation in the Yule-Walker Method of Autoregressive Estimation

Pressure-sensitivity Effects on Toughness Measurements of Compact Tension Specimens for Strain-hardening Solids

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Aggregate Prediction With. the Aggregation Bias

STK4900/ Lecture 7. Program

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands

Wolfgang POESSNECKER and Ulrich GROSS*

Plotting the Wilson distribution

Generalized optimal sub-pattern assignment metric

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Chapter 7 Rational and Irrational Numbers

Invariant yield calculation

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

738 SCIENCE IN CHINA (Series A) Vol. 46 Let y = (x 1 x ) and the random variable ß m be the number of sibs' alleles shared identity by descent (IBD) a

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

whether a process will be spontaneous, it is necessary to know the entropy change in both the

NUMERICAL AND THEORETICAL INVESTIGATIONS ON DETONATION- INERT CONFINEMENT INTERACTIONS

Spectral Analysis by Stationary Time Series Modeling

Chemical Kinetics and Equilibrium - An Overview - Key

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Micro I. Lesson 5 : Consumer Equilibrium

ECE 534 Information Theory - Midterm 2

Mathematics for Economics MA course

The non-stochastic multi-armed bandit problem

Principles of Computed Tomography (CT)

REGRESSION ANALYSIS AND INDICATOR VARIABLES

Probability Estimates for Multi-class Classification by Pairwise Coupling

A Quadratic Cumulative Production Model for the Material Balance of Abnormally-Pressured Gas Reservoirs

FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions

Exercises Econometric Models

Positive decomposition of transfer functions with multiple poles

Transcription:

LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 38 Variable Selection and Model Building Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur

Evaluation of subset regression model A question arises after the selection of subsets of candidate variables for the model, how to judge which subset yields better regression model. Various criteria have been roosed in the literature to evaluate and comare the subset regression models.. Coefficient of determination The coefficient of determination is the square of multile correlation coefficient between the study variable y and set of exlanatory variables X, X,..., X denotes as R. Note that X for all which simly indicates the need of = i =,,..., n intercet term in the model without which the coefficient of determination can not be used. So essentially, there will be a subset of ( ) exlanatory variables and one intercet term in the notation. i R The coefficient of determination based on such variables is SS ( ) reg R = SS T SSres ( ) = SS T where SS ( ) and SS ( ) are the sum of squares due to regression and residuals, resectively in a subset model based reg res on ( ) exlanatory variables. Since there are k exlanatory variables available and we select only ( ) k out of them, so there are ossible choices of subsets. Each such choice will roduce one subset model. Moreover, the coefficient of determination has a tendency to increase with the increase in.

So roceed as follows: Choose any aroriate value of, fit the model and obtain Add one variable, fit the model and again obtain. Obviously R R If R R is small, then sto and choose the value of for subset regression. If R R > +. + + R R + R. is high, then kee on adding variables uto a oint where an additional variable does not roduces a large change in the value of or the increment in becomes small. R 3 To know such value of, create a lot of R versus. For examle, the curve will look like as in the following figure Choose the value of corresonding to a value of where the knee of the curve is clearly seen. Such choice of may not be unique among different analyst. Some exerience and judgment of analyst will be helful is finding the aroriate and satisfactory value of. R

4 To choose a satisfactory value analytically, a solution is a test which can identify the model with R which does not significantly differ from the R based on all the exlanatory variables. Let R = ( R )( + d α ) 0 k +, nk, where d α, nk, = kfα ( n, n k ) n k and is the value of R R + based on all ( k + ) exlanatory variables. A subset with R > R 0 is called k an R -adequate(α) subset..

5. Adjusted coefficient of determination The usted coefficient of determination has certain advantages over the usual coefficient of determination. The usted coefficient of determination based on -term model is n R ( ) = ( R ). n An advantage of R ( ) is that it does not necessarily increases as increases. If there are r more exlanatory variables which are added to a - term model then if and only if the artial F - statistic for testing the significance of r additional exlanatory variables exceeds. So the subset selection based on R ( ) can be made on the same lines are in In general, the value of corresonding to maximum value of R ( + r) > R ( ) R ( ) is chosen for the subset model. R. 3. Residual mean square A model is said to have a better fit if residuals are small. This is reflected in the sum of squares due to residuals SS res. A model with smaller SS res is referable. Based on this, the residual mean square based on a variable subset regression model is defined as MS res SSres ( ) ( ) =. n

6 So MS res () can be used as a criterion for model selection like SS res. The SS res () decreases with an increase in. So similarly as increases, MS res () initially decreases, then stabilizes and finally may increase if the model is not sufficient to comensate the loss of one degree of freedom in the factor (n - ). When MS res () is lotted versus, the curve look like as in the following figure. So lot MS res () versus. Choose corresonding to minimum value of MS res (). Choose corresonding to which MS res () is aroximately equal to MS res based on full model. Choose near the oint where the smallest value of MS res () turns uward.

7 Such minimum value of MS res () will roduce a R ( ) n R( ) = ( R) n n SSres( ) =. n SS n SSres( ) =. SS n T MSres( ) =. SS / ( n ) T T with maximum value. So Thus the two criterion, viz, minimum MS res () and maximum R ( ) are equivalent.

4. Mallow s C statistics 8 Mallow s C criterion is based on the mean squared error of a fitted value. Consider the model y = X + ε with artitioned X ( X, X) where is matrix and is matrix, so that y = X + X + ε E ε = V ε = I, ( ) 0, ( ) = X n X n q where = (, )'. ' ' Consider the reduced model y = X + δ, E( δ) = 0, V( δ) = I and redict y based on subset model as The rediction of y can also be seen as the estimation of of ŷ yˆ = X ˆ, where ˆ = ( XX) Xy. is given by ' ' ( ˆ ) ( ˆ ) Γ = E X X ' X X. E( y) = X, so the exected outweighed squared error loss So the subset model can be considered as an aroriate model if Γ is small. Since where H = X( XX) X, ' ' so Γ = ( ' ) ' ' + ' '. E yhy X HX X X [ ε ε ] EyHy ( ' ) = E( X + )' H( X + ) [ ' ' ' ' ε ε' ε' ε] = E X HX + X H + HX + H = ' X ' H X + 0+ 0+ tr H = + ' X' HX.

Thus Γ = + ' X' HX ' X' HX + ' X' X = + ' X' X ' X' HX = + ' X '( I H) X = + ' X' HX where H = I X( XX) X. ' ' 9 Since Thus [ ε ε ] EyHy ( ' ) = E( X + )' H( X + ) = + trh ' X ' HX = + ( n ) ' X' HX = ' X' HX E( yhy ' ) ( n ). Γ = ( ) + ( ' ). n EyHy Γ Note that deends on and which are unknown. So can not be used in ractice. A solution to this roblem is to relace and by their resective estimators which gives ˆ Γ ˆ = ( n) + SSres ( ) Γ where SS ( ) = y ' H y res is the residuals sum of squares based on the subset model.

0 A rescaled vision of Γˆ is SSres ( ) C = ( n) + ˆ which is the Mallow s C statistic for the model b ( X ' X) X ' y = = n q ˆ ˆ ˆ ( y X )'( y X ) y = X + δ, are used to estimate and resectively which are based on full model. the subset model. Usually When different subset models are considered, then the models with smallest C are considered to be better than those models with higher C. So lower C is referable. If the subset model has negligible bias, (in case of b, then bias is zero), then [ ( )] = ( ) E SS n res and ( n ) E C Bias = 0 = n =.

The lot of C versus for each regression equation will be a straight line assing through origin and look like as follows: Those oints which have smaller bias will be near to line and those oints with significant bias will lie above the line. For examle, oint A has little bias, so it is closer to line whereas oints B and C have substantial bias, so they are above the line. Moreover, oint C is above oint A and it reresents a model with lower total error. It may be referred to accet some bias in the regression equation to reduce the average rediction error. Note that an unbiased estimator of is used in C = which is based on the assumtion that the full model has negligible bias. In case, the full model contains non-significant exlanatory variables with zero regression coefficients, then the same unbiased estimator of will overestimate and then C will have smaller values. So working of C deends on the good choice of estimator of.