Chapter 13 Variable Selection and Model Building
|
|
- Frank Barber
- 6 years ago
- Views:
Transcription
1 Chater 3 Variable Selection and Model Building The comlete regsion analysis deends on the exlanatory variables ent in the model. It is understood in the regsion analysis that only correct and imortant exlanatory variables aears in the model. In ractice, after ensuring the correct functional form of the model, the analyst usually has a ool of exlanatory variables which ossibly influence the rocess or exeriment. Generally, all such candidate variables are not used in the regsion modeling but a subset of exlanatory variables is chosen from this ool. How to determine such an aroriate subset of exlanatory variables to be used in regsion is called the roblem of variable selection. While choosing a subset of exlanatory variables, there are two ossible otions:. In order to make the model as realistic as ossible, the analyst may include as many as ossible exlanatory variables.. In order to make the model as simle as ossible, one way include only fewer number of exlanatory variables. Both the aroaches have their own consequences. In fact, model building and subset selection have contradicting objectives. When large number of variables are included in the model, then these factors can influence the rediction of study variable y. On the other hand, when small number of variables are included then the redictive variance of ŷ decreases. Also, when the observations on more number are to be collected, then it involves more cost, time, labour etc. A comromise between these consequences is striked to select the best regsion equation. The roblem of variable selection is addsed assuming that the functional form of the exlanatory variable, e.g., x,,log x etc., is known and no outliers or influential observations are ent in the data. x Various statistical tools like idual analysis, identification of influential or high leverage observations, model adequacy etc. are linked to variable selection. In fact, all these rocesses should be solved simultaneously. Usually, these stes are iteratively emloyed. In the first ste, a strategy for variable selection is oted and model is fitted with selected variables. The fitted model is then checked for the functional form, outliers, influential observations etc. Based on the outcome, the model is re-examined and egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur
2 selection of variable is reviewed again. Several iterations may be required before the final adequate model is decided. There can be two tyes of incorrect model secifications.. Omission/exclusion of relevant variables.. Inclusion of irrelevant variables. Now we discuss the statistical consequences arising from the both situations.. Exclusion of relevant variables: In order to kee the model simle, the analyst may delete some of the exlanatory variables which may be of imortance from the oint of view of theoretical considerations. There can be several reasons behind such decision, e.g., it may be hard to quantity the variables like taste, intelligence etc. Sometimes it may be difficult to take correct observations on the variables like income etc. Let there be k candidate exlanatory variables out of which suose r variables are included and ( k r) variables are to be deleted from the model. So artition the X and β as X = X X β = β β n k and n r n ( k r) r ( k r) ). The model y Xβ ε, E( ε) 0, V( ε) σ I = + = = can be exsed as y = X β + X β + ε which is called as full model or true model. After droing the r exlanatory variable in the model, the new model is y = Xβ+ δ which is called as missecified model or false model. Alying OLS to the false model, the OLSE of β is b = ( XX) Xy. ' ' The estimation error is obtained as follows: egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur
3 b = ( XX) X( Xβ + Xβ + ε) ' ' = β + ( XX) XXβ + ( XX) Xε ' ' ' ' b β = θ + ( XX) Xε ' ' where θ = ( XX) XXβ. ' ' Thus ' Eb ( β) = θ + ( XX ) E( ε) = θ which is a linear function of β, i.e., the coefficients of excluded variables. So b is biased, in general. The bias vanishes if ' XX = 0, i.e., X and X are orthogonal or uncorrelated. The mean squared error matrix of b is MSEb ( ) = Eb ( β)( b β)' ' ' ' ' ' ' = E θθ ' + θε ' X( XX ) + ( XX ) Xεθ ' + ( XX ) Xεε ' X( XX ) ' ' ' = θθ ' σ ( XX ) XIX ( XX ) ' = θθ ' + σ ( XX ). So efficiency generally declines. Note that the second term is the conventional form of MSE. The idual sum of squa is SS ee ' ˆ σ = = n r n r where e= y Xb = Hy, Thus H = I X( XX) X. ' ' Hy= H( Xβ + Xβ + ε) = 0 + H ( X β + ε) = H ( X β + ε). yh y= ( X β + X β + ε) H ( X β + ε) = ( β XHHXβ + β XHε + β XHXβ + β XHε + ε' HXβ + ε' Hε). ' ' ' ' ' ' ' ' ' egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 3
4 Thus ' ' Es ( ) = E( βxhx β) E( ε ' Hε) n r ' ' = βxhx β) + ( n r) σ n r ' ' = σ + βxhx β. n r s is a biased estimator of σ and s rovides an over estimate of σ. Note that even if ' XX = 0, then also s gives an ovetimate of faulty. The t -test and confidence region will be invalid in this case. σ. So the statistical inferences based on this will be If the onse is to be redicted at x' = ( x, x ), then using the full model, the redicted value is ' ' ˆ = ' = '( ' ) ' y xb x X X X y with E( yˆ ) = x' β Var yˆ = σ + x X X x ( ) '( ' ). When subset model is used then the redictor is and then yˆ = xb ' E( yˆ ) = x ( X X ) X E( y) = ( ) ( β + β + ε) ' ' ' ' ' ' x XX XE X X ' ' ' = x( XX ) X( Xβ+ Xβ) ' ' ' ' = xβ+ x( XX ) XX β ' ' = xβ+ xiθ. Thus ŷ is a biased redictor of y. It is unbiased when ' XX = 0. The MSE of redictor is Also ( ) ' ' ' ' MSE( yˆ) = σ + x( XX) x + xθ xβ. ˆ MSE yˆ Var( y) ( ) ' rovided V ( ˆ β) ββ is ositive semidefinite.. Inclusion of irrelevant variables egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 4
5 Sometimes due to enthusiasm and to make the model more realistic, the analyst may include some exlanatory variables that are not very relevant to the model. Such variables may contribute very little to the exlanatory ower of the model. This may tend to reduce the degrees of freedom ( n k) and consequently the validity of inference drawn may be questionable. or examle, the value of coefficient of determination will increase indicating that the model is getting better which may not really be true. Let the true model be y = Xβ + ε, E( ε) = 0, V( ε) = σ I which comrise k exlanatory variable. Suose now r additional exlanatory variables are added to the model and ulting model becomes y = Xβ + γ + δ where is a n r matrix of n observations on each of the r exlanatory variables and γ is r vector of regsion coefficient associated with and δ is disturbance term. This model is termed as false model. Alying OLS to false model, we get b X ' X X ' X ' y = c ' X ' ' y X ' X X ' b X ' y = ' X ' c ' y X ' Xb + X ' C = X ' y () ' Xb + ' C = ' y () where b and C are the OLSEs of β and γ ectively. Premultily equation () by X ' ( ' ), we get X Xb X C X y ' ( ' ) ' + ' ( ' ) ' = ' ( ' ) '. (3) Subtracting equation () from (3), we get X' X X ' ( ' ) X ' b = X' y X ' ( ' ) y ' X I Xb X I y b = X H X X H y where ( ' ) '. ' ( ' ) ' = ' ( ' ) ' ( ' ) ' H = I egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 5
6 The estimation error of b is Thus b = X H X X H y β β ( ' ) ' = β + ε β = ( X ' HX) X ' H( X ) ( X ' HX) X ' Hε. E b X H X X H E ( β) = ( ' ) ' ( ε) = 0 so b is unbiased even when some irrelevant variables are added to the model. The covariance matrix is ( β)( β) Vb ( ) = Eb b ( ) = E X ' HX X ' Hεε ' HX( X ' HX) = σ = σ ( X ' HX) X ' HIHX ( X ' HX) ( X ' H X). If OLS is alied to true model, then with Eb ( T ) bt = ( X ' X) X ' y Vb = β ( T ) = σ ( X' X). To comare b and b, we use the following ult. T esult: If A and B are two ositive definite matrices then A B is atleast ositive semi definite if B A is also atleast ositive semi definite. Let A= ( X ' H X) B= ( X ' X) B A = X ' X X ' H X = ' ' + ' ( ' ) ' X X X X X X = X ' ( ' ) X ' which is atleast ositive semi definite matrix. This imlies that the efficiency declines unless X ' = 0. If X ' = 0, i.e., X and are orthogonal, then both are equally efficient. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 6
7 The idual sum of squa under false model is where SS = e e ' So Thus e = y Xb C = ( ) ' b XH X X H y ( ' ) ' ( ' ) ' C = y Xb ( ' ) '( y Xb ) = ( ' ) ' I X( X ' HX) X ' H z y ( ' ) ' ( ' ) ' x = ( ' ) ' X H y X X H = I H I X X H X X H H = = = H : idemotent. = ( ' ) ' ( ' ) ' X e y X X H X X H y H y = I X( X ' H X) X ' H ( ' ) ' H X y = HX ( I H ) HX y = H H y X = H y where H = H H. * * X X X SS = e e ' E SS = y' H H H H y = yhh ' = yh ' X X * X * ( ) ( X ) σ ( n k r) = SS E = σ. n k r y X y = σ tr H So SS n k r is an unbiased estimator of σ. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 7
8 A comarison of exclusion and inclusion of variables is as follows: Exclusion tye Inclusion tye Estimation of coefficients Biased Unbiased Efficiency Generally declines Declines Estimation of disturbance term Over-estimate Unbiased Conventional test of hyothesis Invalid and faulty inferences Valid though erroneous and confidence region Evaluation of subset regsion model A question arises after the selection of subsets of candidate variables for the model, how to judge which subset yields better regsion model. Various criteria have been roosed in the literature to evaluate and comare the subset regsion models.. Coefficient of determination The coefficient of determination is the square of multile correlation coefficient between the study variable y and set of exlanatory variables X, X,..., X denotes as Note that X = for all i =,,..., n. i which simly indicates the need of intercet term in the model without which the coefficient of determination can not be used. So essentially, there will be a subset of ( ) exlanatory variables and one intercet term in the notation. The coefficient of determination based on such variables is SS ( ) reg = SST SS ( ) = SS T where SSreg ( ) and SS ( ) are the sum of squa due to regsion and iduals, ectively in a subset model based on ( ) exlanatory variables. Since there are k exlanatory variables available and we select only ( ) out of them, so there are k ossible choices of subsets. Each such choice will roduce one subset model. Moreover, the coefficient of determination has a tendency to increase with the increase in. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 8
9 So roceed as follows: Choose any aroriate value of, fit the model and obtain. Add one variable, fit the model and again obtain +. Obviously If regsion. + > If + + is small, then sto and choose the value of for subset. is high, then kee on adding variables uto a oint where an additional variable does not roduces a large change in the value of or the increment in becomes small. To know such value of, create a lot of versus. or examle, the curve will look like as in the following figure. Choose the value of coronding to a value of where the knee of the curve is clearly seen. Such choice of may not be unique among different analyst. Some exerience and judgment of analyst will be helful is finding the aroriate and satisfactory value of. To choose a satisfactory value analytically, a solution is a test which can identify the model with does not significantly differ from the Let where dα,, subset with = ( )( + d α ) 0 k +, nk, nk kα ( n, n k ) = n k > 0 is called an and based on all the exlanatory variables. + is the value of k - adequate( α) subset. which based on all ( k + ) exlanatory variables. A egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 9
10 . Adjusted coefficient of determination The adjusted coefficient of determination has certain advantages over the usual coefficient of determination. The adjusted coefficient of determination based on -term model is n adj ( ) = ( ). n An advantage of ( ) is that it does not necessarily increases as increases. adj If there are r more exlanatory variables which are added to a term model then ( + r) > ( ) adj adj if and only if the artial statistic for testing the significance of r additional exlanatory variables exceeds. So the subset selection based on ( ) can be made on the same lines are in adj. In general, the value of coronding to maximum value of ( ) is chosen for the subset model. adj 3. esidual mean square A model is said to have a better fit if iduals are small. This is reflected in the sum of squa due to iduals SS. A model with smaller SS is referable. Based on this, the idual mean square based on a variable subset regsion model is defined as MS SS ( ) ( ) =. n So MS ( ) can be used as a criterion for model selection like SS. The SS ( ) decreases with an increase in. So similarly as increases, MS ( ) initially decreases, then stabilizes and finally may increase if the model is not sufficient to comensate the loss of one degree of freedom in the factor ( n ). When MS ( ) is lotted versus, the curve look like as in the following figure. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 0
11 So lot MS ( ) versus. Choose coronding to minimum value of MS ( ). Choose coronding to which MS ( ) is aroximately equal to MS based on full model. Choose near the oint where the smallest value of MS ( ) turns uward. Such minimum value of MS ( ) will roduce a ( ) with maximum value. So adj n adj ( ) = ( ) n n SS ( ) =. n SS n SS ( ) =. SS n T MS ( ) =. SS /( n ) T T Thus the two criterion, viz, minimum MS ( ) and maximum ( ) are equivalent. adj 4. Mallow s C statistics: Mallow s C criterion is based on the mean squared error of a fitted value. Consider the model y = Xβ + ε with artitioned X = ( X, X) where X is n matrix and X is n q matrix, so that y X β X β ε E ε V ε σ I = + +, ( ) = 0, ( ) = where β = ( β, β )'. ' ' Consider the reduced model = +, ( ) = 0, ( ) = y X β δ E δ V δ σ I and redict y based on subset model as yˆ = X ˆ β, where ˆ β = ( XX) Xy. ' ' egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur
12 The rediction of y can also be seen as the estimation of E( y) = Xβ, so the exected outweighed squared error loss of ŷ is given by ( ˆ β β) ( ˆ β β) Γ = E X X ' X X. So the subset model can be considered as an aroriate model if Γ is small. Since H = X( XX) X, so ' ' Γ = β β + β β E( yhy ' ) ' X' HX ' X' X where EyHy ( ' ) = E[ ( Xβ + ε)' H( Xβ + ε) ] Thus [ β' ' β β' ' ε ε' β ε' ε] = E X HX + X H + HX + H = β' X ' H X β σ tr H ' X' HX. = β β + σ Γ = σ + β' ' β β' ' β + β' ' β X HX X HX X X = + ' X' X ' X' HX σ β β β β = + ' X '( I H ) X σ β β = + ' X' HX σ β β where H = I X( XX) X. ' ' Since Thus [ β ε β ε ] EyHy ( ' ) = E( X + )' H( X + ) = + σ trh β' X ' HX β = + σ ( n ) β' X' HX β β β = σ ' X' HX E( yhy ' ) ( n ) Γ = ( ) + ( ' ). σ n EyHy Note that Γ deends on β and σ which are unknown. So Γ can not be used in ractice. A solution to this roblem is to relace β and σ by their ective estimators which gives ˆ ˆ σ ( n) SS ( ) Γ = +. where SS ( ) = y ' Hy is the iduals sum of squa based on the subset model. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur
13 A caled vision of Γ ˆ is SS ( ) C = ( n) + ˆ σ which is the Mallow s b= ( X ' X) X ' y C statistic for the model y = Xβ + δ, the subset model. Usually σ ˆ β ˆ β n q ˆ = ( y X )'( y X ) are used to estimate β and σ ectively which are based on full model. When different subset models are considered, then the models with smallest than those models with higher C. So lower C is referable. C are considered to be better If the subset model has negligible bias, (in case of b, then bias is zero), then and [ ( )] = ( ) E SS n σ The lot of ( n ) σ E C Bias = 0 = n =. σ like as follows: C versus for each regsion equation will be a straight line assing through origin and look Those oints which have smaller bias will be near to line and those oints with significant bias will lie above the line. or examle, oint A has little bias, so it is closer to line A whereas oints B and C have substantial bias, so they are above the line. Moreover, oint C is above oint A and it reents a model egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 3
14 with lower total error. It may be referred to accet some bias in the regsion equation to reduce the average rediction error. Note that an unbiased estimator of σ is used in C = which is based on the assumtion that the full model has negligible bias. In case, the full model contains non-significant exlanatory variables with zero regsion coefficients, then the same unbiased estimator of σ will ovetimate σ and then C will have smaller values. So working of C deends on the good choice of estimator of σ. 5. Akaike s information criterion (AIC) The Akaike s information criterion statistic is given as SS ( ) AIC = n ln + n where SS ( ) = y ' H y = y ' X ( X X ) X y ' ' is based on the subset model y = Xβ+ δ derived from the full model y = Xβ+ Xβ + ε = Xβ + ε. The AIC is defined as AIC = -(maximized log likelihood) + (number of arameters). In linear regsion model with ε ~ N(0, σ I), the likelihood function is Ly (, βσ, ) ex n and log likelihood is ( y Xβ)'( y Xβ) σ ( πσ ) n n ( y Xβ)'( y Xβ) σ ln Ly ( ; βσ, ) = ln π ln( σ ). The log-likelihood is maximized at β = ( X ' X) ' n σ = ˆ σ n X y where β is maximum likelihood estimate of β which is same as OLSE, σ is maximum likelihood estimate of σ and ˆ σ is OLSE of σ. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 4
15 So AIC = L y βσ + ln ( ;, ) SS = nln + + n ln( ) + n [ π ] where SS = y ' I X ( X ' X ) X ' y. The term n[ ln( π ) + ] remains same for all the models under comarison if same observations y are comared. So it is irrelevant for AIC. 6. Bayesian information criterion (BIC) Similar to AIC, the Bayesian information criterion is based on maximizing the osterior distribution of model given the observations y. In the case of linear regsion model, it is defined as BIC = n ln( SS ) + ( k n) ln n. A model with smaller value of BIC is referable. 7. PESS statistic Since the iduals and idual sum of squa acts as a criterion of subset model selection, so similarly the PESS iduals and rediction sum of squa can also be used as a basis for subset model selection. The usual idual and PESS iduals have their own characteristics which use used is regsion modeling. The PESS statistic based on a subset model with exlanatory variable is given by PESS( ) = y ˆ i y n i= n e i = i= hii where h ii is the i th element in () i. H X X X X = ( ' ). This criterion is used on the similar lines as in the case of SS ( ). A subset regsion model with smaller value of PESS( ) is referable. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 5
16 Partial - statistic The artial statistic is used to test the hyothesis about a subvector of the regsion coefficient. Consider the model y = X β + ε n n n where = k+ which includes an intercet term and k exlanatory variables. Suose a subset of r < k exlanatory variables is to be obtained which contribute significantly to the regsion model. So artition β X = X X, β = β where X and X are matrices of order n ( r) and n r ectively; β and β are the vectors of order ( ) and r ectively. The objective is to test the null hyothesis Then H : β = 0 0 H : β 0. y = Xβ + ε = X β + X β + ε is the full model and alication of least squa gives the OLSE of β as b= X X X y ( ' ) '. The coronding sum of squa due to regsion with degrees of freedom is SSreg = b' X ' y and the sum of squa due to iduals with ( n ) degrees of freedom is and SS = y ' y b ' X ' y MS yy ' bx ' ' y = n is the mean square due to idual. The contribution of exlanatory variables in β in the regsion can be found by considering the full model under H0 : β = 0. Assume that H0 : β = 0 is true, then the full model becomes y X E Var I = β + δ, ( δ) = 0, ( δ) = σ which is the reduced model. Alication of least squa to reduced model yields the OLSE of β as egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 6
17 b= ( XX) Xy ' ' and coronding sum of squa due to regsion with ( r) degrees of freedom is SSreg = b X y ' '. The sum of squa of regsion due to β given that β in already in the model can be found by SS ( β β ) = SS ( β) SS ( β ) reg reg reg where SSreg ( β ) and SSreg ( β ) are the sum of squa due to regsion with all exlanatory variables coronding to β is the model and the exlanatory variables coronding to β in the model. The term SSreg ( β β ) is called as the extra sum of squa due to β and has degrees of freedom ( r) = r. It is indeendent of MS and is a measure of regsion sum of squa that ults from adding the exlanatory variables X,..., k r+ Xkin the model when the model has already X, X,..., Xk r exlanatory variables. The null hyothesis H0 : β = 0 can be tested using the statistic SS = 0 ( β β)/ r MS which follows distribution with r and ( n ) degrees of freedom under H. 0 The decision rule is to reject H 0 whenever 0 > α ( rn, ). This is known as the artial test. It measu the contribution of exlanatory variables in X given that the other exlanatory variables in X are already in the model. Comutational techniques for variable selection In order to select a subset model, several techniques based on comutational rocedu and algorithm the available. They are essentially based on two ideas select all ossible exlanatory variables or select the exlanatory variables stewise. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 7
18 . Use all ossible exlanatory variables This methodology is based on following stes: it a model with one exlanatory variable. it a model with two exlanatory variables. it a model with three exlanatory variables. and so on. Choose a suitable criterion for model selection and evaluate each of the fitted regsion equation with the selection criterion. The total number of models to be fitted sharly rises with increase in k. So such models can be evaluated using model selection criterion with the hel of an efficient comutation algorithm on comuters.. Stewise regsion techniques This methodology is based on choosing the exlanatory variables in the subset model in stes which can be either adding one variable at a time or deleting one variable at a times. Based on this, there are three rocedu. - forward selection - backward elimination and - stewise regsion. These rocedu are basically comuter intensive rocedu and are executed using a software. orward selection rocedure: This methodology assumes that there is no exlanatory variable in the model excet an intercet term. It adds variables one by one and test the fitted model at each ste using some suitable criterion. It has following stes. Consider only intercet term and insert one variable at a time. Calculate the simle correlations of x' si ( =,,..., k) with y. Choose x i which has largest correlation with y. i Suose x is the variable which has highest correlation with y. Since statistic given by n k 0 =., k so x will roduce the largest value of 0 in testing the significance of regsion. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 8
19 Choose a ecified value of value, say IN ( to enter). If > IN, then accet x and so x enters into the model. Adjust the effect of x on y and re-comute the correlations of remaining obtain the artial correlations as follows. - it the regsion ŷ = ˆ β ˆ 0+ βxand obtain the iduals. x ' s with y and i - it the regsion of x on other candidate exlanatory variables as ˆ j αoj ˆ α j xˆ = + x, j =,3,..., k and obtain the iduals. - ind the simle correlation between the two iduals. - This gives the artial correlations. Choose x i with second largest correlation with y, i.e., the variable with highest value of artial correlation with y. Suose this variable is x. Then the largest artial statistic is SS x x = MS ( x, x ) reg ( ). It > IN then x enters into the model. These stes are reeated. At each ste, the artial correlations are comuted and exlanatory variable coronding to highest artial correlation with y is chosen to be added into the model. Equivalently, the artial -statistics are calculated and largest statistic given the other exlanatory variables in the model is chosen. The coronding exlanatory variable is added into the model if artial -statistic exceeds IN. Continue with such selection as long as either at articular ste, the artial statistic does not exceed IN or when the least exlanatory variable is added to the model. Note: The SAS software choose IN by choosing a tye I error rate α so that the exlanatory variable with highest artial correlation coefficient with y is added to model if artial statistic exceeds α (, n ). egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 9
20 Backward elimination rocedure: This methodology is contrary to forward selection rocedure. The forward selection rocedure starts with no exlanatory variable in the model and kee on adding one variable at a time until a suitable model is obtained. The backward elimination methodology begins with all exlanatory variables and kee on deleting one variable at a time until a suitable model is obtained. It is based on following stes: Consider all k exlanatory variables and fit the model. Comute artial statistic for each exlanatory variables as if it were the last variable to enter in the model. Choose a elected value 0UT ( to-remove). Comare the smallest of the artial statistics with OUT the coronding exlanatory variable from the model. The model will have now ( k ) exlanatory variables.. If it is less than OUT, then remove it the model with these ( k ) exlanatory variables, comute the artial statistic for the new model and comare it with OUT variable from the model. eeat this rocedure.. If it is less them OUT Sto the rocedure when smallest artial statistic exceeds OUT., then remove the coronding Stewise regsion rocedure: A combinations of forward selection and backward elimination rocedure is the stewise regsion. It is a modification of forward selection rocedure and has following stes. Consider all the exlanatory variables entered into to the model at revious ste. Add a new variable and regses it via their artial statistics. An exlanatory variable that was added at an earlier ste may now become insignificant due to its relationshi with currently ent exlanatory variables in the model. egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur 0
21 If artial -statistic for an exlanatory variable is smaller than OUT deleted from the model., then this variable is Stewise needs two cut-off values, IN and OUT. Sometimes IN = out or IN > OUT are considered. The choice IN variable than to delete one. > makes relatively more difficult to add an exlanatory OUT General comments:. None of the methods among forward selection, backward elimination or stewise regsion guarantees the best subset model.. The order in which the exlanatory variables enter or leave the models does not indicate the order of imortance of exlanatory variable. 3. In forward selection, no exlanatory variable can be removed if entered in the model. Similarly in backward elimination, no exlanatory variable can be added if removed from the model. 4. All rocedu may lead to different models. 5. Different model selection criterion may give different subset models. Comments about stoing rules: Choice of IN and/or OUT rovides stoing rules for algorithms. Some comuter software allows analyst to secify these values directly. Some algorithms require tye I errors to generate IN or/and OUT. Sometimes, taking α as level of significance can be misleading because several correlated artial variables are considered at each ste and maximum among them is examined. Some analyst refer small values of IN and OUT whereas some refer extreme values. Poular choice is = = 4 which is coronding to 5% level of significance of distribution. IN OUT egsion Analysis Chater 3 Variable Selection and Model Building Shalabh, IIT Kanur
Variable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 38 Variable Selection and Model Building Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur Evaluation of subset regression
More informationChapter 11 Specification Error Analysis
Chapter Specification Error Analsis The specification of a linear regression model consists of a formulation of the regression relationships and of statements or assumptions concerning the explanator variables
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 39 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 5. Akaike s information
More informationChapter 10. Supplemental Text Material
Chater 1. Sulemental Tet Material S1-1. The Covariance Matri of the Regression Coefficients In Section 1-3 of the tetbook, we show that the least squares estimator of β in the linear regression model y=
More informationNotes on Instrumental Variables Methods
Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationA Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression
Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More informationMultiple Linear Regression Analysis
LINEAR REGREION ANALYSIS MODULE III Lecture 3 Multiple Linear Regsion Analysis Dr Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Likelihood ratio test for H : Rβ
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationOne-way ANOVA Inference for one-way ANOVA
One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA
More informationIntroduction to Probability and Statistics
Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationTests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)
Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant
More informationUsing the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process
Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur
More informationCHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit
Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't
More informationA New Asymmetric Interaction Ridge (AIR) Regression Method
A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationCHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules
CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is
More informationHotelling s Two- Sample T 2
Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test
More informationSpectral Analysis by Stationary Time Series Modeling
Chater 6 Sectral Analysis by Stationary Time Series Modeling Choosing a arametric model among all the existing models is by itself a difficult roblem. Generally, this is a riori information about the signal
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More information8 STOCHASTIC PROCESSES
8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular
More informationSTK4900/ Lecture 7. Program
STK4900/9900 - Lecture 7 Program 1. Logistic regression with one redictor 2. Maximum likelihood estimation 3. Logistic regression with several redictors 4. Deviance and likelihood ratio tests 5. A comment
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationPerformance of lag length selection criteria in three different situations
MPRA Munich Personal RePEc Archive Performance of lag length selection criteria in three different situations Zahid Asghar and Irum Abid Quaid-i-Azam University, Islamabad Aril 2007 Online at htts://mra.ub.uni-muenchen.de/40042/
More informationMorten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014
Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More information7.2 Inference for comparing means of two populations where the samples are independent
Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht
More informationChapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population
Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection
More informationute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the e
Regression with Time Series Errors David A. Dickey, North Carolina State University Abstract: The basic assumtions of regression are reviewed. Grahical and statistical methods for checking the assumtions
More information16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes
Infinite Series 6. Introduction We extend the concet of a finite series, met in section, to the situation in which the number of terms increase without bound. We define what is meant by an infinite series
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationEstimating Time-Series Models
Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary
More information2. Sample representativeness. That means some type of probability/random sampling.
1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)
More informationResearch Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **
Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R
More informationStatistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform
Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May
More informationOutline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution
Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models
More informationHENSEL S LEMMA KEITH CONRAD
HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar
More informationEstimation of the large covariance matrix with two-step monotone missing data
Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo
More information16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes
Infinite Series 6.2 Introduction We extend the concet of a finite series, met in Section 6., to the situation in which the number of terms increase without bound. We define what is meant by an infinite
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More informationarxiv: v3 [physics.data-an] 23 May 2011
Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More information0.6 Factoring 73. As always, the reader is encouraged to multiply out (3
0.6 Factoring 7 5. The G.C.F. of the terms in 81 16t is just 1 so there is nothing of substance to factor out from both terms. With just a difference of two terms, we are limited to fitting this olynomial
More informationTowards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK
Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)
More informationFinite Mixture EFA in Mplus
Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered
More informationDistributed Rule-Based Inference in the Presence of Redundant Information
istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced
More informationMATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK
Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment
More informationState Estimation with ARMarkov Models
Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,
More informationHypothesis Test-Confidence Interval connection
Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the
More informationAn Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling
Journal of Modern Alied Statistical Methods Volume 15 Issue Article 14 11-1-016 An Imroved Generalized Estimation Procedure of Current Poulation Mean in Two-Occasion Successive Samling G. N. Singh Indian
More informationChapter 3. GMM: Selected Topics
Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationProbability Estimates for Multi-class Classification by Pairwise Coupling
Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationElementary Analysis in Q p
Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 379: Business and Economics Statistics Instructor: Yogesh Ual Email: yual@ysu.edu Chater 9, Part A: Hyothesis Tests Develoing Null and Alternative Hyotheses Tye I and Tye II Errors Poulation Mean:
More information8.7 Associated and Non-associated Flow Rules
8.7 Associated and Non-associated Flow Rules Recall the Levy-Mises flow rule, Eqn. 8.4., d ds (8.7.) The lastic multilier can be determined from the hardening rule. Given the hardening rule one can more
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationBiostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression
Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout
More informationECE 534 Information Theory - Midterm 2
ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You
More informationLecture 6. 2 Recurrence/transience, harmonic functions and martingales
Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationECON 4130 Supplementary Exercises 1-4
HG Set. 0 ECON 430 Sulementary Exercises - 4 Exercise Quantiles (ercentiles). Let X be a continuous random variable (rv.) with df f( x ) and cdf F( x ). For 0< < we define -th quantile (or 00-th ercentile),
More informationPrinciples of Computed Tomography (CT)
Page 298 Princiles of Comuted Tomograhy (CT) The theoretical foundation of CT dates back to Johann Radon, a mathematician from Vienna who derived a method in 1907 for rojecting a 2-D object along arallel
More informationEstimation of Separable Representations in Psychophysical Experiments
Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)
More informationLecture 3 Consistency of Extremum Estimators 1
Lecture 3 Consistency of Extremum Estimators 1 This lecture shows how one can obtain consistency of extremum estimators. It also shows how one can find the robability limit of extremum estimators in cases
More informationLOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi
LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when
More informationOn Using FASTEM2 for the Special Sensor Microwave Imager (SSM/I) March 15, Godelieve Deblonde Meteorological Service of Canada
On Using FASTEM2 for the Secial Sensor Microwave Imager (SSM/I) March 15, 2001 Godelieve Deblonde Meteorological Service of Canada 1 1. Introduction Fastem2 is a fast model (multile-linear regression model)
More informationDO NOT TURN OVER UNTIL TOLD TO BEGIN
ame HEMIS o. For Internal Stdents of Royal Holloway DO OT TUR OVER UTIL TOLD TO BEGI EC5040 : ECOOMETRICS Mid-Term Examination o. Time Allowed: hor Answer All 4 qestions STATISTICAL TABLES ARE PROVIDED
More informationEstimating function analysis for a class of Tweedie regression models
Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal
More informationLecture: Condorcet s Theorem
Social Networs and Social Choice Lecture Date: August 3, 00 Lecture: Condorcet s Theorem Lecturer: Elchanan Mossel Scribes: J. Neeman, N. Truong, and S. Troxler Condorcet s theorem, the most basic jury
More informationRobustness of classifiers to uniform l p and Gaussian noise Supplementary material
Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668
More informationAdaptive estimation with change detection for streaming data
Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment
More informationAn Estimate For Heilbronn s Exponential Sum
An Estimate For Heilbronn s Exonential Sum D.R. Heath-Brown Magdalen College, Oxford For Heini Halberstam, on his retirement Let be a rime, and set e(x) = ex(2πix). Heilbronn s exonential sum is defined
More informationUncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning
TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment
More informationON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS
#A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,
More informationImproved Bounds on Bell Numbers and on Moments of Sums of Random Variables
Imroved Bounds on Bell Numbers and on Moments of Sums of Random Variables Daniel Berend Tamir Tassa Abstract We rovide bounds for moments of sums of sequences of indeendent random variables. Concentrating
More informationDETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS
Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM
More informationMatematické Metody v Ekonometrii 7.
Matematické Metody v Ekonometrii 7. Multicollinearity Blanka Šedivá KMA zimní semestr 2016/2017 Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 1 / 15 One of the assumptions
More informationSession 5: Review of Classical Astrodynamics
Session 5: Review of Classical Astrodynamics In revious lectures we described in detail the rocess to find the otimal secific imulse for a articular situation. Among the mission requirements that serve
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationBayesian Spatially Varying Coefficient Models in the Presence of Collinearity
Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory
More informationReal Analysis 1 Fall Homework 3. a n.
eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually
More informationFAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS
18th Euroean Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmark, August 23-27, 2010 FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS
More informationPlotting the Wilson distribution
, Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion
More informationLower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data
Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment
More informationFE FORMULATIONS FOR PLASTICITY
G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND
More informationMachine Learning: Homework 4
10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,
More informationSection 0.10: Complex Numbers from Precalculus Prerequisites a.k.a. Chapter 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative
Section 0.0: Comlex Numbers from Precalculus Prerequisites a.k.a. Chater 0 by Carl Stitz, PhD, and Jeff Zeager, PhD, is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license.
More informationA Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands
A Secial Case Solution to the Persective -Point Problem William J. Wolfe California State University Channel Islands william.wolfe@csuci.edu Abstract In this aer we address a secial case of the ersective
More information