Chapter 11 : State SAT scores for 1982 Data Listing
|
|
- Candace Taylor
- 5 years ago
- Views:
Transcription
1 EXST3201 Chapter 12a Geaghan Fall 2005: Page 1 Chapter 12 : Variable selection An example: State SAT scores In 1982 there was concern for scores of the Scholastic Aptitude Test (SAT) scores that varied greatly between states. Researchers decided to study the scores to determine what factors caused the differences between the states. 1 ********************************************************; 2 *** State average SAT scores ***; 3 ********************************************************; 4 5 dm'log;clear;output;clear'; 6 options nodate nocenter nonumber ps=512 ls=99 nolabel; 7 ODS HTML style=minimal rs=none 7! body='c:\geaghan\current\exst3201\fall2005\sas\statesat01.html' ; NOTE: Writing HTML Body file: C:\Geaghan\Current\EXST3201\Fall2005\SAS\StateSAT01.html 8 9 Title1 ''; 10 filename input1 'C:\Geaghan\Current\EXST3201\Datasets\ASCII\case1201.csv'; data StateSAT; length state $ 15; 13 infile input1 missover DSD dlm="," firstobs=2; 14 input STATE $ SAT TAKERS INCOME YEARS PUBLIC EXPEND RANK; 15 label SAT = 'Average SAT score (verbal+quantitative)' 16 Takers = 'percent of total eligible students taking the test' 17 Income = 'Median income of test takers' 18 years = 'Ave of yrs of formal study in social sci, nat sci & humanities' 19 public = 'percentage of takers attending public secondary schools' 20 expend = 'total state expenditure on secondary schools' 21 rank = 'median percentile ranking of test takers in their secondary school classes'; 22 datalines; NOTE: The infile INPUT1 is: File Name=C:\Geaghan\Current\EXST3201\Datasets\ASCII\case1201.csv, RECFM=V,LRECL=256 NOTE: 50 records were read from the infile INPUT1. The minimum record length was 64. The maximum record length was 99. NOTE: The data set WORK.STATESAT has 50 observations and 8 variables. NOTE: DATA statement used (Total process time): 0.01 seconds 0.02 seconds 23 run; PROC PRINT DATA=StateSAT; TITLE2 'Data Listing'; RUN; NOTE: There were 50 observations read from the data set WORK.STATESAT. NOTE: The PROCEDURE PRINT printed page 1. NOTE: PROCEDURE PRINT used (Total process time): 0.26 seconds 0.04 seconds Data Listing Obs state SAT TAKERS INCOME YEARS PUBLIC EXPEND RANK 1 Iowa SouthDakota NorthDakota Kansas Nebraska Montana Minnesota Utah Wyoming Wisconsin Oklahoma Arkansas Tennessee NewMexico Idaho Mississippi Kentucky Colorado
2 EXST3201 Chapter 12a Geaghan Fall 2005: Page 2 19 Washington Arizona Illinois Louisiana Missouri Michigan WestVirginia Alabama Ohio NewHampshire Alaska Nevada Oregon Vermont California Delaware Connecticut NewYork Maine Florida Maryland Virginia Massachusetts Pennsylvania RhodeIsland NewJersey Texas Indiana Hawaii NorthCarolina Georgia SouthCarolina options ps=45 ls=99 nolabel; 28 PROC REG DATA=StateSAT; Title2 'Fit of SAT with REG'; 29 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / VIF partial; 30 RUN; NOTE: The PROCEDURE REG printed pages 2-9. NOTE: PROCEDURE REG used (Total process time): 0.28 seconds 0.07 seconds Fit of SAT with REG The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50 Model <.0001 Error Parameter Estimates Variance Variable DF Estimate Error t Value Pr > t Inflation Intercept TAKERS INCOME YEARS PUBLIC EXPEND RANK
3 EXST3201 Chapter 12a Geaghan Fall 2005: Page 3 Fit of SAT with REG The REG Procedure Model: MODEL1 Partial Regression Residual Plot SAT Intercept SAT TAKERS SAT INCOME
4 EXST3201 Chapter 12a Geaghan Fall 2005: Page 4 Fit of SAT with REG The REG Procedure Model: MODEL1 Partial Regression Residual Plot SAT YEARS SAT PUBLIC SAT EXPEND
5 EXST3201 Chapter 12a Geaghan Fall 2005: Page 5 Fit of SAT with REG The REG Procedure Model: MODEL1 Partial Regression Residual Plot SAT RANK First, fit and examine the model. Note the VIF and numerous non-significant variables. We want to find a subset of significant variables that describe the variation in SAT scores. The approach we use will also tend to reduce the multicolinearity problem (though not necessarily). The text suggests 3 specific issues related to this process of variable selection. 1) Adjusting for a large set of explanatory variables. There may be a specific hypothesis to be tested, and that variable should be included in the analysis (though not necessarily initially), but variance can be reduced by including other explanatory variables to refine the analysis. 2) Fishing for an explanation. Often there is not clear variable that must be included or any interest in testing a particular variable. The objective is to determine which variable may be correlated and may ultimately be useful explanatory varibles. 3) Prediction. The objective may not be interpretation, but rather prediction. The variable selection procedure can be used to find useful independent variables for future predictions. Variable selection techniques. Stepwise Regression : we know that we should not delete batches of variable from regressions when they are not significant, because we do not know what the effect of adjusting for each other has been (except when order dependent TYPE I SS has been used, as in a pure polynomial). Generally, when we wish to examine the effect of adding and deleting variables, this should be done one variable at a time. There is an algorithm to do this automatically called stepwise regression. Consider p 1 candidate variables where we want to select the best set. One possibility is to fit and examine all possible combinations of the variables. This all possible regressions approach is used, but there are other more automated procedures.
6 EXST3201 Chapter 12a Geaghan Fall 2005: Page 6 1) FORWARD stepwise regression: Start by fitting all simple linear regressions between Y i and the p 1 different X i variables. These would usually be evaluated with the F statistic (TYPE III) to determine which variable is the best predictor. So, out of the p 1 variables, the one with the best F statistic is the one that fits best. Now, that the BEST variable has been selected we want to determine if there is a second variable model which is also useful. There are p 2 variables remaining, so we try our best variable together with each of the p 2 remaining variable to determine if there is a second variable that is also significant. If there is a second best variable then it is included with the first, giving a two factor model as the best model. At this point there are p 3 variables remaining, so we try our best two factors together with each of the p 3 remaining variable to determine if there is a third best variable that is significant. If there is a third best variable then it is included with the first two, giving a three factor model as the best model. This process is repeated until no variable can be found that will enter the model at some predetermined criteria. With Forward stepwise regression, once a variable is entered into the model, it stays in the model. There are no provisions for removing variables once entered. This process is called FORWARD stepwise regression. Sometimes, no single variable will enter and be significant, but some group of 2 or 3, between them, will lower the MSE sufficiently for all to be significant. FORWARD will not catch this. Although the F statistic is the usual criteria, other criteria can be used, such as the R 2. 2) BACKWARDS elimination. This procedure starts with a fit the full model (p 1 variables). Examine the F statistic (TYPE II, fully adjusted for all other variables in the model). We first find the least significant of the p 1 variables, and eliminate this variable only. THIS IS DONE ONE VARIABLE AT A TIME. Now, refit the full model, minus the eliminated variable (p 2 variables). Again find the least significant and eliminate this variable. Repeat until all variables in the model are significant at some predetermine level of significance. Sometimes, a variable added to a model by FORWARD selection is not significant at some later point, after being adjusted for other variables in the model. Since no variable can be removed once entered, backwards elimination can avoid this problem. 3) STEPWISE selection (FORWARD selection with a backward glance). This selection proceeds as does FORWARD, but after each addition all variables in the model are examined to insure that they remain significant. If not significant, they are removed from the model. Selection criteria 2 R p, 2 AdjR p and MSE p can be used to graphically compare and evaluate models. The subscript p refers to the number of parameters in the model Mallow's Cp criterion Use of this statistic presumes no bias in the full model MSE, so the full model should be carefully chosen to have little or no multicolinearity. The Cp statistics will be approximately equal to p if there is no bias in the regression model ˆ ˆ = Cp p ( n p full ) σ σ 2 ˆ σ full
7 EXST3201 Chapter 12a Geaghan Fall 2005: Page 7 SAS application Selection options: FORWARD, BACKWARD, STEPWISE, NONE (full model). These do the selection techniques discussed above. Additional options for working with these are; SLENTRY = alpha value; level of entry for stepwise and forward SLSTAY = alpha value; level of remaining for backward and stepwise INCLUDE = n; automatically puts the first n variables of the list into the regression permanently (they will not be removed) RSQUARE, ADJRSQ, CP : These are not sequential model development techniques. These are more like all possible regressions analysis, but they simply calculate the R 2, AdjR 2 and the C p statistic for all models to find the best ones. Options for controlling these techniques are: START = n STOP = n (will stop all methods) BEST = n 32 options ps=512 ls=111 nolabel; 33 PROC REG DATA=StateSAT; Title2 'Forward selection stepwise regression'; 34 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / selection=forward; 35 RUN; NOTE: The PROCEDURE REG printed page 10. NOTE: PROCEDURE REG used (Total process time): 0.17 seconds 0.10 seconds Forward selection stepwise regression The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50 Forward Selection: Step 1 Variable RANK Entered: R-Square = and C(p) = Model <.0001 Error Intercept RANK <.0001 Bounds on condition number: 1, 1 -
8 EXST3201 Chapter 12a Geaghan Fall 2005: Page 8 Forward Selection: Step 2 Variable YEARS Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 RANK <.0001 Bounds on condition number: 1.005, Forward Selection: Step 3 Variable EXPEND Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: , Forward Selection: Step 4 Variable PUBLIC Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS PUBLIC EXPEND RANK <.0001 Bounds on condition number: ,
9 EXST3201 Chapter 12a Geaghan Fall 2005: Page 9 Forward Selection: Step 5 Variable TAKERS Entered: R-Square = and C(p) = Model <.0001 Error Intercept TAKERS YEARS PUBLIC EXPEND RANK Bounds on condition number: , No other variable met the significance level for entry into the model. Summary of Forward Selection Variable Number Partial Model Step Entered Vars In R-Square R-Square C(p) F Value Pr > F 1 RANK < YEARS < EXPEND PUBLIC TAKERS PROC REG DATA=StateSAT; Title2 'Backward elimination stepwise regression'; 37 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / selection=backward; 38 RUN; NOTE: The PROCEDURE REG printed page 11. NOTE: PROCEDURE REG used (Total process time): 0.13 seconds 0.05 seconds Backward elimination stepwise regression The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50
10 EXST3201 Chapter 12a Geaghan Fall 2005: Page 10 Backward Elimination: Step 0 All Variables Entered: R-Square = and C(p) = Model <.0001 Error Intercept TAKERS INCOME YEARS PUBLIC EXPEND RANK Bounds on condition number: , Backward Elimination: Step 1 Variable INCOME Removed: R-Square = and C(p) = Model <.0001 Error Intercept TAKERS YEARS PUBLIC EXPEND RANK Bounds on condition number: , Backward Elimination: Step 2 Variable TAKERS Removed: R-Square = and C(p) = Model <.0001 Error Intercept YEARS PUBLIC EXPEND RANK <.0001
11 EXST3201 Chapter 12a Geaghan Fall 2005: Page 11 Bounds on condition number: , Backward Elimination: Step 3 Variable PUBLIC Removed: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: , All variables left in the model are significant at the level. Summary of Backward Elimination Variable Number Partial Model Step Removed Vars In R-Square R-Square C(p) F Value Pr > F 1 INCOME TAKERS PUBLIC PROC REG DATA=StateSAT; Title2 'Stepwise regression'; 40 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / 41 selection=stepwise slentry=0.20 slstay=0.05; 42 RUN; NOTE: The PROCEDURE REG printed page 12. NOTE: PROCEDURE REG used (Total process time): 0.12 seconds 0.04 seconds Stepwise regression The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50
12 EXST3201 Chapter 12a Geaghan Fall 2005: Page 12 Stepwise Selection: Step 1 Variable RANK Entered: R-Square = and C(p) = Model <.0001 Error Intercept RANK <.0001 Bounds on condition number: 1, 1 Stepwise Selection: Step 2 Variable YEARS Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 RANK <.0001 Bounds on condition number: 1.005, Stepwise Selection: Step 3 Variable EXPEND Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: ,
13 EXST3201 Chapter 12a Geaghan Fall 2005: Page 13 Stepwise Selection: Step 4 Variable PUBLIC Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS PUBLIC EXPEND RANK <.0001 Bounds on condition number: , Stepwise Selection: Step 5 Variable PUBLIC Removed: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: , All variables left in the model are significant at the level. The stepwise method terminated because the next variable to be entered was just removed. Summary of Stepwise Selection Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F 1 RANK < YEARS < EXPEND PUBLIC PUBLIC PROC REG DATA=StateSAT; Title2 'Reduced model with diagnostics'; 44 MODEL SAT = YEARS EXPEND RANK / VIF; 45 output out=next1 r=resid p=yhat lclm=lclm uclm=uclm lcl=lcli ucl=ucli 46 student=student rstudent=rstudent cookd=cookd h=leverage dffits=dffits; 47 RUN; NOTE: The data set WORK.NEXT1 has 50 observations and 19 variables. NOTE: The PROCEDURE REG printed page 13. NOTE: PROCEDURE REG used (Total process time): 0.13 seconds 0.06 seconds
14 EXST3201 Chapter 12a Geaghan Fall 2005: Page 14 Reduced model with diagnostics The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50 Model <.0001 Error Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variance Variable DF Estimate Error t Value Pr > t Inflation Intercept YEARS < EXPEND RANK < proc print data=next1; 50 var state SAT yhat resid student rstudent cookd leverage dffits; 51 RUN; NOTE: There were 50 observations read from the data set WORK.NEXT1. NOTE: The PROCEDURE PRINT printed page 14. NOTE: PROCEDURE PRINT used (Total process time): 0.07 seconds 0.03 seconds 52 options ps=60 ls=132; 53 proc plot data=next1; TITLE2 'Various plot with state variable'; 54 plot resid * yhat = state / vref=0; 55 RUN; 55! options ps=40 ls=132; NOTE: There were 50 observations read from the data set WORK.NEXT1. NOTE: The PROCEDURE PLOT printed page 15. NOTE: PROCEDURE PLOT used (Total process time): 0.06 seconds 0.00 seconds 56 proc plot data=next1; TITLE2 'Various plot with state variable'; 57 plot student * state / vref=2; 58 plot rstudent * state / vref=2; 59 plot leverage * state / vref=0.16; /* 2p/n = 2*4/50 = 0.16 */ 60 plot cookd * state / vref=1; 61 plot dffits * state / vref=1; 62 RUN; 62! OPTIONS PS=256 ls=132; NOTE: There were 50 observations read from the data set WORK.NEXT1. NOTE: The PROCEDURE PLOT printed pages NOTE: PROCEDURE PLOT used (Total process time): 0.06 seconds 0.00 seconds
15 EXST3201 Chapter 12a Geaghan Fall 2005: Page 15 Reduced model with diagnostics Obs state SAT yhat resid student rstudent cookd leverage dffits 1 Iowa SouthDakota NorthDakota Kansas Nebraska Montana Minnesota Utah Wyoming Wisconsin Oklahoma Arkansas Tennessee NewMexico Idaho Mississippi Kentucky Colorado Washington Arizona Illinois Louisiana Missouri Michigan WestVirginia Alabama Ohio NewHampshire Alaska Nevada Oregon Vermont California Delaware Connecticut NewYork Maine Florida Maryland Virginia
16 EXST3201 Chapter 12a Geaghan Fall 2005: Page Massachusetts Pennsylvania RhodeIsland NewJersey Texas Indiana Hawaii NorthCarolina Georgia SouthCarolina Various plot with state variable Plot of resid*yhat. Symbol is value of state. resid 60 + I T 40 + N S I C K I 20 + V M N N U M M O O H N M D M N A N I R O M W W T V C A W A N C L K G P N A W M S yhat NOTE: 2 obs hidden.
17 EXST3201 Chapter 12a Geaghan Fall 2005: Page 17 Various plot with state variable Plot of student*state. Legend: A = 1 obs, B = 2 obs, etc. student A A A A A A A A A A A A A A A A A A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A -2 + A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a Plot of rstudent*state. Legend: A = 1 obs, B = 2 obs, etc. rstudent A A A A A A A A A A A A A A A A A A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A -2 + A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state state
18 EXST3201 Chapter 12a Geaghan Fall 2005: Page 18 Various plot with state variable Plot of leverage*state. Legend: A = 1 obs, B = 2 obs, etc. leverage A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state Plot of cookd*state. Legend: A = 1 obs, B = 2 obs, etc. 2 + A cookd A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state
19 EXST3201 Chapter 12a Geaghan Fall 2005: Page 19 Various plot with state variable Plot of dffits*state. Legend: A = 1 obs, B = 2 obs, etc. dffits A A A A A A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A -2 + A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state 63 PROC UNIVARIATE DATA=NEXT1 NORMAL PLOT; VAR resid; RUN; NOTE: The PROCEDURE UNIVARIATE printed page 21. NOTE: PROCEDURE UNIVARIATE used (Total process time): 0.09 seconds 0.04 seconds 64
20 EXST3201 Chapter 12a Geaghan Fall 2005: Page 20 Various plot with state variable The UNIVARIATE Procedure Variable: resid Moments N 50 Sum Weights 50 0 Sum Observations 0 Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation. Std Error Basic Statistical Measures Location Variability Std Deviation Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=0 Test -Statistic p Value Student's t t 0 Pr > t Sign M 1 Pr >= M Signed Rank S 57.5 Pr >= S Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D < Cramer-von Mises W-Sq Pr > W-Sq Anderson-Darling A-Sq Pr > A-Sq Quantiles (Definition 5) Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest----- Value Obs Value Obs
21 EXST3201 Chapter 12a Geaghan Fall 2005: Page 21 Stem Leaf Boxplot Normal Probability Plot *+ * * * ** *** **** ** *** *--+--* *** **** **** ** * ** * * * * * Multiply Stem.Leaf by 10** Selection options: FORWARD, BACKWARD, STEPWISE, NONE (full model) These do the selection techniques discussed. Additional options working with these are; SLENTRY = alpha value, level of entry for stepwise and forward SLSTAY = alpha value, level of remaining for backward and stepwise INCLUDE = n, automatically puts the first n variables of the list into the regression (do not remove) RSQUARE, ADJRSQ, CP : These are like all possible regressions. These will simply calculate the R 2, AdjR 2 and C p statistic for all models to find the best ones. Other criteria include Akaike's information criterion (AIC), Schwartz s Bayesian information criterion (SBC), SSE, MSE and others. Additional options START = n STOP = n (will stop all methods) BEST = n eg. with these options we could determine the best 5 models at each level, starting with 3 factors, ending with 6 factors 65 PROC REG DATA=StateSAT; Title2 'RSquare procedure'; 66 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / 67 selection=rsquare start=1 stop=4 best=5 cp adjrsq aic sbc mse; 68 RUN; 69 NOTE: The PROCEDURE REG printed page 22. NOTE: PROCEDURE REG used (Total process time): 0.12 seconds 0.06 seconds
22 EXST3201 Chapter 12a Geaghan Fall 2005: Page 22 RSquare procedure The REG Procedure Model: MODEL1 Dependent Variable: SAT R-Square Selection Method Number of Observations Read 50 Number of Observations Used 50 Number in Adjusted Model R-Square R-Square C(p) AIC BIC MSE SBC SSE Variables in Model YEARS RANK EXPEND RANK TAKERS YEARS INCOME RANK PUBLIC RANK TAKERS RANK YEARS EXPEND RANK INCOME YEARS RANK TAKERS YEARS RANK YEARS PUBLIC RANK PUBLIC EXPEND RANK TAKERS YEARS EXPEND YEARS PUBLIC EXPEND RANK TAKERS YEARS EXPEND RANK INCOME YEARS EXPEND RANK INCOME YEARS PUBLIC RANK TAKERS INCOME YEARS RANK TAKERS YEARS PUBLIC RANK TAKERS YEARS PUBLIC EXPEND RANK INCOME YEARS PUBLIC EXPEND RANK TAKERS INCOME YEARS EXPEND RANK TAKERS INCOME YEARS PUBLIC RANK TAKERS INCOME PUBLIC EXPEND RANK TAKERS INCOME YEARS PUBLIC EXPEND
23 EXST3201 Chapter 12a Geaghan Fall 2005: Page proc print data=parm1; 72 * var ADJRSQ AIC BIC CP MSE RSQUARE SBC SSE; 73 run; NOTE: There were 24 observations read from the data set WORK.PARM1. NOTE: The PROCEDURE PRINT printed page 23. NOTE: PROCEDURE PRINT used (Total process time): 0.08 seconds 0.03 seconds RSquare procedure Obs _MODEL TYPE DEPVAR RMSE_ Intercept TAKERS INCOME YEARS PUBLIC EXPEND RANK 1 MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT Obs SAT _IN P EDF SSE MSE RSQ ADJRSQ CP AIC BIC SBC_ PROC REG DATA=StateSAT; Title2 'Cp criterion'; 76 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / 77 selection=cp start=2 stop=5 best=10; 78 RUN; ods html close; 81 quit; NOTE: The PROCEDURE REG printed page 24. NOTE: PROCEDURE REG used (Total process time): 0.13 seconds 0.03 seconds
24 EXST3201 Chapter 12a Geaghan Fall 2005: Page 24 Cp criterion The REG Procedure Model: MODEL1 Dependent Variable: SAT C(p) Selection Method Number of Observations Read 50 Number of Observations Used 50 Number in Model C(p) R-Square Variables in Model YEARS PUBLIC EXPEND RANK YEARS EXPEND RANK TAKERS YEARS EXPEND RANK INCOME YEARS EXPEND RANK TAKERS YEARS PUBLIC EXPEND RANK INCOME YEARS PUBLIC EXPEND RANK TAKERS INCOME YEARS EXPEND RANK INCOME YEARS RANK INCOME YEARS PUBLIC RANK TAKERS INCOME YEARS RANK
Chapter 8 (More on Assumptions for the Simple Linear Regression)
EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually
More informationThis is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.
EXST3201 Chapter 13c Geaghan Fall 2005: Page 1 Linear Models Y ij = µ + βi + τ j + βτij + εijk This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.
More informationThe program for the following sections follows.
Homework 6 nswer sheet Page 31 The program for the following sections follows. dm'log;clear;output;clear'; *************************************************************; *** EXST734 Homework Example 1
More informationEXST7015: Estimating tree weights from other morphometric variables Raw data print
Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;
More information1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.
1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive
More informationOdor attraction CRD Page 1
Odor attraction CRD Page 1 dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************;
More informationdm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;
dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************; *** Moore, David
More informationHandout 1: Predicting GPA from SAT
Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data
More information5.3 Three-Stage Nested Design Example
5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens
More informationNew Educators Campaign Weekly Report
Campaign Weekly Report Conversations and 9/24/2017 Leader Forms Emails Collected Text Opt-ins Digital Journey 14,661 5,289 4,458 7,124 317 13,699 1,871 2,124 Pro 13,924 5,175 4,345 6,726 294 13,086 1,767
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationModel Selection Procedures
Model Selection Procedures Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Model Selection Procedures Consider a regression setting with K potential predictor variables and you wish to explore
More informationParametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.
Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 984. y ˆ = a + b x + b 2 x 2K + b n x n where n is the number of variables Example: In an earlier bivariate
More informationChapter 5 Linear least squares regression
Chapter 5 Linear least squares regression Consider first simple linear regression. For now, the model of interest is only the model of the conditional expectations; the distribution of the residuals is
More informationJakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black
Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Name: Date: Score : 42 Data collection, presentation and application Frequency tables. (Answer question 1 on
More informationMultivariate Statistics
Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro
More informationLINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.
LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least
More informationCooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018
Cooperative Program Allocation Budget Receipts May 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior Year
More informationCooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017
Cooperative Program Allocation Budget Receipts October 2017 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior
More informationCooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018
Cooperative Program Allocation Budget Receipts October 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2018-2019 2017-2018 Prior
More informationMultivariate Statistics
Multivariate Statistics Chapter 3: Principal Component Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical
More informationStandard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world.
Standard Indicator 4.3.1 That s the Latitude! Purpose Students will use latitude and longitude to locate places in Indiana and other parts of the world. Materials For the teacher: graph paper, globe showing
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationAbortion Facilities Target College Students
Target College Students By Kristan Hawkins Executive Director, Students for Life America Ashleigh Weaver Researcher Abstract In the Fall 2011, Life Dynamics released a study entitled, Racial Targeting
More informationMulticollinearity Exercise
Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there
More informationIntercity Bus Stop Analysis
by Karalyn Clouser, Research Associate and David Kack, Director of the Small Urban and Rural Livability Center Western Transportation Institute College of Engineering Montana State University Report prepared
More informationChapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.
Chapter 2 Organizing and Summarizing Data Section 2.1 Organizing Qualitative Data Objectives 1. Organize Qualitative Data in Tables 2. Construct Bar Graphs 3. Construct Pie Charts When data is collected
More informationMultivariate Analysis
Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and
More information, District of Columbia
State Capitals These are the State Seals of each state. Fill in the blank with the name of each states capital city. (Hint: You may find it helpful to do the word search first to refresh your memory.),
More informationHourly Precipitation Data Documentation (text and csv version) February 2016
I. Description Hourly Precipitation Data Documentation (text and csv version) February 2016 Hourly Precipitation Data (labeled Precipitation Hourly in Climate Data Online system) is a database that gives
More information10. Alternative case influence statistics
10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the
More informationssh tap sas913, sas
B. Kedem, STAT 430 SAS Examples SAS8 ===================== ssh xyz@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Multiple Regression ====================== 0. Show
More informationQF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue
QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation by Issue Qualified Non-Paid Circulation Qualified Paid Circulation Individual Assoc. Total Assoc. Total Total Requester Group Qualified
More informationPrintable Activity book
Printable Activity book 16 Pages of Activities Printable Activity Book Print it Take it Keep them busy Print them out Laminate them or Put them in page protectors Put them in a binder Bring along a dry
More informationCorrection to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements
JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 109,, doi:10.1029/2004jd005099, 2004 Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements Cristina L. Archer
More informationDescriptions of post-hoc tests
Experimental Statistics II Page 81 Descriptions of post-hoc tests Post-hoc or Post-ANOVA tests! Once you have found out some treatment(s) are different, how do you determine which one(s) are different?
More informationChallenge 1: Learning About the Physical Geography of Canada and the United States
60ºN S T U D E N T H A N D O U T Challenge 1: Learning About the Physical Geography of Canada and the United States 170ºE 10ºW 180º 20ºW 60ºN 30ºW 1 40ºW 160ºW 50ºW 150ºW 60ºW 140ºW N W S E 0 500 1,000
More informationRELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON
RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON ETHAN A. BLIGHT Blight Investigations, Gainesville, FL ABSTRACT Misidentification of the American brown bear (Ursus arctos,
More informationJAN/FEB MAR/APR MAY/JUN
QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Qualified Non-Paid Qualified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals
More informationEXST 7015 Fall 2014 Lab 08: Polynomial Regression
EXST 7015 Fall 2014 Lab 08: Polynomial Regression OBJECTIVES Polynomial regression is a statistical modeling technique to fit the curvilinear data that either shows a maximum or a minimum in the curve,
More informationThis is a Split-plot Design with a fixed single factor treatment arrangement in the main plot and a 2 by 3 factorial subplot.
EXST3201 Chapter 13c Geaghan Fall 2005: Page 1 Linear Models Y ij = µ + τ1 i + γij + τ2k + ττ 1 2ik + εijkl This is a Split-plot Design with a fixed single factor treatment arrangement in the main plot
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationBooklet of Code and Output for STAC32 Final Exam
Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile
More informationClub Convergence and Clustering of U.S. State-Level CO 2 Emissions
Methodological Club Convergence and Clustering of U.S. State-Level CO 2 Emissions J. Wesley Burnett Division of Resource Management West Virginia University Wednesday, August 31, 2013 Outline Motivation
More informationAdditional VEX Worlds 2019 Spot Allocations
Overview VEX Worlds 2019 Spot s Qualifying spots for the VEX Robotics World Championship are calculated twice per year. On the following table, the number in the column is based on the number of teams
More informationOsteopathic Medical Colleges
Osteopathic Medical Colleges Matriculants by U.S. States and Territories Entering Class 0 Prepared by the Research Department American Association of Colleges of Osteopathic Medicine Copyright 0, AAM All
More informationA. Geography Students know the location of places, geographic features, and patterns of the environment.
Learning Targets Elementary Social Studies Grade 5 2014-2015 A. Geography Students know the location of places, geographic features, and patterns of the environment. A.5.1. A.5.2. A.5.3. A.5.4. Label North
More informationWhat Lies Beneath: A Sub- National Look at Okun s Law for the United States.
What Lies Beneath: A Sub- National Look at Okun s Law for the United States. Nathalie Gonzalez Prieto International Monetary Fund Global Labor Markets Workshop Paris, September 1-2, 2016 What the paper
More informationBE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club
BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook
More informationOnline Appendix: Can Easing Concealed Carry Deter Crime?
Online Appendix: Can Easing Concealed Carry Deter Crime? David Fortunato University of California, Merced dfortunato@ucmerced.edu Regulations included in institutional context measure As noted in the main
More informationSUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008
SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 U.S. DEPARTMENT OF AGRICULTURE FOOD AND NUTRITION SERVICE PROGRAM ACCOUNTABILITY AND ADMINISTRATION DIVISION QUALITY
More informationWeek 7.1--IES 612-STA STA doc
Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ
More informationPreview: Making a Mental Map of the Region
Preview: Making a Mental Map of the Region Draw an outline map of Canada and the United States on the next page or on a separate sheet of paper. Add a compass rose to your map, showing where north, south,
More informationHeteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)
Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u 2 i /X i ) σ 2 i (In practice this means the spread
More informationComparison of a Population Means
Analysis of Variance Interested in comparing Several treatments Several levels of one treatment Comparison of a Population Means Could do numerous two-sample t-tests but... ANOVA provides method of joint
More informationEXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"
EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically
More informationAlpine Funds 2017 Tax Guide
Alpine s 2017 Guide Alpine Dynamic Dividend ADVDX 1/30/17 1/31/17 1/31/17 0.020000000 0.019248130 0.000000000 0.00000000 0.019248130 0.013842273 0.000000000 0.000000000 0.000751870 0.000000000 0.00 0.00
More informationSummary of Natural Hazard Statistics for 2008 in the United States
Summary of Natural Hazard Statistics for 2008 in the United States This National Weather Service (NWS) report summarizes fatalities, injuries and damages caused by severe weather in 2008. The NWS Office
More information2005 Mortgage Broker Regulation Matrix
2005 Mortgage Broker Regulation Matrix Notes on individual states follow the table REG EXEMPTIONS LIC-EDU LIC-EXP LIC-EXAM LIC-CONT-EDU NET WORTH BOND MAN-LIC MAN-EDU MAN-EXP MAN-EXAM Alabama 1 0 2 0 0
More informationAlpine Funds 2016 Tax Guide
Alpine s 2016 Guide Alpine Dynamic Dividend ADVDX 01/28/2016 01/29/2016 01/29/2016 0.020000000 0.017621842 0.000000000 0.00000000 0.017621842 0.013359130 0.000000000 0.000000000 0.002378158 0.000000000
More informationHeteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set
Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that
More informationOffice of Special Education Projects State Contacts List - Part B and Part C
Office of Special Education Projects State Contacts List - Part B and Part C Source: http://www.ed.gov/policy/speced/guid/idea/monitor/state-contactlist.html Alabama Customer Specialist: Jill Harris 202-245-7372
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationFailure Time of System due to the Hot Electron Effect
of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationLABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)
LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written
More informationAnswer Keys to Homework#10
Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean
More informationLABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)
LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written
More informationAssignment 9 Answer Keys
Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67
More informationOutline. Topic 13 - Model Selection. Predicting Survival - Page 350. Survival Time as a Response. Variable Selection R 2 C p Adjusted R 2 PRESS
Topic 13 - Model Selection - Fall 2013 Variable Selection R 2 C p Adjusted R 2 PRESS Outline Automatic Search Procedures Topic 13 2 Predicting Survival - Page 350 Surgical unit wants to predict survival
More informationNorth American Geography. Lesson 2: My Country tis of Thee
North American Geography Lesson 2: My Country tis of Thee Unit Overview: As students work through the activities in this unit they will be introduced to the United States in general, different regions
More informationCrop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage]
Crop Progress ISSN: 00 Released October, 0, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United s Department of Agriculture (USDA). Corn Mature Selected s [These
More informationLecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3
Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Fall, 2013 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationInfant Mortality: Cross Section study of the United State, with Emphasis on Education
Illinois State University ISU ReD: Research and edata Stevenson Center for Community and Economic Development Arts and Sciences Fall 12-15-2014 Infant Mortality: Cross Section study of the United State,
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationa. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).
STAT3503 Test 2 NOTE: a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF). b. YOU MAY USE ANY ELECTRONIC CALCULATOR. c. FOR FULL MARKS YOU MUST SHOW THE FORMULA YOU USE
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationMultivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases
Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Summer Undergraduate Mathematical Sciences Research Institute (SUMSRI) Lindsay Kellam, Queens College kellaml@queens.edu
More information3rd Quartile. 1st Quartile) Minimum
EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain
More informationBlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3
MUNICIPAL FUNDS Arizona (MZA) California Municipal Income Trust (BFZ) California Municipal 08 Term Trust (BJZ) California Quality (MCA) California Quality (MUC) California (MYC) Florida Municipal 00 Term
More informationLecture 1 Linear Regression with One Predictor Variable.p2
Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of
More informationGrand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional
s by Location of Permanent Home Address and Degree Level Louisiana Acadia 19 13 0 3 0 3 0 0 0 Allen 5 5 0 0 0 0 0 0 0 Ascension 307 269 2 28 1 6 0 1 0 Assumption 14 12 0 1 0 1 0 0 0 Avoyelles 6 4 0 1 0
More informationLecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis
Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationThe Steps to Follow in a Multiple Regression Analysis
ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationLecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3
Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the weight percent
More informationMeteorology 110. Lab 1. Geography and Map Skills
Meteorology 110 Name Lab 1 Geography and Map Skills 1. Geography Weather involves maps. There s no getting around it. You must know where places are so when they are mentioned in the course it won t be
More informationRegression Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationOutline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping
Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals
More informationOutline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining
Outline Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining Dimensionality Reduction Introduction Principal Components Analysis Singular Value Decomposition Multidimensional
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear Model Selection and Regularization Recall the linear model Y = 0 + 1 X 1 + + p X p +. In the lectures
More informationJAN/FEB MAR/APR MAY/JUN
QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Analyzed Nonpaid and Verified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals
More informationholding all other predictors constant
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing
More information