Chapter 11 : State SAT scores for 1982 Data Listing

Size: px
Start display at page:

Download "Chapter 11 : State SAT scores for 1982 Data Listing"

Transcription

1 EXST3201 Chapter 12a Geaghan Fall 2005: Page 1 Chapter 12 : Variable selection An example: State SAT scores In 1982 there was concern for scores of the Scholastic Aptitude Test (SAT) scores that varied greatly between states. Researchers decided to study the scores to determine what factors caused the differences between the states. 1 ********************************************************; 2 *** State average SAT scores ***; 3 ********************************************************; 4 5 dm'log;clear;output;clear'; 6 options nodate nocenter nonumber ps=512 ls=99 nolabel; 7 ODS HTML style=minimal rs=none 7! body='c:\geaghan\current\exst3201\fall2005\sas\statesat01.html' ; NOTE: Writing HTML Body file: C:\Geaghan\Current\EXST3201\Fall2005\SAS\StateSAT01.html 8 9 Title1 ''; 10 filename input1 'C:\Geaghan\Current\EXST3201\Datasets\ASCII\case1201.csv'; data StateSAT; length state $ 15; 13 infile input1 missover DSD dlm="," firstobs=2; 14 input STATE $ SAT TAKERS INCOME YEARS PUBLIC EXPEND RANK; 15 label SAT = 'Average SAT score (verbal+quantitative)' 16 Takers = 'percent of total eligible students taking the test' 17 Income = 'Median income of test takers' 18 years = 'Ave of yrs of formal study in social sci, nat sci & humanities' 19 public = 'percentage of takers attending public secondary schools' 20 expend = 'total state expenditure on secondary schools' 21 rank = 'median percentile ranking of test takers in their secondary school classes'; 22 datalines; NOTE: The infile INPUT1 is: File Name=C:\Geaghan\Current\EXST3201\Datasets\ASCII\case1201.csv, RECFM=V,LRECL=256 NOTE: 50 records were read from the infile INPUT1. The minimum record length was 64. The maximum record length was 99. NOTE: The data set WORK.STATESAT has 50 observations and 8 variables. NOTE: DATA statement used (Total process time): 0.01 seconds 0.02 seconds 23 run; PROC PRINT DATA=StateSAT; TITLE2 'Data Listing'; RUN; NOTE: There were 50 observations read from the data set WORK.STATESAT. NOTE: The PROCEDURE PRINT printed page 1. NOTE: PROCEDURE PRINT used (Total process time): 0.26 seconds 0.04 seconds Data Listing Obs state SAT TAKERS INCOME YEARS PUBLIC EXPEND RANK 1 Iowa SouthDakota NorthDakota Kansas Nebraska Montana Minnesota Utah Wyoming Wisconsin Oklahoma Arkansas Tennessee NewMexico Idaho Mississippi Kentucky Colorado

2 EXST3201 Chapter 12a Geaghan Fall 2005: Page 2 19 Washington Arizona Illinois Louisiana Missouri Michigan WestVirginia Alabama Ohio NewHampshire Alaska Nevada Oregon Vermont California Delaware Connecticut NewYork Maine Florida Maryland Virginia Massachusetts Pennsylvania RhodeIsland NewJersey Texas Indiana Hawaii NorthCarolina Georgia SouthCarolina options ps=45 ls=99 nolabel; 28 PROC REG DATA=StateSAT; Title2 'Fit of SAT with REG'; 29 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / VIF partial; 30 RUN; NOTE: The PROCEDURE REG printed pages 2-9. NOTE: PROCEDURE REG used (Total process time): 0.28 seconds 0.07 seconds Fit of SAT with REG The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50 Model <.0001 Error Parameter Estimates Variance Variable DF Estimate Error t Value Pr > t Inflation Intercept TAKERS INCOME YEARS PUBLIC EXPEND RANK

3 EXST3201 Chapter 12a Geaghan Fall 2005: Page 3 Fit of SAT with REG The REG Procedure Model: MODEL1 Partial Regression Residual Plot SAT Intercept SAT TAKERS SAT INCOME

4 EXST3201 Chapter 12a Geaghan Fall 2005: Page 4 Fit of SAT with REG The REG Procedure Model: MODEL1 Partial Regression Residual Plot SAT YEARS SAT PUBLIC SAT EXPEND

5 EXST3201 Chapter 12a Geaghan Fall 2005: Page 5 Fit of SAT with REG The REG Procedure Model: MODEL1 Partial Regression Residual Plot SAT RANK First, fit and examine the model. Note the VIF and numerous non-significant variables. We want to find a subset of significant variables that describe the variation in SAT scores. The approach we use will also tend to reduce the multicolinearity problem (though not necessarily). The text suggests 3 specific issues related to this process of variable selection. 1) Adjusting for a large set of explanatory variables. There may be a specific hypothesis to be tested, and that variable should be included in the analysis (though not necessarily initially), but variance can be reduced by including other explanatory variables to refine the analysis. 2) Fishing for an explanation. Often there is not clear variable that must be included or any interest in testing a particular variable. The objective is to determine which variable may be correlated and may ultimately be useful explanatory varibles. 3) Prediction. The objective may not be interpretation, but rather prediction. The variable selection procedure can be used to find useful independent variables for future predictions. Variable selection techniques. Stepwise Regression : we know that we should not delete batches of variable from regressions when they are not significant, because we do not know what the effect of adjusting for each other has been (except when order dependent TYPE I SS has been used, as in a pure polynomial). Generally, when we wish to examine the effect of adding and deleting variables, this should be done one variable at a time. There is an algorithm to do this automatically called stepwise regression. Consider p 1 candidate variables where we want to select the best set. One possibility is to fit and examine all possible combinations of the variables. This all possible regressions approach is used, but there are other more automated procedures.

6 EXST3201 Chapter 12a Geaghan Fall 2005: Page 6 1) FORWARD stepwise regression: Start by fitting all simple linear regressions between Y i and the p 1 different X i variables. These would usually be evaluated with the F statistic (TYPE III) to determine which variable is the best predictor. So, out of the p 1 variables, the one with the best F statistic is the one that fits best. Now, that the BEST variable has been selected we want to determine if there is a second variable model which is also useful. There are p 2 variables remaining, so we try our best variable together with each of the p 2 remaining variable to determine if there is a second variable that is also significant. If there is a second best variable then it is included with the first, giving a two factor model as the best model. At this point there are p 3 variables remaining, so we try our best two factors together with each of the p 3 remaining variable to determine if there is a third best variable that is significant. If there is a third best variable then it is included with the first two, giving a three factor model as the best model. This process is repeated until no variable can be found that will enter the model at some predetermined criteria. With Forward stepwise regression, once a variable is entered into the model, it stays in the model. There are no provisions for removing variables once entered. This process is called FORWARD stepwise regression. Sometimes, no single variable will enter and be significant, but some group of 2 or 3, between them, will lower the MSE sufficiently for all to be significant. FORWARD will not catch this. Although the F statistic is the usual criteria, other criteria can be used, such as the R 2. 2) BACKWARDS elimination. This procedure starts with a fit the full model (p 1 variables). Examine the F statistic (TYPE II, fully adjusted for all other variables in the model). We first find the least significant of the p 1 variables, and eliminate this variable only. THIS IS DONE ONE VARIABLE AT A TIME. Now, refit the full model, minus the eliminated variable (p 2 variables). Again find the least significant and eliminate this variable. Repeat until all variables in the model are significant at some predetermine level of significance. Sometimes, a variable added to a model by FORWARD selection is not significant at some later point, after being adjusted for other variables in the model. Since no variable can be removed once entered, backwards elimination can avoid this problem. 3) STEPWISE selection (FORWARD selection with a backward glance). This selection proceeds as does FORWARD, but after each addition all variables in the model are examined to insure that they remain significant. If not significant, they are removed from the model. Selection criteria 2 R p, 2 AdjR p and MSE p can be used to graphically compare and evaluate models. The subscript p refers to the number of parameters in the model Mallow's Cp criterion Use of this statistic presumes no bias in the full model MSE, so the full model should be carefully chosen to have little or no multicolinearity. The Cp statistics will be approximately equal to p if there is no bias in the regression model ˆ ˆ = Cp p ( n p full ) σ σ 2 ˆ σ full

7 EXST3201 Chapter 12a Geaghan Fall 2005: Page 7 SAS application Selection options: FORWARD, BACKWARD, STEPWISE, NONE (full model). These do the selection techniques discussed above. Additional options for working with these are; SLENTRY = alpha value; level of entry for stepwise and forward SLSTAY = alpha value; level of remaining for backward and stepwise INCLUDE = n; automatically puts the first n variables of the list into the regression permanently (they will not be removed) RSQUARE, ADJRSQ, CP : These are not sequential model development techniques. These are more like all possible regressions analysis, but they simply calculate the R 2, AdjR 2 and the C p statistic for all models to find the best ones. Options for controlling these techniques are: START = n STOP = n (will stop all methods) BEST = n 32 options ps=512 ls=111 nolabel; 33 PROC REG DATA=StateSAT; Title2 'Forward selection stepwise regression'; 34 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / selection=forward; 35 RUN; NOTE: The PROCEDURE REG printed page 10. NOTE: PROCEDURE REG used (Total process time): 0.17 seconds 0.10 seconds Forward selection stepwise regression The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50 Forward Selection: Step 1 Variable RANK Entered: R-Square = and C(p) = Model <.0001 Error Intercept RANK <.0001 Bounds on condition number: 1, 1 -

8 EXST3201 Chapter 12a Geaghan Fall 2005: Page 8 Forward Selection: Step 2 Variable YEARS Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 RANK <.0001 Bounds on condition number: 1.005, Forward Selection: Step 3 Variable EXPEND Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: , Forward Selection: Step 4 Variable PUBLIC Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS PUBLIC EXPEND RANK <.0001 Bounds on condition number: ,

9 EXST3201 Chapter 12a Geaghan Fall 2005: Page 9 Forward Selection: Step 5 Variable TAKERS Entered: R-Square = and C(p) = Model <.0001 Error Intercept TAKERS YEARS PUBLIC EXPEND RANK Bounds on condition number: , No other variable met the significance level for entry into the model. Summary of Forward Selection Variable Number Partial Model Step Entered Vars In R-Square R-Square C(p) F Value Pr > F 1 RANK < YEARS < EXPEND PUBLIC TAKERS PROC REG DATA=StateSAT; Title2 'Backward elimination stepwise regression'; 37 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / selection=backward; 38 RUN; NOTE: The PROCEDURE REG printed page 11. NOTE: PROCEDURE REG used (Total process time): 0.13 seconds 0.05 seconds Backward elimination stepwise regression The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50

10 EXST3201 Chapter 12a Geaghan Fall 2005: Page 10 Backward Elimination: Step 0 All Variables Entered: R-Square = and C(p) = Model <.0001 Error Intercept TAKERS INCOME YEARS PUBLIC EXPEND RANK Bounds on condition number: , Backward Elimination: Step 1 Variable INCOME Removed: R-Square = and C(p) = Model <.0001 Error Intercept TAKERS YEARS PUBLIC EXPEND RANK Bounds on condition number: , Backward Elimination: Step 2 Variable TAKERS Removed: R-Square = and C(p) = Model <.0001 Error Intercept YEARS PUBLIC EXPEND RANK <.0001

11 EXST3201 Chapter 12a Geaghan Fall 2005: Page 11 Bounds on condition number: , Backward Elimination: Step 3 Variable PUBLIC Removed: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: , All variables left in the model are significant at the level. Summary of Backward Elimination Variable Number Partial Model Step Removed Vars In R-Square R-Square C(p) F Value Pr > F 1 INCOME TAKERS PUBLIC PROC REG DATA=StateSAT; Title2 'Stepwise regression'; 40 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / 41 selection=stepwise slentry=0.20 slstay=0.05; 42 RUN; NOTE: The PROCEDURE REG printed page 12. NOTE: PROCEDURE REG used (Total process time): 0.12 seconds 0.04 seconds Stepwise regression The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50

12 EXST3201 Chapter 12a Geaghan Fall 2005: Page 12 Stepwise Selection: Step 1 Variable RANK Entered: R-Square = and C(p) = Model <.0001 Error Intercept RANK <.0001 Bounds on condition number: 1, 1 Stepwise Selection: Step 2 Variable YEARS Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 RANK <.0001 Bounds on condition number: 1.005, Stepwise Selection: Step 3 Variable EXPEND Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: ,

13 EXST3201 Chapter 12a Geaghan Fall 2005: Page 13 Stepwise Selection: Step 4 Variable PUBLIC Entered: R-Square = and C(p) = Model <.0001 Error Intercept YEARS PUBLIC EXPEND RANK <.0001 Bounds on condition number: , Stepwise Selection: Step 5 Variable PUBLIC Removed: R-Square = and C(p) = Model <.0001 Error Intercept YEARS <.0001 EXPEND RANK <.0001 Bounds on condition number: , All variables left in the model are significant at the level. The stepwise method terminated because the next variable to be entered was just removed. Summary of Stepwise Selection Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F 1 RANK < YEARS < EXPEND PUBLIC PUBLIC PROC REG DATA=StateSAT; Title2 'Reduced model with diagnostics'; 44 MODEL SAT = YEARS EXPEND RANK / VIF; 45 output out=next1 r=resid p=yhat lclm=lclm uclm=uclm lcl=lcli ucl=ucli 46 student=student rstudent=rstudent cookd=cookd h=leverage dffits=dffits; 47 RUN; NOTE: The data set WORK.NEXT1 has 50 observations and 19 variables. NOTE: The PROCEDURE REG printed page 13. NOTE: PROCEDURE REG used (Total process time): 0.13 seconds 0.06 seconds

14 EXST3201 Chapter 12a Geaghan Fall 2005: Page 14 Reduced model with diagnostics The REG Procedure Model: MODEL1 Dependent Variable: SAT Number of Observations Read 50 Number of Observations Used 50 Model <.0001 Error Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variance Variable DF Estimate Error t Value Pr > t Inflation Intercept YEARS < EXPEND RANK < proc print data=next1; 50 var state SAT yhat resid student rstudent cookd leverage dffits; 51 RUN; NOTE: There were 50 observations read from the data set WORK.NEXT1. NOTE: The PROCEDURE PRINT printed page 14. NOTE: PROCEDURE PRINT used (Total process time): 0.07 seconds 0.03 seconds 52 options ps=60 ls=132; 53 proc plot data=next1; TITLE2 'Various plot with state variable'; 54 plot resid * yhat = state / vref=0; 55 RUN; 55! options ps=40 ls=132; NOTE: There were 50 observations read from the data set WORK.NEXT1. NOTE: The PROCEDURE PLOT printed page 15. NOTE: PROCEDURE PLOT used (Total process time): 0.06 seconds 0.00 seconds 56 proc plot data=next1; TITLE2 'Various plot with state variable'; 57 plot student * state / vref=2; 58 plot rstudent * state / vref=2; 59 plot leverage * state / vref=0.16; /* 2p/n = 2*4/50 = 0.16 */ 60 plot cookd * state / vref=1; 61 plot dffits * state / vref=1; 62 RUN; 62! OPTIONS PS=256 ls=132; NOTE: There were 50 observations read from the data set WORK.NEXT1. NOTE: The PROCEDURE PLOT printed pages NOTE: PROCEDURE PLOT used (Total process time): 0.06 seconds 0.00 seconds

15 EXST3201 Chapter 12a Geaghan Fall 2005: Page 15 Reduced model with diagnostics Obs state SAT yhat resid student rstudent cookd leverage dffits 1 Iowa SouthDakota NorthDakota Kansas Nebraska Montana Minnesota Utah Wyoming Wisconsin Oklahoma Arkansas Tennessee NewMexico Idaho Mississippi Kentucky Colorado Washington Arizona Illinois Louisiana Missouri Michigan WestVirginia Alabama Ohio NewHampshire Alaska Nevada Oregon Vermont California Delaware Connecticut NewYork Maine Florida Maryland Virginia

16 EXST3201 Chapter 12a Geaghan Fall 2005: Page Massachusetts Pennsylvania RhodeIsland NewJersey Texas Indiana Hawaii NorthCarolina Georgia SouthCarolina Various plot with state variable Plot of resid*yhat. Symbol is value of state. resid 60 + I T 40 + N S I C K I 20 + V M N N U M M O O H N M D M N A N I R O M W W T V C A W A N C L K G P N A W M S yhat NOTE: 2 obs hidden.

17 EXST3201 Chapter 12a Geaghan Fall 2005: Page 17 Various plot with state variable Plot of student*state. Legend: A = 1 obs, B = 2 obs, etc. student A A A A A A A A A A A A A A A A A A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A -2 + A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a Plot of rstudent*state. Legend: A = 1 obs, B = 2 obs, etc. rstudent A A A A A A A A A A A A A A A A A A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A -2 + A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state state

18 EXST3201 Chapter 12a Geaghan Fall 2005: Page 18 Various plot with state variable Plot of leverage*state. Legend: A = 1 obs, B = 2 obs, etc. leverage A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state Plot of cookd*state. Legend: A = 1 obs, B = 2 obs, etc. 2 + A cookd A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state

19 EXST3201 Chapter 12a Geaghan Fall 2005: Page 19 Various plot with state variable Plot of dffits*state. Legend: A = 1 obs, B = 2 obs, etc. dffits A A A A A A A A A 0 + A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A -2 + A A A A A C C C D F G H I I I I K K L M M M M M M M M N N N N N N N N O O O P R S S T T U V V W W W W l l r r a o o e l e a d l n o a e o a a a i i i i o e e e e e e o o h k r e h o o e e t e i a e i y a a i k l l n l o o w a l d w n n u i r s c n s s n b v w w w w r r i l e n o u u n x a r r s s s o b s z a i o n a r r a h i i a s t i n y s h n s s t r a H J M Y t t o a g n d t t n a h m g h t c m a k o n f r e w i g i o n a a u s e l a i e i o a a d a e e o h h h o s e h h e s o i i V o i m a n s o a c a d i i o n s c i a c g s s u n s a m r x r C D o n y I C D s n n n i n n a a a r d t r a a i a k a n h a o s r a k p s i k a a m l s a a s t i g r s g s n o i e s y n d u n t i i a s e c r k a v l r k e a t g i i c a s a p h y o o o a a o o e o i n a u e p i l t n n l t n n t t i r i a i d i a i t e n a n a s a a state 63 PROC UNIVARIATE DATA=NEXT1 NORMAL PLOT; VAR resid; RUN; NOTE: The PROCEDURE UNIVARIATE printed page 21. NOTE: PROCEDURE UNIVARIATE used (Total process time): 0.09 seconds 0.04 seconds 64

20 EXST3201 Chapter 12a Geaghan Fall 2005: Page 20 Various plot with state variable The UNIVARIATE Procedure Variable: resid Moments N 50 Sum Weights 50 0 Sum Observations 0 Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation. Std Error Basic Statistical Measures Location Variability Std Deviation Median Variance Mode. Range Interquartile Range Tests for Location: Mu0=0 Test -Statistic p Value Student's t t 0 Pr > t Sign M 1 Pr >= M Signed Rank S 57.5 Pr >= S Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D < Cramer-von Mises W-Sq Pr > W-Sq Anderson-Darling A-Sq Pr > A-Sq Quantiles (Definition 5) Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest----- Value Obs Value Obs

21 EXST3201 Chapter 12a Geaghan Fall 2005: Page 21 Stem Leaf Boxplot Normal Probability Plot *+ * * * ** *** **** ** *** *--+--* *** **** **** ** * ** * * * * * Multiply Stem.Leaf by 10** Selection options: FORWARD, BACKWARD, STEPWISE, NONE (full model) These do the selection techniques discussed. Additional options working with these are; SLENTRY = alpha value, level of entry for stepwise and forward SLSTAY = alpha value, level of remaining for backward and stepwise INCLUDE = n, automatically puts the first n variables of the list into the regression (do not remove) RSQUARE, ADJRSQ, CP : These are like all possible regressions. These will simply calculate the R 2, AdjR 2 and C p statistic for all models to find the best ones. Other criteria include Akaike's information criterion (AIC), Schwartz s Bayesian information criterion (SBC), SSE, MSE and others. Additional options START = n STOP = n (will stop all methods) BEST = n eg. with these options we could determine the best 5 models at each level, starting with 3 factors, ending with 6 factors 65 PROC REG DATA=StateSAT; Title2 'RSquare procedure'; 66 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / 67 selection=rsquare start=1 stop=4 best=5 cp adjrsq aic sbc mse; 68 RUN; 69 NOTE: The PROCEDURE REG printed page 22. NOTE: PROCEDURE REG used (Total process time): 0.12 seconds 0.06 seconds

22 EXST3201 Chapter 12a Geaghan Fall 2005: Page 22 RSquare procedure The REG Procedure Model: MODEL1 Dependent Variable: SAT R-Square Selection Method Number of Observations Read 50 Number of Observations Used 50 Number in Adjusted Model R-Square R-Square C(p) AIC BIC MSE SBC SSE Variables in Model YEARS RANK EXPEND RANK TAKERS YEARS INCOME RANK PUBLIC RANK TAKERS RANK YEARS EXPEND RANK INCOME YEARS RANK TAKERS YEARS RANK YEARS PUBLIC RANK PUBLIC EXPEND RANK TAKERS YEARS EXPEND YEARS PUBLIC EXPEND RANK TAKERS YEARS EXPEND RANK INCOME YEARS EXPEND RANK INCOME YEARS PUBLIC RANK TAKERS INCOME YEARS RANK TAKERS YEARS PUBLIC RANK TAKERS YEARS PUBLIC EXPEND RANK INCOME YEARS PUBLIC EXPEND RANK TAKERS INCOME YEARS EXPEND RANK TAKERS INCOME YEARS PUBLIC RANK TAKERS INCOME PUBLIC EXPEND RANK TAKERS INCOME YEARS PUBLIC EXPEND

23 EXST3201 Chapter 12a Geaghan Fall 2005: Page proc print data=parm1; 72 * var ADJRSQ AIC BIC CP MSE RSQUARE SBC SSE; 73 run; NOTE: There were 24 observations read from the data set WORK.PARM1. NOTE: The PROCEDURE PRINT printed page 23. NOTE: PROCEDURE PRINT used (Total process time): 0.08 seconds 0.03 seconds RSquare procedure Obs _MODEL TYPE DEPVAR RMSE_ Intercept TAKERS INCOME YEARS PUBLIC EXPEND RANK 1 MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT MODEL1 PARMS SAT Obs SAT _IN P EDF SSE MSE RSQ ADJRSQ CP AIC BIC SBC_ PROC REG DATA=StateSAT; Title2 'Cp criterion'; 76 MODEL SAT = TAKERS INCOME YEARS PUBLIC EXPEND RANK / 77 selection=cp start=2 stop=5 best=10; 78 RUN; ods html close; 81 quit; NOTE: The PROCEDURE REG printed page 24. NOTE: PROCEDURE REG used (Total process time): 0.13 seconds 0.03 seconds

24 EXST3201 Chapter 12a Geaghan Fall 2005: Page 24 Cp criterion The REG Procedure Model: MODEL1 Dependent Variable: SAT C(p) Selection Method Number of Observations Read 50 Number of Observations Used 50 Number in Model C(p) R-Square Variables in Model YEARS PUBLIC EXPEND RANK YEARS EXPEND RANK TAKERS YEARS EXPEND RANK INCOME YEARS EXPEND RANK TAKERS YEARS PUBLIC EXPEND RANK INCOME YEARS PUBLIC EXPEND RANK TAKERS INCOME YEARS EXPEND RANK INCOME YEARS RANK INCOME YEARS PUBLIC RANK TAKERS INCOME YEARS RANK

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed. EXST3201 Chapter 13c Geaghan Fall 2005: Page 1 Linear Models Y ij = µ + βi + τ j + βτij + εijk This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

More information

The program for the following sections follows.

The program for the following sections follows. Homework 6 nswer sheet Page 31 The program for the following sections follows. dm'log;clear;output;clear'; *************************************************************; *** EXST734 Homework Example 1

More information

EXST7015: Estimating tree weights from other morphometric variables Raw data print

EXST7015: Estimating tree weights from other morphometric variables Raw data print Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

Odor attraction CRD Page 1

Odor attraction CRD Page 1 Odor attraction CRD Page 1 dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************;

More information

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR= = -/\<>*; ODS LISTING; dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************; *** Moore, David

More information

Handout 1: Predicting GPA from SAT

Handout 1: Predicting GPA from SAT Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

New Educators Campaign Weekly Report

New Educators Campaign Weekly Report Campaign Weekly Report Conversations and 9/24/2017 Leader Forms Emails Collected Text Opt-ins Digital Journey 14,661 5,289 4,458 7,124 317 13,699 1,871 2,124 Pro 13,924 5,175 4,345 6,726 294 13,086 1,767

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Model Selection Procedures

Model Selection Procedures Model Selection Procedures Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Model Selection Procedures Consider a regression setting with K potential predictor variables and you wish to explore

More information

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 984. y ˆ = a + b x + b 2 x 2K + b n x n where n is the number of variables Example: In an earlier bivariate

More information

Chapter 5 Linear least squares regression

Chapter 5 Linear least squares regression Chapter 5 Linear least squares regression Consider first simple linear regression. For now, the model of interest is only the model of the conditional expectations; the distribution of the residuals is

More information

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black Name: Date: Score : 42 Data collection, presentation and application Frequency tables. (Answer question 1 on

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved. LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018 Cooperative Program Allocation Budget Receipts May 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior Year

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017 Cooperative Program Allocation Budget Receipts October 2017 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2017-2018 2016-2017 Prior

More information

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018 Cooperative Program Allocation Budget Receipts October 2018 Cooperative Program Allocation Budget Current Current $ Change % Change Month Month from from Contribution Sources 2018-2019 2017-2018 Prior

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 3: Principal Component Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical

More information

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world.

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world. Standard Indicator 4.3.1 That s the Latitude! Purpose Students will use latitude and longitude to locate places in Indiana and other parts of the world. Materials For the teacher: graph paper, globe showing

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Abortion Facilities Target College Students

Abortion Facilities Target College Students Target College Students By Kristan Hawkins Executive Director, Students for Life America Ashleigh Weaver Researcher Abstract In the Fall 2011, Life Dynamics released a study entitled, Racial Targeting

More information

Multicollinearity Exercise

Multicollinearity Exercise Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there

More information

Intercity Bus Stop Analysis

Intercity Bus Stop Analysis by Karalyn Clouser, Research Associate and David Kack, Director of the Small Urban and Rural Livability Center Western Transportation Institute College of Engineering Montana State University Report prepared

More information

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc. Chapter 2 Organizing and Summarizing Data Section 2.1 Organizing Qualitative Data Objectives 1. Organize Qualitative Data in Tables 2. Construct Bar Graphs 3. Construct Pie Charts When data is collected

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

, District of Columbia

, District of Columbia State Capitals These are the State Seals of each state. Fill in the blank with the name of each states capital city. (Hint: You may find it helpful to do the word search first to refresh your memory.),

More information

Hourly Precipitation Data Documentation (text and csv version) February 2016

Hourly Precipitation Data Documentation (text and csv version) February 2016 I. Description Hourly Precipitation Data Documentation (text and csv version) February 2016 Hourly Precipitation Data (labeled Precipitation Hourly in Climate Data Online system) is a database that gives

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

ssh tap sas913, sas

ssh tap sas913, sas B. Kedem, STAT 430 SAS Examples SAS8 ===================== ssh xyz@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Multiple Regression ====================== 0. Show

More information

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation by Issue Qualified Non-Paid Circulation Qualified Paid Circulation Individual Assoc. Total Assoc. Total Total Requester Group Qualified

More information

Printable Activity book

Printable Activity book Printable Activity book 16 Pages of Activities Printable Activity Book Print it Take it Keep them busy Print them out Laminate them or Put them in page protectors Put them in a binder Bring along a dry

More information

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 109,, doi:10.1029/2004jd005099, 2004 Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements Cristina L. Archer

More information

Descriptions of post-hoc tests

Descriptions of post-hoc tests Experimental Statistics II Page 81 Descriptions of post-hoc tests Post-hoc or Post-ANOVA tests! Once you have found out some treatment(s) are different, how do you determine which one(s) are different?

More information

Challenge 1: Learning About the Physical Geography of Canada and the United States

Challenge 1: Learning About the Physical Geography of Canada and the United States 60ºN S T U D E N T H A N D O U T Challenge 1: Learning About the Physical Geography of Canada and the United States 170ºE 10ºW 180º 20ºW 60ºN 30ºW 1 40ºW 160ºW 50ºW 150ºW 60ºW 140ºW N W S E 0 500 1,000

More information

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON ETHAN A. BLIGHT Blight Investigations, Gainesville, FL ABSTRACT Misidentification of the American brown bear (Ursus arctos,

More information

JAN/FEB MAR/APR MAY/JUN

JAN/FEB MAR/APR MAY/JUN QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Qualified Non-Paid Qualified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals

More information

EXST 7015 Fall 2014 Lab 08: Polynomial Regression

EXST 7015 Fall 2014 Lab 08: Polynomial Regression EXST 7015 Fall 2014 Lab 08: Polynomial Regression OBJECTIVES Polynomial regression is a statistical modeling technique to fit the curvilinear data that either shows a maximum or a minimum in the curve,

More information

This is a Split-plot Design with a fixed single factor treatment arrangement in the main plot and a 2 by 3 factorial subplot.

This is a Split-plot Design with a fixed single factor treatment arrangement in the main plot and a 2 by 3 factorial subplot. EXST3201 Chapter 13c Geaghan Fall 2005: Page 1 Linear Models Y ij = µ + τ1 i + γij + τ2k + ττ 1 2ik + εijkl This is a Split-plot Design with a fixed single factor treatment arrangement in the main plot

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions Methodological Club Convergence and Clustering of U.S. State-Level CO 2 Emissions J. Wesley Burnett Division of Resource Management West Virginia University Wednesday, August 31, 2013 Outline Motivation

More information

Additional VEX Worlds 2019 Spot Allocations

Additional VEX Worlds 2019 Spot Allocations Overview VEX Worlds 2019 Spot s Qualifying spots for the VEX Robotics World Championship are calculated twice per year. On the following table, the number in the column is based on the number of teams

More information

Osteopathic Medical Colleges

Osteopathic Medical Colleges Osteopathic Medical Colleges Matriculants by U.S. States and Territories Entering Class 0 Prepared by the Research Department American Association of Colleges of Osteopathic Medicine Copyright 0, AAM All

More information

A. Geography Students know the location of places, geographic features, and patterns of the environment.

A. Geography Students know the location of places, geographic features, and patterns of the environment. Learning Targets Elementary Social Studies Grade 5 2014-2015 A. Geography Students know the location of places, geographic features, and patterns of the environment. A.5.1. A.5.2. A.5.3. A.5.4. Label North

More information

What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

What Lies Beneath: A Sub- National Look at Okun s Law for the United States. What Lies Beneath: A Sub- National Look at Okun s Law for the United States. Nathalie Gonzalez Prieto International Monetary Fund Global Labor Markets Workshop Paris, September 1-2, 2016 What the paper

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

Online Appendix: Can Easing Concealed Carry Deter Crime?

Online Appendix: Can Easing Concealed Carry Deter Crime? Online Appendix: Can Easing Concealed Carry Deter Crime? David Fortunato University of California, Merced dfortunato@ucmerced.edu Regulations included in institutional context measure As noted in the main

More information

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008 U.S. DEPARTMENT OF AGRICULTURE FOOD AND NUTRITION SERVICE PROGRAM ACCOUNTABILITY AND ADMINISTRATION DIVISION QUALITY

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

Preview: Making a Mental Map of the Region

Preview: Making a Mental Map of the Region Preview: Making a Mental Map of the Region Draw an outline map of Canada and the United States on the next page or on a separate sheet of paper. Add a compass rose to your map, showing where north, south,

More information

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant) Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u 2 i /X i ) σ 2 i (In practice this means the spread

More information

Comparison of a Population Means

Comparison of a Population Means Analysis of Variance Interested in comparing Several treatments Several levels of one treatment Comparison of a Population Means Could do numerous two-sample t-tests but... ANOVA provides method of joint

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Alpine Funds 2017 Tax Guide

Alpine Funds 2017 Tax Guide Alpine s 2017 Guide Alpine Dynamic Dividend ADVDX 1/30/17 1/31/17 1/31/17 0.020000000 0.019248130 0.000000000 0.00000000 0.019248130 0.013842273 0.000000000 0.000000000 0.000751870 0.000000000 0.00 0.00

More information

Summary of Natural Hazard Statistics for 2008 in the United States

Summary of Natural Hazard Statistics for 2008 in the United States Summary of Natural Hazard Statistics for 2008 in the United States This National Weather Service (NWS) report summarizes fatalities, injuries and damages caused by severe weather in 2008. The NWS Office

More information

2005 Mortgage Broker Regulation Matrix

2005 Mortgage Broker Regulation Matrix 2005 Mortgage Broker Regulation Matrix Notes on individual states follow the table REG EXEMPTIONS LIC-EDU LIC-EXP LIC-EXAM LIC-CONT-EDU NET WORTH BOND MAN-LIC MAN-EDU MAN-EXP MAN-EXAM Alabama 1 0 2 0 0

More information

Alpine Funds 2016 Tax Guide

Alpine Funds 2016 Tax Guide Alpine s 2016 Guide Alpine Dynamic Dividend ADVDX 01/28/2016 01/29/2016 01/29/2016 0.020000000 0.017621842 0.000000000 0.00000000 0.017621842 0.013359130 0.000000000 0.000000000 0.002378158 0.000000000

More information

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that

More information

Office of Special Education Projects State Contacts List - Part B and Part C

Office of Special Education Projects State Contacts List - Part B and Part C Office of Special Education Projects State Contacts List - Part B and Part C Source: http://www.ed.gov/policy/speced/guid/idea/monitor/state-contactlist.html Alabama Customer Specialist: Jill Harris 202-245-7372

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Failure Time of System due to the Hot Electron Effect

Failure Time of System due to the Hot Electron Effect of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574) LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574) LABORATORY REPORT If you have any questions concerning this report, please do not hesitate to call us at (800) 332-4345 or (574) 233-4777. This report may not be reproduced, except in full, without written

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

Outline. Topic 13 - Model Selection. Predicting Survival - Page 350. Survival Time as a Response. Variable Selection R 2 C p Adjusted R 2 PRESS

Outline. Topic 13 - Model Selection. Predicting Survival - Page 350. Survival Time as a Response. Variable Selection R 2 C p Adjusted R 2 PRESS Topic 13 - Model Selection - Fall 2013 Variable Selection R 2 C p Adjusted R 2 PRESS Outline Automatic Search Procedures Topic 13 2 Predicting Survival - Page 350 Surgical unit wants to predict survival

More information

North American Geography. Lesson 2: My Country tis of Thee

North American Geography. Lesson 2: My Country tis of Thee North American Geography Lesson 2: My Country tis of Thee Unit Overview: As students work through the activities in this unit they will be introduced to the United States in general, different regions

More information

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage]

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage] Crop Progress ISSN: 00 Released October, 0, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United s Department of Agriculture (USDA). Corn Mature Selected s [These

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3 Fall, 2013 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

Infant Mortality: Cross Section study of the United State, with Emphasis on Education

Infant Mortality: Cross Section study of the United State, with Emphasis on Education Illinois State University ISU ReD: Research and edata Stevenson Center for Community and Economic Development Arts and Sciences Fall 12-15-2014 Infant Mortality: Cross Section study of the United State,

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF). STAT3503 Test 2 NOTE: a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF). b. YOU MAY USE ANY ELECTRONIC CALCULATOR. c. FOR FULL MARKS YOU MUST SHOW THE FORMULA YOU USE

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases Summer Undergraduate Mathematical Sciences Research Institute (SUMSRI) Lindsay Kellam, Queens College kellaml@queens.edu

More information

3rd Quartile. 1st Quartile) Minimum

3rd Quartile. 1st Quartile) Minimum EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain

More information

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3 MUNICIPAL FUNDS Arizona (MZA) California Municipal Income Trust (BFZ) California Municipal 08 Term Trust (BJZ) California Quality (MCA) California Quality (MUC) California (MYC) Florida Municipal 00 Term

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional s by Location of Permanent Home Address and Degree Level Louisiana Acadia 19 13 0 3 0 3 0 0 0 Allen 5 5 0 0 0 0 0 0 0 Ascension 307 269 2 28 1 6 0 1 0 Assumption 14 12 0 1 0 1 0 0 0 Avoyelles 6 4 0 1 0

More information

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis Lecture 5: Ecological distance metrics; Principal Coordinates Analysis Univariate testing vs. community analysis Univariate testing deals with hypotheses concerning individual taxa Is this taxon differentially

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

The Steps to Follow in a Multiple Regression Analysis

The Steps to Follow in a Multiple Regression Analysis ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3 Page 1 Tensile Strength Experiment Investigate the tensile strength of a new synthetic fiber. The factor is the weight percent

More information

Meteorology 110. Lab 1. Geography and Map Skills

Meteorology 110. Lab 1. Geography and Map Skills Meteorology 110 Name Lab 1 Geography and Map Skills 1. Geography Weather involves maps. There s no getting around it. You must know where places are so when they are mentioned in the course it won t be

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining Outline Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining Dimensionality Reduction Introduction Principal Components Analysis Singular Value Decomposition Multidimensional

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear Model Selection and Regularization Recall the linear model Y = 0 + 1 X 1 + + p X p +. In the lectures

More information

JAN/FEB MAR/APR MAY/JUN

JAN/FEB MAR/APR MAY/JUN QF 1.100 (Build 1010) Widget Publishing, Inc Page: 1 Circulation Breakdown by Issue Analyzed Nonpaid and Verified Paid Previous This Previous This Total Total issue Removals Additions issue issue Removals

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information