Decision 411: Class 8

Size: px
Start display at page:

Download "Decision 411: Class 8"

Transcription

1 Decision 411: Class 8 One more way to model seasonality Advanced regression (power tools): Stepwise and all possible regressions 1-way ANOVA Multifactor ANOVA General Linear Models (GLM) Out-of of-sample validation of regression models Logistic regression

2 One more way to model seasonality with regression Suppose a time series has an underlying stable trend and stable seasonal pattern (either additive or multiplicative), with effects of other independent variables added on,, so effects of the independent variables are not seasonal. Suppose that you also have a externally supplied seasonal index. Then it may be appropriate to use the seasonal index and/or the seasonal index multiplied by the time index as separate regressors to capture the seasonal part of the overall pattern.

3 Details Let SINDEX denote a seasonal index variable and let TIME denote a time index variable. Then by including TIME, SINDEX, and SINDEX*TIME as potential regressors,, you can model a range of patterns with stable trend and seasonality. Depending on the amount of trend and the degree to which seasonal swings get larger as the level of the series rises, perhaps not all of these terms would be significant.

4 Depending on the estimated coefficients, you could fit any of these patterns: Only SINDEX is significant Seasonal pattern with no trend (no real difference between additive and multiplicative) Only TIME and SINDEX are significant additive seasonal pattern with trend SINDEX*TIME is significant multiplicative seasonal pattern with trend

5 Stepwise regression Automatic stepwise variable selection is a standard feature in multiple regression (a right- mouse-button analysis option in Statgraphics)

6 Backward stepwise regression Automates the common process of sequentially removing the variable with the smallest t-stat, t if that t-stat t is less than a specified threshold. The F-to-remove parameter is the square of the minimum t-stat t needed to remain in the model, given the other variables still present. Can be used to fine-tune the selection of variables, but should not be used to go fishing for significant variables in a large pool.

7 Forward stepwise Automates the process of sequentially adding the variable that would have the highest t-stat t if it were the next variable entered. F-to-enter is the square of the minimum t-stat needed to enter, given the other variables already present. Can be used (with care!) to go fishing for significant variables in a large pool. It s s a potentially powerful data exploration tool, because it does something that would be hard to do by hand.

8 Example: enrollment revisited Dependent variable: ROLL Potential regressors (8): lag(roll,1), lag(roll,2) HSGRAD, lag(hsgrad,1) UNEMP, lag (UNEMP,1) INCOME, lag(income,1) Previously we had considered models whose equations involved up to 2 lags of ROLL and up to 1 lag of HSGRAD and UNEMP. INCOME is an additional predictive variable that was not considered before. Note that YOU are responsible for anticipating all transformations that may be useful.

9 Here is the all likely suspects model: many variables are not significant

10 With F-toF to-enter and F-toF to-remove set at 4.0, both forward and backward stepwise regression lead to a 2-variable 2 model.

11 Details of the steps in the backward stepwise regression: you can see here by how much the MSE changes as variables are removed. MSE actually improves (i.e., gets smaller) when some of the least significant variables are removed. In this case the smallest MSE was actually reached at step 2, although MSE s after steps are the same for all practical purposes. The INCOME variables are removed first, followed by the HSGRAD variables, leaving UNEMP as the only exogeous variable after step 4. MSE goes up somewhat in steps 5 & 6, although the variables removed at those points are technically not significant at the 0.05 level.

12 When F-toF to-enter and F-toF to-remove are lowered to 3.0, which corresponds to a permissible t-stat t as low as 1.73 in magnitude, forward stepwise still leads to the 2-variable model, while backward stepwise leads to this 4-variable 4 model in which one variable (lag(roll,2)) has a t-stat t of -1.86

13 Caveats Stepwise regression (or any other automatic model selection method) is not a substitute for logical thinking and graphical data exploration There is a danger of overfitting from fishing in too large a pool of potential variables and finding spurious regressors. Resist the urge to lower F-toF to-enter or F-toF to-remove below 3.0 to find more significant variables. Ideally you should hold out a significant sample of data while selecting variables, for later out-of of-sample validation of model. Validation is not honest if you peeked at the hold-out out data while trying to identify significant variables.

14 All possible regressions Automatic stepwise selection (forward or backward) is efficient, but not guaranteed to find the best model that can be constructed from a given set of potential regressors. It is computationally feasible to test all possible regressions that can be constructed with k out of m potential regressors. Beware: danger of getting obsessed with rankings & forgetting about logic & intuition!

15 All possible regressions, continued In Statgraphics,, all-possible possible-regressions is the Regression model selection procedure Analysis options allow you to set the maximum number of variables (default is 5). Outputs include rankings of models by adjusted R-R squared and Mallows C p stat. Pane options for these reports allow you to limit the number of best models shown for a given number of variables (default is 5).

16 What is the Mallows C p stat? MSE(subset of size p) Cp = p+ ( n p) 1 MSE(all variables) where p = # coeff s in subset model, including constant Ideally, C p should be small and p Note that C p = p for the all-variable model, so C p < p if subset model has lower MSE than all-variable model. Ideally you should approach or even beat the all- variable MSE with fewer variables. Ranking by C p penalizes more heavily for model complexity than ranking by adjusted R 2

17 Example: enrollment revisited Dependent variable: ROLL Potential regressors (8): lag(roll,1), lag(roll,2) HSGRAD, lag(hsgrad,1) UNEMP, lag (UNEMP,1) INCOME, lag(income,1) Number of possible models = 2 8 = 256

18 Lining up the suspects: Since there are only 8 potential regressors,, it is feasible to ask for reports on all possible models with up to 8 regressors (just for purposes of illustration!!). 256 models are fitted to only 27 data points. Overkill? The default maximum number of variables is 5, which is usually plenty! In most applications, I would not recommend raising it.

19 How does it work so fast? It is actually unnecessary for the computer to run a complete set of calculations from scratch for each possible regression. Once the correlation matrix has been computed from the original variables, a simple sequence of calculations on the correlation matrix can determine the R-squaredR squared s and MSE s of all possible models. The big problem is the length of the reports!

20 Ranking by R-squaredR Note hair-splitting differences in adjusted R-R squared among models at the top of the rankings. Easy to get lost here! Since the dependent variable is non-stationary, all of the good models have adj. R-sq. R very close to 100%. Some 7-7 variable models, and even the 8-variable 8 model, show up near the top of the rankings (yike( yike!). As usual, it s s better to focus on MSE to decide whether additional variables are worth their weight in model complexity: MSE goes down as R-squared R goes up

21 Plot of R-squared R vs. # coefficients Plot of adjusted R-squared R vs. # coefficients (including constant) shows that most of the variance is explained by the first regressor added, which happens to be lag(roll,1). This is what should be expected with a nonstationary (strongly trended) dependent variable.

22 Ranking by C p Ranking by C p favors models with fewer coefficients and discriminates more finely among the models at the top. The best model includes lag(roll,1), lag(roll,2), unemp,, and lag(unemp,1)

23 Plot of C p vs. # coefficients C p plot shows that C p is minimized at 5 coefficients (i.e., 4 regressors + constant). The 4-coefficient 4 model also yields C p close to p. Note: the Y-axis Y scale had to be adjusted to show only small values of C p.

24 Details of best-c p model It s s necessary to run a manual regression to see the coefficients. Note that coefficients of unemp and lag(unemp,1) are roughly equal and opposite. Would a difference be just as good?

25 Restarting from a different set of transformed variables Let s s re-run run the all-possible possible- regressions with the lagged regressors replaced by differences.. This allows the selection algorithm to choose the difference alone or the difference together with the unlagged variable, which would be logically equivalent to including the lags separately.

26 New ranking by C p Here the two clearly-best models both include lag(roll,1), lag(roll,2), and diff(unemp), and the top model also includes diff(hsgrad). Thus, collapsing separate lags into a difference may allow a model with fewer coefficients to fit as well or allow a model with the same number of coefficients to fit t better.

27 New plot of C p vs. p The two highest-ranking models now have C p much less than p

28 Conclusions All-possible possible-regressions makes sure you don t overlook the model with lowest possible error stats for a given number of regressors... but staring at rankings can distract you from thinking about other issues, such as which model makes the most sense. It won t t find a good model by magic: you still have to choose the set of potential regressors and consider transformations of the variables. (Ditto for stepwise!) You are NOT REQUIRED to choose the model that is #1 in the rankings (on whatever measure)

29 Caution: do not overdifference In several of our examples of regressions of nonstationary time series, it has turned out that a differencing transformation was useful. However, beware of using differencing when it is not really needed! Differencing adds complexity to a model, and sometimes it may even create artificial correlation patterns and increase the variance to be explained. Differencing is most appropriate when the original variables either look like random walks (e.g., stock prices) or else are very smooth (e.g., ROLL), with variances dramatically reduced by differencing

30 Analysis of Variance (ANOVA) ANOVA is multiple regression with (only) categorical independent variables. In a one-way ANOVA, a dummy variable is created for all-but but-one level of the independent variable. The model then estimates the mean of the dependent variable for each level of the independent variable. A pooled estimate of the error standard deviation is used to compute standard errors of the means This is how one-way ANOVA differs from separate calculations of the means, in which standard errors are based on separate standard deviations.

31 ANOVA in practice Analysis of variance is typically used to analyze data from designed experiments in marketing research, pharmaceutical research, crop science, quality, etc. Interest often centers on nonlinear effects and/or interactions among effects of independent variables: Does relative effectiveness of different ad formats vary with market or demographics? Which combinations and dosages of drugs work best? Which combinations and quantities of crop treatments maximize yield and/or quality? ANOVA is also appropriate for natural experiments with categorical variables, if error variances can be assumed to be the same for all categories.

32 Example: cardata Let s s start by doing a one-way ANOVA of mpg vs. origin (continent). Origin codes 1, 2, 3 refer to America, Europe, and Japan, respectively. This model will test for differences in average (mean) mpg among the 3 origins Typical ANOVA output: ANOVA table Means table Box and whisker plot

33 Sum of squared deviations between group means (predictions) & grand mean Sum of squared errors (deviations from group means) Explained variance Unexplained variance Sum of squared deviations from the grand mean ANOVA table shows the decomposition of the sum of squared deviations ions from the grand mean and corresponding variances (but no R-squared!) R The F-ratio F (30.20) is the ratio of explained variance ( ) to the unexplained variance ( ). The variables in the model are jointly significant if this ratio is significantly greater than 1, which means they are doing more than what would happen if you just dummied out some data points.

34 Decomposition of the sum of squares: SS(total) ) = SS(between) ) + SS(within) B mean Grand mean A mean total variation between groups variation (prediction) within groups variation (error)

35 Means table The table of means shows the estimated mean of the dependent variable for each level of the independent variable-- --in this case, mean mpg s s for cars from each continent. These are just ordinary means. However, the standard errors of the means are based on a pooled estimate of the standard deviation of the errors.

36 The box and whisker plot provides a nice visual comparison of the means, inter-quartile ranges, and extreme values. 55 Box-and-Whisker Plot 45 mpg mean origin Interquartile range (25%-tile to 75%-tile) outside point (outside the box by >1.5x interquartile range) minimum & maximum (if not outside ) Here we see that the American cars have significantly lower mean mpg than European or Japanese cars, although the highest-mpg American car is in the upper quartile of the European and Japanese ranges. Also, although the e European and Japanese cars have similar mean mpg s, the Japanese cars have a tighter distribution of mpg s s except for two outliers, one high and one low.

37 For comparison, here s s the same model fitted by using multiple regression with dummy variables for the first two origin codes. Note that the ANOVA table shows the same F-ratio, F etc. The CONSTANT in this model is the mean for origin=3, and the coefficients of the dummy variables origin=1 o and origin=2 are the differences in means for the other two levels. Standard error of the regression ( ) is the square root of mean square for error ( ) and R-squared R is model SS ( ) divided by total SS ( ) Thus, ANOVA is nothing really new, new, it s s just a repackaging of regression output for the special case when the independent variables are dummies for levels of a categorical variable.

38 Multifactor ANOVA Multifactor ANOVA is regression with dummy variables for levels of two or more categorical independent variables. When there are two or more variables, you can estimate not only main effects,, but also interactions among levels of two different variables. One of the questions of interest is whether the interactions are significant.

39 Multi-factor ANOVA: possible patterns in data A main effect only B main effect only (These bar charts show hypothetical mean responses for 2 levels of factor A and 3 levels of factor B)

40 Interactions between factors?? Both A and B main effects, without interaction Both A and B main effects, with interaction

41 Two factors: mpg vs. origin & year Here, is 2-factor 2 ANOVA with no interactions: only main effects have been estimated. e The ANOVA table now shows separate F-ratios F for each of the two input variables, reflecting the joint significance of their respective dummy variables. ables. (Both are significant here, but origin is more significant.)

42 Main effects (mean mpg) for origin & year The means table now shows means of the dependent variable for each level of both variables: these are the main effects. (The means by origin are slightly different from those of the one-way ANOVA, since the coefficients of the origin dummies are now being estimated simultaneously with those for year.)

43 Means and 95.0 Percent LSD Intervals Means and 95.0 Percent LSD Intervals mpg origin mpg Here Europe and Japan have the same high average mpg year The LSD (least significant difference) intervals are constructed in such a way that if two means are the same, their intervals will overlap 95.0% of the time. Any pair of intervals that do not overlap vertically correspond to a pair of means which have a statistically significant difference.

44 Same model fitted by multiple regression: the differences among coefficients for different levels of the same variable are the same as the differences among means in the ANOVA output. These coefficients can be computed from the ones in the multifactor ANOVA output, but not vice versa: the multiple regression output does not show the grand mean.

45 Estimating interaction effects If the order of interactions is set to 2, additional dummy variables will be added for all possible combinations of a level of one variable and a level of the other variable.

46 Are interaction effects significant? The F-ratio F for the variance explained by the interaction terms is not significant. Hence there is no significant interaction between origin and year. This means that variations of average mpg across years are essentially the same for each origin, and correspondingly, variations of average mpg across origins are essentially the same for each year.

47 Here are the details of the estimated interactions, as well as the main effects.

48 Categorical + quantitative? Suppose that you want to include a quantitative independent variable along with dummies for categorical variables? Example: suppose you want to include weight as an additional regressor to control for differences in average weights in cars from different countries of origin. This brings us to...

49 General linear models (GLM) GLM is a combination of multifactor ANOVA and ordinary multiple regression. You can specify both categorical and quantitative independent variables. You can also estimate interactions and nested effects.

50 Input variables for GLM

51 Effects and interactions to be estimated After the variables have been specified on the data input panel, this panel is used to specify interactions and/or nesting of effects (if any). To begin with, we will just look for main effects

52 Here s s the ANOVA report. Note that weight has a very significant F-ratio. F (For a quantitative variable, the F-ratio F is simply the square of its t-stat. t Here the F-ratio F of 201 corresponds to a t-stat t of around 14.)

53 Regression with out-of of-sample validation! At the bottom of the Analysis Summary report are the usual regression statistics, including separate error stats for a validation period. If you use the Select box to hold out data in this (or any Advanced Regression) procedure, the de-selected points are used as the hold-out out sample. Hence if you use the GLM procedure to fit a multiple regression model, you can perform out-of of-sample validation!

54 Holding out a random sample An additional column is added to the data worksheet, with the name random120. random120. The Generate Data option is used with the expression RANDOM(120) to fill the column with 1 s 1 s in 120 random places, 0 s 0 elsewhere. When this variable is used as the Select criterion, only the randomly chosen rows with 1 s s will be fitted.

55 Refitting the GLM model with a random hold-out out sample Here the new random120 variable is used as the Select criterion. In this case it is appropriate for the hold-out out sample to be determined randomly because the variables are not time series and the rows are sorted according to the values of some of the independent variables. Hence holding out the last k values would not necessarily yield a representative sample.

56 Validation results Of the 120 rows that were randomly selected for fitting, only 119 had non-missing values for all independent variables. Note that MSE is actually smaller in the validation period (perhaps there was less variance in the hold-out out sample, while MAPE is slightly larger. So, the model appears to be valid, i.e., not overfitted.

57 Back to the original model with no hold-out: out: here s s the Model Coefficients report. It also includes Variance Inflation Factors to test for multicollinearity (VIF > 10 is bad ) The dummy variables actually have values of +1/-1/0 1/0 instead of +1/0, although taken together they are equivalent to the usual dummy variables.

58 What is multicollinearity? Multicollinearity refers to a situation in which the independent variables are strongly linearly related to each other. When multicollinearity exists, the estimated coefficients may not represent the true effects of the variables, and standard errors will be inflated (variables may all appear to be insignificant despite high R-squared). R In the most extreme case, where one independent variable is an exact linear function of the others, the regression will fail to produce any results at all (you will get an error condition).

59 What are Variance Inflation Factors? The VIF for the k th regressor is (VIF) k = 1/(1-R 2 k ) where R 2 k is the R-squared obtained by regressing the k th regressor on all the other regressors. Thus, (VIF( VIF) k = 1 when R 2 k = 0. Severe multicollinearity is indicated if (VIF) k > 10, which means R 2 k > 90%

60 GLM example, continued Means and 95.0 Percent LSD Intervals 33 Means and 95.0 Percent LSD Intervals 33 mpg mpg origin year When we control for weight, the pattern of main effects is different: European cars get higher mileage for a given weight.. Japanese cars evidently get high mileage by being lighter than American or European cars on average. Also, there has been an general upward trend in mpg except for drop in 1981.

61 Studentized residual Residual Plot predicted mpg Another nice feature of the GLM procedure: both autocorrelation and probability plots are pane options for the residual plot.

62 Probability plot in GLM Normal Probability Plot for mpg percentage Studentized residual Here s s the (vertical) probability plot. the slight S-shaped S pattern indicates that the tails of the residual distribution are a bit fatter than normal, but nothing much to worry about in this case. (There is no simple transformation of the data that will make the distribution tion look any better apparently a few cars are just exceptional.)

63 Here s s the same model fitted in the Multiple Regression procedure instead. Note that the regression stats and the coefficient of weight are identical. The differences in coefficients between levels of the same factor are e also identical.

64 GLM with interaction effects Hit the Cross button to insert the interaction operator (*) The GLM procedure can also be used to test for interaction ( cross( cross ) ) effects, exactly as in the ANOVA procedure, as well as to use nested experimental designs. Here this feature is used to look for interactions between the categorical factors while controlling for the effect of a quantitative tative factor (weight).

65 The F-ratio F for the variance explained by the interaction between origin and year is larger when controlling for weight, but still not technically significant (F=1.76, P=0.089).

66 Summary of GLM features GLM is an all-everything procedure for fitting models with categorical and/or quantitative factors, with or without interaction effects. It can perform out-of of-sample validation. Variance Inflation Factors (VIF( VIF s) ) are a test for multicollinearity (>10 is bad) It also includes a few more built-in in plots (residual autocorrelation & probability plot).

67 Logistic regression Logistic regression is regression with a binary (0-1) dependent variable,, e.g., an indicator variable for the occurrence or some event or condition. Applications: predicting probabilities of events or fractions of individuals who will respond to a given promotion or medical treatment, etc. In this case is the predicted probability that Y t = 1. Yˆt The probabilistic prediction equation has this form: Yˆ t exp( β + β X + β X +...) 1 exp(...) = 0 1 1t 2 2t + β + β 0 1X + β 1t 2X + 2t

68 Predictions expressed in terms of odds The predicted probability can equivalently be expressed in the form of odds in favor of Y t = 1 : 1 Yˆ /(1 Yˆ) = exp( β + β X + β X +...) t t 0 1 1t 2 2t Predicted odds in favor of of Y t =1 = X1t X2t exp( )exp( ) exp( )... Constant odds factor β β β Odds factor for X 1 Odds factor for X 2 The odds factor of the i th variable is raised to the power X it The predicted total odds is a product rather than a sum of contributions of the independent variables. If X it increases by one unit, the predicted odds in favor of Y t =1 increase by the factor exp(β i ),, other things being equal.

69 Predictions expressed in terms of log-odds The predicted probability can also be equivalently expressed in log odds form: log( Yˆ /(1 Yˆ)) = β + β X + β X +... t t 0 1 1t 2 2t Thus, logistic regression uses a linear regression equation to predict the log odds in favor of Y t = 1. However, you can t t estimate the model by regressing log(y t /(1 Y t )) on the X s. (Can t t take the log of zero!) In practice, the betas are estimated by a procedure that is similar to minimizing a weighted sum of the squared prediction errors: ˆ 2 ( Y Y) t t

70 Logistic example: predicting magazine subscription responses by age and sex The dependent variable can either be a binary (0-1) variable (shown here) or it can be a vector of proportions or probabilities,, together with a vector of sample sizes.

71 Coefficients, standard errors, and R-squared R are interpreted in the same manner as in multiple regression. Odds ratio of an independent variable is just EXP(beta).

72 Differences in predicted subscription responses for male and female subjects

73 This plot shows a summary of the prediction capability of the fitted model. First, the model is used to predict the response using the information in each e row of the data file. If the predicted value is larger than the cutoff, the response is predicted to be TRUE. If the predicted value is less than or equal to the cutoff, the response is predicted to be FALSE. The table shows the percent of the observed data correctly predicted at various cutoff values. For example, using a cutoff equal to 0.56, 75.0% of all TRUE responses were correctly predicted, while 95.0% of all FALSE responses were e correctly predicted, for a total of 85.0%. Using the cutoff value which maximizes the e total percentage correct may provide a good value to use for predicting additional individuals. duals.

74 Other advanced regression procedures Comparison of regression lines Fit several simple regressions to the same X & Y variables, splitting the data on levels of another variable Nonlinear regression Estimate a model such as Y=1/(a+b*X^c X^c) Similar to Solver in Excel

75 Resources You can find out more about these and other procedures via the Statgraphics help system, StatAdvisor,, and user manuals (in pdf files in your Statgraphics directory) There are also many good on-line sources: Statsoft on-line textbook: David Garson s s on-line textbook: ( links are available on Decision 411 course home page)

Decision 411: Class 8

Decision 411: Class 8 Decision 411: Class 8 One more way to model seasonality Advanced regression (power tools): Stepwise and all possible regressions 1-way ANOVA Multifactor ANOVA General Linear Models (GLM) Out-of of-sample

More information

Decision 411: Class 7

Decision 411: Class 7 Decision 411: Class 7 Confidence limits for sums of coefficients Use of the time index as a regressor The difficulty of predicting the future Confidence intervals for sums of coefficients Sometimes the

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Decision 411: Class 3

Decision 411: Class 3 Decision 411: Class 3 Discussion of HW#1 Introduction to seasonal models Seasonal decomposition Seasonal adjustment on a spreadsheet Forecasting with seasonal adjustment Forecasting inflation Log transformation

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan

More information

Decision 411: Class 3

Decision 411: Class 3 Decision 411: Class 3 Discussion of HW#1 Introduction to seasonal models Seasonal decomposition Seasonal adjustment on a spreadsheet Forecasting with seasonal adjustment Forecasting inflation Poor man

More information

Decision 411: Class 3

Decision 411: Class 3 Decision 411: Class 3 Discussion of HW#1 Introduction to seasonal models Seasonal decomposition Seasonal adjustment on a spreadsheet Forecasting with seasonal adjustment Forecasting inflation Poor man

More information

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL) 12.12 Model Building, and the Effects of Multicollinearity (Optional) 1 Although Excel and MegaStat are emphasized in Business Statistics in Practice, Second Canadian Edition, some examples in the additional

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Decision 411: Class 4

Decision 411: Class 4 Decision 411: Class 4 Non-seasonal averaging & smoothing models Simple moving average (SMA) model Simple exponential smoothing (SES) model Linear exponential smoothing (LES) model Combining seasonal adjustment

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Decision 411: Class 9. HW#3 issues

Decision 411: Class 9. HW#3 issues Decision 411: Class 9 Presentation/discussion of HW#3 Introduction to ARIMA models Rules for fitting nonseasonal models Differencing and stationarity Reading the tea leaves : : ACF and PACF plots Unit

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Decision 411: Class 4

Decision 411: Class 4 Decision 411: Class 4 Non-seasonal averaging & smoothing models Simple moving average (SMA) model Simple exponential smoothing (SES) model Linear exponential smoothing (LES) model Combining seasonal adjustment

More information

8. Example: Predicting University of New Mexico Enrollment

8. Example: Predicting University of New Mexico Enrollment 8. Example: Predicting University of New Mexico Enrollment year (1=1961) 6 7 8 9 10 6000 10000 14000 0 5 10 15 20 25 30 6 7 8 9 10 unem (unemployment rate) hgrad (highschool graduates) 10000 14000 18000

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

What If There Are More Than. Two Factor Levels?

What If There Are More Than. Two Factor Levels? What If There Are More Than Chapter 3 Two Factor Levels? Comparing more that two factor levels the analysis of variance ANOVA decomposition of total variability Statistical testing & analysis Checking

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

The Steps to Follow in a Multiple Regression Analysis

The Steps to Follow in a Multiple Regression Analysis ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

STAT 212 Business Statistics II 1

STAT 212 Business Statistics II 1 STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56 References Aljandali, A. (2014). Exchange rate forecasting: Regional applications to ASEAN, CACM, MERCOSUR and SADC countries. Unpublished PhD thesis, London Metropolitan University, London. Aljandali,

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

Week 8 Hour 1: More on polynomial fits. The AIC

Week 8 Hour 1: More on polynomial fits. The AIC Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables Hour 3: Interactions Stat 302 Notes. Week 8, Hour 3, Page 1 / 36 Interactions. So far we have extended simple regression in the following

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos - Hypothesis Testing with One-Way ANOVA Statistics Arlo Clark-Foos Conceptual Refresher 1. Standardized z distribution of scores and of means can be represented as percentile rankings. 2. t distribution

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Regression in R. Seth Margolis GradQuant May 31,

Regression in R. Seth Margolis GradQuant May 31, Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion How To: Analyze a SplitPlot Design Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus August 13, 2005 Introduction When performing an experiment involving several factors, it is best to randomize the

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Objective: Students will gain familiarity with using Excel to record data, display data properly, use built-in formulae to do calculations, and plot and fit data with linear functions.

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Sociology 593 Exam 1 Answer Key February 17, 1995

Sociology 593 Exam 1 Answer Key February 17, 1995 Sociology 593 Exam 1 Answer Key February 17, 1995 I. True-False. (5 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When

More information

Stochastic Processes

Stochastic Processes qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Chapter 13. Multiple Regression and Model Building

Chapter 13. Multiple Regression and Model Building Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the

More information

Modeling Machiavellianism Predicting Scores with Fewer Factors

Modeling Machiavellianism Predicting Scores with Fewer Factors Modeling Machiavellianism Predicting Scores with Fewer Factors ABSTRACT RESULTS Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression. PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the

More information

The General Linear Model Ivo Dinov

The General Linear Model Ivo Dinov Stats 33 Statistical Methods for Biomedical Data The General Linear Model Ivo Dinov dinov@stat.ucla.edu http://www.stat.ucla.edu/~dinov Slide 1 Problems with t-tests and correlations 1) How do we evaluate

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

FORECASTING PROCEDURES FOR SUCCESS

FORECASTING PROCEDURES FOR SUCCESS FORECASTING PROCEDURES FOR SUCCESS Suzy V. Landram, University of Northern Colorado Frank G. Landram, West Texas A&M University Chris Furner, West Texas A&M University ABSTRACT This study brings an awareness

More information

Ch 7: Dummy (binary, indicator) variables

Ch 7: Dummy (binary, indicator) variables Ch 7: Dummy (binary, indicator) variables :Examples Dummy variable are used to indicate the presence or absence of a characteristic. For example, define female i 1 if obs i is female 0 otherwise or male

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

PLS205 Lab 6 February 13, Laboratory Topic 9

PLS205 Lab 6 February 13, Laboratory Topic 9 PLS205 Lab 6 February 13, 2014 Laboratory Topic 9 A word about factorials Specifying interactions among factorial effects in SAS The relationship between factors and treatment Interpreting results of an

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

DOE Wizard Screening Designs

DOE Wizard Screening Designs DOE Wizard Screening Designs Revised: 10/10/2017 Summary... 1 Example... 2 Design Creation... 3 Design Properties... 13 Saving the Design File... 16 Analyzing the Results... 17 Statistical Model... 18

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

MULTIPLE LINEAR REGRESSION IN MINITAB

MULTIPLE LINEAR REGRESSION IN MINITAB MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments

More information

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 28, 2005 Introduction When fitting statistical models, it is usually assumed that the error variance is the

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information