Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns
|
|
- Hannah Long
- 5 years ago
- Views:
Transcription
1 Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment; find simplest & best predictor Look for important rows Diagnostics Outliers and Influence 12/02/201 1
2 Interpreting the coefficients MLR Experiment with models Dropping/Adding vars impacts coeffs of others No issues to discuss unless > 1 predictor Co-variation in at least 3 dimensions Do more x-variables mean better models? More Coeffs bigger R 2, smaller S & SumSq Simplest coefficients have value 0 large T-values! Simple models Best science 12/02/201 2
3 Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Experiment with models Dropping/Adding Vars Noting change in R 2 Noting change in fitted coeffs Check diagnostics VIF Find simplest & best predictor Understand how y interacts with predictors x 12/02/201 3
4 How important is predictor x k? Fitted coeff b k = avg inc/dec in y when x k increases by one unit and all other predictors unchanged Big numerical value? Big T ratio? Small p? 12/02/201 4
5 How important is predictor x k? What if : An important predictor is not available? Some X vars are highly inter-correlated? What are implications for interpreting b k? changes in b k when other variables added/dropped? 12/02/201
6 Trees: A simple case Linear Model regressing Vol on Height Diam and Ht Diam 2 simple theory available 12/02/201 6
7 Diameter and Height important The regression equation is Volume = Diameter Height Predictor Coef SE Coef T P Constant Diameter Height S = R-Sq = 94.8% R-Sq(adj) = 94.4% 12/02/201 7
8 Diameter and Height not important The regression equation is Volume = Diameter Height Ht*Diam^2 Predictor Coef SE Coef T P Constant Diameter Height Ht*Diam^ S = R-Sq = 97.8% R-Sq(adj) = 97.% 12/02/201 8
9 Strategies with correlated predictors Regression a device: to think about Rel Importance of X vars Correlation important Proceed with care; need reg theory! Transformations Derived variables Modify models; use VIF to predict Correlation relatively unimportant? Semi-automatic options Incl Best Subsets/Stepwise 12/02/201 9
10 Outline Examples, mostly in more than two dims Theory for correlated x-vars Sums of Squares R 2, multiple correlation and simple corr Changes in R 2 - partial R 2 Changes depend on ORDER in x-vars MTB Coefficients - nor T or P values are NOT always a measure of importance 12/02/201 10
11 Technical Material MLR as a sequence of SLR Partial R 2 Correlated predictors Variance inflation Multi-collinearity Use of Intercept term with indicator variables 12/02/201 11
12 Extreme case: Tree Vol: x 1 =x 2 = Ht Regress Vol on x 1 Vol = x 1 = x 1 +0 x 2 Regress Vol on x 2 Vol = x x 2 Regress Vol on both 12/02/201 Vol = (b) x 1 +(1.43-b) x 2 for any arbitrary value of b!! Infinity of identical solutions All equally good for predicting MINITAB notes and takes action Extra tech material online 12
13 Common case: x 1 x 2 Infinity of nearly identical solutions Many pairs of coeffs almost equivalent More generally at least one x User notes and takes action nearly perfectly predictable from other x vars Many sets of coeffs almost equivalent 12/02/201 13
14 M.Stuart PEMax: Too many predictors? Obj: Relate Respiratory Muscle Strength (PEMax) To other measures of lung function in patients suffering from cystic fibrosis, adjusting for sex and body size. 12/02/201 14
15 Controlling for external variation Observation al data are often unbalanced eg age, gender Ideally data collection designed equal numbers M/F similar age dist in each group Regression often used to control for such variation 12/02/201 1
16 PEmax FEV 1 RV FRC TLC Sex Height Weight BMP PEMax: The variables Maximal static expiratory pressure a measure of expiratory muscle strength Forced expiratory volume in 1 second Residual volume (after 1 second) Functional residual capacity Total lung capacity 0 = Male, 1 = Female cms. kg. Sub Age Sex Ht Wt BMP FEV1 RV FRC TLC PEmax Body mass (percent of median of normal cases) 12/02/201 16
17 PEmax: Too many vars? all coeffs small The regression equation is PEmax = FEV RV FRC TLC Sex Height Weight BMP Predictor Coef SE Coef T P Constant FEV RV FRC TLC Sex Height Weight BMP No variables important? Issue here: Too many correlated variables Challenge here: Poor theor. guidance Return later S = R-Sq = 63.1% R-Sq(adj) = 44.6% 12/02/201 17
18 Some simple cases Scientific Framework Networks 12/02/201 18
19 Direct and Indirect Importance Ht Tree Vol Ht Ht* Diam 2 Diam OR? Diam Ht* Diam 2 Tree Vol Theory Scientific Framework Causal model 12/02/201 19
20 Math Marks Marks on 88 students in maths exams. How to predict Stats mark from others? Correlation mx R Variable MeanStdDev Mech Vect Alg Anal Stat Mech Mech Vect Vect Alg Alg Anal Anal Stat Stat What can we learn from the coeffs in the best predictor? Correlation mx R Variable MeanStdDev Mech Vect Alg Anal Stat Mech Mech Vect Vect Alg Alg Anal Anal Stat Stat /02/201 20
21 Math Marks: guidance from theory Mechanics Analysis Algebra Vectors Statistics 12/02/201 21
22 Predicting Statistics Performance The regression equation is Stat = Anal Alg Vect Mech Predictor Coef SE Coef T P Constant Anal Alg Vect Mech Mechanics Vectors Algebra Analysis Statistics S = R-Sq = 47.9% R-Sq(adj) = 4.4% 12/02/201 22
23 Alternative Predictions Stat = Anal Alg Predictor Coef SE Coef T P Constant Anal Alg S = R-Sq = 47.9% Stat = Anal Vect Predictor Coef SE Coef T P Constant Anal Vect Mechanics Vectors Algebra Analysis Statistics S = R-Sq = 39.% 12/02/201 23
24 Theory 12/02/201 24
25 Ex a. Uncorrelated x-variables Artificial data x1 x2 e y SS(total) 1.87 Corr x1 x2 x y Data Generating Model Y x x ; ~ N 0, ; 1; 1; The regression equation is ybal = x x2bal Predictor Coef SE Coef T P Constant x x2bal S = R-Sq = 93.9% SS Total /02/201 2
26 x1unbal Ex b. Correlated x-variables 1.0 Artificial data x1 x2 e y SS(total) 40.1 Scatterplot of x1unbal vs x2unbal Corr x1 x2 x y Data Generating Model Y x x ; ~ N 0, ; 1; 1; The regression equation is y = x x2 Predictor Coef SE Coef T P Constant x x S = R-Sq = 98.4% Coeffs smaller SE(Coeffs) larger x2unbal /02/201 26
27 MLR as successive SLR Additional Info from x2 not in x1 1. Regress y on x1 Store Resids RESy.x1 1a Regress x2 on x1 Store Resids RESx2.x1 Thus RESy.x1 and RESx2.x1 represent those aspects of y and x2, that DO NOT depend on x1 2. Regress RESy.x1 on RESx2.x1 24/02/201 27
28 x2unbal RESy.x y Residuals the same Models identical MLR stepwise by SLR Residuals (y.x1) x1 x2 y y.x1 x2.x1.(x2.x1) y.x1x ; R 2 =97.2% Fitted Line Plot y = x1 ; unbalanced case x S R-Sq 97.2% R-Sq(adj) 96.8% Fitted Line Plot x2 = x1; unbalanced case 1. S a R-Sq 97.2% R-Sq(adj) 96.8% 2; R 2 =43.6% Fitted Line Plot RES.x1 = RESx2.x1 unbalanced case S R-Sq 43.6% R-Sq(adj) 3.% x1unbal RESx2l.x /02/201 28
29 Reduction in SSQ Partial R 2 The regression equation is y = x x2 Predictor Coef SE Coef T P Constant X X S = R-Sq = 98.4% Analysis of Variance Source DF SS MS F P Regression (small rounding error 43.6%) e Residual Error Total Source DF Seq SS X X /02/201 Note that MINITAB 17 uses a different layout for the SS than that shown here Total SS X1 explains % X2 explains 0.48 Total % Unexplained by X Of this explained by x ie 43% y x [ y x ] [ x x ] R 1 R 1 R Here using R as in (0,1); ie %age R /100 29
30 Is Order Important? 12/02/201 30
31 Is order important? The regression equation is y = x x2 Predictor Coef SE Coef T P Constant X X The regression equation is y = x x1 No: Coefficients not impacted by ordering Predictor Coef SE Coef T P Constant x x S = R-Sq = 98.4% Analysis of Variance S = R-Sq = 98.4% Analysis of Variance Source DF SS MS Regression Residual Error Total Source DF Seq SS X X first use x and then x 1 2 Source DF SS MS Regression Residual Error Total Source DF Seq SS x X Yes: partial R 2 impacted by ordering first use x and then x 2 1 Note that MINITAB 17 uses a different layout for the SS than that shown here 24/02/201 31
32 Is order important? Uncorrelated Preds No: Coefficients not impacted by ordering Analysis of Variance Analysis of Variance Source DF SS MS Regression Residual Error Total Source DF SS MS Regression Residual Error Total Source DF Seq SS x x Source DF Seq SS x x Partial R 2 not impacted by ordering if predictors uncorrelated Note that MINITAB 17 uses a different layout for the SS than that shown here 24/02/201 32
33 Is order important? No No Yes For prediction If predictor variables uncorrelated even if correlated For coeffs and for SE( Coeff), T ratios, p values for teasing out aspects of relative importance 12/02/201 33
34 Is correlation in predictors important? No even if correlated For prediction - if n is large Yes For coeffs and for SE( Coeff), T ratios, p values Seek simplest model you can get away with But no simpler Seek and drop redundant variables 12/02/201 34
35 Variance Inflation Factors 12/02/201 3
36 2 x 2 j SE b s R j SE b j Variance Inflation Factor 2 s 1 ( n 1) s 1 R Variance of the 2 2 x j 2 x 2 j values % of var of x when regressed on all other preds j large when j j x j s R j small large Implications: If have control over study design spread out the predictors Else Here using R as in (0,1); arrange preds to be uncorrelated coeffs can be individually small, SEs can be large ie %age R / If Coeff interpretation important be careful with too many derived variables 12/02/201 36
37 Trees Regression Analysis: Vol versus Ht x Diam^2, Diam, Ht The regression equation is Vol = Ht x Diam^ Diam Ht Predictor Coef SE Coef T P VIF Constant Ht x Diam^ Diam Ht S = R-Sq = 97.8% Note that MINITAB 17 uses a different layout for the SS than that shown here Source DF Seq SS Ht x Diam^ Diam Ht /02/201 37
38 Trees Regression Analysis: Vol versus Ht x Diam^2, Diam, Ht The regression equation is Vol = Ht x Diam^ Diam Ht Predictor Coef SE Coef T P VIF Constant Ht x Diam^ Diam Ht S = R-Sq = 97.8% Note that MINITAB 17 uses a different layout for the SS than that shown here Source DF Seq SS Ht x Diam^ Diam Ht /02/201 38
39 The regression equation is Trees Vol = Ht x Diam^2 Predictor Coef SE Coef T P VIF Constant Ht x Diam^ Cf Ht x Diam^ S = R-Sq = 97.8% VIF = 1 1< VIF < VIF > to 10 Not correlated Mod. correlated Highly correlated VIF values greater than 10 may indicate multicollinearity is unduly influencing your regression results. In this case, you may want to reduce multicollinearity by removing unimportant predictors from your model. 12/02/201 39
40 Example: Trees Ht Theory Ht* Diam 2 Tree Vol Diam Source DF Seq SS Diameter Height Ht*Diam^ Alternative orderings Source DF Seq SS Ht*Diam^ Height Diameter Note that MINITAB 17 uses a different layout for the SS than that shown here Source DF Seq SS Ht*Diam^ Diameter Height /02/201 40
41 PE Max revisited The regression equation is PEmax = Age Sex Height Weight BMP FEV RV FRC TLC Predictor Coef SE Coef T P VIF Constant Age Sex Height Weight BMP FEV RV FRC TLC Which to drop? What s the objective? S = R-Sq = 63.8% 12/02/201 41
42 PEmax PEmax PE Max revisited The regression equation is PEmax = FRC Fitted Line Plot PEmax = FRC S R-Sq 17.4% R-Sq(adj) 13.8% 10 Predictor Coef SE Coef T P Constant FRC FRC S = R-Sq = 17.4% Scatterplot of PEmax vs FRC Sex FRC /02/201 42
43 PE Max revisited The regression equation is PEmax = Sex FRC Predictor Coef SE Coef T P Constant Sex FRC Source DF Seq SS Sex FRC S = R-Sq = 22.1% R-Sq(adj) = 1.0% Analysis of Variance Source DF SS MS F P Regression Residual Error Total /02/201 43
44 The regression equation is PE Max revisited PEmax = Sex Age FRC Predictor Coef SE Coef T P Constant Sex Age FRC Source DF Seq SS Sex Age FRC S = R-Sq = 41.2% Analysis of Variance Source DF SS MS F P Regression Residual Error Total /02/201 44
45 PE Max revisited Corr(Ht,Wt)=0.921 %age of variation in Ht explained by Wt = 100(0.921) 2 =8% The regression equation is PEmax = Height Weight FRC Predictor Coef SE Coef T P VIF Constant Height Weight FRC /02/201 4
46 PE Max revisited The regression equation is PEmax = Height FRC Predictor Coef SE Coef T P VIF Constant Height FRC The regression equation is PEmax = Height Weight FRC Predictor Coef SE Coef T P VIF Constant Height Weight FRC /02/201 46
47 Interpreting the coefficients Do more x-variables mean better models? Bigger R 2, Smaller S, Fewer Coeffs Key issue: correlated and/or missing x-variables Theory Coefficients indirectly reflect correlation High correlation does not imply big coeff Low coeff does not imply low correlation 12/02/201 47
48 Review R-squared R as a correlation coefficient 2 S S 1r one x-var; r Corr( x, y) y If yˆ b x b x b x Then S S 1 R where R Corr( y, yˆ ) y yˆ is that linear combination of x, x, x which best predicts y 12/02/201 48
49 SSTotal S y S Review R-squared Var of y about its mean Var of y about best linear reg predictor Var of residuals about their mean, 0 S S 1R y S S R R y x, x y y x y x, x yx y x, x 1 R 1 R 1 R S S 1 r one x-var; r Corr( x, y) y 2 2 Here using R as in (0,1); ie %age R /100 12/02/201 49
50 Review Coefficients When one predictor x y x 2 2 simply related to r Corr( y, x); R r NB Symmetry r Corr( x, y) When one predictor y x a by b simply to r Corr( y, x) and hence to 12/02/201 0
51 Review Coefficients Coefficients are not impacted by order When multiple predictors x, x, x y x x x not proportional to r Corr( y and x ) i i i In fact i reflects Corr x i and best predictor of x i using other x vars AN D y 12/02/201 1
52 Strategies for correlated x-vars Redundancy in an extreme case If two or more vars contain exactly one piece of information, use only one of them Partial redundancy If two or more vars contain much the same information for the purposes in hand, use one (possibly composite) variable. More generally, can the important info in K variables be reduced to a few (possibly composite) variables? 12/02/201 2
53 Other Strategies Best Subsets and Stepwise Regression Select Best in a predictive sense Modern methods Very large data sets n and/or p Computationally intensive Data Mining literature/software Penalise models with many variables Note that many models nearly as good 12/02/201 3
54 Challenges with coeffs To be able to interpret coefficients, ideally Choose x variables that are complementary and measure quite different aspects of the system Organise the data such that it does not inadvertently give the impression that these are correlated, despite their selection In other words, design an experiment 12/02/201 4
55 Technicality 12/02/201
56 Extreme case Exact multi-collinearity x 2 perfectly correlated with x 1 x k perfectly predicted by others SE(coeff) = can t be computed No single best set of parameters 2 j SE b R j 2 s 1 ( n 1) s 1 R 2 2 x j j % of var of x when regressed on all other preds j MINITAB refuses to proceed Often an error Same var entered twice! Always arises with sets of indicators 12/02/201 6
57 Exact multi-collinearity Many pairs of coeffs give same predicted values No unique solution slope 1.86 Pred from Preds using both x1 and x2 intercept 1.36 X1 coeffs x y x1 x2 x /02/201 7
58 Exact multi-collinearity Many pairs of coeffs give same predicted values No unique solution slope 1.86 Pred from Preds using both x1 and x2 intercept 1.36 X1 coeffs x y x1 x2 x /02/201 8
59 Indicator Vars: Exact multi-collinearity Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4 * Q4 is highly correlated with other X variables * Q4 has been removed from the equation. The regression equation is Comps = Time since Q Q2-78 Q3 12/02/201 9
60 Indicator Vars: Exact multi-collinearity Model has no intercept Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4 The regression equation is Comps = 986 Time since Q Q Q3-942 Q4 12/02/201 60
61 Indicator Vars: Exact multi-collinearity Models 1 Comps = 986 Time since Q Q Q3-942 Q4 2 Comps = Time since Q Q2-78 Q3 Time since Indicator vars Predictions 1978 Comps Q1 Q2 Q3 Q4 Model 1 Model /02/201 61
62 Indicator Vars: Exact multi-collinearity Models 1 Comps = 986 Time since Q Q Q3-942 Q4 2 Comps = Time since Q Q2-78 Q3 Time since Indicator vars Predictions 1978 Comps Q1 Q2 Q3 Q4 Model 1 Model /02/201 62
Interpreting the coefficients
Lecture Week 5 Multiple Linear Regression Interpreting the coefficients Uses of Multiple Regression Predict for specified new x-vars Predict in time. Focus on one parameter Use regression to adjust variation
More informationIntroduction to Regression
Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1
More informationMultiple Regression Examples
Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +
More informationModels with qualitative explanatory variables p216
Models with qualitative explanatory variables p216 Example gen = 1 for female Row gpa hsm gen 1 3.32 10 0 2 2.26 6 0 3 2.35 8 0 4 2.08 9 0 5 3.38 8 0 6 3.29 10 0 7 3.21 8 0 8 2.00 3 0 9 3.18 9 0 10 2.34
More informationMATH ASSIGNMENT 2: SOLUTIONS
MATH 204 - ASSIGNMENT 2: SOLUTIONS (a) Fitting the simple linear regression model to each of the variables in turn yields the following results: we look at t-tests for the individual coefficients, and
More informationModel Building Chap 5 p251
Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4
More informationSTA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6
STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf
More informationIntroduction to Regression
Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationQ Lecture Introduction to Regression
Q3 2009 1 Before/After Transformation 2 Construction Role of T-ratios Formally, even under Null Hyp: H : 0, ˆ, being computed from k t k SE ˆ ˆ y values themselves containing random error, will sometimes
More informationMultiple Regression Methods
Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret
More informationAnalysis of Bivariate Data
Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr® 2 Independent
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationQuestion Possible Points Score Total 100
Midterm I NAME: Instructions: 1. For hypothesis testing, the significant level is set at α = 0.05. 2. This exam is open book. You may use textbooks, notebooks, and a calculator. 3. Do all your work in
More informationSMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)
SMAM 319 Exam 1 Name 1.Pick the best choice for the multiple choice questions below (10 points 2 each) A b In Metropolis there are some houses for sale. Superman and Lois Lane are interested in the average
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationSMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3
SMAM 319 Exam1 Name 1. Pick the best choice. (10 points-2 each) _c A. A data set consisting of fifteen observations has the five number summary 4 11 12 13 15.5. For this data set it is definitely true
More informationSchool of Mathematical Sciences. Question 1. Best Subsets Regression
School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 9 and Assignment 8 Solutions Question 1 Best Subsets Regression Response is Crime I n W c e I P a n A E P U U l e Mallows g E P
More informationOverview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.
3.1-1 Overview 4.1 Tables and Graphs for the Relationship Between Two Variables 4.2 Introduction to Correlation 4.3 Introduction to Regression 3.1-2 4.1 Tables and Graphs for the Relationship Between Two
More informationChapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models
Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation
More informationMultiple Linear Regression
Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach
More informationChapter 26 Multiple Regression, Logistic Regression, and Indicator Variables
Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables 26.1 S 4 /IEE Application Examples: Multiple Regression An S 4 /IEE project was created to improve the 30,000-footlevel metric
More informationMultiple Regression: Chapter 13. July 24, 2015
Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)
More informationMULTIPLE LINEAR REGRESSION IN MINITAB
MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments
More informationLecture 11 Multiple Linear Regression
Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression
More informationChapter 12: Multiple Regression
Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x
More information6. Multiple regression - PROC GLM
Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationChapter 14 Multiple Regression Analysis
Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationCorrelation and Regression
Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should
More informationMULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1
MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS F. Chiaromonte 1 Pool of available predictors/terms from them in the data set. Related to model selection, are the questions: What is the relative importance
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationHistogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.
Steps for Regression Simple Linear Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal?
More informationSimple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)
Simple Linear Regression 1 Steps for Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal?
More informationSTAT 212 Business Statistics II 1
STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More information(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.
Exam 3 Resource Economics 312 Introductory Econometrics Please complete all questions on this exam. The data in the spreadsheet: Exam 3- Home Prices.xls are to be used for all analyses. These data are
More informationRidge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014
Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...
More information28. SIMPLE LINEAR REGRESSION III
28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of
More informationIntroduction to Regression
Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,
More informationSMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.
SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot. 2. Fit the linear regression line. Regression Analysis: y versus x y
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More informationAP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation
Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationSections 7.1, 7.2, 7.4, & 7.6
Sections 7.1, 7.2, 7.4, & 7.6 Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 25 Chapter 7 example: Body fat n = 20 healthy females 25 34
More informationSix Sigma Black Belt Study Guides
Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships
More informationSteps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?
Steps for Regression Simple Linear Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal?
More informationStat 501, F. Chiaromonte. Lecture #8
Stat 501, F. Chiaromonte Lecture #8 Data set: BEARS.MTW In the minitab example data sets (for description, get into the help option and search for "Data Set Description"). Wild bears were anesthetized,
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationApart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.
B. Sc. Examination by course unit 2014 MTH5120 Statistical Modelling I Duration: 2 hours Date and time: 16 May 2014, 1000h 1200h Apart from this page, you are not permitted to read the contents of this
More informationStatistics 5100 Spring 2018 Exam 1
Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationLEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT
LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT Laura M Williams, RN, CLNC, MSN MOREHEAD STATE UNIVERSITY IET603: STATISTICAL QUALITY ASSURANCE IN SCIENCE AND TECHNOLOGY DR. AHMAD
More informationPractice Questions for Exam 1
Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon
More informationAnnouncements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationAnalysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.
Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a
More informationSTAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM
STAT212_E3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICS & STATISTICS Term 171 Page 1 of 9 STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 2017 @ 6:00 PM Name: ID #:
More informationPh.D. Preliminary Examination Statistics June 2, 2014
Ph.D. Preliminary Examination Statistics June, 04 NOTES:. The exam is worth 00 points.. Partial credit may be given for partial answers if possible.. There are 5 pages in this exam paper. I have neither
More information23. Inference for regression
23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence
More informationCh 13 & 14 - Regression Analysis
Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more
More informationStat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights
Stat 529 (Winter 2011) A simple linear regression (SLR) case study Reading: Sections 8.1 8.4, 8.6, 8.7 Mammals brain weights and body weights Questions of interest Scatterplots of the data Log transforming
More informationData Set 8: Laysan Finch Beak Widths
Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.
More informationSTAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis
STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationy n 1 ( x i x )( y y i n 1 i y 2
STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered
More informationACOVA and Interactions
Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationAP Statistics Bivariate Data Analysis Test Review. Multiple-Choice
Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the
More informationPeriod: Date: Lesson 3B: Properties of Dilations and Equations of lines
Name: Period: Date: : Properties of Dilations and Equations of lines Learning Targets I can identify the properties of dilation mentioned as followed: dilation takes a line not passing through the center
More informationFinal Exam Bus 320 Spring 2000 Russell
Name Final Exam Bus 320 Spring 2000 Russell Do not turn over this page until you are told to do so. You will have 3 hours minutes to complete this exam. The exam has a total of 100 points and is divided
More information10. Alternative case influence statistics
10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationWeek 8 Hour 1: More on polynomial fits. The AIC
Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables Hour 3: Interactions Stat 302 Notes. Week 8, Hour 3, Page 1 / 36 Interactions. So far we have extended simple regression in the following
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More information12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)
12.12 Model Building, and the Effects of Multicollinearity (Optional) 1 Although Excel and MegaStat are emphasized in Business Statistics in Practice, Second Canadian Edition, some examples in the additional
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationMODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model
STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients Cautions Want More Stats??? If you have enjoyed learning how to analyze data, and want to
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationPooled Regression and Dummy Variables in CO$TAT
Pooled Regression and Dummy Variables in CO$TAT Jeff McDowell September 19, 2012 PRT 141 Outline Define Dummy Variable CER Example Using Linear Regression The Pattern CER Example Using Log-Linear Regression
More informationSTAT-UB.0103 Exam APRIL.11 SQUARE Version Solutions
STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions S1. Jason Harter is a professional fund raiser for charities. He s currently working with the Pets--Luv animal shelter. The operating account for Pets--Luv
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationStat 231 Final Exam. Consider first only the measurements made on housing number 1.
December 16, 1997 Stat 231 Final Exam Professor Vardeman 1. The first page of printout attached to this exam summarizes some data (collected by a student group) on the diameters of holes bored in certain
More informationPath Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis
Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate
More information