Lab 7 Multiple Regression and F Tests Of a Subset Of Predictors

Size: px
Start display at page:

Download "Lab 7 Multiple Regression and F Tests Of a Subset Of Predictors"

Transcription

1 Lab 7 Multiple Regression and F Tests Of a Subset Of Predictors Preliminary Information: [1] Last week someone wanted to change the y axis labeling on a plot of the TukeyHSD plot(). The labels printed vertically, and some of the pairs listings were missing. We can include the argument las=1 in the plot command (for the confidence intervals of the group pairings of the Tukey procedure, to change orientation from vertical to horizontal on the y labeling. [2] Last week someone needed to have Markdown automatically resize a plot of 4 graphs which automatically result from the simple.ls() plot command of the UsingR package. Markdown doesn't automatically reproduce the 4 plot cluster. I have been unsuccessful at finding out how to get Markdown to do that, so, for now I suggest doing the plots of confidence intervals and prediction intervals the long way, as shown in the lab, which Markdown can reproduce successfully. Otherwise, use the 4 plot production of the simple.lm() routine and then copy/paste an expanded view of the graphs into the WORD document (after Markdown completes the WORD document) from Studio. Introduction: Up until now we have been looking at linear models of 2 quantitative variables (or transformed quantitative variables, such that a linear model would fit better), so that we could, given some sort of al strong relationship between our response (y) and explanatory (x), predict y given a value of x, within our range of x, to some sort of accuracy. The generic model for this linear model was presented in 2 forms: ^y =β o +β 1 x, or the more precise y=β o +β 1 x +ϵ Remember that we first made a scatter plot, with orthogonal axes x and y (usually in the first quadrant), and from that we came up with a residual plot which would give visual evidence (or not) that a linear fit would be appropriate, and that the error was relatively symmetrical across the range of x values. We also generated a set of summary statistics on the model, using the summary() command, along with various graphics, like qqnorm() plots, histograms, box plots, etc., in our linear regression investigations. It seems that the next logical step would be to ask if there are any of a number of explanatory variables (xi) which would influence the response variable (y). The possible formulas for this linear modeling are: ^y =β 0 +β 1 x +...+β n x n and y=β 0 +β 1 x +...+β n x n +ϵ, where all of the various x's are candidates for being influencing explanatory variables. We can branch from that to the next level of questions, asking if any powers of these x's (different than the first power) would influence the model, or if any interacting terms (i.e., xi * xn) would influence our model. In theory, we could also construct an n-space scatter plot, with more -1-

2 than 2 orthogonal axes (again, usually in the first quadrant), where a straight line would model the points through the n-space. See below for a picture of what this linear model would look like in 3-space. Example: Let us use our patients.txt data file, containing height (inches), weight (lbs), and catheter length (cath, in mm) of various young people, to see if catheter size (y) can be predicted by height and weight (x1, x2)of the patients. Look at the code used below. # lab 6 Multiple Regression Introduction # ====================================== patients <- read.delim("c:/users/michael/desktop/lab 7/patients.txt") data1 <- patients data1 height <- data1[,1] ; weight <- data1[,2] ; cath <- data1[,3] model1 <- lm(cath ~ height + weight) model1 summary(model1) plot(fitted(model1), resid(model1), main="residual plot", xlab="predicted response", ylab="residuals") abline(h=0, lty=2) var1 <- predict(model1) var2 <- fitted (model1) The residual plot and output are shown below. It doesn't look very promising that a linear fit is appropriate for our resulting model, since our error doesn't look very uniform across the predicted values. Our model1, as described by the lm() command above, is ^ cath=β 0 +β 1 height +β 2 weight= height weight Also notice that our var1 vector = var2 vector, showing that predict(model1) = fitted(model1) next -2-

3 The linear model of best fit, according to our lm() output is: ^ cath= height weight Homework [1]: Some archaeologists theorize that ancient Egyptians interbred with several different immigrant populations over thousands of years. To see if there is any indication of changes in body structure that might have resulted, they took several measurements (MB, BH, BL, and NH) of 30 skulls of male Egyptians dated from several eras (-4000bce, -3300bce, -1850bce, -200bce, and 150ad) (A. Thomson and R. Randall-Maciver, Ancient Races of the Thebiad, Oxford University Press, 1905). Generate a multiple regression model of the results of this study in pyramids.csv, and produce a summary of the model parameters, a residual plot, and a short description (using proper sentence structure) of your conclusions on the effect of BH, BL, and NH (the explanatory variables) on MB (the response). Model Predictors= Important and Unimportant ones: We want to review with you some procedures in manipulating general regression models-specifically, how to check for statistical significance when you remove some possible predictors from your complete linear model, and how to look for predictor interactions in models. As I understand it, these are sort of special cases of other general regression procedures, where both quantitative and categorical variables are involved. F-Test of a subset of Predictors: The picture below is the formulas used to determine how important predictors are to a model. -3-

4 We are testing the hypothesis test significance of assuming (under Ho) that one or more of the β's are 0, and, therefore, of no importance to the prediction power of our model. In our F test for this hypothesis we use the SS of the complete model, the SS of the reduced model, and the SS of the complete model Residual. Our degrees of freedom include n (the number of data points, k (the total number of predictors in the model), and g (the remaining number of predictors not hypothesized to be 0). If, for example, our computed F is much greater than the F.01(df1, df2), which is the 99th quantile of the F distribution, then we have statistical significance at the α = 0.01 level and we reject Ho. In essence, if we have statistically significant results, we know that at least one of the proposed predictors we threw out as not impacting our model is, indeed, needed for the model. Example: Let us use the example of bass catch data, which is located in basscatch.csv. A state fisheries commission wants to estimate the number of bass caught in a given lake in a season in order to restock the lake with an appropriate number of young fish. The commission could get a fairly accurate assessment of the seasonal catch by extensive netting sweeps of the lake before and after a season, but this technique is much too expensive to be done routinely. Therefore, the commission samples a number of lakes (the observational units) and records the seasonal catch (thousands of bass/sq.mi. of lake area), the number of lake area residences (per sq.mi. of lake area), the size of the lake (in sq.mi.), if the lake has public access (0 if not and 1 if so), and a structure index (these are weed beds, sunken trees, drop-offs, and other living places for bass). Part of the data set is shown below. I first read in the data file with the commands shown below. -4-

5 # Lab 7 Sec 12.5 and 12.7 # ======================== basscatch <- read.csv("c:/users/michael/desktop/lab 7/basscatch.csv") data1 <- basscatch data1 catch <- data1[,1] ; residence <- data1[,2] size <- data1[,3] ; access <- data1[,4] structure <- data1[,5] We next might want to look at some scatter plot information of 2 variables at a time, to see if there are any obvious candidates we might want to throw out of our model and test their significance. pairs(data1) This command results in the lattice type of plot below. Our complete model would then be: catch=β 0 +β 1 residence+β 2 siz+β 3 access+β 4 structure+ϵ Now, after a bit of investigating, let us hypothesize that we can throw out access and structure from our model, leaving the reduced model: catch=β 0 +β 1 residence +β 2 siz+ϵ -5-

6 Now using the letters from our formula we have n = 20, k = 4, and g = 2. The code below computes the needed F quantile from the output of our lm() and anova() commands of the 2 models. Note in the code that I call the full model modelfull and the reduced model modelpart. modelfull <- lm(catch ~ residence + size + access + structure) modelpart <- lm(catch ~ residence + size) summary(modelfull) anova(modelfull) summary(modelpart) anova(modelpart) Results of the full model are shown below. The results from the reduced model are shown below. -6-

7 From this output we can get the numbers we need to perform the partial F test sscomplete < ssreduced < ssresid.complete < k <- 4 ; g <- 2 ; n <- 20 Fvalue <- ((sscomplete ssreduced)/(k-g))/(ssresid.complete/ (n*(k+1))) qf(.99,2,15) Fvalue Results are shown below. With such a high F value way above allowable, we see that at least one of the variables we removed is influential to the model, has high predictive value, and should have not been removed. Below is the residual plot of the full model, along with code. plot(predict(modelfull),resid(modelfull), main="full Model Residual Plot") abline(h=0, lty=2) -7-

8 The plot gives some evidence that a linear model might be appropriate here. Homework[2]: Using the basscatch.csv data set, pick one or 2 variables which you hypothesize might be 0 and use this subset F test procedure to gather evidence on your hypothesis. Interactions: We want to apply our hypothesis approach shown above to a specific application where we want to compare slopes of different regression lines. Statistically, if slopes are different enough we probably have an influential predictor in an interaction term in our overall regression model. Below are pictures of various interaction possibilities among factors of a regression model, where we have blood pressure in adults affected by 3 levels of dosage (10mg, 20mg, 30mg, factor A) and 2 administration times for the dosage (once/day, twice/day, factor B ). The table shows various possibilities of interaction and significance of factors A and B, with the ANOVA result shown in the right column (with results done in Minitab output format), and statement of the factor effect in the left column. Note that these results were done in a simulation, rather than as results from an actual study. In the graphs of the middle column for the table below, the lines represent the levels of the times-per-day factor (B). The x-axis represent the levels of the dosage factor. Factor Effect A and B are both significant in the model, with no interaction present. Blood pressure changes across dosage levels for taking drug once or twice daily, but lines are so close together that whether you take the drug once or twice daily makes no difference. So, factor A (dosage) is significant, and factor B (times/day) is not significant. Factor B is significant but A is not. The lines are flat across dosage levels, indicating dosage has no effect on blood pressure. However, the 2 lines are spread apart, ANOVA results 2-Way ANOVA results (Minitab output)

9 indicating their effect on blood pressure is significant. The lines are flat and close together, so you have no interaction and both factors A and B are not significant. Factors A and B interact because lines cross. Taking drug twice per day at low dose, low blood pressure results, As dose increases, so does blood pressure. The opposite result occurs between once per day for dosage level vs blood pressure. Example: We will use the data set ratanxiety.csv. We have 2 different drug products (A and B), administered to 2 groups of rats in this experiment, and within each group different doses of the drug (5mg, 10mg, 20mg) are administered. The anxiety level of each rat is then measured, according to some rat anxiety scale. The partial picture of the data set is shown below. The x2 is a categorical predictor which takes on the value 1 if drug B is used and 0 if drug A is used. -9-

10 We want to perform our F test to see if the full model can have the interaction term deleted from it without losing significant prediction capability. Specifically, our full model is: anxiety=β 0 +β 1 dose+β 2 x 2+β 3 dose x 2+ ϵ Our reduced model is: anxiety=β 0 +β 1 dose+β 2 x 2+ϵ, where the interaction term is deleted. Our hypothesis test, then, is Ho: β3 = 0 vs Ha: β3 is not 0 We will compute the F test according to the formula used in our previous example, where Before we do this test, let us first read in the data and plot ANXIETY vs DOSE for each of the drugs, and superimpose the regression lines of each drug on the plot. The code used is shown below. # 2nd problem presentation # ratanxiety <- read.csv("c:/users/michael/desktop/lab 7/ratanxiety.csv") data3 <- ratanxiety product <- data3[,1] ; dose <-data3[,2] anxiety <- data3[,3] ; x2 <- data3[,4] modela <- lm(anxiety[1:30] ~ dose[1:30]) anova(modela) modelb <- lm(anxiety[31:60] ~ dose[31:60]) anova(modelb) total.model <- lm(anxiety ~ dose + x2 + I(dose*x2)) summary(total.model) anova(total.model) model.reduced <- lm(anxiety ~ dose + x2) summary(model.reduced) anova(model.reduced) plot(dose, anxiety, pch=as.character(product), main="slopes of Product A and B") abline(modela) abline(modelb, lty=2) text(12,25, labels="solid line is A, dashed is B") Graphical output is below. -10-

11 Model output is below. Next. -11-

12 I read in the data3 first, then label each column as product, dose, anxiety and x2, respectively. I next construct 4 models which I will need later, using the lm() command. modela is the model of the A drug effects only, modelb is of the B drug effects only, total.model is the total model and model.reduced is the model with the β3 term removed. I also produced the anova() information of these models, since I will need them later. Finally, I produced the plot of ANXIETY on DOSE, distinguishing the A and B dots, and superimposing the lines of best fit using the abline() command. We have some evidence with this plot that our F test will end up significant at the.05 level. I will leave it up to you to find and add up the various sum of squares values to see that we have the following F value computed. We compute our allowed F value using the command qf(.95, 1, 56), which is= 4.01 So, since our computed F of 22 is so much greater than the allowed F of 4, at the α = 0.05 level we conclude that β3 is not 0, and we have powerful prediction power in the interaction term of our model (which we suspected already from our previous scatter plot). -12-

13 Homework[2]: The data set crops.csv is mercury poisoning results of an agricultural experiment involving 3 kinds of crops (corn, wheat, and barley) planted in mercury tainted soil (called sludge). There are 6 levels of soil contamination, and the mercury content contained in the plants (the response variable) is measured. Compare corn with wheat to see if the β3 (the interaction term) is significant. Possibly construct a scatter plot with the 2 ablines() superimposed on the plot to see what you will end up with on the hypothesis test. Use the α = 0.05 level. You may want to add an x2 column of 0's and 1's as I did in my demonstration. Remember you will only be using rows 1 through 60 (i.e., [1:60]) for this test. Homework[3]: Repeat HW[2] for comparing corn and barley, for β3 significance at the 0.05 level. You may want to construct a new data set for this, with new x2 column of 0's and 1's where you have deleted the wheat rows. Homework[4]: Repeat HW[2] for comparing wheat and barley. Again you may want to construct a new data set with corn rows deleted and a new set of x2 values. You will have 2 weeks to complete this lab assignment. -13-

STAT 458 Lab 4 Linear Regression Analysis

STAT 458 Lab 4 Linear Regression Analysis STAT 458 Lab 4 Linear Regression Analysis Scatter Plots: When investigating relationship between 2 quantitative variables, one of the first steps might be to construct a scatter plot of the response variable

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

Basic Statistics Exercises 66

Basic Statistics Exercises 66 Basic Statistics Exercises 66 42. Suppose we are interested in predicting a person's height from the person's length of stride (distance between footprints). The following data is recorded for a random

More information

L21: Chapter 12: Linear regression

L21: Chapter 12: Linear regression L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Factorial Analysis of Variance

Factorial Analysis of Variance Factorial Analysis of Variance Overview of the Factorial ANOVA In the context of ANOVA, an independent variable (or a quasiindependent variable) is called a factor, and research studies with multiple factors,

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments while blocking for study participant race (Black,

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

ANOVA: Analysis of Variation

ANOVA: Analysis of Variation ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data 1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000 Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

MODULE 4 SIMPLE LINEAR REGRESSION

MODULE 4 SIMPLE LINEAR REGRESSION MODULE 4 SIMPLE LINEAR REGRESSION Module Objectives: 1. Describe the equation of a line including the meanings of the two parameters. 2. Describe how the best-fit line to a set of bivariate data is derived.

More information

15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES

15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES 15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES The method of multiple regression that we have studied through the use of the two explanatory variable life expectancies example can be extended

More information

1. Use Scenario 3-1. In this study, the response variable is

1. Use Scenario 3-1. In this study, the response variable is Chapter 8 Bell Work Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber of 32 cherry trees are measured by a researcher. The goal is to determine if volume of usable lumber can

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.

More information

Chapter 8. Linear Regression /71

Chapter 8. Linear Regression /71 Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two

More information

Data files & analysis PrsnLee.out Ch9.xls

Data files & analysis PrsnLee.out Ch9.xls Model Based Statistics in Biology. Part III. The General Linear Model. Chapter 9.2 Regression. Explanatory Variable Fixed into Classes ReCap. Part I (Chapters 1,2,3,4) ReCap Part II (Ch 5, 6, 7) ReCap

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

MODULE 11 BIVARIATE EDA - QUANTITATIVE

MODULE 11 BIVARIATE EDA - QUANTITATIVE MODULE 11 BIVARIATE EDA - QUANTITATIVE Contents 11.1 Response and Explanatory................................... 78 11.2 Summaries............................................ 78 11.3 Items to Describe........................................

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Mean Comparisons PLANNED F TESTS

Mean Comparisons PLANNED F TESTS Mean Comparisons F-tests provide information on significance of treatment effects, but no information on what the treatment effects are. Comparisons of treatment means provide information on what the treatment

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Topic 4: Orthogonal Contrasts

Topic 4: Orthogonal Contrasts Topic 4: Orthogonal Contrasts ANOVA is a useful and powerful tool to compare several treatment means. In comparing t treatments, the null hypothesis tested is that the t true means are all equal (H 0 :

More information

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

y n 1 ( x i x )( y y i n 1 i y 2

y n 1 ( x i x )( y y i n 1 i y 2 STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

Factorial designs. Experiments

Factorial designs. Experiments Chapter 5: Factorial designs Petter Mostad mostad@chalmers.se Experiments Actively making changes and observing the result, to find causal relationships. Many types of experimental plans Measuring response

More information

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up? Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Assignment #7. Chapter 12: 18, 24 Chapter 13: 28. Due next Friday Nov. 20 th by 2pm in your TA s homework box

Assignment #7. Chapter 12: 18, 24 Chapter 13: 28. Due next Friday Nov. 20 th by 2pm in your TA s homework box Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 20 th by 2pm in your TA s homework box Lab Report Posted on web-site Dates Rough draft due to TAs homework box on Monday Nov. 16 th

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Chapter 10 Regression Analysis

Chapter 10 Regression Analysis Chapter 10 Regression Analysis Goal: To become familiar with how to use Excel 2007/2010 for Correlation and Regression. Instructions: You will be using CORREL, FORECAST and Regression. CORREL and FORECAST

More information

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,

More information

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS GEORGE W. COBB Mount Holyoke College INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS Springer CONTENTS To the Instructor Sample Exam Questions To the Student Acknowledgments xv xxi xxvii xxix 1. INTRODUCTION

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

Clinton Community School District K-8 Mathematics Scope and Sequence

Clinton Community School District K-8 Mathematics Scope and Sequence 6_RP_1 6_RP_2 6_RP_3 Domain: Ratios and Proportional Relationships Grade 6 Understand the concept of a ratio and use ratio language to describe a ratio relationship between two quantities. Understand the

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Statistical Concepts. Constructing a Trend Plot

Statistical Concepts. Constructing a Trend Plot Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Presents. The Common Core State Standards Checklist Grades 6-8

Presents. The Common Core State Standards Checklist Grades 6-8 Presents The Common Core State Standards Checklist Grades 6-8 Sixth Grade Common Core State Standards Sixth Grade: Ratios and Proportional Relationships Understand ratio concepts and use ratio reasoning

More information

1) A residual plot: A)

1) A residual plot: A) 1) A residual plot: A) B) C) D) E) displays residuals of the response variable versus the independent variable. displays residuals of the independent variable versus the response variable. displays residuals

More information

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74

Q1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74 Lecture 4 This week lab:exam 1! Review lectures, practice labs 1 to 4 and homework 1 to 5!!!!! Need help? See me during my office hrs, or goto open lab or GS 211. Bring your picture ID and simple calculator.(note

More information

Bivariate Data Summary

Bivariate Data Summary Bivariate Data Summary Bivariate data data that examines the relationship between two variables What individuals to the data describe? What are the variables and how are they measured Are the variables

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

The Multiple Regression Model

The Multiple Regression Model Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:

More information