Lab 7 Multiple Regression and F Tests Of a Subset Of Predictors
|
|
- Oscar Montgomery
- 5 years ago
- Views:
Transcription
1 Lab 7 Multiple Regression and F Tests Of a Subset Of Predictors Preliminary Information: [1] Last week someone wanted to change the y axis labeling on a plot of the TukeyHSD plot(). The labels printed vertically, and some of the pairs listings were missing. We can include the argument las=1 in the plot command (for the confidence intervals of the group pairings of the Tukey procedure, to change orientation from vertical to horizontal on the y labeling. [2] Last week someone needed to have Markdown automatically resize a plot of 4 graphs which automatically result from the simple.ls() plot command of the UsingR package. Markdown doesn't automatically reproduce the 4 plot cluster. I have been unsuccessful at finding out how to get Markdown to do that, so, for now I suggest doing the plots of confidence intervals and prediction intervals the long way, as shown in the lab, which Markdown can reproduce successfully. Otherwise, use the 4 plot production of the simple.lm() routine and then copy/paste an expanded view of the graphs into the WORD document (after Markdown completes the WORD document) from Studio. Introduction: Up until now we have been looking at linear models of 2 quantitative variables (or transformed quantitative variables, such that a linear model would fit better), so that we could, given some sort of al strong relationship between our response (y) and explanatory (x), predict y given a value of x, within our range of x, to some sort of accuracy. The generic model for this linear model was presented in 2 forms: ^y =β o +β 1 x, or the more precise y=β o +β 1 x +ϵ Remember that we first made a scatter plot, with orthogonal axes x and y (usually in the first quadrant), and from that we came up with a residual plot which would give visual evidence (or not) that a linear fit would be appropriate, and that the error was relatively symmetrical across the range of x values. We also generated a set of summary statistics on the model, using the summary() command, along with various graphics, like qqnorm() plots, histograms, box plots, etc., in our linear regression investigations. It seems that the next logical step would be to ask if there are any of a number of explanatory variables (xi) which would influence the response variable (y). The possible formulas for this linear modeling are: ^y =β 0 +β 1 x +...+β n x n and y=β 0 +β 1 x +...+β n x n +ϵ, where all of the various x's are candidates for being influencing explanatory variables. We can branch from that to the next level of questions, asking if any powers of these x's (different than the first power) would influence the model, or if any interacting terms (i.e., xi * xn) would influence our model. In theory, we could also construct an n-space scatter plot, with more -1-
2 than 2 orthogonal axes (again, usually in the first quadrant), where a straight line would model the points through the n-space. See below for a picture of what this linear model would look like in 3-space. Example: Let us use our patients.txt data file, containing height (inches), weight (lbs), and catheter length (cath, in mm) of various young people, to see if catheter size (y) can be predicted by height and weight (x1, x2)of the patients. Look at the code used below. # lab 6 Multiple Regression Introduction # ====================================== patients <- read.delim("c:/users/michael/desktop/lab 7/patients.txt") data1 <- patients data1 height <- data1[,1] ; weight <- data1[,2] ; cath <- data1[,3] model1 <- lm(cath ~ height + weight) model1 summary(model1) plot(fitted(model1), resid(model1), main="residual plot", xlab="predicted response", ylab="residuals") abline(h=0, lty=2) var1 <- predict(model1) var2 <- fitted (model1) The residual plot and output are shown below. It doesn't look very promising that a linear fit is appropriate for our resulting model, since our error doesn't look very uniform across the predicted values. Our model1, as described by the lm() command above, is ^ cath=β 0 +β 1 height +β 2 weight= height weight Also notice that our var1 vector = var2 vector, showing that predict(model1) = fitted(model1) next -2-
3 The linear model of best fit, according to our lm() output is: ^ cath= height weight Homework [1]: Some archaeologists theorize that ancient Egyptians interbred with several different immigrant populations over thousands of years. To see if there is any indication of changes in body structure that might have resulted, they took several measurements (MB, BH, BL, and NH) of 30 skulls of male Egyptians dated from several eras (-4000bce, -3300bce, -1850bce, -200bce, and 150ad) (A. Thomson and R. Randall-Maciver, Ancient Races of the Thebiad, Oxford University Press, 1905). Generate a multiple regression model of the results of this study in pyramids.csv, and produce a summary of the model parameters, a residual plot, and a short description (using proper sentence structure) of your conclusions on the effect of BH, BL, and NH (the explanatory variables) on MB (the response). Model Predictors= Important and Unimportant ones: We want to review with you some procedures in manipulating general regression models-specifically, how to check for statistical significance when you remove some possible predictors from your complete linear model, and how to look for predictor interactions in models. As I understand it, these are sort of special cases of other general regression procedures, where both quantitative and categorical variables are involved. F-Test of a subset of Predictors: The picture below is the formulas used to determine how important predictors are to a model. -3-
4 We are testing the hypothesis test significance of assuming (under Ho) that one or more of the β's are 0, and, therefore, of no importance to the prediction power of our model. In our F test for this hypothesis we use the SS of the complete model, the SS of the reduced model, and the SS of the complete model Residual. Our degrees of freedom include n (the number of data points, k (the total number of predictors in the model), and g (the remaining number of predictors not hypothesized to be 0). If, for example, our computed F is much greater than the F.01(df1, df2), which is the 99th quantile of the F distribution, then we have statistical significance at the α = 0.01 level and we reject Ho. In essence, if we have statistically significant results, we know that at least one of the proposed predictors we threw out as not impacting our model is, indeed, needed for the model. Example: Let us use the example of bass catch data, which is located in basscatch.csv. A state fisheries commission wants to estimate the number of bass caught in a given lake in a season in order to restock the lake with an appropriate number of young fish. The commission could get a fairly accurate assessment of the seasonal catch by extensive netting sweeps of the lake before and after a season, but this technique is much too expensive to be done routinely. Therefore, the commission samples a number of lakes (the observational units) and records the seasonal catch (thousands of bass/sq.mi. of lake area), the number of lake area residences (per sq.mi. of lake area), the size of the lake (in sq.mi.), if the lake has public access (0 if not and 1 if so), and a structure index (these are weed beds, sunken trees, drop-offs, and other living places for bass). Part of the data set is shown below. I first read in the data file with the commands shown below. -4-
5 # Lab 7 Sec 12.5 and 12.7 # ======================== basscatch <- read.csv("c:/users/michael/desktop/lab 7/basscatch.csv") data1 <- basscatch data1 catch <- data1[,1] ; residence <- data1[,2] size <- data1[,3] ; access <- data1[,4] structure <- data1[,5] We next might want to look at some scatter plot information of 2 variables at a time, to see if there are any obvious candidates we might want to throw out of our model and test their significance. pairs(data1) This command results in the lattice type of plot below. Our complete model would then be: catch=β 0 +β 1 residence+β 2 siz+β 3 access+β 4 structure+ϵ Now, after a bit of investigating, let us hypothesize that we can throw out access and structure from our model, leaving the reduced model: catch=β 0 +β 1 residence +β 2 siz+ϵ -5-
6 Now using the letters from our formula we have n = 20, k = 4, and g = 2. The code below computes the needed F quantile from the output of our lm() and anova() commands of the 2 models. Note in the code that I call the full model modelfull and the reduced model modelpart. modelfull <- lm(catch ~ residence + size + access + structure) modelpart <- lm(catch ~ residence + size) summary(modelfull) anova(modelfull) summary(modelpart) anova(modelpart) Results of the full model are shown below. The results from the reduced model are shown below. -6-
7 From this output we can get the numbers we need to perform the partial F test sscomplete < ssreduced < ssresid.complete < k <- 4 ; g <- 2 ; n <- 20 Fvalue <- ((sscomplete ssreduced)/(k-g))/(ssresid.complete/ (n*(k+1))) qf(.99,2,15) Fvalue Results are shown below. With such a high F value way above allowable, we see that at least one of the variables we removed is influential to the model, has high predictive value, and should have not been removed. Below is the residual plot of the full model, along with code. plot(predict(modelfull),resid(modelfull), main="full Model Residual Plot") abline(h=0, lty=2) -7-
8 The plot gives some evidence that a linear model might be appropriate here. Homework[2]: Using the basscatch.csv data set, pick one or 2 variables which you hypothesize might be 0 and use this subset F test procedure to gather evidence on your hypothesis. Interactions: We want to apply our hypothesis approach shown above to a specific application where we want to compare slopes of different regression lines. Statistically, if slopes are different enough we probably have an influential predictor in an interaction term in our overall regression model. Below are pictures of various interaction possibilities among factors of a regression model, where we have blood pressure in adults affected by 3 levels of dosage (10mg, 20mg, 30mg, factor A) and 2 administration times for the dosage (once/day, twice/day, factor B ). The table shows various possibilities of interaction and significance of factors A and B, with the ANOVA result shown in the right column (with results done in Minitab output format), and statement of the factor effect in the left column. Note that these results were done in a simulation, rather than as results from an actual study. In the graphs of the middle column for the table below, the lines represent the levels of the times-per-day factor (B). The x-axis represent the levels of the dosage factor. Factor Effect A and B are both significant in the model, with no interaction present. Blood pressure changes across dosage levels for taking drug once or twice daily, but lines are so close together that whether you take the drug once or twice daily makes no difference. So, factor A (dosage) is significant, and factor B (times/day) is not significant. Factor B is significant but A is not. The lines are flat across dosage levels, indicating dosage has no effect on blood pressure. However, the 2 lines are spread apart, ANOVA results 2-Way ANOVA results (Minitab output)
9 indicating their effect on blood pressure is significant. The lines are flat and close together, so you have no interaction and both factors A and B are not significant. Factors A and B interact because lines cross. Taking drug twice per day at low dose, low blood pressure results, As dose increases, so does blood pressure. The opposite result occurs between once per day for dosage level vs blood pressure. Example: We will use the data set ratanxiety.csv. We have 2 different drug products (A and B), administered to 2 groups of rats in this experiment, and within each group different doses of the drug (5mg, 10mg, 20mg) are administered. The anxiety level of each rat is then measured, according to some rat anxiety scale. The partial picture of the data set is shown below. The x2 is a categorical predictor which takes on the value 1 if drug B is used and 0 if drug A is used. -9-
10 We want to perform our F test to see if the full model can have the interaction term deleted from it without losing significant prediction capability. Specifically, our full model is: anxiety=β 0 +β 1 dose+β 2 x 2+β 3 dose x 2+ ϵ Our reduced model is: anxiety=β 0 +β 1 dose+β 2 x 2+ϵ, where the interaction term is deleted. Our hypothesis test, then, is Ho: β3 = 0 vs Ha: β3 is not 0 We will compute the F test according to the formula used in our previous example, where Before we do this test, let us first read in the data and plot ANXIETY vs DOSE for each of the drugs, and superimpose the regression lines of each drug on the plot. The code used is shown below. # 2nd problem presentation # ratanxiety <- read.csv("c:/users/michael/desktop/lab 7/ratanxiety.csv") data3 <- ratanxiety product <- data3[,1] ; dose <-data3[,2] anxiety <- data3[,3] ; x2 <- data3[,4] modela <- lm(anxiety[1:30] ~ dose[1:30]) anova(modela) modelb <- lm(anxiety[31:60] ~ dose[31:60]) anova(modelb) total.model <- lm(anxiety ~ dose + x2 + I(dose*x2)) summary(total.model) anova(total.model) model.reduced <- lm(anxiety ~ dose + x2) summary(model.reduced) anova(model.reduced) plot(dose, anxiety, pch=as.character(product), main="slopes of Product A and B") abline(modela) abline(modelb, lty=2) text(12,25, labels="solid line is A, dashed is B") Graphical output is below. -10-
11 Model output is below. Next. -11-
12 I read in the data3 first, then label each column as product, dose, anxiety and x2, respectively. I next construct 4 models which I will need later, using the lm() command. modela is the model of the A drug effects only, modelb is of the B drug effects only, total.model is the total model and model.reduced is the model with the β3 term removed. I also produced the anova() information of these models, since I will need them later. Finally, I produced the plot of ANXIETY on DOSE, distinguishing the A and B dots, and superimposing the lines of best fit using the abline() command. We have some evidence with this plot that our F test will end up significant at the.05 level. I will leave it up to you to find and add up the various sum of squares values to see that we have the following F value computed. We compute our allowed F value using the command qf(.95, 1, 56), which is= 4.01 So, since our computed F of 22 is so much greater than the allowed F of 4, at the α = 0.05 level we conclude that β3 is not 0, and we have powerful prediction power in the interaction term of our model (which we suspected already from our previous scatter plot). -12-
13 Homework[2]: The data set crops.csv is mercury poisoning results of an agricultural experiment involving 3 kinds of crops (corn, wheat, and barley) planted in mercury tainted soil (called sludge). There are 6 levels of soil contamination, and the mercury content contained in the plants (the response variable) is measured. Compare corn with wheat to see if the β3 (the interaction term) is significant. Possibly construct a scatter plot with the 2 ablines() superimposed on the plot to see what you will end up with on the hypothesis test. Use the α = 0.05 level. You may want to add an x2 column of 0's and 1's as I did in my demonstration. Remember you will only be using rows 1 through 60 (i.e., [1:60]) for this test. Homework[3]: Repeat HW[2] for comparing corn and barley, for β3 significance at the 0.05 level. You may want to construct a new data set for this, with new x2 column of 0's and 1's where you have deleted the wheat rows. Homework[4]: Repeat HW[2] for comparing wheat and barley. Again you may want to construct a new data set with corn rows deleted and a new set of x2 values. You will have 2 weeks to complete this lab assignment. -13-
STAT 458 Lab 4 Linear Regression Analysis
STAT 458 Lab 4 Linear Regression Analysis Scatter Plots: When investigating relationship between 2 quantitative variables, one of the first steps might be to construct a scatter plot of the response variable
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationBasic Statistics Exercises 66
Basic Statistics Exercises 66 42. Suppose we are interested in predicting a person's height from the person's length of stride (distance between footprints). The following data is recorded for a random
More informationL21: Chapter 12: Linear regression
L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationSix Sigma Black Belt Study Guides
Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationSTA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6
STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationFactorial Analysis of Variance
Factorial Analysis of Variance Overview of the Factorial ANOVA In the context of ANOVA, an independent variable (or a quasiindependent variable) is called a factor, and research studies with multiple factors,
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationPart II { Oneway Anova, Simple Linear Regression and ANCOVA with R
Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationSelf-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons
Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments while blocking for study participant race (Black,
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationCHAPTER EIGHT Linear Regression
7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following
More informationy = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output
12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationANOVA: Analysis of Variation
ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical
More informationSTAT 350. Assignment 4
STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More information23. Inference for regression
23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence
More informationStats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data
1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationWarm-up Using the given data Create a scatterplot Find the regression line
Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationChapter 3: Examining Relationships
Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationLecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000
Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationLecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)
Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationCh. 1: Data and Distributions
Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and
More informationStatistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).
Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results
More informationMODULE 4 SIMPLE LINEAR REGRESSION
MODULE 4 SIMPLE LINEAR REGRESSION Module Objectives: 1. Describe the equation of a line including the meanings of the two parameters. 2. Describe how the best-fit line to a set of bivariate data is derived.
More information15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES
15.8 MULTIPLE REGRESSION WITH MANY EXPLANATORY VARIABLES The method of multiple regression that we have studied through the use of the two explanatory variable life expectancies example can be extended
More information1. Use Scenario 3-1. In this study, the response variable is
Chapter 8 Bell Work Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber of 32 cherry trees are measured by a researcher. The goal is to determine if volume of usable lumber can
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationLecture 4 Scatterplots, Association, and Correlation
Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variable In this lecture: We shall look at two quantitative variables.
More informationChapter 8. Linear Regression /71
Chapter 8 Linear Regression 1 /71 Homework p192 1, 2, 3, 5, 7, 13, 15, 21, 27, 28, 29, 32, 35, 37 2 /71 3 /71 Objectives Determine Least Squares Regression Line (LSRL) describing the association of two
More informationData files & analysis PrsnLee.out Ch9.xls
Model Based Statistics in Biology. Part III. The General Linear Model. Chapter 9.2 Regression. Explanatory Variable Fixed into Classes ReCap. Part I (Chapters 1,2,3,4) ReCap Part II (Ch 5, 6, 7) ReCap
More information28. SIMPLE LINEAR REGRESSION III
28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationEcn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,
More informationMODULE 11 BIVARIATE EDA - QUANTITATIVE
MODULE 11 BIVARIATE EDA - QUANTITATIVE Contents 11.1 Response and Explanatory................................... 78 11.2 Summaries............................................ 78 11.3 Items to Describe........................................
More informationx3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators
Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationMean Comparisons PLANNED F TESTS
Mean Comparisons F-tests provide information on significance of treatment effects, but no information on what the treatment effects are. Comparisons of treatment means provide information on what the treatment
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More informationTopic 4: Orthogonal Contrasts
Topic 4: Orthogonal Contrasts ANOVA is a useful and powerful tool to compare several treatment means. In comparing t treatments, the null hypothesis tested is that the t true means are all equal (H 0 :
More informationAnnouncements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationy n 1 ( x i x )( y y i n 1 i y 2
STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationChapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression
Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing
More informationFactorial designs. Experiments
Chapter 5: Factorial designs Petter Mostad mostad@chalmers.se Experiments Actively making changes and observing the result, to find causal relationships. Many types of experimental plans Measuring response
More informationCorrelation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?
Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationFoundations of Correlation and Regression
BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations
More informationRegression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).
Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,
More informationAssignment #7. Chapter 12: 18, 24 Chapter 13: 28. Due next Friday Nov. 20 th by 2pm in your TA s homework box
Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 20 th by 2pm in your TA s homework box Lab Report Posted on web-site Dates Rough draft due to TAs homework box on Monday Nov. 16 th
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationChapter 10 Regression Analysis
Chapter 10 Regression Analysis Goal: To become familiar with how to use Excel 2007/2010 for Correlation and Regression. Instructions: You will be using CORREL, FORECAST and Regression. CORREL and FORECAST
More informationAssumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals
Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,
More informationINTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS
GEORGE W. COBB Mount Holyoke College INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS Springer CONTENTS To the Instructor Sample Exam Questions To the Student Acknowledgments xv xxi xxvii xxix 1. INTRODUCTION
More informationModel Building Chap 5 p251
Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4
More informationMultiple Regression Examples
Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +
More informationLecture 4 Scatterplots, Association, and Correlation
Lecture 4 Scatterplots, Association, and Correlation Previously, we looked at Single variables on their own One or more categorical variables In this lecture: We shall look at two quantitative variables.
More informationANOVA (Analysis of Variance) output RLS 11/20/2016
ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.
More informationAP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation
Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may
More informationClinton Community School District K-8 Mathematics Scope and Sequence
6_RP_1 6_RP_2 6_RP_3 Domain: Ratios and Proportional Relationships Grade 6 Understand the concept of a ratio and use ratio language to describe a ratio relationship between two quantities. Understand the
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment
More informationStatistical Concepts. Constructing a Trend Plot
Module 1: Review of Basic Statistical Concepts 1.2 Plotting Data, Measures of Central Tendency and Dispersion, and Correlation Constructing a Trend Plot A trend plot graphs the data against a variable
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationPresents. The Common Core State Standards Checklist Grades 6-8
Presents The Common Core State Standards Checklist Grades 6-8 Sixth Grade Common Core State Standards Sixth Grade: Ratios and Proportional Relationships Understand ratio concepts and use ratio reasoning
More information1) A residual plot: A)
1) A residual plot: A) B) C) D) E) displays residuals of the response variable versus the independent variable. displays residuals of the independent variable versus the response variable. displays residuals
More informationQ1: What is the interpretation of the number 4.1? A: There were 4.1 million visits to ER by people 85 and older, Q2: What percent of people 65-74
Lecture 4 This week lab:exam 1! Review lectures, practice labs 1 to 4 and homework 1 to 5!!!!! Need help? See me during my office hrs, or goto open lab or GS 211. Bring your picture ID and simple calculator.(note
More informationBivariate Data Summary
Bivariate Data Summary Bivariate data data that examines the relationship between two variables What individuals to the data describe? What are the variables and how are they measured Are the variables
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More informationThe Multiple Regression Model
Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:
More information