STAT 458 Lab 4 Linear Regression Analysis

Similar documents
Lab 7 Multiple Regression and F Tests Of a Subset Of Predictors

Chapter 7. Scatterplots, Association, and Correlation

Approximate Linear Relationships

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Chapter 2: Looking at Data Relationships (Part 3)

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Simple Linear Regression

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

11 Correlation and Regression

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Chapter 2: simple regression model

STAT 111 Recitation 7

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

Review. Midterm Exam. Midterm Review. May 6th, 2015 AMS-UCSC. Spring Session 1 (Midterm Review) AMS-5 May 6th, / 24

Relationships Regression

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

STAT 3022 Spring 2007

Chapter 6: Exploring Data: Relationships Lesson Plan

Foundations of Correlation and Regression

Business Statistics. Lecture 9: Simple Regression

Stat 101 L: Laboratory 5

Six Sigma Black Belt Study Guides

MULTIPLE REGRESSION METHODS

Summarizing Data: Paired Quantitative Data

Unit 6 - Introduction to linear regression

Chapter 12 - Part I: Correlation Analysis

Multiple Linear Regression for the Supervisor Data

Chapter 4 Describing the Relation between Two Variables

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Looking at data: relationships

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

20. Ignore the common effect question (the first one). Makes little sense in the context of this question.

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Math M111: Lecture Notes For Chapter 3

5. Linear Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Chapter 3. Introduction to Linear Correlation and Regression Part 3

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

appstats27.notebook April 06, 2017

1 A Review of Correlation and Regression

Regression. Marc H. Mehlman University of New Haven

Simple Linear Regression

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

5. Linear Regression

Chapter 7 Linear Regression

Prob/Stats Questions? /32

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Unit 6 - Simple linear regression

AMS 7 Correlation and Regression Lecture 8

Chapter 19 Sir Migo Mendoza

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

BIOSTATISTICS NURS 3324

Least-Squares Regression. Unit 3 Exploring Data

Diagnostics and Transformations Part 2

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

STAT 31 Practice Midterm 2 Fall, 2005

Statistical View of Least Squares

Predicted Y Scores. The symbol stands for a predicted Y score

Review of Multiple Regression

Analysis of Bivariate Data

Chapter 8. Linear Regression /71

Bivariate data analysis

Correlation and Regression

Bivariate Data Summary

Topic 10 - Linear Regression

Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science

15: Regression. Introduction

How to mathematically model a linear relationship and make predictions.

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

4.5 linear regression ink.notebook. November 29, page 159. page 160. page Linear Regression. Standards. Lesson Objectives Standards

Chapter 10 Correlation and Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

28. SIMPLE LINEAR REGRESSION III

Chapter 6. Functions. 01/2017 LSowatsky 1

Warm-up Using the given data Create a scatterplot Find the regression line

Analysing categorical data using logit models

Simple linear regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

How to mathematically model a linear relationship and make predictions.

appstats8.notebook October 11, 2016

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

36-309/749 Math Review 2014

Chapter 27 Summary Inferences for Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

The following formulas related to this topic are provided on the formula sheet:

Lecture notes on Regression & SAS example demonstration

UNIT 12 ~ More About Regression

4.5 linear regression ink.notebook. November 30, page 177 page Linear Regression. Standards. page 179. Lesson Objectives.

y n 1 ( x i x )( y y i n 1 i y 2

SECTION I Number of Questions 42 Percent of Total Grade 50

Linear Regression and Correlation. February 11, 2009

Intro to Linear Regression

Chapter 9. Correlation and Regression

Chapter 3 - Linear Regression

Algebra 2 C Midterm Exam Review Topics Exam Date: A2: Wednesday, January 21 st A4: Friday, January 23 rd

Transcription:

STAT 458 Lab 4 Linear Regression Analysis Scatter Plots: When investigating relationship between 2 quantitative variables, one of the first steps might be to construct a scatter plot of the response variable (y variable on the y axis) vs the explanatory variable (x variable on the x axis). Usually, the plot lives in the first quadrant of the x-y graph, and the points (x,y) represent (explanatory, response) pairs. Do birds chirp more or less in cold climates? Let us investigate if there is any relationship between the bird chirping frequency and ambient temperature. I import chirpstemp.csv into R Studio, which shows the following. We initially see that our chirp frequency goes up as the temperature goes up, as a general trend. We see that possibly a straight line model might be a fair predictor of the response (chirps) from the ambient temperature. The code used to generate this is shown below. # scatter plot chirpstemp <- read.csv("c:/users/michael O'Lear/Desktop/chirpstemp(temp.csv") plot(chirpstemp$temp, chirpstemp$chirps, main="bird Chirps vs Temp (in deg F)") Linear Model: If it appears that there might be a linear relationship between the variables, we can set up a model: y=β 0 x+ϵ where y is the value of response variable, x is the value of explanatory variable, b 0 is the linear model intercept, b 1 is the linear model slope, and e is the error or residual (either plus or minus) of the predicted model response (y-hat) and the actual value of y. We sometimes express the linear model equation (or the line of best fit ) in the following way. -1-

ŷ=β 0 x, where the y-hat stands for the estimation of y, or y estimate, from our model. With this symbolism we see that the error, called the residual, is ϵ = y ŷ. It is important that you always remember to take actual y minus estimated y, since the sign of e indicates model over prediction (a minus e) and under prediction (a plus e). Since it looks like we might have a linear model with our birdchirp data here, we can use R to find the parameter values of that relationship (i.e., b o and b 1 ). The lm() command creates the model, along with useful statistics and values, as shown in the code below. # linear model model1 <- lm(chirpstemp$temp ~ chirpstemp$chirps, data=chirpstemp) model1 Output is shown below. By the output we see that the linear model for this is chirps= 131.232+3.809 temp( F o ) We could predict chirps from ambient temp., as long as we stay within the range of the temp. data and not commit the mistake of extrapolation. We will talk more about extrapolation later. For example, for a temperature of 70 0, we estimate a predicted number of chirps at -131.232 + 3.809 * 70 = 135.4 chirps/min. We can superimpose a line of best fit onto the original scatter plot using the abline() command, now that we have a model to refer to. See below. The linear model seems to fit rather well across the range of temps. -2-

A final warning about using the plot command and lm command close to each other. Note that the context of plot is plot(x, y) whereas the lm context is lm(y ~ x). R users may sometimes do both commands the same way, resulting in bad information. Using Formulas in lm() and in other commands: Notice that there is a sort of formula notation inside of the lm() command. Sometimes we have to generate formulas for linear models, as well as other transformed models. A brief table showing how R recognizes formulas is shown below. Equation R formula ŷ=β o x ŷ=β o ln( x) ln( ŷ)=β o x ln( ŷ)=β o ln( x) ŷ=β o x 2 +β 3 x ŷ=β o (x) y ~ x y ~ log(x) log(y) ~ x log(y) ~ log(x) y ~ I(x^2) + x y ~ sqrt(x) We will look at more R formulas later when we talk about multiple regression analyses. What lm() generates: When you execute lm() and create a model, as we did when we created model1 above, many useful statistics were generated in the background of R. The following is a list of some of the more useful commands. -3-

Generated What it does/is Uses summary(model1) Gives useful info like b o, b 1, r 2, F-statistic and p values resid(model1) fitted(model1) predict(model1) Lists residuals for every point Lists model predictions for every point Computes predictions of y from x, according to model Used in residual plot and in other plots and uses Used in residual plot, plot() command and others anova(model1) Used later in course deviance(model1) Computes RSS Used later in course AIC(model1) Used later in course

model1 coef(model1) Returns coefficients of model Residual plot: No determination of linearity is thorough until a residual plot is presented. This is a sort of scatter plot of residuals vs predicted (or sometimes x) values, showing how appropriate a fit a linear model might be to the data points. Briefly, if we: see no pattern in the dots, see the points roughly uniformly distributed above and below the x axis, and see no marked change in variance above and below the x axis as we go from left to right on the plot, we can make the case that the linear fit is an appropriate fit or model for our study. If we violate any of those bulleted points, we must proceed in our linear regression with caution and suspicion in our results. Below is the residual plot of our chirp data, where we used the resid(model1) and fitted(model1) as our y and x axis values, respectively. # residual plot plot(fitted(model1),resid(model1), main="residual of Chirps Model" ) abline(h=0) Notice that we plotted a horizontal line at y=0 to show the x axis of this plot. We see that the residual plot seems to show that a linear fit is appropriate for this data. Other Information: We finally would want to know what the degree of fit is, now that we believe that a linear model will fit OK we want to know what r (the correlation coefficient is). We can get that from the summary information, showing that the r2 value is 0.9567, giving us -5-

.9567 = 0.9781, or use the cor() command shown below. # other information sqrt(.9567) cor(chirpstemp$temp,chirpstemp$chirps) Homework [1]: Import the data file patients.csv. We want to see if there is a linear relationship between height and weight (units unknown) of the patients used in this study, and if so, how strong that relationship is, where height is the explanatory variable. Produce scatter plot, residual plot, and model statistics answering these questions, and include a 50-100 word statement of conclusions for this study. Find the estimated weight of a patient who has height of 51. Scatter Plot Matrix: Sometimes we have a large number of quantitative variables in a study, and we want to see if there are any relationships between any two of them. R has a nice function, called pairs(), which make multiple scatter plots of pairs of variables. The data set statgrades.csv is a standardized set of grades for a statistics class and includes midterm, final, homework, and final class grade for each student. The command shown below was used to produce the following graphical display. pairs(statgrades) Notice that all grades have been converted from their raw score to a standardized N(0,1) z score, so that difficulty levels are not weighting/biasing the cumulative scores, -6-

but rather associated with the mean of each test. This is why all scores seem to go from -3 to 3 or less. Homework [2]: Pick one or two pairs which appear to have a possible linear relationship. Make individual scatter plots, summary statistics, residual plots, etc. and come up with a conclusion to the question: Is there a linear relationship and, if so, how strong is it? Pick what you think should be explanatory and response variables. Include a 50 word justification of your results. Homework [3]: Import the data set bloodpres.csv. We are interested in any linear relationship between age and systolic pressure. As before, investigate the linearity and strength of relationship, if age can predict systolic pressure. -7-