Regression and Models with Multiple Factors. Ch. 17, 18

Similar documents
STAT 3022 Spring 2007

Week 7 Multiple factors. Ch , Some miscellaneous parts

Lecture 18: Simple Linear Regression

Inference for Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Important note: Transcripts are not substitutes for textbook assignments. 1

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

Inferences for Regression

INFERENCE FOR REGRESSION

Regression on Faithful with Section 9.3 content

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Chapter 8: Correlation & Regression

AMS 7 Correlation and Regression Lecture 8

Regression. Marc H. Mehlman University of New Haven

Unit 6 - Simple linear regression

Business Statistics. Lecture 10: Course Review

A discussion on multiple regression models

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

ST430 Exam 1 with Answers

Ch 2: Simple Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Unit 6 - Introduction to linear regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Density Temp vs Ratio. temp

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

appstats27.notebook April 06, 2017

REVIEW 8/2/2017 陈芳华东师大英语系

Workshop 7.4a: Single factor ANOVA

Inference for the Regression Coefficient

Chapter 27 Summary Inferences for Regression

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Ch Inference for Linear Regression

Data Set 8: Laysan Finch Beak Widths

Six Sigma Black Belt Study Guides

Lecture 19: Inference for SLR & Transformations

Comparing Nested Models

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Chapter 16: Understanding Relationships Numerical Data

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

General Linear Model (Chapter 4)

Multiple Linear Regression (solutions to exercises)

Inference for Regression Simple Linear Regression

Exam details. Final Review Session. Things to Review

Introduction and Single Predictor Regression. Correlation

BIOSTATISTICS NURS 3324

Correlation & Simple Regression

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

1 Multiple Regression

Multiple Linear Regression

Simple Linear Regression

ST430 Exam 2 Solutions

Review of Statistics 101

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Scatter plot of data from the study. Linear Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Correlation and Regression Notes. Categorical / Categorical Relationship (Chi-Squared Independence Test)

Tests of Linear Restrictions

CS 5014: Research Methods in Computer Science

Stat 5102 Final Exam May 14, 2015

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Extensions of One-Way ANOVA.

Review of Multiple Regression

Applied Regression Analysis

( ), which of the coefficients would end

Extensions of One-Way ANOVA.

Inference for Regression Inference about the Regression Model and Using the Regression Line

23. Inference for regression

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Warm-up Using the given data Create a scatterplot Find the regression line

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and Regression

Scatter plot of data from the study. Linear Regression

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Lecture 9: Linear Regression

Multiple Regression and Regression Model Adequacy

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Statistics in medicine

Introduction to Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

MATH 644: Regression Analysis Methods

Exam 3 Practice Questions Psych , Fall 9

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Lecture 4 Multiple linear regression

Statistical View of Least Squares

MATH 1150 Chapter 2 Notation and Terminology

1 The Classic Bivariate Least Squares Model

Variance Decomposition and Goodness of Fit

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Transcription:

Regression and Models with Multiple Factors Ch. 17, 18

Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length

Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length

Least-squares The method of least squares chooses the line that minimizes the squared deviations along the Y-axis Y = bx + a The slope of the line is the average change in variable Y given a change in variable X The slope is usually of greater interest than the intercept The slope is the covariance between X and Y divided by the variance in X a positive slope implies a positive covariance and a negative slope implies a negative covariance

Linear Regression Uses Depict the relationship between two variables in an eye-catching fashion Test the null hypothesis of no association between two variables The test is whether or not the slope is zero Predict the average value of variable Y for a group of individuals with a given value of variable X Note that variation around the line can make it very difficult to predict a value for a given individual with much confidence

What are Residuals?

What are Residuals? Chimps have larger testes than predicted for their body weight Gorillas have smaller testes than predicted for their body weight Chimps have a positive residual Gorillas have a negative residual

What are Residuals In general, the residual is the individual s departure from the value predicted by the model In this case the model is simple the linear regression but residuals also exist for more complex models For a model that fits better, the residuals will be smaller on average Residuals can be of interest in their own right, because they represent values that have been corrected for relationships that might be obscuring a pattern (e.g., the body weighttestes mass relationship)

Strong Inference for Observational Studies Noticing a pattern in the data and reporting it represents a post hoc analysis This is not hypothesis testing The results, while potentially important, must be interpreted cautiously What can be done? Based on a post-hoc observational study, construct a new hypothesis for a novel group or system that has not yet been studied For example, given the primate data, a reasonable prediction is that residual testes mass in deer will be associated with mating system Collect a new observational data set from the new group to test the hypothesis

Phylogenetic Pseudoreplication http://www.cell.com/trends/ecologyevolution/fulltext/s0169-5347(10)00146-1

Performing Linear Regression in R Run the regression: > reg_newt <- lm(mass ~ svl, data=males) > summary(reg_newt) > confint(reg_newt) #get the CI for slope Add the line to a plot: > plot(mass ~ svl, data=males) > abline(reg_newt) See http://whitlockschluter.zoology.ubc.ca/r-code/rcode17 for code to draw the line s 95% CI on the graph

Interpreting the Output Call: lm(formula = mass ~ svl, data = males) Residuals: Min 1Q Median 3Q Max -5.5081-1.7670-0.5128 1.6322 6.4942 Slope Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -31.48430 7.43453-4.235 0.000108 *** svl 0.68766 0.09806 7.013 8.71e-09 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 2.514 on 46 degrees of freedom Multiple R-squared: 0.5167, Adjusted R-squared: 0.5062 F-statistic: 49.18 on 1 and 46 DF, p-value: 8.712e-09 p-value for slope

Mass 15 20 25 Least Squares Regression with 95% CI 70 75 80 Snout-Vent Length

Code for That males <- read.csv("malenewtsampledata.csv") plot(males$mass ~ males$svl, pch=19, xlab=list("snout-vent Length", cex=1.5), ylab=list("mass",cex=1.5)) box(lwd=3) axis(1,lwd=3) axis(2,lwd=3) reg_newt <- lm(mass ~ svl, data=males) abline(reg_newt, lwd=3, col="maroon") summary(reg_newt) #Add confidence limits for the regression line xpt <- seq(min(males$svl),max(males$svl), length.out=100) ypt <- data.frame(predict(reg_newt, newdata=data.frame(svl=xpt), interval="confidence")) lines(ypt$lwr ~ xpt, lwd=2, lty=2) lines(ypt$upr ~ xpt, lwd=2, lty=2)

Assumptions of Linear Regression The true relationship must be linear At each value of X, the distribution of Y is normal (i.e., the residuals are normal) The variance in Y is independent of the value of X Note that there are no assumptions about the distribution of X

Common Problems Outliers Regression is extremely sensitive to outliers The line will be drawn to outliers, especially along the x-axis Consider performing the regression with and without outliers Non-linearity Best way to notice is by visually inspecting the plot and the line fit Try a transformation to get linearity [often a log transformation] Non-normality of residuals Can be detected from a residual plot Possibly solved with a transformation Unequal variance Usually visible from a scatterplot or from a residual plot

residuals(reg_newt) -4-2 0 2 4 6 Residual Plot 70 75 80 svl plot(residuals(reg_newt) ~ svl, data=males, pch=19)

Examples of Problems

Outlier Example

Multiple Explanatory Variables The reason ANOVA is so widely used is that it provides a framework to simultaneously test the effects of multiple factors ANOVA also makes it possible to detect interactions among the factors ANOVA is a special case of a general linear model

General Linear Models GLMs handle categorical factors and continuous factors in a single modeling framework ANOVA is a special case with just categorical explanatory variables Linear regression is a special case with just continuous explanatory variables

Two-factor ANOVA Testing: The mean of serum starved versus normal culture The mean of wild-type versus WTC911 The interaction (i.e., wild-type responds differently than WTC911 to the change in culturing conditions [serum starved versus normal culture])

ANOVA with Blocking Analysis is similar to two-factor ANOVA (at least it was in 1995) Except: Typically do not test for an interaction term Don t care if the block is significant or not Remember: you don t care about the block; you simply want to reduce the variance introduced by it

Analysis of Covariance Used to test for a difference in means, while correcting for a variable that is correlated with the response variable The slopes must not differ in the two groups In other words, the mean comparison is only valid if the interaction term is not significant Also used to compare the slope of two regression lines If the interaction term is significant, then the conclusion is that the slopes are different

ANCOVA Example

ANCOVA Example

Summary Statistical models can be quite complex, with potentially many factors and interaction terms The model is specified by something that looks like an equation: Y = μ + A + B + A*B General linear models allow you to combine categorical and continuous factors into a single model Your sample size will limit the complexity of the model For complex models, you will need to think about procedures to choose the best model (i.e., the model that best explains your observations)