Chapter 12: Linear regression II

Similar documents
L21: Chapter 12: Linear regression

STATS DOESN T SUCK! ~ CHAPTER 16

ST430 Exam 1 with Answers

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Topic 14: Inference in Multiple Regression

Introduction and Single Predictor Regression. Correlation

MATH 644: Regression Analysis Methods

Inference for Regression

Density Temp vs Ratio. temp

ST430 Exam 2 Solutions

Lecture 18: Simple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

22s:152 Applied Linear Regression

Comparing Nested Models

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Simple Linear Regression

Multiple Linear Regression

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Introduction to Linear Regression

Lecture 4 Multiple linear regression

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

MODELS WITHOUT AN INTERCEPT

Coefficient of Determination

Stat 5102 Final Exam May 14, 2015

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Statistical View of Least Squares

AMS 7 Correlation and Regression Lecture 8

Linear Modelling: Simple Regression

Scatter plot of data from the study. Linear Regression

Solution to Series 3

MS&E 226: Small Data

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

R 2 and F -Tests and ANOVA

Applied Regression Analysis

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Math 2311 Written Homework 6 (Sections )

Can you tell the relationship between students SAT scores and their college grades?

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Chapter 11: Analysis of variance

Scatter plot of data from the study. Linear Regression

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Variance Decomposition and Goodness of Fit

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

Simple Linear Regression for the Climate Data

Inference with Simple Regression

Chapter 12 - Part I: Correlation Analysis

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Lecture 10. Factorial experiments (2-way ANOVA etc)

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Biostatistics 380 Multiple Regression 1. Multiple Regression

Lecture 6 Multiple Linear Regression, cont.

1 The Classic Bivariate Least Squares Model

Simple linear regression

STAT 3022 Spring 2007

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Homework 1 Solutions

Ch 2: Simple Linear Regression

Business Statistics. Lecture 10: Course Review

Simple and Multiple Linear Regression

Topics on Statistics 2

STAT 350: Summer Semester Midterm 1: Solutions

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Diagnostics and Transformations Part 2

Section 3: Simple Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

STA 4210 Practise set 2a

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Linear Regression Model. Badr Missaoui

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

5. Linear Regression

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Regression and Models with Multiple Factors. Ch. 17, 18

Stat 401B Final Exam Fall 2015

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

lm statistics Chris Parrish

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Lecture 6: Linear Regression

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Correlation & Simple Regression

Linear Model Specification in R

Sections 7.1, 7.2, 7.4, & 7.6

Handout 4: Simple Linear Regression

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

14 Multiple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Inferences for Regression

5. Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS

Transcription:

Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14

12.4 The regression model We assume the underlying model with Greek letters (as usual) y = β 0 + β 1 x + ɛ For each subject i we see x i and y i = β 0 + β 1 x i + ɛ i. β 0 is the population intercept. β 1 is the population slope. ɛ i is the ith error, we assume these are N(0, σ e ). We don t know any of β 0, β 1, or σ e. 2 / 14

Visualizing the model µ y x = β 0 + β 1 x is mean response for everyone with covariate x. σ e is constant variance. Variance doesn t change with x. Example 12.4.4, pretend we know that the mean weight µ y x given height x is µ y x = 145 + 4.25x and σ e = 20. 3 / 14

Weight vs. height 4 / 14

Estimating β 0, β 1, and σ ɛ b 0 estimates β 0. b 1 estimates β 1. s e estimates σ e. Example 12.4.5. For the snake data, b 0 = 301 estimates β 0, b 1 = 7.19 estimates β 1, and s e = 12.5 estimates σ e. We estimate the the mean weight ŷ of snakes with length x as ŷ = 301 + 7.19x 5 / 14

Example 12.4.6 Arsenic in rice If we believe the data follow a line, we can estimate the mean for any x we want. b 0 = 197.17 estimates β 0, b 1 = 2.51 estimates β 1, and s e = 37.30 estimates σ e. For straw silicon concentration of x = 33 g/kg we estimate a mean arsenic level of ŷ = 197.17 2.51(33) = 114.35 µgm/kg with s e = 37.30 µgm/kg. 6 / 14

Arsenic in rice at X = 33 g/kg ŷ = 197.17 2.51x 114.35 = 197.17 2.51(33) 7 / 14

Often people want a 95% confidence interval for β 1 and want to test H 0 : β 1 = 0. If we reject H 0 : β 1 = 0, then y is significantly linearly assocatied with x. Same as testing H 0 : ρ = 0. A 95% confidence interval for β 1 gives us a range for how the mean changes when x is increased by one unit. Everything comes from b 1 β 0 SE b1 t n 2, SE b1 = s e s x n 1. R automatically gives a P-value for testing H 0 : β 1 = 0. Need to ask R for 95% confidence interval for β 1. 8 / 14

R code 12.4 Interpreting the regression line > amph=c(0,0,0,0,0,0,0,0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0) > cons=c(112.6,102.1,90.2,81.5,105.6,93.0,106.6,108.3,73.3,84.8,67.3,55.3, + 80.7,90.0,75.5,77.1,38.5,81.3,57.1,62.3,51.5,48.3,42.7,57.9) > fit=lm(cons~amph) > summary(fit) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 99.331 3.680 26.99 < 2e-16 *** amph -9.007 1.140-7.90 7.27e-08 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 > confint(fit) 2.5 % 97.5 % (Intercept) 91.69979 106.962710 amph -11.37202-6.642979 P-value for testing H 0 : β 1 = 0 vs. H A : β 1 0 is 0.0000000727, we reject at the 5% level. We are 95% confidence that true mean consumption is reduced by 6.6 to 11.4 g/kg for every mg/kg increase in amphetamine dose. 9 / 14

Often there are more than one predictors we are interested in, say we have two x 1 and x 2. The model is easily extended to y = β 0 + β 1 x 1 + β 2 x 2 + ɛ Example: Dwayne Portrait Studio is doing a sales analysis based on data from n = 21 cities. y = sales (thousands of dollars) for a city x 1 = number of people 16 years or younger (thousands) x 2 = per capita disposable income (thousands of dollars) 10 / 14

The data 12.4 Interpreting the regression line x 1 x 2 y x 1 x 2 y 68.5 16.7 174.4 45.2 16.8 164.4 91.3 18.2 244.2 47.8 16.3 154.6 46.9 17.3 181.6 66.1 18.2 207.5 49.5 15.9 152.8 52.0 17.2 163.2 48.9 16.6 145.4 38.4 16.0 137.2 87.9 18.3 241.9 72.8 17.1 191.1 88.4 17.4 232.0 42.9 15.8 145.3 52.5 17.8 161.1 85.7 18.4 209.7 41.3 16.5 146.4 51.7 16.3 144.0 89.6 18.1 232.6 82.7 19.1 224.1 52.3 16.0 166.5 11 / 14

R code for multiple regression > under16=c(68.5,45.2,91.3,47.8,46.9,66.1,49.5,52.0,48.9,38.4,87.9,72.8,88.4,42.9,52.5, + 85.7,41.3,51.7,89.6,82.7,52.3) > > income=c(16.7,16.8,18.2,16.3,17.3,18.2,15.9,17.2,16.6,16.0,18.3,17.1,17.4,15.8,17.8, + 18.4,16.5,16.3,18.1,19.1,16.0) > > sales=c(174.4,164.4,244.2,154.6,181.6,207.5,152.8,163.2,145.4,137.2,241.9,191.1,232.0, + 145.3,161.1,209.7,146.4,144.0,232.6,224.1,166.5) > fit=lm(sales~under16+income) > summary(fit) Call: lm(formula = sales ~ under16 + income) Residuals: Min 1Q Median 3Q Max -18.4239-6.2161 0.7449 9.4356 20.2151 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -68.8571 60.0170-1.147 0.2663 under16 1.4546 0.2118 6.868 2e-06 *** income 9.3655 4.0640 2.305 0.0333 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 11.01 on 18 degrees of freedom Multiple R-squared: 0.9167, Adjusted R-squared: 0.9075 F-statistic: 99.1 on 2 and 18 DF, p-value: 1.921e-10 12 / 14

Interpretation... The fitted regression surface is sales = 68.857 + 1.455 (under 16) + 9.366 income. For every unit increase (1000 people) in those under 16, average sales go up 1.455 thousand, $1,455. For every unit increase ($1000) in disposable income, average sales go up 9.366 thousand, $9,366. 91.67% of the variability in sales is explained by those under 16 and disposable income. σ e is estimated to be 11.01. 13 / 14

Regression homework 12.2.5, 12.2.7, 12.3.1, 12.3.3, 12.3.5, 12.3.7, 12.3.8. Use R for all problems; i.e. don t do anything by hand. 12.4.3, 12.4.6, 12.4.8, 12.4.9, 12.5.1, 12.5.3, 12.5.5, 12.5.9(a). Use R for all problems; don t do anything by hand. 14 / 14