Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Similar documents
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Lecture 18: Simple Linear Regression

STAT 215 Confidence and Prediction Intervals in Regression

Math 2311 Written Homework 6 (Sections )

III. Inferential Tools

Warm-up Using the given data Create a scatterplot Find the regression line

Linear Modelling: Simple Regression

appstats27.notebook April 06, 2017

Regression Models - Introduction

ST430 Exam 1 with Answers

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

Unit 6 - Simple linear regression

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

MATH 644: Regression Analysis Methods

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Inferences for Regression

Business Statistics. Lecture 10: Course Review

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Regression. Marc H. Mehlman University of New Haven

L21: Chapter 12: Linear regression

Inference for Regression

1 The Classic Bivariate Least Squares Model

STAT 350: Summer Semester Midterm 1: Solutions

Two-Sample Inference for Proportions and Inference for Linear Regression

Chapter 27 Summary Inferences for Regression

Unit 6 - Introduction to linear regression

Statistics for Engineers Lecture 9 Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

SCHOOL OF MATHEMATICS AND STATISTICS

Linear Probability Model

Multiple Linear Regression. Chapter 12

Lecture 19: Inference for SLR & Transformations

Data Analysis and Statistical Methods Statistics 651

Comparing Nested Models

Inference for Regression Inference about the Regression Model and Using the Regression Line

CHAPTER EIGHT Linear Regression

Chapter 12: Linear regression II

1 Multiple Regression

Introduction and Single Predictor Regression. Correlation

Lecture 1 Intro to Spatial and Temporal Data

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapter 23. Inference About Means

MODELS WITHOUT AN INTERCEPT

Stat 401B Exam 2 Fall 2015

Example: Data from the Child Health and Development Study

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Correlation Analysis

Stat 5102 Final Exam May 14, 2015

STAT 3022 Spring 2007

STA 101 Final Review

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Linear Regression Model. Badr Missaoui

Chapter 16. Simple Linear Regression and dcorrelation

Econometrics. 4) Statistical inference

Stat 500 Midterm 2 8 November 2007 page 0 of 4

9. Linear Regression and Correlation

Regression and Models with Multiple Factors. Ch. 17, 18

Multiple Regression Analysis. Part III. Multiple Regression Analysis

STAT Chapter 11: Regression

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Big Data Analysis with Apache Spark UC#BERKELEY

Multiple Linear Regression

Chapter 5 Friday, May 21st

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Business Statistics. Lecture 5: Confidence Intervals

Model Specification and Data Problems. Part VIII

Variance Decomposition and Goodness of Fit

Section Least Squares Regression

AMS-207: Bayesian Statistics

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

Topic 10 - Linear Regression

TMA 4275 Lifetime Analysis June 2004 Solution

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 16. Simple Linear Regression and Correlation

Chapter 24. Comparing Means

Final Exam - Solutions

Lecture 2. The Simple Linear Regression Model: Matrix Approach

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Lecture 11: Simple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Recall, Positive/Negative Association:

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

Multiple Regression Part I STAT315, 19-20/3/2014

10.1 Simple Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Introduction to Linear Regression

Business Statistics 41000: Homework # 5

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Applied Regression Analysis

Module 11: Linear Regression. Rebecca C. Steorts

χ L = χ R =

Transcription:

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz

Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm Ben Thur noon-1pm Joe Mon 3-4pm & Thur 9-10am Reminder my ST512 office hours : Tuesday 3-5pm Cordley 3003 Friday 11-noon Kidder 76 First homework posted!

Review Simple linear regression: model for the mean interpreting intercept and slope assumptions and residuals R output Types of statistical inference

The response variable is the measurement we are interested in Y explaining or predicting. The explanatory variable is the measurement we want to use to X explain or predict the response.

The simple linear regression model Parameters μ{y X} = β0 + β1 X Intercept Slope The mean response as a function of the explanatory variable is a straight line. Describes the relationship between the response and explanatory variable with two parameters.

Intercept and Slope The intercept gives the mean response at an explanatory value of zero. The slope gives the change in the mean response for a 1 unit change in the explanatory variable.

Your turn A simple linear regression of stopping distance (ft) on speed (mph) : μ{dist speed} = β0 + β1speed from 50 cars, gave the following estimates: Parameter Estimate 95% confidence interval Intercept, β0-17.6-31.2, -4.0 Slope, β1 3.9 3.1, 4.8 On your own, write down a two sentence summary interpreting the slope estimate.

Your turn It is estimated that for every one mile an hour increase in speed the mean stopping distance increases by 3.9 feet (95% CI 3.1 to 4.8). With 95% confidence, for every one mile an hour increase in speed the mean stopping distance increases by between 3.1 and 4.8 feet

Assumptions 1. The means of the subpopulations fall on a straight line function of the explanatory variable. μ { Y X } = β0 + β1 X 2. The subpopulations have the same standard deviation, σ. Var { Y X } = σ 2 3. At each value of the explanatory, the response has a Normal distribution. 4. Observations are independent. There are no assumptions on the explanatory variable!

An alternative view of the assumptions Assumptions If the linear effect of the explanatory variable is removed from the response, then what's left should be: 1. Normally distributed 2. Have mean zero. 3. Have standard deviation, σ. 4. Be independent. I.e. ( Y - ( β0 + β1 X ) ) are independent Normal with mean zero and standard deviation σ

Your turn Someone suggests that we should have done the regression of log(dist) against log(speed) (because the relationship probably isn t additive, e.g. the difference in stopping distance is probably bigger between 60 and 70 mph than it is between 10 and 20 mph) How would you decide whether their suggestion is a good one?

dist ~ speed Your turn log(dist) ~ log(speed)

Inference in simple linear regression by least squares t-ratios of the estimates of the slope, intercept, mean response, and predicted response all have a Student's t-distribution with n - 2 degrees of freedom. Competing (nested) models can be compared using an extra Sum of Squares F-test.

Your turn > summary(fit2) Call: lm(formula = log(dist) ~ log(speed), data = cars) Residuals: Min 1Q Median 3Q Max -1.00215-0.24578-0.02898 0.20717 0.88289 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.7297 0.3758-1.941 0.0581. log(speed) 1.6024 0.1395 11.484 2.26e-15 *** 2. How --- is this t-statistic calculated? Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 1. What test is this p-value for? Residual standard error: 0.4053 on 48 degrees of freedom Multiple R-squared: 0.7331, Adjusted R-squared: 0.7276 F-statistic: 131.9 on 1 and 48 DF, p-value: 2.259e-15 3. What test is this p-value for?

Which dataset has the largest σ (subpopulation response sd)? Which dataset has the largest sx (explanatory sd)?

A larger sample (larger n) does not decrease σ, we simply get a more precise estimate. A larger sample (larger n) does decrease the standard error on the slope and intercept estimates. σ is often outside the control of the researcher, but sometimes it can be decreased by improving the measurement of the response. Eg. using a digital thermometer, rather than one of those color change magnets to measure temperature. The standard error of the slope and intercept also depend on how you choose X (if you can). Normally it's a balance between low standard error, and the ability to check assumptions and will depend on your questions of interest.

The fitted value describes the estimated mean for an observation. For the i th observation the fitted value is, fittedi = μ {Yi Xi} = β 0 + β 1Xi The residual is the difference between the observed response and it's fitted value residuali = Yi - fittedi = Yi - ( β 0 + β 1Xi ) The appropriateness of the linear regression model is checked by looking at plots of the residuals Review in lab and homework

Population and Causal Inference Experimental unit: the object to which the treatment is applied. Sampling unit: the object that is selected from a population. Observational unit: the object that measurements are taken on. Population inference (statistical inference beyond the units in the study to a wider population) is justified if the sampling units in the study are a random sample from the population. Causal inference (statistical inference that the treatment caused the differences observed in the study) is justified if the experimental units in the study were randomly assigned to treatments.

In simple (and multiple) linear regression In regression, the treatments are specific values of the explanatory variables. Experimental units must be randomly assigned to the values of the explanatory variables for causal inference to be valid. To infer a cause-effect relationship the researcher must decide the levels of the explanatory variable they are going to measure, and then randomly assign their subjects to those levels. If the subjects are not randomly assigned to predetermined levels, or the explanatory variables are not under the control of the researcher (i.e. they simply observe it) causal inference is not justified. We can still talk about a relationship or association between the variables but we must avoid causal language, (e.g. avoid saying X increases Y )

... all models are wrong, but some are useful. George E. P. Box Regression models, like all models, are never going to be a perfect description of reality. We will always keep two things in mind: * we want our models to be good approximate descriptions of reality (i.e. not too wrong) * we want our models to be useful in answering our scientific questions of interest

General approach for hypothesis testing and estimation 1. Fit a regression model that is designed to answer your questions. 2. Check the validity of the model If it's OK, proceed. If it isn't OK, refine the model. 3. Answer your questions of interest using the model and make appropriate inferences.