Lecture 19: Inference for SLR & Transformations

Size: px
Start display at page:

Download "Lecture 19: Inference for SLR & Transformations"

Transcription

1 Lecture 19: Inference for SLR & Transformations Statistics 101 Mine Çetinkaya-Rundel April 3, 2012

2 Announcements Announcements HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced in class. Prize: +1 (out of 100) point on the final. istics.net/ stat/ correlations Group: sta101 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

3 Recap Online quiz 7 - commonly missed questions Question 1: Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

4 Recap Review question Which of the following is false? In SLR, (a) residuals should be nearly normally distributed with mean at 0 (b) residuals should have non-constant variance (c) residuals vs. x plot should show a random scatter around 0 (d) the relationship between x and y should be linear, and outliers should be handled with caution Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

5 1 Inference for linear regression Understanding regression output from software HT for the slope CI for the slope An alternative statistic 2 Transformations Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, 2012

6 Major league baseball Yesterday in lab you worked with 2009 MLB data. What was the best predictor of runs? Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

7 Major league baseball Yesterday in lab you worked with 2009 MLB data. What was the best predictor of runs? Runs vs. On base plus slugging runs ob_slg Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

8 Major league baseball R 2 for the regression line for predicting runs from on-base plus slugging is 91.31%. Which of the below is the correct interpretation of this value? 91.31% of (a) runs can be accurately predicted by on-base plus slugging. (b) variability in predictions of runs is explained by on-base plus slugging. (c) variability in predictions of on-base plus slugging is explained by runs. (d) variability in runs is explained by on-base plus slugging. (e) variability in on-base plus slugging is explained by runs. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

9 Understanding regression output from software Major league baseball (regression output) m = lm(runs ob_slg, data = mlb) summary(m) Call: lm(formula = runs ob_slg, data = mlb) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-10 *** ob_slg < 2e-16 *** --- Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: 0.91 F-statistic: on 1 and 28 DF, p-value: < 2.2e-16 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

10 HT for the slope Testing for the slope Clicker question Assuming that the 2009 season is representative of all MLB seasons, we would like to test if these data provide convincing evidence that the slope of the regression line for predicting runs from on-base plus slugging is different than 0. What are the appropriate hypotheses? (a) H 0 : b 0 = 0; H A : b 0 0 (b) H 0 : β 1 = 0; H A : β 1 0 (c) H 0 : b 1 = 0; H A : b 1 0 (d) H 0 : β 0 = 0; H A : β 0 0 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

11 HT for the slope Testing for the slope (cont.) Estimate Std. Error t value Pr(> t ) (Intercept) ob slg We always use a t-test in inference for regression Remember: Test statistic, T = point estimate null value SE Point estimate = b 1 is the observed slope, and is given in the regression output SE b1 is the standard error associated with the slope, and can be calculated as (yi ŷ i ) SE b1 = 2 /(n 2) (xi x i ) 2 is also given in the regression output (and it s silly to try to calculate it by hand, just know that it s doable and why the formula works the way it does) Degrees of freedom associated with the slope is df = n 2, where n is the sample size Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

12 HT for the slope Testing for the slope (cont.) Estimate Std. Error t value Pr(> t ) (Intercept) ob slg T = = df = 30 2 = 28 p value = P( T > 17.15) < 0.01 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

13 HT for the slope % College graduate vs. % Hispanic in LA What can you say about the relationship between of % college graduate and % Hispanic in a sample of 100 zip code areas in LA? Education: College graduate 1.0 Race/Ethnicity: Hispanic Freeways No data 0.0 Freeways No data 0.0 Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

14 HT for the slope % College educated vs. % Hispanic in LA - another look What can you say about the relationship between of % college graduate and % Hispanic in a sample of 100 zip code areas in LA? 100% % College graduate 75% 50% 25% 0% 0% 25% 50% 75% 100% % Hispanic Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

15 HT for the slope % College educated vs. % Hispanic in LA - linear model Clicker question Which of the below is the best interpretation of the slope? Estimate Std. Error t value Pr(> t ) (Intercept) %Hispanic (a) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 75% decrease in % of college grads. (b) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 0.75% decrease in % of college grads. (c) An additional 1% of Hispanic residents decreases the % of college graduates in a zip code area in LA by 0.75%. (d) In zip code areas with no Hispanic residents, % of college graduates is expected to be 75%. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

16 HT for the slope % College educated vs. % Hispanic in LA - linear model Do these data provide convincing evidence that there is a statistically significant relationship between % Hispanic and % college graduates in zip code areas in LA? Estimate Std. Error t value Pr(> t ) (Intercept) hispanic Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

17 HT for the slope Violent crime rate vs. unemployment Relationship between violent crime rate (annual number of violent crimes per 100,000 population) and unemployment rate (% of work eligible population not working) in 51 US States (including DC): violent_crime_rate DC unemployed Note: The data are from the 2003 Statistical Abstract of the US. A 2012 version is available online, if looking for data on states for your project, it s a good resource. Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

18 HT for the slope Violent crime rate vs. unemployment Clicker question Which of the below is the correct set of hypotheses and the p-value for testing if the slope of the relationship between violent crime rate and unemployment is positive? Estimate Std. Error t value Pr(> t ) (Intercept) unemployed (a) H 0 :b 1 = 0 H A :b 1 0 p value = (b) H 0 :β 1 = 0 H A :β 1 > 0 p value = /2 = (c) H 0 :β 1 = 0 H A :β 1 0 p value = /2 = (d) H 0 :b 1 = 0 H A :b 1 > 0 p value = /2 = (e) H 0 :β 1 = 0 H A :β 1 0 p value = Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

19 CI for the slope Confidence interval for the slope Clicker question Remember that a confidence interval is calculated as point estimate±me and the degrees of freedom associated with the slope in a simple linear regression is n 2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 51 states. (a) ± Estimate Std. Error t value Pr(> t ) (Intercept) unemployed (b) ± (c) ± (d) ± Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

20 CI for the slope Recap Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: Confidence interval: T = b 1 null value SE b1 df = n 2 b 1 ± t df =n 2 SE b 1 The null value is often 0 since we are usually checking for any relationship between the explanatory and the response variable The regression output gives b 1, SE b1, and two-tailed p-value for the t-test for the slope where the null value is 0 We rarely do inference on the intercept, so we ll be focusing on the estimates and inference for the slope Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

21 CI for the slope Caution Always be aware of the type of data you re working with: random sample, non-random sample, or population Statistical inference, and the resulting p-values, are meaningless when you already have population data If you have a sample that is non-random (biased), the results will be unreliable The ultimate goal is to have independent observations and you know how to check for those by now Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

22 An alternative statistic ANOVA We considered the t-test as a way to evaluate the strength of evidence for a hypothesis test evaluating the relationship between x and y However, we could focus on R 2 proportion of variability in the response variable (y) explained by the explanatory variable (x) A large R 2 suggests a linear relationship between x and y exists A small R 2 suggests the evidence provided by the data may not be convincing Considering the amount of explained variability is called analysis of variance (ANOVA) In SLR, where there is only one explanatory variable (and hence one slope parameter) t-test and the ANOVA yield the same result In multiple linear regression, they provide different pieces of information Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

23 Transformations 1 Inference for linear regression 2 Transformations Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, 2012

24 Transformations Truck prices The scatterplot below shows the relationship between year and price of a random sample of 43 pickup trucks. Describe the relationship between these two variables. price year From: faculty.chicagobooth.edu/ robert.gramacy/ teaching.html Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

25 Transformations Remove unusual observations Let s remove trucks older than 20 years, and only focus on trucks made in 1992 or later. Now what can you say about the relationship? price year Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

26 Transformations Truck prices - linear model? price residuals year Model: price = b 0 + b 1 year The linear model doesn t appear to be a good fit since the residuals have non-constant variance Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

27 Transformations Truck prices - log transform of the response variable log(price) residuals year Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28 Model: log(price) = b 0 + b 1 year We applied a log transformation to the response variable. The relationship now seems linear, and the residuals no longer have non-constant variance.

28 Transformations Interpreting models with log transformation Estimate Std. Error t value Pr(> t ) (Intercept) pu$year Model: log(price) = year For each additional year the car is newer (for each year decrease in car s age) we would expect the log price of the car to increase on average by 0.14 log dollars. which is not very useful... Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

29 Transformations Working with logs Subtraction and logs: log(a) log(b) = log( a b ) Natural logarithm: e log(x) = x We can these identities to undo the log transformation Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

30 Transformations Interpreting models with log transformation (cont.) The slope coefficient for the log transformed model is 0.14, meaning the log price difference between cars that are one year apart is predicted to be 0.14 log dollars. log(price at year x + 1) log(price at year x) = 0.14 ( ) price at year x + 1 log = 0.14 price at year x e log( price at year x + 1 price at year x price at year x + 1 price at year x ) = e 0.14 = 1.15 For each additional year the car is newer (for each year decrease in car s age) we would expect the price of the car to increase on average by a factor of Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

31 Transformations Recap: dealing with non-constant variance Non-constant variance is one of the most common model violations, however it is usually fixable by transforming the response (y) variable The most common variance stabilizing transform is the log transformation: log(y) When using a log transformation on the response variable the interpretation of the slope changes: For each unit increase in x, y is expected on average to decrease/increase by a factor of e b 1. Another useful transformation is the square root: y These transformations may also be useful when the relationship is non-linear, but in those cases a polynomial regression may also be needed (this is beyond the scope of this course, but you re welcomed to try it for your project, and I d be happy to provide further guidance) Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

32 Transformations R code # load data pu_allyrs = read.csv(" pickups.csv") # drop trucks older than 20 yrs old pu = subset(pu_allyrs, pu_allyrs$year >= 1992) # linear model plot(pu$price pu$year) m1 = lm(pu$price pu$year) abline(m1) plot(m1$residuals pu$year) # model with log transformation plot(log(pu$price ) pu$year) m2 = lm(log(pu$price ) pu$year) abline(m2) plot(m2$residuals pu$year) # model summary and interpretation of the slope coefficient summary(m2) exp(0.14) Statistics 101 (Mine Çetinkaya-Rundel) L19: Inference for SLR & Transformations April 3, / 28

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions Housekeeping Announcements Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Statistics 101 Mine Çetinkaya-Rundel November 25, 2014 Poster presentation location: Section

More information

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination

Modeling kid s test scores (revisited) Lecture 20 - Model Selection. Model output. Backward-elimination Modeling kid s test scores (revisited) Lecture 20 - Model Selection Sta102 / BME102 Colin Rundel November 17, 2014 Predicting cognitive test scores of three- and four-year-old children using characteristics

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables Announcements Announcements Unit : Simple Linear Regression Lecture : Introduction to SLR Statistics 1 Mine Çetinkaya-Rundel April 2, 2013 Statistics 1 (Mine Çetinkaya-Rundel) U - L1: Introduction to SLR

More information

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate Announcements Announcements Lecture : Simple Linear Regression Statistics 1 Mine Çetinkaya-Rundel March 29, 2 Midterm 2 - same regrade request policy: On a separate sheet write up your request, describing

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory Announcements Announcements Lecture : Relationship between Measurement Variables Statistics Colin Rundel February, 20 In class Quiz #2 at the end of class Midterm #1 on Friday, in class review Wednesday

More information

Lecture 20: Multiple linear regression

Lecture 20: Multiple linear regression Lecture 20: Multiple linear regression Statistics 101 Mine Çetinkaya-Rundel April 5, 2012 Announcements Announcements Project proposals due Sunday midnight: Respsonse variable: numeric Explanatory variables:

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate Review and Comments Chi-square tests Unit : Simple Linear Regression Lecture 1: Introduction to SLR Statistics 1 Monika Jingchen Hu June, 20 Chi-square test of GOF k χ 2 (O E) 2 = E i=1 where k = total

More information

2. Outliers and inference for regression

2. Outliers and inference for regression Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Lecture 16 - Correlation and Regression

Lecture 16 - Correlation and Regression Lecture 16 - Correlation and Regression Statistics 102 Colin Rundel April 1, 2013 Modeling numerical variables Modeling numerical variables So far we have worked with single numerical and categorical variables,

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Announcements Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall 2017 Duke University, Department of Statistical Science Work on your project! Due date- Sunday

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Hypotheses. Poll. (a) H 0 : µ 6th = µ 13th H A : µ 6th µ 13th (b) H 0 : p 6th = p 13th H A : p 6th p 13th (c) H 0 : µ diff = 0

Hypotheses. Poll. (a) H 0 : µ 6th = µ 13th H A : µ 6th µ 13th (b) H 0 : p 6th = p 13th H A : p 6th p 13th (c) H 0 : µ diff = 0 Friday the 13 th Unit 4: Inference for numerical variables Lecture 3: t-distribution Statistics 14 Mine Çetinkaya-Rundel June 5, 213 Between 199-1992 researchers in the UK collected data on traffic flow,

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Project proposal feedback. Unit 4: Inference for numerical variables Lecture 3: t-distribution. Statistics 104

Project proposal feedback. Unit 4: Inference for numerical variables Lecture 3: t-distribution. Statistics 104 Announcements Project proposal feedback Unit 4: Inference for numerical variables Lecture 3: t-distribution Statistics 104 Mine Çetinkaya-Rundel October 22, 2013 Cases: units, not values Data prep: If

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 18 Miscellaneous Topics in Multiple Regression Lecture 18 Miscellaneous Topics in Multiple Regression STAT 512 Spring 2011 Background Reading KNNL: 8.1-8.5,10.1, 11, 12 18-1 Topic Overview Polynomial Models (8.1) Interaction Models (8.2) Qualitative

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Review for Final Exam Stat 205: Statistics for the Life Sciences

Review for Final Exam Stat 205: Statistics for the Life Sciences Review for Final Exam Stat 205: Statistics for the Life Sciences Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 205: Statistics for the Life Sciences 1 / 20 Overview of Final Exam

More information

Model Specification and Data Problems. Part VIII

Model Specification and Data Problems. Part VIII Part VIII Model Specification and Data Problems As of Oct 24, 2017 1 Model Specification and Data Problems RESET test Non-nested alternatives Outliers A functional form misspecification generally means

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Inference in Regression Analysis

Inference in Regression Analysis ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Announcements. Final Review: Units 1-7

Announcements. Final Review: Units 1-7 Announcements Announcements Final : Units 1-7 Statistics 104 Mine Çetinkaya-Rundel June 24, 2013 Final on Wed: cheat sheet (one sheet, front and back) and calculator Must have webcam + audio on at all

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

Second Midterm Exam Name: Solutions March 19, 2014

Second Midterm Exam Name: Solutions March 19, 2014 Math 3080 1. Treibergs σιι Second Midterm Exam Name: Solutions March 19, 2014 (1. The article Withdrawl Strength of Threaded Nails, in Journal of Structural Engineering, 2001, describes an experiment to

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables. Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Annoucements. MT2 - Review. one variable. two variables

Annoucements. MT2 - Review. one variable. two variables Housekeeping Annoucements MT2 - Review Statistics 101 Dr. Çetinkaya-Rundel November 4, 2014 Peer evals for projects by Thursday - Qualtrics email will come later this evening Additional MT review session

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information