Chapter 16: Understanding Relationships Numerical Data

Size: px
Start display at page:

Download "Chapter 16: Understanding Relationships Numerical Data"

Transcription

1 Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, Linear models For two quantitative variables, it is often convenient to distinguish between an explanatory (predictor) and a response (predicted) variable, denoted x and y, respectively. The means, µ x, µ y, standard deviations, σ x, σ y, and correlation coefficient, ρ, describe a population. Fitting y x results in a linear model, y = β 0 + β 1 x, describing the population. An association between the variables x and y is characterized by its direction (positive or negative), form (linear or non-linear) and strength (which for linear relationships is measured by the correlation). The sample means, x, ȳ, sample standard deviations, s x, s y, and sample correlation coefficient, r, describe a sample taken from the population. Point estimates for β 0 and β 1 are determined from the sample and are denoted b 0 and b 1. The linear model for the sample takes the form ŷ = b 0 + b 1 x. The residual, e i = y i ŷ i, measures the distance between the actual value, y i, and the predicted value, ŷ i, corresponding to a particular x i. Regression analysis uses properties of a linear model constructed from a sample to deduce properties of a linear relationship in the corresponding population. Least squares line Conditions for least squares : (1) nearly linear relationship, (2) nearly normal residuals, (3) with nearly constant variability. Formulas for the regression coefficients: b 1 = ρ s y s x, b 0 = ȳ b 1 x. Use a least squares line to predict y from x : ŷ = b 0 + b 1 x The center of mass of the sample lies on the least squares line: ȳ = b 0 + b 1 x The squared correlation, r 2, describes the percent of the variance of the response variable explained by the explanatory variable. Two quantitative variables We illustrate simple regression with one of the examples explored by Agresti and Franklin in chapter 12, a data set describing 57 female high school athletes and their performances in several athletic activities. Read in the data set, select two athletic activities, and generate a scatterplot. We use x and y to describe these activities, rather than more descriptive names, to suggest that this type of analysis is widely applicable. Spring 2016 Page 1 of 13

2 athletes <- read.csv("high_school_female_athletes.csv", header=true) head(athletes) str(athletes) summary(athletes) x <- athletes$brtf..60. # number of 60 lb bench presses y <- athletes$x1rm.bench..lbs. # maximum bench press plot(x, y, pch=19, col="darkred", xlab="number of 60 lb bench presses", ylab="maximum bench press (lbs)", main="female High School Athletes") Female High School Athletes maximum bench press (lbs) A suggestion of a linear relationship? number of 60 lb bench presses Is there a suggestion of a linear relationship here? Use R s lm procedure to calculate a linear model for this data. plot(x, y, pch=19, col="darkred", xlab="number of 60 lb bench presses", ylab="maximum bench press (lbs)", main="female High School Athletes") athletes.lm <- lm(y ~ x) abline(athletes.lm, col="orange") Spring 2016 Page 2 of 13

3 Female High School Athletes (lm) maximum bench press (lbs) number of 60 lb bench presses A linear relationship in this context is described by an equation of the form ŷ = a + bx, where the coefficients a and b are part of the linear model. Create a function which calculates ŷ given x and use it to calculate a point along the regression line. The second student in the data set had an x value of 12. What value of y would this linear model predict for the second student? coefficients(athletes.lm) # (Intercept) x # predict.y.hat <- function(x){ a <- coefficients(athletes.lm)[1] b <- coefficients(athletes.lm)[2] y.hat <- as.numeric(a + b * x) return(y.hat) } predict.y.hat(12) # We can use R s function predict to do the same calculation. # use predict new.data <- data.frame(x=12) predict(athletes.lm, new.data) # 1 # R s predict can calculate the predictions for every x in the data set. Spring 2016 Page 3 of 13

4 # calculate y.hat for each student y.hat <- predict(athletes.lm, data.frame(x, y)) head(data.frame(x, y, y.hat)) # x y y.hat # # # # # # A residual is the difference between an actual y and the predicted ŷ. Verify that the second student s residual is ɛ = y ŷ = Testing for association Do the data plausibly cluster around this least-squares line? Just how much evidence is there of a linear relationship in this data? We will test the hypothesis that there is a linear relationship against the alternative hypothesis that there is none. If the regression line is horizontal, then knowing something about x gives no usable information about y, so there would be no association between these two variables. Therefore, the key thought is to determine if the slope of the actual (population) regression line could plausibly be 0 or, equivalently, if the correlation between the two variables is 0. We organize the discussion as a two-sided hypothesis test. Some key statistics are contained in the summary of the linear model for the associated sample. H 0 : β = 0 H a : β 0 # are the two variables associated? summary(athletes.lm) # Call: # lm(formula = y ~ x) # Residuals: # Min 1Q Median 3Q Max # # Coefficients: # Estimate Std. Error t value Pr(> t ) # (Intercept) < 2e-16 *** # x e-14 *** # --- # Signif. codes: 0 *** ** 0.01 * # Residual standard error: on 55 degrees of freedom # Multiple R-squared: ,Adjusted R-squared: # F-statistic: on 1 and 55 DF, p-value: 6.481e-14 Spring 2016 Page 4 of 13

5 The value of the slope b in the linear model for the sample ŷ = a + bx is the Estimate to the right of x. Its standard error is the next number to the right in that row, under the title Std. Error. Use b and its SE to calculate the test statistic and then determine its p-value. # HT # H_0 : beta == 0 # H_a : beta!= 0 b < se < t <- (b - 0) / se # n <- length(x) p.value <- 2 * (1 - pt(t, df=n-2)) # e-14 The p-value is very small, so we reject the null hypothesis, accept the alternative hypothesis, and conclude that the two quantitative variables are associated. A confidence interval centered on the statistic b provides a range of plausible values for the slope β of the (population) regression line. alpha < t.star <- qt(1 - alpha/2, df=n-2) # ci <- b + t.star * se * c(-1, 1); ci # So we are 95% confident that our confidence interval [ , ] contains the population parameter β. Note that this interval does not contain the value 0, so we once again discover that these two quantitative variables are associated. The F statistic mentioned in the summary of the simple linear regression model is an alternate test statistic for the proposition H 0 : β = 0, and in fact it is equal to the square of the t statistic that we have used for that same purpose. The p-value obtained from the F statistic is exactly the same as the p-value obtained from the t statistic. F distributions will play a stronger role in multiple linear regression. Strength of the association When working with categorical variables, we used the chi-square test to determine if the variables were associated, and then we turned to measures of association, such as differences of proportions and relative risk, to determine the strength of the association. For quantitative variables, the correlation measures the strength of the association. The correlation is a number between -1 and 1. Values near 1 and -1 reflect the strongest (positive and negative, resp.) associations. A correlation of 0 means that the two variables are not associated. # correlation cor(x, y) # Spring 2016 Page 5 of 13

6 Correlation matrix R s function cor can also return a matrix of correlations. Let s add two more athletic activities to the mix, a leg press and a 40 yard dash. Which activities are most strongly associated? Which have the weakest association. Can you imagine why? What is the interpretation of the negative numbers in this matrix? # matrix of correlations # x bench press # y max bench press # add two more exercises z <- athletes$lp.rtf # leg press w <- athletes$x40.yd..sec. # 40 yd run corr.matrix <- cor(data.frame(x, y, z, w)) # x y z w # x # y # z # w Interpret this visualization of the correlation matrix. library(corrplot) corrplot(corr.matrix, method="circle") x y z w 1 x y z w Spring 2016 Page 6 of 13

7 Regression toward the Mean The equation of the regression line is ŷ = b 0 + b 1 x, where b 0 = ȳ b 1 x and b 1 = rs y /s x, so we can rewrite it as ŷ ȳ = b 1 (x x), = r s y s x (x x). = rs y (x x) s x. Now choose x one standard deviation to the right of x, so x x = s x. The corresponding predicted value ŷ is given by ŷ ȳ = rs y, so the predicted value ŷ is r times one standard deviation s y above ȳ, and of course r 1. Therefore, if x moves one standard deviation to the right of its mean, x = x + s x, then the predicted ŷ moves only rs y above its mean, ŷ = ȳ + rs y. Sons of tall fathers are likely shorter than their dads. Sons of short fathers are likely taller than their dads. This was first noticed by the famous pioneer of statistics, Francis Galton ( ), and it is called regression toward the mean. Regression toward the Mean 1 y = x y y^ = a + bx rs y 0 (x, y) s x 0 1 x Spring 2016 Page 7 of 13

8 Standardized residuals How do data vary around the regression line? Residuals tell the story, but standardized residuals are more informative, in the same way that a z-score tells how many standard deviations away from a given value a certain result might lie. standardized.residuals <- rstandard(athletes.lm) hist(standardized.residuals, col="orangered") Histogram of standardized.residuals Frequency standardized.residuals Spring 2016 Page 8 of 13

9 MSE and RSE A basic assumption of simple linear regression is that for each fixed x, the y values are normally distributed with mean ŷ and standard deviation σ. A single value σ describes the spread of the normal distributions about their mean for each one of the x s. The value of σ can be estimated from the data. The mean square error, MSE, is the variance of all of those normal distributions, and the square root of MSE, known as the residual standard error, RSE, is the very important estimate of σ. The RSE and related statistics appear in the output of R s procedure aov (analysis of variance). The MSE is the residual sum of squares, Residual SS, divided by its degrees of freedom, n 2, and the RSE is the square root of MSE. aov(athletes.lm) # Call: # aov(formula = athletes.lm) # Terms: # x Residuals # Sum of Squares # Deg. of Freedom 1 55 # Residual standard error: # Estimated effects may be unbalanced residual.ss < df <- 55 mse <- residual.ss / df rse <- sqrt(mse) # Prediction Two types of prediction are important in this context. Given x we would like to predict plausible values for µ y (the population ŷ) with a confidence interval, CI, and we would like to predict y values for individuals sharing that value of x with a prediction interval, PI. The PI will be wider than the associated CI because the PI encompasses a lot of individual variation, but the CI is a confidence interval for a (much more constrained) mean. In the following approximate formulas (Agresti and Franklin, 3e, p.611), the RSE plays the role of σ, so these formulas resemble previous confidence intervals for means and values. # approximate CI for the population mu_y ci <- y.hat + t.star * rse / sqrt(n) * c(-1, 1) # approximate PI for individual y values pi <- y.hat + t.star * rse * c(-1, 1) Here t is calculated with an R command such as t.star qt(0.975, df = n 2) and the residual standard error, RSE, is obtained from the summary of the linear model or by calling aov on the linear model: summary(athletes.lm) or aov(athletes.lm). Spring 2016 Page 9 of 13

10 Confidence and Prediction Intervals Using Predict For more accurate confidence and prediction intervals, use R s predict. # confidence and prediction intervals using predict?predict # 95% CI for mu_y given x == 12 new.data <- data.frame(x=12) predict(athletes.lm, new.data, interval="confidence") # fit lwr upr # # 95% PI for y given x == 12 predict(athletes.lm, new.data, interval="prediction") # fit lwr upr # Using predict to calculate confidence and prediction intervals for a whole range of x values produces confidence and prediction bands. Notice that the confidence band is narrowest near ( x, ȳ) = (10.98, 79.91). Female High School Athletes, confidence and prediction bands maximum bench press (lbs) number of 60 lb bench presses Spring 2016 Page 10 of 13

11 Outline for Presenting an Hypothesis Test Agresti and Franklin suggest using a five-step outline for presenting hypothesis tests such as we are using in this chapter. Here is a sketch of the approach they recommend. Assumptions We assume randomization, normal conditional distributions for y given x, with a linear trend for the means of these distributions, and a common standard deviation for all of them. Hypotheses The null hypothesis is that the variables are independent, and the alternative hypothesis is that they are dependent (associated). H 0 : β = 0 H a : β 0 Test Statistic The slope b of the sample regression line and its standard error, SE, are found in the Coefficients section of the summary of the linear model. t = b/se. p-value The p-value is calculated with an R command such as p.value 2 (1 pt(t, df = n 2)) Conclusion in Context Is there sufficient evidence to reject H 0 or not? What does this mean in the context of this particular investigation? Outline for Presenting a Confidence Interval Confidence Interval A 95% confidence interval for the population parameter β is given by b ± t SE where b and SE are as in the associated hypothesis test, and t is calculated with an R command such as t.star qt(0.975, df = n 2) Conclusion in Context The confidence interval provides a range of plausible values for the population parameter β. State clearly what this means in the context of the present study. Spring 2016 Page 11 of 13

12 Analyzing Association Associations involve explanatory variables and response variables. Order them like this: explanatory response. categorical categorical (Peck, chapter 15 ) r c contingency table, test for independence 1 c contingency table, goodness of fit Test for independence or goodness of fit with a χ 2 test statistic quantitative quantitative (Peck, chapters 4, 16 ) Linear model for the population µ y = β 0 + β 1 x + β 2 x Linear model describing the sample ŷ = b 0 + b 1 x + b 2 x Test for relevance of the model with an F test statistic. H 0 : all β i s are 0 Estimate the parameters β i with t statistics and confidence intervals. (quantitative and categorical) quantitative Subsume this case into the previous one with indicator variables. categorical quantitative (Peck, chapter 17 ) The categorical variable divides quantitative measurements into groups, and the question becomes one of comparing the mean responses of the groups. Test that all of the means are the same with an F test (ANOVA) H 0 : β 1 = = β g Find which means are different with t tests and confidence intervals for β i β j Control the significance level for multiple comparisons with Tukey HSD quantitative categorical (Peck, chapter 4 ) Use quantitative variables to predict a categorical variable with logistic regression Spring 2016 Page 12 of 13

13 Exercises We will attempt to solve some of the following exercises as a community project in class today. Finish these solutions as homework exercises, write them up carefully and clearly, and hand them in at the beginning of class next Friday. Homework 16a regression Exercises from Chapter 16: 16.2 (house price), 16.3 (house price), 16.9 (cancer), (marketing), (R&D) Homework 16b regression Exercises from Chapter 16: (money), (grasslands), (shrimp), (skulls), (turtles) Spring 2016 Page 13 of 13

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Chapter 8: Sampling Variability and Sampling Distributions

Chapter 8: Sampling Variability and Sampling Distributions Chapter 8: Sampling Variability and Sampling Distributions These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015.

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

Estimated Simple Regression Equation

Estimated Simple Regression Equation Simple Linear Regression A simple linear regression model that describes the relationship between two variables x and y can be expressed by the following equation. The numbers α and β are called parameters,

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct. ,, Dr. Back Oct. 14, 2009 Son s Heights from Their Fathers Galton s Original 1886 Data If you know a father s height, what can you say about his son s? Son s Heights from Their Fathers Galton s Original

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Multiple Regression and Regression Model Adequacy

Multiple Regression and Regression Model Adequacy Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

2. Outliers and inference for regression

2. Outliers and inference for regression Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

L21: Chapter 12: Linear regression

L21: Chapter 12: Linear regression L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

Lecture 17. Ingo Ruczinski. October 26, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 17. Ingo Ruczinski. October 26, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 17 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 26, 2015 1 2 3 4 5 1 Paired difference hypothesis tests 2 Independent group differences

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

STAT22200 Spring 2014 Chapter 5

STAT22200 Spring 2014 Chapter 5 STAT22200 Spring 2014 Chapter 5 Yibi Huang April 29, 2014 Chapter 5 Multiple Comparisons Chapter 5-1 Chapter 5 Multiple Comparisons Note the t-tests and C.I. s are constructed assuming we only do one test,

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

movies Name:

movies Name: movies Name: 217-4-14 Contents movies.................................................... 1 USRevenue ~ Budget + Opening + Theaters + Opinion..................... 6 USRevenue ~ Opening + Opinion..................................

More information