Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Similar documents
Chapter 9 - Correlation and Regression

INFERENCE FOR REGRESSION

Mrs. Poyner/Mr. Page Chapter 3 page 1

Linear Regression Communication, skills, and understanding Calculator Use

Practical Biostatistics

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

Chapter 2: Looking at Data Relationships (Part 3)

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Review of Multiple Regression

Test 3A AP Statistics Name:

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Examining Relationships. Chapter 3

UNIT 12 ~ More About Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Scatterplots and Correlation

Chapter 3: Examining Relationships

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Analysis of Bivariate Data

9. Linear Regression and Correlation

IT 403 Practice Problems (2-2) Answers

Quantitative Bivariate Data

Simple Linear Regression

Practice Questions for Exam 1

ECON 497 Midterm Spring

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

Chapter 12 : Linear Correlation and Linear Regression

IF YOU HAVE DATA VALUES:

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Correlation and Linear Regression

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

( ), which of the coefficients would end

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 3: Describing Relationships

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

1 A Review of Correlation and Regression

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Unit 6 - Introduction to linear regression

AP Statistics - Chapter 2A Extra Practice

1 Correlation and Inference from Regression

7.0 Lesson Plan. Regression. Residuals

Correlation and Regression

Study Guide AP Statistics

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Stat 101 Exam 1 Important Formulas and Concepts 1

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

Regression. Marc H. Mehlman University of New Haven

Intro to Linear Regression

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Chapter 5 Friday, May 21st

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Inferences for Regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

This document contains 3 sets of practice problems.

Ch 13 & 14 - Regression Analysis

Mathematics for Economics MA course

1. Use Scenario 3-1. In this study, the response variable is

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

AMS 7 Correlation and Regression Lecture 8

bivariate correlation bivariate regression multiple regression

Pre-Calculus Multiple Choice Questions - Chapter S8

Chapter 5 Least Squares Regression

1) A residual plot: A)

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

Unit 6 - Simple linear regression

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Inference for Regression Inference about the Regression Model and Using the Regression Line

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

What is the easiest way to lose points when making a scatterplot?

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

Business Statistics. Lecture 9: Simple Regression

Chapter 7 Linear Regression

Sampling, Frequency Distributions, and Graphs (12.1)

Chapter 9. Correlation and Regression

Regression Equation. April 25, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Regression Equation. November 28, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Chapter 3 Assignment

3.2: Least Squares Regressions

Correlation and simple linear regression S5

Math 2311 Written Homework 6 (Sections )

Chapter 10 Correlation and Regression

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Review of Statistics 101

Intro to Linear Regression

Chapter 3: Examining Relationships Review Sheet

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Correlation Analysis

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Midterm 2 - Solutions

Chapter 4: Regression Models

Assoc.Prof.Dr. Wolfgang Feilmayr Multivariate Methods in Regional Science: Regression and Correlation Analysis REGRESSION ANALYSIS

The following formulas related to this topic are provided on the formula sheet:

SPSS LAB FILE 1

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Transcription:

Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There a Relationship? 7 1

Definition Data for a single variable is called univariate data, for two variables, bivariate data, and for more than two, it is called multivariate data. Methods for studying relationships: Graphical Scatterplots Line plots 3-D plots Models Linear regression Correlations Frequency tables Chapter 7: Is There a Relationship? 7 2

Two Quantitative Variables Definition The response variable,alsocalledthe dependent variable, is the variable we want to predict, and is usually denoted by y. Definition The explanatory variable,alsocalledthe independent variable, is the variable that attempts to explain the response, and is denoted by x. Let s do it! 7.1 Response Height of son Explanatory Height of the father, height of the mother, age Weight Chapter 7: Is There a Relationship? 7 3

Scatterplots Example The following table gives the partial list of midterm score x and the final score y for 25 students in a particular class. Student x y 1 39 62 2 44 69 3 32 68 4 40 86 5 45 88.5 6 46 88.5 7 33 76 8 39 66.5 9 32.5 75 10 21 38 : : : Chapter 7: Is There a Relationship? 7 4

Scatterplot of Final vs Midterm Score 120 100 80 Final Score (y) 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 Midterm Score (x) Chapter 7: Is There a Relationship? 7 5

Trends Chapter 7: Is There a Relationship? 7 6

Let s do it! 7.3 What kind of relationship would you expect in the following situations: 1. age (in years) of a car, and its price. 2. number of calories consumed per day and weight. 3. height and IQ of a person. Chapter 7: Is There a Relationship? 7 7

Let s do it! 7.4 For each of the two descriptions given, Identify the two variables that vary and decide which should be the independent variable and which should be the dependent variable. Sketch a graph that you think best represents the relationship between the two variables Write a sentence that explains the shape and behavior shown in your graph. 1. The size of a persons vocabulary over his or her lifetime. 2. The distance from the ceiling to the tip of the minute hand of a clock hung on the wall. Chapter 7: Is There a Relationship? 7 8

Simple Linear Regression Objective: To find the best fit to the data. Scatterplot of Final vs Midterm Score 120 100 80 Final Score (y) 60 40 yˆ = 7.5214 + 1. 7543 x 20 0 0 5 10 15 20 25 30 35 40 45 50 Midterm Score (x) Equation of a line: ŷ = a + bx where, b = slope; the change in y for a unit change in x a = y intercept; the value of y when x = 0 Chapter 7: Is There a Relationship? 7 9

Scatterplot of Final vs Midterm Score 120 100 80 Final Score (y) 60 y ˆ = Predicted y yˆ = 7.5214 + 1. 7543x 40 difference = residual 20 y = Observed y 0 0 5 10 15 20 25 30 35 40 45 50 Midterm Score (x) Chapter 7: Is There a Relationship? 7 10

Definition A residual is the difference between the observed response y and the predicted response y (determined using the regression line). Thus, for each pair of observations x i, y i, the i th residual is e i = y i y i = y i a + bx Definition The least squares regression line, given by y = a + bx, is the line that minimizes (makes as small as possible) the sum of squared vertical deviations (i.e., the square of the residuals) of the observed points from the line. This is often stated as regress yonx. Chapter 7: Is There a Relationship? 7 11

Let s do it! 7.6 The growth of children from early childhood through adolescence generally follows a linear pattern. Data on the heights of female Americans during childhood, from four to nine years old, were compiled and the least squares regression line was obtained as y = 32 + 2. 4x where y is the predicted height in inches, and x is age in years. Interpret the value of the estimated slope b = 2.4. Would interpretation of the value of the estimated y-intercept, a = 32, make sense here? What would you predict the height to be for a female American at 8 years old? What would you predict the height to be for a female American at 25 years old? How does the quality of this answer compare to the previous question? Chapter 7: Is There a Relationship? 7 12

Calculating the Least Squares Regression Line Find a solution to where, The solution is min y i y i 2 i i.e., min i e i 2 y i = a + bx, e i = y i y i b = x i x y i y x i x 2 a = y b x = n x iy i x i y i n x i 2 x i 2, Chapter 7: Is There a Relationship? 7 13

Student x i y i x i 2 x i y i 1 39 62 1521 2418 2 44 69 3 32 68 4 40 86 5 45 88.5 Total i x i = i y i = i x i 2 = i x i y i = i x i = i y i = i x i 2 = i x i y i = b = n x i y i x i y i n x i 2 x i 2 = a = y b x = Chapter 7: Is There a Relationship? 7 14

Scatterplot of Final vs Midterm Score 100 90 ŷ = 32.247+1.0613x 80 70 Final Score (y) 60 50 40 30 20 10 0 30 32 34 36 38 40 42 44 46 Midterm Score (x) Residual Plot Residuals 20 0-20 0 10 20 30 40 50 X Variable Chapter 7: Is There a Relationship? 7 15

Statistically Significant? Consider the results of regressing the midterm scores x against the final score y for the earlier example with 25 students. Model 1 Model Summary Std. Error Adjusted of the R R Square R Square Estimate.691 a.478.455 14.0211 a. Predictors: (Constant), X Model 1 (Constant) X a. Dependent Variable: Y Unstandardized Coefficients Coefficients a Standardi zed Coefficien ts 95% Confidence Interval for B Lower Upper B Std. Error Beta t Sig. Bound Bound 7.521 14.235.528.602-21.926 36.969 1.754.383.691 4.586.000.963 2.546 Chapter 7: Is There a Relationship? 7 16

Let s do it! 7.9 There are many factors that affect the selling price of a home. The total dwelling size and the assessed value are just two factors. Data were gathered on homes in a Milwaukee, Wisconsin, neighborhood. A scatterplot revealed a linear relationship between the total dwelling size of a home in 100 square feet and its selling price in dollars. The following is the regression output for the least squares regression of selling price on total dwelling size. Chapter 7: Is There a Relationship? 7 17

1. How many homes were included in this study? 2. Obtain the least squares regression line for predicting selling price from the size of the home. 3. Is there evidence of a significant linear relationship between price and size? 4. The total dwelling size for another home in this neighborhood is 1620 square feet. Use the least squares line to estimate the selling price of this home. Chapter 7: Is There a Relationship? 7 18

Residual Analysis Student Number Midterm Final Predicted Y Residuals 1 39 62 75.94-13.94 2 44 69 84.71-15.71 3 32 68 63.66 4.34 4 40 86 77.70 8.30 5 45 88.5 86.47 2.03 6 46 88.5 88.22 0.28 7 33 76 65.41 10.59 8 39 66.5 75.94-9.44 9 32.5 75 64.54 10.46 10 21 38 44.36-6.36 11 30 71 60.15 10.85 12 39 88 75.94 12.06 13 44 96.5 84.71 11.79 14 28.5 71.5 57.52 13.98 15 38 96 74.19 21.81 16 43 82.5 82.96-0.46 17 42 85 81.20 3.80 18 25.5 28 52.26-24.26 19 47 95 89.98 5.02 20 36 39 70.68-31.68 21 31.5 58 62.78-4.78 22 32 49 63.66-14.66 23 42 62 81.20-19.20 24 21 59 44.36 14.64 25 41 90 79.45 10.55 Residual Plot Residuals 50.00 0.00-50.00 0 10 20 30 40 50 X Chapter 7: Is There a Relationship? 7 19

Residual Analysis (continued) Steps to take 1. Plot the residuals e i versus the predicted values y i. If you see NO systematic patterns then the regression model appears appropriate for the data. Systematic patterns indicate that the model may not be appropriate for the data. 2. Plot the residuals e i versus the x i. Again, if you see systematic patterns, the linear regression model may not be appropriate for the data. Chapter 7: Is There a Relationship? 7 20

Influential Points and Outliers Definition An outlier in regression is an observation with a residual that is unusually large (positive or negative) as compared to the other residuals. Definition An influential point in regression is an observation that has a great deal of influence in determining the regression equation. Removing such a point would markedly change the position of the regression line. Observations that are somewhat extreme for the value of x are often influential. Chapter 7: Is There a Relationship? 7 21

Chapter 7: Is There a Relationship? 7 22

Chapter 7: Is There a Relationship? 7 23

Correlation Definition Correlation measures the strength of linear relationship between two variables. It is usually denoted by r. Properties Range 1 r 1. Sign The sign indicates direction of association Negative 1, 0 or positive 0,1. Magnitude The magnitude indicates the strength of the relationship. A r =±1 indicates a straight line, while r = 0 indicates no linear relationship. Units None. Chapter 7: Is There a Relationship? 7 24

Let s do it! 7.13 Match the following correlation values to the graphs. r 0 1.0 1.0 0. 6 0. 2 0. 8 0. 1 Chapter 7: Is There a Relationship? 7 25

Two Qualitative Variables Relationships between two qualitative variables are usually described using frequency or contingency tables. Example Consider the following table: Questions: Nutritional Status Academic Performance Poor Adequate Excellent Total Below Average 70 95 35 200 Average 130 450 30 610 Above Average 90 30 70 190 Total 290 575 135 1000 Chapter 7: Is There a Relationship? 7 26

Marginal and Conditional Distributions Definition The marginal distribution is found by computing the percentage of each row or column total based on the grand total. Definition The conditional distribution of the row variable given the column variable is found by expressing the entries in the original table as percentages of the column total. Similarly, the conditional distribution of the column variable given the row variable is found by expressing the entries in the original table as percentages of the row total. Chapter 7: Is There a Relationship? 7 27

Is There a Significant Relationship? Consider the above example. Crosstabs Nutritional Status Count 1 2 3 1 2 3 70 95 35 200 130 450 30 610 90 30 70 190 290 575 135 1000 Tests Source DF -LogLikelihood RSquare (U) Model 4 121.54736 0.1295 Error 994 817.39991 C Total 998 938.94727 Total Count 1000 Test Likelihood Ratio Pearson ChiSquare 243.095 238.405 Prob>ChiSq <.0001 <.0001 Kappa Std Err 0.275106 0.024136 Kappa measures the degree of agreement. Chapter 7: Is There a Relationship? 7 28