UNIT 12 ~ More About Regression

Similar documents
y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

10.1 Simple Linear Regression

Ch Inference for Linear Regression

23. Inference for regression

Conditions for Regression Inference:

Regression. Marc H. Mehlman University of New Haven

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 2: Looking at Data Relationships (Part 3)

Inference for the Regression Coefficient

INFERENCE FOR REGRESSION

Yes, inner planets tend to be and outer planets tend to be.

MATH 1150 Chapter 2 Notation and Terminology

Relationships Regression

Large and small planets

Quantitative Bivariate Data

6-6. Fitting a Quadratic Model to Data. Vocabulary. Lesson. Mental Math

y n 1 ( x i x )( y y i n 1 i y 2

Looking at data: relationships

7.0 Lesson Plan. Regression. Residuals

Review of Regression Basics

Chapter 27 Summary Inferences for Regression

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Astronomy Test Review. 3 rd Grade

appstats27.notebook April 06, 2017

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 9. Correlation and Regression

TEKS Cluster: Space. identify and compare the physical characteristics of the Sun, Earth, and Moon

Summative Assessment #2 for Outer Space and Cyber Space

Important note: Transcripts are not substitutes for textbook assignments. 1

bx, which takes in a value of the explanatory variable and spits out the log of the predicted response.

Linear Regression Communication, skills, and understanding Calculator Use

Name Class Date. For each pair of terms, explain how the meanings of the terms differ.

ASTROMATH 101: BEGINNING MATHEMATICS IN ASTRONOMY

Data Analysis and Statistical Methods Statistics 651

Lecture 18: Simple Linear Regression

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Patterns in the Solar System (Chapter 18)

Name: Pd Parent Signature of completion:

1. The two triangles shown below are similar. This means that all the angles are equal and the sides are proportional.

Chapter 7 Linear Regression

Orbital Paths. the Solar System

Inferences for Regression

Unit 6 - Introduction to linear regression

BIOSTATISTICS NURS 3324

Investigating the Solar System

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

appstats8.notebook October 11, 2016

Sensational Solar System

Planet Time to orbit the Sun (Earth years) Distance from the Sun (million km) Mercury Venus Earth Mars 2.

2.1 Scatterplots. Ulrich Hoensch MAT210 Rocky Mountain College Billings, MT 59102

Science Skills Station

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Warm-up Using the given data Create a scatterplot Find the regression line

Cycles. 1. Explain what the picture to the left shows. 2. Explain what the picture to the right shows. 3. Explain what the picture to the left shows.

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

The Outer Planets (pages )

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Putting Earth In Its Place

28. SIMPLE LINEAR REGRESSION III

BIVARIATE DATA data for two variables

Sociology 6Z03 Review I

Transforming to Achieve Linearity

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Linear Regression and Correlation. February 11, 2009

Simple Linear Regression Using Ordinary Least Squares

Regression Models - Introduction

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

Simple Linear Regression

The activities below cover LO1: Be able to apply the principles of good laboratory practice. Associated files:

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

NAME: PERIOD: DATE: LAB PARTNERS: LAB #39 ECCENTRICITY OF PLANETARY ORBITS

The response variable depends on the explanatory variable.

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AMS 7 Correlation and Regression Lecture 8

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

The Planets and Scale

Business Statistics. Lecture 9: Simple Regression

Simple Linear Regression

Patterns in the Solar System (Chapter 18)

Investigation: Transit Tracks

IN THE ALMIGHTY GOD NAME Through the Mother of God mediation I do this research

Which of the following planets are all made up of gas? When a planets orbit around the Sun looks like an oval, it s called a(n)

Least-Squares Regression

Unit 6 - Simple linear regression

Total Points 50. Project grade 50

Chapter 3: Examining Relationships

4 A(n) is a small, rocky object that orbits the sun; many of these objects are located in a band between the orbits of Mars and Jupiter.

Describing Bivariate Relationships

Psychology 282 Lecture #4 Outline Inferences in SLR

Chapter 3. Measuring data

Lecture 11: Simple Linear Regression

AP Final Review II Exploring Data (20% 30%)

Inference for Regression Inference about the Regression Model and Using the Regression Line

11 Correlation and Regression

October 19, NOTES Solar System Data Table.notebook. Which page in the ESRT???? million km million. average.

Transcription:

***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests and confidence intervals in this setting. Example 1: Crying and IQ Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and their later IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children s IQ at age three years using the Stanford-Binet IQ test. The following table contains data on 38 infants. Let us analyze the data (Be sure to use the DATA ANALYSIS TOOLBOX on pages 93-94!) Crying IQ Crying IQ Crying IQ Crying IQ 10 87 20 90 17 94 12 94 12 97 16 100 19 103 12 103 9 103 23 103 13 104 14 106 16 106 27 108 18 109 10 109 18 109 15 112 18 112 23 113 15 114 21 114 16 118 9 119 12 119 12 120 19 120 16 124 20 132 15 133 22 135 31 135 16 136 17 141 30 155 22 157 33 159 13 162 Data Who? What? Why? When, where, how, and by whom? Graphs Unit 12 ~ Pg. 1

Numerical summaries Model Interpretation Conditions for the Regression Model The slope b and intercept a of the least-squares line are. That is, we calculate them from the data. In our previous example, we know that these statistics would take somewhat different values if we repeated the study with different infants. When we perform formal inference, we think of a and b as estimates of the unknown parameters. Conditions for Regression Inference (L.I.N.E.R.) We have n observations of an explanatory variable x and a response variable y. Our goal is to study or predict the behavior of y for given values of x. Linear The actual relationship between x and y is linear. For any fixed value of x, the mean response falls on the population (true) regression line µ y = α+ β x. The slope and intercept are usually unknown parameters. Independent Individual observations are independent of each other. Normal For any fixed value of x, the response y varies according to a Normal distribution. Equal variance The standard deviation of y (call it ) is the same for all values of x. The common standard deviation is usually an unknown parameter. Random The data come from a well-designed random sample or randomized experiment. Unit 12 ~ Pg. 2

The heart of this model is that there is an on the average straight-line relationship between y and x. The true regression line µ y = α+ β x says that the mean response moves along a straight line as the explanatory variable changes. We observe the regression line. The values of y that we do observe vary about their means according to a Normal distribution. This figure shows the regression model in picture form. The line in the figure is the true regression line. The mean of the response y moves along this line as the explanatory variable x takes different values. The Normal curves show how y will vary when x is held fixed at different values. All of the curves have the same σ, so the variability of y is the same for all values of x. *YOU SHOULD CHECK THE CONDITIONS FOR INFERENCE WHEN YOU DO INFERENCE ABOUT REGRESSION! Checking the Regression Conditions (L.I.N.E.R.) You can fit a least-squares line to any set of explanatory-response data when both variables are quantitative. Before we do inference, we must check these conditions one by one. Linear Examine the scatterplot to check that the overall pattern is roughly linear. Look for curved patterns in the residual plot. Check to see that the residual center on the residual = 0 line at each x-value in the residual plot. Independent Look at how the data were produced. Random sampling and random assignment help ensure the independence of individual observations. If sampling is done without replacement, remember to check that the population is at least 10 times as large as the sample (10% condition). Normal Make a stemplot, histogram, boxplot, or Normal probability plot of the residuals and check for clear skewness or other major departures from Normality. Equal variance Look at the scatter of the residuals above and below the residual = 0 line in the residual plot. The amount of scatter should be roughly the same from the smallest to the largest x- value. Random See if the data were produced by random sampling or a randomized experiment. Unit 12 ~ Pg. 3

Example 2: Crying and IQ.. Let us check conditions for Example 1. Estimating Parameters The first step in inference is to the unknown α, β, and σ. When the regression model describes our data and we calculate the least-squares line ŷ= a+ bx, the slope b of the least-squares line is an unbiased estimator of the true slope β, and the intercept a of the least-squares line is an unbiased estimator of the true intercept α. Example 3: Crying and IQ.. Let us find and interpret the slope and intercept for Example 1. Unit 12 ~ Pg. 4

The remaining parameter is σ, which describes the of the response y about the true regression line. The residuals estimate how much y varies about the true line. Recall that the residuals are the vertical deviation of the data points from the least-squares line: residual= observed y predicted y = y yˆ There are n residuals, one for each data point. Because σ is the standard deviation of responses about the true regression line, we estimate it by a. We saw this error measure before in Chapter 3 (Pg. 218). We call this sample standard deviation a to emphasize that it is estimated from data. The residuals from a least-squares line always have mean zero, which simplifies their standard error. Standard Error about the Least-Squares Line The standard error about the line is s= residuals n 2 2 = ( y yˆ ) 2 n 2 Use s to estimate the unknown σ in the regression model. Example 4: Crying and IQ Let us calculate the standard error for Example 1. Unit 12 ~ Pg. 5

Confidence Intervals for the Regression Slope The slope β of the true regression line is usually the most important parameter in the regression problem. The slope is the of the as the variable. We often want to estimate β. The slope b of the least-squares line is an unbiased estimator of β. A confidence interval is more useful because it shows how accurate the estimate b is likely to be. The confidence interval for β has the familiar form * estimate ± t SE estimate Because b is our estimate, the confidence interval becomes b ± t * SE b Confidence Interval for Regression Slope A level C confidence interval for the slope β of the true regression line is b ± t * SE b In this expression, the standard error of the least-squares slope b is SE b = s x x ( ) 2 and t * is the critical value for the density curve with area α between * t and t *. Example 5: Crying and IQ Let us examine regression output for Example 1. Unit 12 ~ Pg. 6

Testing the Hypothesis of No Linear Relationship The most common hypothesis about the slope is: A regression line with slope 0 is horizontal. That is, the mean of y when x changes. So this H 0 says that there is no true linear relationship between x and y. In other words, H 0 says there is no correlation between x and y in the population from which we drew our data. * TRICK: You can use the test for zero slope to test the hypothesis of zero correlation between any two quantitative variables. * NOTE: Testing for correlation makes sense only if the observations are a random sample. In regression settings, this is often not the case because researchers may fix in advance the values of x they want to study. Significance Tests for Regression Slope To test the hypothesis H : 0 0 β =, compute the t statistic b t= SE b In terms of a random variable T having the t( n 2) distribution, the P-value for a test of H 0 against H : β > 0 is P( T t) a H : β < 0 is P( T t) a H a : β 0 is 2 P( T t ) This test is also a test of the hypothesis that the correlation is 0 in the population. * Regression output from statistical software usually gives t and its P-value. So, for a one-sided test, be sure to the P-value in the output by. Example 6: Crying and IQ Let us test the regression slope for Example 1. Unit 12 ~ Pg. 7

Example 7: Beer and blood alcohol Let us look at how well the number of beers a student drinks predicts his or her blood alcohol content (BAC). Sixteen student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their BAC. We will perform a linear regression t test. Here are the data: Student: 1 2 3 4 5 6 7 8 Beers: 5 2 9 8 3 7 3 5 BAC: 0.10 0.03 0.19 0.12 0.04 0.095 0.07 0.06 Student: 9 10 11 12 13 14 15 16 Beers: 3 5 4 6 5 7 1 4 BAC: 0.02 0.05 0.07 0.10 0.085 0.09 0.01 0.05 STEP 1: State STEP 2: Plan STEP 3: Do STEP 4: Conclude Unit 12 ~ Pg. 8

***SECTION 4.1*** Transforming To Achieve Linearity UNIT 12 ~ More About Regression Linear Transformations: Note: Linear Transformations can not Non-Linear Transformations: Nonlinear relationships between variables can sometimes be changed into relationships by one or of the variables. GOAL: transform the data into a linear pattern once our data is linear, we can make a regression model to help make future predictions. Types of Transformations: (1) The against. y x = ab becomes linear when we plot (2) The. y p = ax becomes linear when we plot against Transforming Non-Linear Bivariate Data: The table below lists the mean distance from the sun (in astronomical units or AU) and the period (time to orbit) for the nine planets of the solar system. Planet Distance (AU) Period (years) Mercury 0.386 0.241 Venus 0.720 0.615 Earth 1.00 1.00 Mars 1.52 1.88 Jupiter 5.19 11.9 Saturn 9.53 29.46 Uranus 19.2 83.8 Neptune 30.0 164 Pluto 39.5 248 Unit 12 ~ Pg. 9

Confirm that linear regression for this data yields evidence that the points lie fairly close to a line. Explain how you know this. Provide evidence to confirm that a line in not an appropriate model for this data, and explain your evidence. Develop a more appropriate model for this data, and provide supporting evidence for your choice. HINT: Instead of trying to find a function (non-linear) to fit the curve, we take our data and it to make it and then fit a line to it. Use your model to predict the period of an asteroid located 4.0 AU from the sun. Unit 12 ~ Pg. 10