Regression I - the least squares line

Similar documents
Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

22 Approximations - the method of least squares (1)

Business Statistics. Lecture 9: Simple Regression

5.2 Infinite Series Brian E. Veitch

The Chemistry of Respiration and Photosynthesis

AMS 7 Correlation and Regression Lecture 8

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Speed of waves. Apparatus: Long spring, meter stick, spring scale, stopwatch (or cell phone stopwatch)

Chapter 1 Review of Equations and Inequalities

Confidence intervals

DIFFERENTIAL EQUATIONS

Descriptive Statistics (And a little bit on rounding and significant digits)

Correlation 1. December 4, HMS, 2017, v1.1

Math 5a Reading Assignments for Sections

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

Regression Analysis: Exploring relationships between variables. Stat 251

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Chapter 6. Systems of Equations and Inequalities

9.7 Extension: Writing and Graphing the Equations

Math 111, Introduction to the Calculus, Fall 2011 Midterm I Practice Exam 1 Solutions

Lecture 11: Simple Linear Regression

Solving Equations by Factoring. Solve the quadratic equation x 2 16 by factoring. We write the equation in standard form: x

MATH 408N PRACTICE MIDTERM 1

MA 1128: Lecture 08 03/02/2018. Linear Equations from Graphs And Linear Inequalities

Chapter 6: Exploring Data: Relationships Lesson Plan

Big-oh stuff. You should know this definition by heart and be able to give it,

Algebra 2 Notes AII.7 Polynomials Part 2

Algebra & Trig Review

Calculus with Analytic Geometry I Exam 8 Take Home Part.

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Sect The Slope-Intercept Form

( ) is called the dependent variable because its

Statistical View of Least Squares

MTH 241: Business and Social Sciences Calculus

Geometry 21 Summer Work Packet Review and Study Guide

Approximations - the method of least squares (1)

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Resonance and response

Before we do that, I need to show you another way of writing an exponential. We all know 5² = 25. Another way of writing that is: log

Contents. Contents. Contents

Section 2.6 Solving Linear Inequalities

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

1 Least Squares Estimation - multiple regression.

Practical Algebra. A Step-by-step Approach. Brought to you by Softmath, producers of Algebrator Software

Absolute and Local Extrema

Graphical Analysis; and Vectors

Lesson 3-1: Solving Linear Systems by Graphing

Chapter 4 Describing the Relation between Two Variables

Math 229 Mock Final Exam Solution

Northwood High School Algebra 2/Honors Algebra 2 Summer Review Packet

STEP 1: Ask Do I know the SLOPE of the line? (Notice how it s needed for both!) YES! NO! But, I have two NO! But, my line is

Pre-Calculus Notes from Week 6

Is a measure of the strength and direction of a linear relationship

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations

Precalculus Lesson 4.1 Polynomial Functions and Models Mrs. Snow, Instructor

Section 1.4 Circles. Objective #1: Writing the Equation of a Circle in Standard Form.

AP CALCULUS AB SUMMER ASSIGNMNET NAME: READ THE FOLLOWING DIRECTIONS CAREFULLY

Pure Math 30: Explained! 81

Project Two. James K. Peterson. March 26, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

Cottonwood Classical Preparatory School CCPS Pre-Calculus with Statistics Summer Packet

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Example 1: What do you know about the graph of the function

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Fitting a regression model

Algebra Exam. Solutions and Grading Guide

AP Calculus AB Summer Assignment. Due Date: First day of school.

Instructor Quick Check: Question Block 12

Lesson 3-2: Solving Linear Systems Algebraically

MAT1193 1f. Linear functions (most closely related to section 1.4) But for now, we introduce the most important equation in this class:

Lesson 2-6: Graphs of Absolute Value Equations

2.6 Logarithmic Functions. Inverse Functions. Question: What is the relationship between f(x) = x 2 and g(x) = x?

Chapter 2A - Solving Equations

Newton s Cooling Model in Matlab and the Cooling Project!

If you have completed your extra credit opportunity, please place it on your inbox.

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.

Section 4.6 Negative Exponents

Introduction to Linear Regression

GUIDED NOTES 4.1 LINEAR FUNCTIONS

Implicit Differentiation Applying Implicit Differentiation Applying Implicit Differentiation Page [1 of 5]

Chapter 1. Linear Equations

4.2 Graphs of Rational Functions

NAME DATE PERIOD. Power and Radical Functions. New Vocabulary Fill in the blank with the correct term. positive integer.

STEP Support Programme. Hints and Partial Solutions for Assignment 1

Objectives: Review open, closed, and mixed intervals, and begin discussion of graphing points in the xyplane. Interval notation

Parabolas and lines

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

The degree of the polynomial function is n. We call the term the leading term, and is called the leading coefficient. 0 =

Calculus 140, section 4.7 Concavity and Inflection Points notes by Tim Pilachowski

Nonlinear Regression

Algebra 2 CP Semester 1 PRACTICE Exam

Mathematician Preemptive Strike 2 Due on Day 1 of Chapter 2

1 Measurement Uncertainties

Chapter 3. September 11, ax + b = 0.

Mon 3 Nov Tuesday 4 Nov: Quiz 8 ( ) Friday 7 Nov: Exam 2!!! Today: 4.5 Wednesday: REVIEW. In class Covers

A quadratic expression is a mathematical expression that can be written in the form 2

#29: Logarithm review May 16, 2009

Transcription:

Regression I - the least squares line The difference between correlation and regression. Correlation describes the relationship between two variables, where neither variable is independent or used to predict. In correlation it would be irrelevant if we changed the axes on our graph. This is NOT true for regression. In regression, we often have a variable that is independent and another that depends on this variable. For example,temperature (independent) and activity level (dependent) in cold blooded animals. Activity level obviously depends on temperature (and NOT vice-versa). Often we also use this independent variable to predict the dependent variable. We re also interested in testing for significance, though in this case we look to see if a line going through the data points is significant. The basic idea in regression is to model the relationship between our variables. Let s assume there is a relationship between our variables. This relationship is then modeled using an equation. In the simplest case (as in our class), this is the equation for a line. This raises the question: is the line significant? This is one place where statistics come in. Fitting the line. There are actually many different ways of estimating the line, but we ll only learn the most common method. We ll use what is called the least squares line. 1

Regression I - the least squares line 2 Least squares line cob weight 180 200 220 240 260 280 1) Rotate line through (x, y) 2) Minimize the squared distances (blue lines) (x, y) 80 100 120 140 160 180 plant density We draw a line through the point ( x, ȳ), and then rotate it until the vertical distances (which we ll actually call residuals or errors instead of distances) are at a minimum: d i = minimum Except that (for mathematical reasons we don t want to go into) we actually use the sum of the squared distances: d 2 i = minimum To do this, we use calculus (e.g., differentiate, set to 0, make sure we have a minimum). Fortunately, we don t actually need to do the calculus (it s been done for us). All we really need to know are the answers, which consist of the slope and intercept for our least squares line:

Regression I - the least squares line 3 b 1 = (x i x)(y i ȳ) (x i x) 2 b 0 = ȳ b 1 x Where b 1 is the slope and b 0 the intercept. Notice that we could rewrite the equation for the slope in terms of the Sum of Cross Products (SS cp ) and the Sum of Squares for x (SS x ). b 1 = SS cp SS x So we finally get the equation for our least squares line: Ŷ = b 0 + b 1 X Where Ŷ is the predicted value of Y for a particular X. A comment on a different approach (that is actually the same). Recently, for some reason, many textbooks have presented a different approach to calculating b 1 : ( ) sy b 1 = r Where r is the correlation coefficient. The two approaches do give the same answer for b 1 : (y i ȳ) ( ) (x i x)(y i ȳ) 2 (x i x)(y i ȳ) sy n 1 b 1 = r = s n = x (x i x) 2 n (y i ȳ) 2 (x i x) 2 (x i x) 2 s x n 1 We will stick with the original approach. In those instances where you don t have to calculate everything yourself, you ll be given SS cp, SS x and SS y to help you (instead of r, s x, and s y ).

Regression I - the least squares line 4 Let s summarize. We calculate b 0 and b 1 and then arrange everything into the equation for a line. Notice also that (as usual) we calculated the estimates: b 1 estimates β 1 b 0 estimates β 0 As usual, β 1 and β 0 are unknown, and we estimate them with b 1 and b 0. So we know how to calculate the equation of a line. There is, however, a similar equation that is also used a lot in regression: y i = b 0 + b 1 x i + e i What s the difference? Notice that in this case the equation does not give us Ŷ. Instead, it gives us the actual values for Y (the actual y i s). It does this because the last term (e i ) is an error term. It adds in the difference between the actual y i and the estimated value (i.e., ŷ i ). It is used quite a bit; often we want to calculate the value of the e i s, and we can use this equation to do that. Notice (as mentioned) that we use the term error here. The e i s are also known as the residuals (e i = r i ), which abbreviation we use depends a little on what we re doing, but for use they re mostly interchangeable. Let s illustrate things before we get lost with an example based on Wikipedia that discusses soil respiration. Soil respiration is the amount of CO 2 that is released from the soil by microorganisms and other soil inhabitants. We want to know what the effect of temperature is on soil respiration. CO 2 levels are measured as CO 2 exchange rate (NCER), measured in µmol/m 2. First, you should figure out what you expect to happen - if temperature increases, what should happen to the CO 2 levels being released from the soil (more on formulating hypotheses in the next set of notes)? We have the following results, loosely based on a real study: Temperature 17 18 19 20 21 22 23 24 NCER 0.31 0.27 0.45 0.43 0.51 0.55 0.65 0.63 We know how to calculate the following: SS x = 42 SS y = 0.1334 SS cp = 2.26 x = 20.5 ȳ = 0.475

Regression I - the least squares line 5 But to make sure, let s do the calculation for SS cp (our Sum of Cross Products) one more time: for: i = 1 : (17 20.5)(.02.31) =0.5775 i = 2 : (18 20.5)(.25.27) =0.5125 i = 3 : (19 20.5)(.54.45) =0.0375 i = 4 : (20 20.5)(.69.43) =0.0225 i = 5 : (21 20.5)(1.07.51) =0.0175 i = 6 : (22 20.5)(1.50.55) =0.1125 i = 7 : (23 20.7)(1.74.65) =0.4375 i = 8 : (24 20.5)(1.74.63) =0.5425 Sum =2.26 So now that we know how to get SS cp we can proceed to calculate our slope: b 1 = 2.26 42 = 0.05381 And now we use this to calculate our intercept: b 0 = 0.475 0.05381 20.5 = 0.62809 From the slope and intercept we get the equation for our least squares line: Ŷ = 0.62809 + 0.05381X

Regression I - the least squares line 6 And finally we can see what it all looks like: Soil respiration NCER 0.35 0.45 0.55 0.65 Y^ = 0.62809 + 0.05381X line with slope = 0.05381 line will intercept 0 degrees at 0.62809 17 18 19 20 21 22 23 24 Temperature