Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Similar documents
Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

1 Inferential Methods for Correlation and Regression Analysis

Linear Regression Models

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Polynomial Functions and Their Graphs

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Paired Data and Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

STP 226 ELEMENTARY STATISTICS

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Quadratic Functions. Before we start looking at polynomials, we should know some common terminology.

Regression, Inference, and Model Building

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Correlation Regression

(all terms are scalars).the minimization is clearer in sum notation:

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Confidence Intervals for the Population Proportion p

Simple Linear Regression

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Least-Squares Regression

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Math 1314 Lesson 16 Area and Riemann Sums and Lesson 17 Riemann Sums Using GeoGebra; Definite Integrals

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Stat 139 Homework 7 Solutions, Fall 2015

Linear Regression Demystified

Chapter 18: Sampling Distribution Models

(Figure 2.9), we observe x. and we write. (b) as x, x 1. and we write. We say that the line y 0 is a horizontal asymptote of the graph of f.

Correlation and Covariance

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Topic 9: Sampling Distributions of Estimators

Chapter 8: Estimating with Confidence

MA Lesson 26 Notes Graphs of Rational Functions (Asymptotes) Limits at infinity

Lecture 11 Simple Linear Regression

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Data Analysis and Statistical Methods Statistics 651

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

University of California, Los Angeles Department of Statistics. Simple regression analysis

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Bertrand s Postulate

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

a is some real number (called the coefficient) other

HOMEWORK #10 SOLUTIONS

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Properties and Tests of Zeros of Polynomial Functions

In algebra one spends much time finding common denominators and thus simplifying rational expressions. For example:

Topic 9: Sampling Distributions of Estimators

11 Correlation and Regression

Statistics 20: Final Exam Solutions Summer Session 2007

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

COMPUTING FOURIER SERIES

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

2 f(x) dx = 1, 0. 2f(x 1) dx d) 1 4t t6 t. t 2 dt i)

CS 5150/6150: Assignment 1 Due: Sep 23, 2010

MATHEMATICS: PAPER III (LO 3 AND LO 4) PLEASE READ THE FOLLOWING INSTRUCTIONS CAREFULLY

MATH/STAT 352: Lecture 15

Math 234 Test 1, Tuesday 27 September 2005, 4 pages, 30 points, 75 minutes.

Topic 9: Sampling Distributions of Estimators

Practice Problems: Taylor and Maclaurin Series

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

Describing the Relation between Two Variables

x c the remainder is Pc ().

Mathematical Notation Math Introduction to Applied Statistics

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Simple Linear Regression

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Final Examination Solutions 17/6/2010

ECON 3150/4150, Spring term Lecture 3

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Homework 5 Solutions

Lesson 10: Limits and Continuity

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

STP 226 EXAMPLE EXAM #1

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

TAMS24: Notations and Formulas

Chapter 4 - Summarizing Numerical Data

Section 14. Simple linear regression.

Solutions to Homework 7

Ismor Fischer, 1/11/

Math 312 Lecture Notes One Dimensional Maps

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

32 estimating the cumulative distribution function

Homework 3 Solutions

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

Statistics 511 Additional Materials

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Transcription:

Liear Regressio Aalysis Aalysis of paired data ad usig a give value of oe variable to predict the value of the other 5 5 15 15 1 1 5 5 1 3 4 5 6 7 8 1 3 4 5 6 7 8

Liear Regressio Aalysis E: The chirp rate of crickets is strogly correlated with the outside temperature. O 8 differet days a statisticia wet outside ad measured two quatities: the umber of chirps a cricket made i a miute () ad the outside temperature (y). The data is i the table below. X Chirps i 1 miute 88 118 11 86 1 13 96 9 Y Outside Temperature ( F) 69.7 93.3 84.3 76.3 88.6 8.6 71.6 79.6 a) Fid the liear correlatio coefficiet r b) Fid the equatio of the least squares regressio lie c) Predict the outside temperature if you hear a cricket chirpig 14 times per miute d) Fid a 95% predictio iterval for the outside temperature if a cricket is chirpig 14 times per miute

Temperature (y) Liear Regressio Aalysis E (cotiued): X Chirps i 1 miute 88 118 11 86 1 13 96 9 Y Outside Temperature ( F) 69.7 93.3 84.3 76.3 88.6 8.6 71.6 79.6 Predictor Variable: X = Number of chirps i 1 miute Respose Variable: Y = Outside Temperature Scatter Plot Crickets Temp vs. chirps 95 9 85 8 75 7 a) r =.8693 b) y =.536 + 7.674 65 8 9 1 11 1 13 # of chirps ()

Temperature (y) Liear Regressio Aalysis E (cotiued): X Chirps i 1 miute 88 118 11 86 1 13 96 9 Y Outside Temperature ( F) 69.7 93.3 84.3 76.3 88.6 8.6 71.6 79.6 Crickets Temp vs. chirps 95 9 85 8 75 7 a) r =.8693 b) y =.536 + 7.674 65 8 9 1 11 1 13 # of chirps () c) If a cricket is chirpig at 14 chirps per miute, the outside temperature is about o y =.536(14) + 7.674 = 11 F

Temperature (y) Liear Regressio Aalysis E (cotiued): X Chirps i 1 miute 88 118 11 86 1 13 96 9 Y Outside Temperature ( F) 69.7 93.3 84.3 76.3 88.6 8.6 71.6 79.6 95 9 Crickets Temp vs. chirps a) r =.8693 85 8 75 7 65 8 9 1 11 1 13 # of chirps () b) y =.536 + 7.674 c) If = 14 chirps / mi, o y = 11 F d) For a 95% predictio iterval, the margi or error E is E = 15.9 F ad the predictio iterval is o o E y E or 85.1 F y 116.9 F

The Liear Correlatio Coefficiet r

The Liear Correlatio Coefficiet r r is a umber that ca be calculated from paired data that tells you how close to a straight lie the graph of the data is r is always betwee -1 ad 1 Whe r is 1 or -1, the data form a perfect straight lie Whe r is close to 1 or -1, the graph of the data looks almost like a straight lie, but with a little scatter The further away r is from -1 ad 1 (i.e. closer to ), the less the data looks like a straight lie. It ca look scattered all over the place, or ca form a eat shape (but ot a lie) If r is positive, the data treds upward. That meas: whe icreases, y icreases the slope of the lie is positive If r is egative, the data treds dowward. That meas: whe icreases, y decreases the slope of the lie is egative

The Liear Correlatio Coefficiet r 14 1 1 8 6 4 r is positive whe, y r = 1 1 3 4 5 14 1 1 8 r is positive whe, y 6 4 r is close to 1 (actual r =.9698) 1 3 4 5

The Liear Correlatio Coefficiet r 3 5 15 1 r is egative r = -1 whe, y 5 1 3 4 5 3 5 15 r is egative whe, y 1 5 r is close to -1 (actual r = -.775) 1 3 4 5

The Liear Correlatio Coefficiet r 1 9 8 7 6 5 4 3 1 1 3 4 5 r is close to whe, y??? (actual r = -.11) 18 16 14 1 1 8 6 4 1 3 4 5 r is close to whe, y??? (actual r =.1183)

The formula for r is: The Liear Correlatio Coefficiet r r y y y y (more o this later )

The Least-Squares Regressio Lie

The Least-Squares Regressio Lie Goal: We wat to fid a lie that is as close as possible to all the data poits Every lie has a total error ad we are lookig for the lie with the smallest total error E: Data 1 3 4 y 3 13 9 14 1 1 8 6 4 Let s see which lie is closer to this data. Our choices i this eample are 13 or 1 3 4 5

E (cotiued): Data 13 The Least-Squares Regressio Lie 1 3 4 y 3 13 9 What is the total error? E E E? 1 3 No because we do t wat egative umbers to cacel out positive oes E y [3] [ (1) 13] 8 1 E y [13] [ (3) 13] 6 E y [9] [ (4) 13] 4 3 E E E? 1 3 No because absolute values are hard to deal with i math So we ll use E E E 1 3

E (cotiued): Data 13 The Least-Squares Regressio Lie 1 3 4 y 3 13 9 E y [3] [ (1) 13] 8 1 E y [13] [ (3) 13] 6 E y [9] [ (4) 13] 4 3 Total error = E E E ( 8) (6) (4) 116 1 3 E y [3] [(1) ] 1 1 E y [13] [(3) ] 5 E y [9] [(4) ] 1 3 Total error = E E E ( 1) (5) ( 1) 1 3 7

E (cotiued): Data 13 The Least-Squares Regressio Lie 1 3 4 y 3 13 9 Total error = 116 Total error = 7 So the lie is closer to the data tha the lie 13 Is there a eve closer lie???

The Least-Squares Regressio Lie Usig calculus, it ca be show that give some data, there is a lie that is closer to the data tha ay other lie. This lie is called the Least-Squares Regressio Lie (LSRL) Formula for the Least-Squares Regressio Lie b 1 b b1 y y b y y (more o this later )

Predictio Itervals

Predictio Itervals The Least-Squares Regressio Lie ca be used to predict the value of y give the value of. The predicted value of y is called The problem: is a poit estimate because it is a 1 umber guess of the value of y. Istead, we wat a iterval estimate (a cofidece iterval). Whe fidig cofidece itervals for problems ivolvig paired data, they are called Predictio Itervals. As usual, to fid a predictio iterval, we eed the margi or error formula E, ad E y E

Summary of Formulas Correlatio Coefficiet r: y y y y r LSRL: 1 ˆ b b y 1 y y b y y b Predictio Iterval (use t distributio): 1 y b y b y s e df / 1 1 s t E e E y y E y ˆ ˆ

A Detailed Liear Regressio Problem

A Detailed Liear Regressio Problem E: I order to study global warmig, the data below was take over differet years (CO levels i parts per millio ad temperature i C) CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) Fid the liear correlatio coefficiet r b) Fid the equatio of the least squares regressio lie c) Predict the temperature of the Earth if the CO levels reach 4 parts per millio d) Costruct a 9% predictio iterval for the temperature of the Earth if the CO levels reach 4 parts per millio

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 Before we do ay of the parts of this problem, we eed to fid y y y because these appear i may of the formulas.

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 y y 314317... 369 3377 314 317... 369 1143757 13.9 14... 14.4 141. 7 13.9 14... 14.4 8. 39 y ( 314)(13.9) (317)(14)... 3377 337. 7 1 (369)(14.4) 47888.6

Temperature (Celsius) A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 Before we calculate r, here is a graph of the data Temp vs. CO Levels 14.6 14.5 14.4 14.3 14. 14.1 14 13.9 13.8 31 3 33 34 35 36 37 38 CO Level (ppm) What do you thik r is? r is positive whe, y r is close to 1

Temperature (Celsius) A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 Temp vs. CO Levels 14.6 14.5 14.4 14.3 14. 14.1 14 13.9 13.8 31 3 33 34 35 36 37 38 CO Level (ppm) a) To calculate r, plug ito the formula for r r.89 r y y y y (1)(47888.6) (1)(1143757) (3377) (3377)(141.7) 1(8.39) (141.7)

b A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 b) To fid the equatio of the regressio lie, plug ito the equatios for b ad b 1 y y (1)(47888.6) (3377)(141.7) b 1 (1)(1143757) (3377).19 y y (141.7)(1143757) (3377)(47888.6) 1. 483 (1)(1143757) (3377) b 1 b so.19 1. 483

Temperature (Celsius) A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) r =.89 b) y.19 1. 483 ˆ Here s what s goig o Temp vs. CO Levels 14.6 14.5 14.4 14.3 14. 14.1 14 13.9 13.8 31 3 33 34 35 36 37 38 CO Level (ppm)

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4.19 1.483 c) The whole poit of the regressio lie is to make predictios. To make the predictio, just plug i the give value of to predict the y value Whe the Earth s CO level reaches 4 ppm (that is called umber poit estimate predictio for y is ), the best oe.19(4) 1.483 14. 843 o C

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) r =.89 b) y.19 1. 483 c) 14. 843 ˆ d) As usual, to fid a predictio iterval, we eed to fid E. To fid E we eed to fid,, ad plug everythig ito the formula for E. t / se 1 cof. level 1.9. 1 /.1/. 5 t / 1.86 (from table) o C df 1 8

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) r =.89 b) y.19 1. 483 c) 14. 843 d) t 1. 86? / ˆ s E =? e o C s e y b 1 y b 8.39 (1.483)(141.7) (.19)(47888.6) 1.347 y

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) r =.89 b) y.19 1. 483 c) 14. 843 d) 1. 86 / ˆ t.347 E =? se o C E t / s e 1 1 (1.86)(.347) 1 1 1 1(4 337.7) 1(1143757) (3377).97

A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) r =.89 b) y.19 1. 483 c) 14. 843 ˆ d) t 1. 86 s. 347 E. 97 / e o C ad the 9% predictio iterval is E y E 14.843.97 y 14.843.97 o 13.873 C y 15.813 o C

Temperature (Celsius) A Detailed Liear Regressio Problem E (cotiued): CO 314 317 3 36 331 339 346 354 361 369 Temperature y 13.9 14 13.9 14.1 14 14.3 14.1 14.5 14.5 14.4 a) r =.89 b) y.19 1. 483 c) 14. 843 o o d) 13.873 C y 15.813 C 14.6 14.5 ˆ Temp vs. CO Levels o C 14.4 14.3 14. 14.1 14 13.9 13.8 31 3 33 34 35 36 37 38 CO Level (ppm)

Homework

Homework Sec. 4.1: # s 1-43 odd Sec. 4.: # s 1-31 odd Sec. 14.: # s 1-17 all

Thaks everyoe, it s bee a fu semester