Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Similar documents
Linear Regression Models

Regression, Inference, and Model Building

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

1 Inferential Methods for Correlation and Regression Analysis

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

(all terms are scalars).the minimization is clearer in sum notation:

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Simple Linear Regression

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

STP 226 ELEMENTARY STATISTICS

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Lecture 11 Simple Linear Regression

Correlation Regression

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

University of California, Los Angeles Department of Statistics. Simple regression analysis

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Algebra of Least Squares

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Simple Linear Regression

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Summarizing Data. Major Properties of Numerical Data

Simple Regression Model

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Properties and Hypothesis Testing

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

11 Correlation and Regression

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Statistical Properties of OLS estimators

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Correlation and Covariance

Stat 139 Homework 7 Solutions, Fall 2015

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

ECON 3150/4150, Spring term Lecture 3

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Section 14. Simple linear regression.

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4

Question 1: Exercise 8.2

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

Computing Confidence Intervals for Sample Data

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Describing the Relation between Two Variables

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

Formulas and Tables for Gerstman

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

TIME SERIES AND REGRESSION APPLIED TO HOUSING PRICE

Correlation and Regression

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 1, Jan 19. i=1 p i = 1.

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

TAMS24: Notations and Formulas

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Dealing with Data and Fitting Empirically

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

STP 226 EXAMPLE EXAM #1

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

9. Simple linear regression G2.1) Show that the vector of residuals e = Y Ŷ has the covariance matrix (I X(X T X) 1 X T )σ 2.

Chapter 4 - Summarizing Numerical Data

(7 One- and Two-Sample Estimation Problem )

Linear Regression Demystified

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Lesson 11: Simple Linear Regression

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed:

Common Large/Small Sample Tests 1/55

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Simple Linear Regression. Copyright 2012 Pearson Education, Inc. Publishing as Prentice Hall. Chapter Chapter. β0 β1. β β = 1. a. b.

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

UNIT 11 MULTIPLE LINEAR REGRESSION

Linear Regression Models, OLS, Assumptions and Properties

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

Mathematical Notation Math Introduction to Applied Statistics

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Least-Squares Regression

Simple Linear Regression Matrix Form

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Logit regression Logit regression

Chapter 1 Simple Linear Regression (part 6: matrix version)

MA 575, Linear Models : Homework 3

Stat 200 -Testing Summary Page 1

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Topic 9: Sampling Distributions of Estimators

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

WEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =!

Transcription:

Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU)

Basics Purpose of regressio aalysis: predict the value of a depedet or respose variable from the values of at least oe explaatory or idepedet variable (also called predictors or factors). Purpose of correlatio aalysis: measure the stregth of the correlatio betwee two variables. 3 Liear Relatioship 4 35 3 5 5 5 4 6 8 4 6 4

No Relatioship 8 6 4 8 6 4 4 6 8 4 6 5 Negative Curviliear 8 6 4 4 6 8 4 6 6 3

Simple Liear Regressio Residual Error y error estimated y x 7 Simple Liear Regressio Selectig the best lie y Miimize the sum of the squares of the errors: least squares criterio. error estimated y x 8 4

Liear Regressio Yˆi X i b SSE = : : ˆ + Yi = b b X i predicted value of Y for observatio i. value of observatio i. ad b i= e Subject to: e i = i= i = are chose to miimize: ( Y ˆ i Yi ) = i= i= [ Y ( b + b X )] i i 9 Method of Least Squares b = i= i= b = Y b X X i Y i X Y X i ( X ) 5

Liear Regressio Example Number of I/Os (x) CPU Time (y) Estimate (.48*x +.58) Error Error Squared.9.9.5..34.3.3. 3.65.73 -.83.7 4..4 -.6. 5.4.55 -.8.6 6.3.95.67.5 7.357.336.6.4 8.4.377.39.57 9.45.48 -.3.7.44.459 -.6.6.7 Xbar 5.5 Ybar.75 Sum x 385 Sum xy 8.49466 b.48 b.58 Liear Regressio Example CPU Time (sec).5.45.4.35.3.5..5..5. CPU time =.48*No. I/Os +.58 R =.9877 4 6 8 Number of I/Os 6

Allocatio of Variatio No regressio model: use mea as predicted value. SSE is: SST = ( Yi Y ) i= Sum of squares total SSR = SST SSE Sum of squares explaied by the regressio. Variatio ot explaied by regressio 3 Allocatio of Variatio Coefficiet of determiatio (R ): fractio of variatio explaied by the regressio. R SSR SST SSE = = = SST SST SSE SST The closer R is to oe, the better is the regressio model. 4 7

Number of I/Os (x) CPU Time (y) Estimate (.48*x +.58) Error Error Squared SSY.9.9.5..848 SST.38884.34.3.3..788 SSR.3769 3.65.73 -.84.7.773 R.987654 4..4 -.7..44645 5.4.55 -.9.7.5855 6.3.96.66.4.933 7.357.336.4.4.733 8.4.377.38.56.677 9.45.48 -.33.8.63795.44.459 -.63.7.95783.75.7.8957 SST Y Y Y = ( i ) = i Y = SSY SS i= i= SSE = ( Yi Yˆ i ) i= SSR = ( Yˆ i Y ) = SST SSE i= SSR R = SST coefficiet of determiatio. SSE SSY The higher the value of R the better the regressio. 5 Stadard Deviatio of Errors Variace of errors: divide the sum of squares (SSE) by the umber of degrees of freedom (- sice two regressio parameters eed to be computed first). s e = SSE Mea squared error (MSE) 6 8

Degrees of freedom of various sum of squares. SST - Need to compute Y SSY SS SSE SSR - Does ot deped o ay other parameter Need to compute two regressio parameters =SST-SSE Degrees of freedom add as sum of squares do. 7 Cofidece Iterval for Regressio Parameters b o ad b were computed from a sample. So, they are just estimates of the true parameters β ad β for the true model. Stadard deviatios for b o ad b. s b = s e + i= ( X ) X i ( X ) s b = s e X i ( X ) i= 8 9

Cofidece Iterval for Regressio Parameters (-α)% cofidece iterval for b o ad b b b ± t ± t [ α / ; ] [ α / ; ] s s b b 9 Cofidece Iterval Example Number of I/Os (x) CPU Time (y) Estimate (.48*x +.58) Error Error Squared.9.9.5..34.3.3. 3.65.73 -.83.7 4..4 -.6. 5.4.55 -.8.6 6.3.95.67.5 7.357.336.6.4 8.4.377.39.57 9.45.48 -.3.7.44.459 -.6.6 SSE:.7 Xbar 5.5 Ybar.75 Sum x 385 Sum xy 8.49466 b.48 b.58 se.44 Lower bo.777 se.464 Upper bo.739 sb.7 sb.69 Lower b.3758576 95% cofidece level Upper b.444984 alpha.5 t[-alpha/;-].3656 SST.38884 SSR.377 R.987654

Cofidece Iterval for the Predicted Value The stadard deviatio of the mea of a future sample of m observatios at X = X p is y mp = s e m + + sˆ ( X p X ) X i X As the future sample size (m) decreases, the stadard deviatio for predicted value decreases. i= / Cofidece Iterval for the Predicted Value (-α)% cofidece iterval for the predicted value for a future sample of size m at X p : yˆ p ± t[ α / ; ] s ˆ y mp

Liear Regressio Assumptios Liear relatioship betwee the respose (y) ad the predictor (x). The predictor (x) is o-stochastic ad is measured without ay error. Errors are statistically idepedet. Errors are ormally distributed with zero mea ad a costat stadard deviatio. 3 Liear Regressio Assumptios Liear relatioship betwee the respose (y) ad the predictor (x). y liear y piecewise-liear x x y possible outlier y o-liear x x 4

Liear Regressio Assumptios Errors are statistically idepedet. residual residual o tred predicted respose residual predicted respose tred predicted respose tred 5 Liear Regressio Assumptios Errors are ormally distributed. residual quatile residual quatile Normal quatile ormally distributed errors Normal quatile o-ormally distributed errors 6 3

Liear Regressio Assumptios Errors have a costat stadard deviatio. residual predicted respose o tred i spread residual predicted respose icreasig spread 7 Other Regressio Models 8 4

Multiple Liear Regressio Use to predict the value of the respose variable as fuctio of k predictor variables x,, x. Y ˆ = b + b X + b X +... + b i i i Similar to simple liear regressio. MS Excel ca be used to do multiple liear regressio. x X ki 9 CPU Time (yi) I/O Time (xi) Memory Requiremet (xi) 4 7 5 6 75 7 7 44 9 4 9 39 3 5 35 83 4 Wat to fid: CPUTime = b + b * I/OTime + b * MemoryRequiremet 3 5

SUMMARY OUTPUT Regressio Statistics Multiple R.987 R Square.974 Adjusted R Square.964 Stadard Error.5 Observatios 7 R Coefficiets Stadard Error t Stat Lower 95% Upper 95% Lower 9.% Upper 9.% Itercept (b) -.645.9345 -.7674 -.69759.3747 -.878.78589 X Variable (b).84.96.6389 -.465.6599 -.936.5884 X Variable (b).65.445.6559 -.858.388 -.5973.73 3 Curviliear Regressio Approach: plot a scatter plot. If it does ot look liear, try o-liear models: No-liear Liear y = a + b / x y = a + b(/ x) y = /( a + bx) (/ y) = a + bx y = x /( a + bx) ( x / y) = a + bx x y = a b l y = l a + x l b y = a + bx y = a + b( x ) 3 6