REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Similar documents
SIMPLE LINEAR REGRESSION

COMPARISONS INVOLVING TWO SAMPLE MEANS. Two-tail tests have these types of hypotheses: H A : 1 2

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

x z Increasing the size of the sample increases the power (reduces the probability of a Type II error) when the significance level remains fixed.

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

Statistical Inference Procedures

TESTS OF SIGNIFICANCE

S T A T R a c h e l L. W e b b, P o r t l a n d S t a t e U n i v e r s i t y P a g e 1. = Population Variance

CE3502 Environmental Monitoring, Measurements, and Data Analysis (EMMA) Spring 2008 Final Review

Chapter 9. Key Ideas Hypothesis Test (Two Populations)

Statistics and Chemical Measurements: Quantifying Uncertainty. Normal or Gaussian Distribution The Bell Curve

M227 Chapter 9 Section 1 Testing Two Parameters: Means, Variances, Proportions

Tools Hypothesis Tests

STUDENT S t-distribution AND CONFIDENCE INTERVALS OF THE MEAN ( )

Comments on Discussion Sheet 18 and Worksheet 18 ( ) An Introduction to Hypothesis Testing

STA 4032 Final Exam Formula Sheet

UNIVERSITY OF CALICUT

SOLUTION: The 95% confidence interval for the population mean µ is x ± t 0.025; 49

ME 410 MECHANICAL ENGINEERING SYSTEMS LABORATORY REGRESSION ANALYSIS

Below are the following formulas for the z-scores section.

Stat 3411 Spring 2011 Assignment 6 Answers

1 Inferential Methods for Correlation and Regression Analysis

20. CONFIDENCE INTERVALS FOR THE MEAN, UNKNOWN VARIANCE

MTH 212 Formulas page 1 out of 7. Sample variance: s = Sample standard deviation: s = s

Regression, Inference, and Model Building

ON THE SCALE PARAMETER OF EXPONENTIAL DISTRIBUTION

IntroEcono. Discrete RV. Continuous RV s

Confidence Intervals. Confidence Intervals

VIII. Interval Estimation A. A Few Important Definitions (Including Some Reminders)

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Formula Sheet. December 8, 2011

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

LECTURE 13 SIMULTANEOUS EQUATIONS

Difference tests (1): parametric

Chapter 9: Hypothesis Testing

100(1 α)% confidence interval: ( x z ( sample size needed to construct a 100(1 α)% confidence interval with a margin of error of w:

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Statistical Equations

Chapter 8 Part 2. Unpaired t-test With Equal Variances With Unequal Variances

Grant MacEwan University STAT 151 Formula Sheet Final Exam Dr. Karen Buro

Chapter 1 Econometrics

Sample Size Determination (Two or More Samples)

Simple Linear Regression

Statistical Inference for Two Samples. Applied Statistics and Probability for Engineers. Chapter 10 Statistical Inference for Two Samples

TI-83/84 Calculator Instructions for Math Elementary Statistics

Stat 139 Homework 7 Solutions, Fall 2015

Questions about the Assignment. Describing Data: Distributions and Relationships. Measures of Spread Standard Deviation. One Quantitative Variable

Describing the Relation between Two Variables

10-716: Advanced Machine Learning Spring Lecture 13: March 5

Comparing your lab results with the others by one-way ANOVA

11/19/ Chapter 10 Overview. Chapter 10: Two-Sample Inference. + The Big Picture : Inference for Mean Difference Dependent Samples

Linear Regression Models

Chapter 1 ASPECTS OF MUTIVARIATE ANALYSIS

STP 226 ELEMENTARY STATISTICS

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

u t u 0 ( 7) Intuitively, the maximum principles can be explained by the following observation. Recall

CHAPTER 6. Confidence Intervals. 6.1 (a) y = 1269; s = 145; n = 8. The standard error of the mean is = s n = = 51.3 ng/gm.

Fig. 1: Streamline coordinates

18.05 Problem Set 9, Spring 2014 Solutions

Properties and Hypothesis Testing

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Statistical treatment of test results

Isolated Word Recogniser

Chapter 8.2. Interval Estimation

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Statistics Problem Set - modified July 25, _. d Q w. i n

Random Variables, Sampling and Estimation

University of California, Los Angeles Department of Statistics. Simple regression analysis

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Topic 9: Sampling Distributions of Estimators

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Statistics Parameters

Topic 9: Sampling Distributions of Estimators

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

(all terms are scalars).the minimization is clearer in sum notation:

Topic 9: Sampling Distributions of Estimators

Erick L. Oberstar Fall 2001 Project: Sidelobe Canceller & GSC 1. Advanced Digital Signal Processing Sidelobe Canceller (Beam Former)

Statistics - Lying without sinning? Statistics - Lying without sinning?

a 1 = 1 a a a a n n s f() s = Σ log a 1 + a a n log n sup log a n+1 + a n+2 + a n+3 log n sup () s = an /n s s = + t i

Heat Equation: Maximum Principles

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Confidence Intervals: Three Views Class 23, Jeremy Orloff and Jonathan Bloom

orig For example, if we dilute ml of the M stock solution to ml, C new is M and the relative uncertainty in C new is

Chem Exam 1-9/14/16. Frequency. Grade Average = 72, Median = 72, s = 20

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Mathematical Notation Math Introduction to Applied Statistics

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Reasons for Sampling. Forest Sampling. Scales of Measurement. Scales of Measurement. Sampling Error. Sampling - General Approach

m = Statistical Inference Estimators Sampling Distribution of Mean (Parameters) Sampling Distribution s = Sampling Distribution & Confidence Interval

Maximum Likelihood Estimation

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

UNIT 8: INTRODUCTION TO INTERVAL ESTIMATION

Section II. Free-Response Questions -46-

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Transcription:

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION I liear regreio, we coider the frequecy ditributio of oe variable (Y) at each of everal level of a ecod variable (X). Y i kow a the depedet variable. The variable for which you collect data. X i kow a the idepedet variable. The variable for the treatmet. Determiig the Regreio Equatio Oe goal of regreio i to draw the bet lie through the data poit. The bet lie uually i obtaied uig mea itead of idividual obervatio. Example Effect of hour of mixig o temperature of wood pulp Hour of mixig (X) Temperature of wood pulp (Y) XY 4 4 7 8 6 9 74 8 64 5 86 86 9 4 X=4 Y=39 XY=8 X =364 Y =,967 =6

Temperature Effect of hour of mixig o temperature of w ood pulp 8 6 4 4 6 8 Hour of mixig The equatio for ay traight lie ca be writte a: Ŷ b bx where: b o = Y itercept, ad b = regreio coefficiet = lope of the lie The liear model ca be writte a: where: e i =reidual = Y Ŷ i i Y i β β X ε i With the data provided, our firt goal i to determie the regreio equatio Step. Solve for b (X X)(Y Y) (X X) ( XY) XY SS Cro Product b X SSCP SS X SS X X for the data i thi example

Temperature ( o F) X = 4 Y = 39 XY =,8 X = 364 Y =,967 ( XY) XY X X (4x39) 8 6 4 364 6 567 7 b 8. The umber calculated for b, the regreio coefficiet, idicate that for each uit icreae i X (i.e., hour of mixig), Y (i.e., wood pulp temperature) will icreae 8. uit (i.e., degree). The regreio coefficiet ca be a poitive or egative umber. To complete the regreio equatio, we eed to calculate b o. 39 4 Y - b X 8. 6 6 b - 3.533 Therefore, the regreio equatio i: 3.533 8.X Ŷ i 8 6 4 - -4-4 6 8-4 -6 Hour of mixig

Temperature Aumptio of Regreio. There i a liear relatiohip betwee X ad Y. The value of X are kow cotat ad preumably are meaure without error. 3. For each value of X, Y i idepedet ad ormally ditributed: Y~N(, σ ). Y Y. 4. Sum of deviatio from the regreio lie equal zero: 5. Sum of quare for error are a miimum. i ˆi Effect of hour of mixig o temperature of wood pulp 8 6 4-4 6 8 Hour of mixig If you quare the deviatio ad um acro all obervatio, you obtai the defiitio formula for the followig um of quare: Ŷ Y i Y i Ŷ i Y Y i = Sum Square Due to Regreio = Sum Square Due to Deviatio from Regreio (Reidual) = Sum Square Total

Tetig the hypothei that a liear relatiohip betwee X ad Y exit The hypothee to tet that a liear relatiohip betwee X ad Y exit are: H o : ß = H A : ß Thee hypothee ca be teted uig three differet method:. F-tet. t-tet 3. Cofidece iterval Method. F-tet The ANOVA to tet H o = ca be doe uig the followig ource of variatio, degree of freedom, ad um of quare: SOV df Sum of Square Due to regreio ( XY) XY X SSCP X SS X Reidual - Determied by ubtractio Y Total - Y SS Y Uig data from the example: X = 4 Y = 39 XY =,8 X = 364 Y =,967 Step. Calculate Total SS = Y Y 39,967-6 5,6.833

Step. Calculate SS Due to Regreio = ( XY) XY X X 4x39 8-6 4 364 6 3,489 7 4,59.7 Step 3. Calculate Reidual SS = SS Deviatio from Regreio Total SS - SS Due to Regreio 56.833-459.7 = 44.33 Step 4. Complete ANOVA SOV df SS MS F Due to Regreio 459.7 459.7 Due to Reg. MS/Reidual MS = 44.36 ** Reidual 4 44.33 3.533 Total 5 56.833 The reidual mea quare i a etimate of σ Y X, read a variace of Y give X. Thi parameter etimate the tatitic σ Y X. Step 5. Becaue the F-tet o the Due to Regreio SOV i igificat, we reject H o : ß = at the 99% level of cofidece ad ca coclude that there i a liear relatiohip betwee X ad Y. Coefficiet of Determiatio - r From the ANOVA table, the coefficiet of variatio ca be calculated uig the formula r = SS Due to Regreio / SS Total Thi value alway will be poitive ad rage from to.. A r approache., the aociatio betwee X ad Y improve. r x i the percetage of the variatio i Y that ca be explaied by havig X i the model. For our example: r = 459.7 / 56.833 =.97. We ca coclude that 9.7% (i.e..97 x ) of the variatio i wood pulp temperature ca be explaied by hour of mixig.

Method. t-tet The formula for the t-tet to tet the hypothei H o : ß = i: b t b where: b the regreio coefficiet, ad b Y X SS X For our example: Step. Calculate Remember that Y X = Reidual MS = [SS Y - (SSCP / SS X)] / (-) b We kow from previou part of thi example: Therefore, SS Y = 56.833 SSCP = 567. SS X = 7. b = ( Y X / SS X) SSCP SS Y - SS X - SS X 567 56.833-7 6-7.479

Step. Calculate t tatitic b t b 8..479 6.66 Step 3. Look up table t value Table t -) df = t.5/, 4df =.776 Step 4. Draw cocluio Sice the table t value (.776) i le that the calculated t-value (6.66), we reject H o : ß = at the 95% level of cofidece. Thu, we ca coclude that there i a liear relatiohip betwee hour of mixig ad wood pulp temperature at the 95% level of cofidece. Method 3. Cofidece Iterval The hypothei H o : ß = ca be teted uig the cofidece iterval: CI b t,( ) df ( b ) For thi example: CI b t,( ) df ( b ) 8..776.479 4.74 β.476 We reject Ho: ß = at the 95% level of cofidece ice the CI doe ot iclude.

Predictig Y Give X Regreio aalyi alo ca be ued to predict a value for Y give X. Uig the example, we ca predict the temperature of oe batch of wood pulp after mixig X hour. I thi cae, we predict a idividual outcome of Y X draw from the ditributio of Y. Thi etimate i ditict from etimatig mea or average of a ditributio of Y. The value of a idividual Y at a give X will take o the form of the cofidece iterval: CI Ŷ t,( ) df (Y XX ) where Y XX Y X, ad Example Y XX (X X) Y X Remember Y X i the Reidual Mea Square SS X We wih to determie the temperature of the oe batch of wood pulp after mixig two hour (i.e., Y X= ). Step. Uig the regreio equatio, olve for Ŷ whe X=. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.() =.667 Step. Solve for Y X= Y XX Y X (X X) SS X 3.533 57.765 6 ( 7) 7

Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ).667.776 57.65.667 34.868 Therefore : LCI -.ad UCI 47.535 Note: Thi CI i ot ued to tet a hypothei Thi CI tate that if we mix the wood pulp for two hour, we would expect the temperature to fall withi the rage of -. ad 47.535 degree 95% of the time. We would expect the temperature to fall outide of thi rage 5% of the time due to radom chace. Example We wih to determie the temperature of the oe batch of wood pulp after mixig eve hour (i.e., Y X=7 ). Step. Uig the regreio equatio, olve for Ŷ whe X=7. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.(7) = 53.67 Step. Solve for Y X=7 Y XX Y X (X X) SS X 3.533.789 6 (7 7) 7

Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ) 53.67.776.789 53.67 3.59 Therefore : LCI.658 ad UCI 83.676 Note: For X=7 (i.e., at the mea of X), the variace Y X=Xo i at a miimum. Thi CI tate that if we mix the wood pulp for eve hour, we would expect the temperature to fall withi the rage of.658 ad 83.676 degree 95% of the time. We would expect the temperature to fall outide of thi rage 5% of the time due to radom chace. Predictig Y Give X Regreio aalyi alo ca be ued to predict a value for Y give X. Uig the example, we ca predict the average temperature of wood pulp after mixig X hour. I thi cae, we predict a idividual outcome of Y X draw from the ditributio of Y. Thi etimate i ditict from ditributio of Y for a X. The value of a idividual Y at a give X will take o the form of the cofidece iterval: CI Ŷ t (,( ) df Y XX ) where Y XX Y X, ad Y XX Y X (X X) SS X

Example We wih to determie the average temperature of the wood pulp after mixig two hour (i.e., Y X= ). Step. Uig the regreio equatio, olve for Ŷ whe X=. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.() =.667 Step. Solve for Y X Y X Y X (X X) SS X ( 7) 3.533 6 7 54.3 Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ).667.776 54.3.667.443 Therefore : LCI -7.776 ad UCI 33. Note: Thi CI i ot ued to tet a hypothei Thi CI tate that if we mix the wood pulp for two hour ay umber of time, we would expect the average temperature to fall withi the rage of -7.776 ad 33. degree 95% of the time. We would expect the temperature to fall outide of thi rage 5% of the time due to radom chace.

Example We wih to determie the average temperature of wood pulp after mixig eve hour. Step. Uig the regreio equatio, olve for Ŷ whe X=7. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.(7) = 53.67 Step. Solve for Y X 7 Y X7 Y X (X X) SS X (7 7) 3.533 6 7 7.56 Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ) 53.67.776 7.56 53.67.53 Therefore : LCI 4.635 ad UCI 64.669 Note: For X=7 (i.e., at the mea of X), the variace i at a miimum. Y X X

Comparig ad Y X X Y X X i alway greater tha Y X X. Y X X Comparig the formula: Y XX Y XX (X X) Y X ad SS X (X X) Y X. SS X Notice that i the formula for you add oe while i the formula for Y X X you do ot. Y X X Compario of Y X X ad Y X X. X Y X X Y X X 57.767 54.3 7.789 7.56

Temperature We ca draw the two cofidece iterval a cofidece belt about the regreio lie. Cofidece belt for the effect of hour of mixig o temperature of wood pulp 3 9 7 5 3 LL - Id. Y UL - Id. Y LL - Y Bar UL - Y Bar - -3 4 6 8 4 Hour of mixig Notice that:. The cofidece belt are ymmetrical about the regreio lie. The cofidece belt are arrowet at the mea of X, ad 3. The cofidece belt for the ditributio baed o mea are arrower tha the ditributio baed o a idividual obervatio.

Determiig if Two Idepedet Regreio Coefficiet are Differet It may be deirable to tet the homogeeity of two b ' to determie if they are etimate of the ame ß. Thi ca be doe uig a t-tet to tet the hypothee: H o : ß = ß ' H A : ß ß ' b b' Where t (Reidual MS Reidual MS) The Table t-value ha ( - ) + ( - ) df. Example X SS X SS X Y Y 6 9 3 4 34 4 8 45 5 3 58 X = 5 Y = 8 Y = 87 X = 55 Y = 46 Y = 787 Step. Determie regreio coefficiet for each Y for Y XY = 7 Thu b = [7 - (5x8)/5] / [55-5 /5] =.6 for Y XY = 65 Thu b ' = [65 - (5x87)/5] / [55-5/5] = 9.

Step. Calculate Reidual MS for each Y Remember Reidual MS = SSCP SS Y SS X 8 6 46-5 Reidual MS 5 4.5 87 9 787-5 Reidual MS 5 7.7 Step 3. Solve for t.6 9. (4.5 7.7)x 6.4.(.) 4. Step 4. Look up table t-value with ( - ) + ( - ) df t.5/, 6 df = -.447 Step 4. Make cocluio Becaue the abolute value of the calculated t-value (-4.) i greater tha the abolute value of the tabular t-value (.776), we ca coclude at the 95% level of cofidece that the two regreio coefficiet are ot etimatig the ame ß.

Summary - Some Ue of Regreio. Determie if there i a liear relatiohip betwee a idepedet ad depedet variable.. Predict value of Y at a give X Mot accurate ear the mea of X. Should avoid predictig value of Y outide the rage of the idepedet variable that were ued. 3. Ca adjut Y to a commo bae by removig the effect of the idepedet variable (Aalyi of Covariace). 4. ANOVA (CRD, RCBD, ad LS) ca be doe uig regreio 5. Compare homogeeity of two regreio coefficiet. SAS Commad optio pageo=; data reg; iput x y; datalie; 4 7 6 9 8 64 86 9 ; proc reg; model y=x/cli clm; title 'SAS Output for Liear Regreio Example i Cla'; ru;

SAS Output for Liear Regreio Example i Cla The REG Procedure Model: MODEL Depedet Variable: y Number of Obervatio Read 6 Number of Obervatio Ued 6 Source DF Aalyi of Variace Sum of Square Mea Square F Value Pr > F Model 459.7 459.7 44.36.6 Error 4 44.3333 3.53333 Corrected Total 5 56.83333 Root MSE.753 R-Square.973 Depedet Mea 53.6667 Adj R-Sq.8966 Coeff Var 9.388 Variable DF Parameter Etimate Parameter Etimate Stadard Error t Value Pr > t Itercept -3.53333 9.4753 -.37.78 x 8..66 6.66.6

SAS Output for Liear Regreio Example i Cla The REG Procedure Model: MODEL Depedet Variable: y Ob Depedet Variable Predicted Value Output Statitic Std Error Mea Predict 95% CL Mea 95% CL Predict Reidual..6667 7.364-7.7797 33.3 -.68 47.54 8.3333 7. 8.8667 5.587 3.564 44.69-3.85 6.84 -.8667 3 9. 45.667 4.383 33.49 57.84 4.366 75.767-6.667 4 64. 6.667 4.383 49.49 73.84 3.566 9.967.7333 5 86. 77.4667 5.587 6.64 9.869 45.35 9.684 8.5333 6 9. 93.6667 7.364 73.3 4.3 58.793 8.54 -.6667 Sum of Reidual Sum of Squared Reidual 44.3333 Predicted Reidual SS (PRESS) 868.5699