Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS PDF Free Download

Simple Regressio CS 7 Ackowledgemet These slides are based o presetatios created ad copyrighted by Prof. Daiel Measce (GMU)

Basics Purpose of regressio aalysis: predict the value of a depedet or respose variable from the values of at least oe explaatory or idepedet variable (also called predictors or factors). Purpose of correlatio aalysis: measure the stregth of the correlatio betwee two variables. 3 Liear Relatioship 4 35 3 5 5 5 4 6 8 4 6 4

No Relatioship 8 6 4 8 6 4 4 6 8 4 6 5 Negative Curviliear 8 6 4 4 6 8 4 6 6 3

Simple Liear Regressio Residual Error y error estimated y x 7 Simple Liear Regressio Selectig the best lie y Miimize the sum of the squares of the errors: least squares criterio. error estimated y x 8 4

Liear Regressio Yˆi X i b SSE = : : ˆ + Yi = b b X i predicted value of Y for observatio i. value of observatio i. ad b i= e Subject to: e i = i= i = are chose to miimize: ( Y ˆ i Yi ) = i= i= [ Y ( b + b X )] i i 9 Method of Least Squares b = i= i= b = Y b X X i Y i X Y X i ( X ) 5

Liear Regressio Example Number of I/Os (x) CPU Time (y) Estimate (.48*x +.58) Error Error Squared.9.9.5..34.3.3. 3.65.73 -.83.7 4..4 -.6. 5.4.55 -.8.6 6.3.95.67.5 7.357.336.6.4 8.4.377.39.57 9.45.48 -.3.7.44.459 -.6.6.7 Xbar 5.5 Ybar.75 Sum x 385 Sum xy 8.49466 b.48 b.58 Liear Regressio Example CPU Time (sec).5.45.4.35.3.5..5..5. CPU time =.48*No. I/Os +.58 R =.9877 4 6 8 Number of I/Os 6

Allocatio of Variatio No regressio model: use mea as predicted value. SSE is: SST = ( Yi Y ) i= Sum of squares total SSR = SST SSE Sum of squares explaied by the regressio. Variatio ot explaied by regressio 3 Allocatio of Variatio Coefficiet of determiatio (R ): fractio of variatio explaied by the regressio. R SSR SST SSE = = = SST SST SSE SST The closer R is to oe, the better is the regressio model. 4 7

Number of I/Os (x) CPU Time (y) Estimate (.48*x +.58) Error Error Squared SSY.9.9.5..848 SST.38884.34.3.3..788 SSR.3769 3.65.73 -.84.7.773 R.987654 4..4 -.7..44645 5.4.55 -.9.7.5855 6.3.96.66.4.933 7.357.336.4.4.733 8.4.377.38.56.677 9.45.48 -.33.8.63795.44.459 -.63.7.95783.75.7.8957 SST Y Y Y = ( i ) = i Y = SSY SS i= i= SSE = ( Yi Yˆ i ) i= SSR = ( Yˆ i Y ) = SST SSE i= SSR R = SST coefficiet of determiatio. SSE SSY The higher the value of R the better the regressio. 5 Stadard Deviatio of Errors Variace of errors: divide the sum of squares (SSE) by the umber of degrees of freedom (- sice two regressio parameters eed to be computed first). s e = SSE Mea squared error (MSE) 6 8

Degrees of freedom of various sum of squares. SST - Need to compute Y SSY SS SSE SSR - Does ot deped o ay other parameter Need to compute two regressio parameters =SST-SSE Degrees of freedom add as sum of squares do. 7 Cofidece Iterval for Regressio Parameters b o ad b were computed from a sample. So, they are just estimates of the true parameters β ad β for the true model. Stadard deviatios for b o ad b. s b = s e + i= ( X ) X i ( X ) s b = s e X i ( X ) i= 8 9

Cofidece Iterval for Regressio Parameters (-α)% cofidece iterval for b o ad b b b ± t ± t [ α / ; ] [ α / ; ] s s b b 9 Cofidece Iterval Example Number of I/Os (x) CPU Time (y) Estimate (.48*x +.58) Error Error Squared.9.9.5..34.3.3. 3.65.73 -.83.7 4..4 -.6. 5.4.55 -.8.6 6.3.95.67.5 7.357.336.6.4 8.4.377.39.57 9.45.48 -.3.7.44.459 -.6.6 SSE:.7 Xbar 5.5 Ybar.75 Sum x 385 Sum xy 8.49466 b.48 b.58 se.44 Lower bo.777 se.464 Upper bo.739 sb.7 sb.69 Lower b.3758576 95% cofidece level Upper b.444984 alpha.5 t[-alpha/;-].3656 SST.38884 SSR.377 R.987654

Cofidece Iterval for the Predicted Value The stadard deviatio of the mea of a future sample of m observatios at X = X p is y mp = s e m + + sˆ ( X p X ) X i X As the future sample size (m) decreases, the stadard deviatio for predicted value decreases. i= / Cofidece Iterval for the Predicted Value (-α)% cofidece iterval for the predicted value for a future sample of size m at X p : yˆ p ± t[ α / ; ] s ˆ y mp

Liear Regressio Assumptios Liear relatioship betwee the respose (y) ad the predictor (x). The predictor (x) is o-stochastic ad is measured without ay error. Errors are statistically idepedet. Errors are ormally distributed with zero mea ad a costat stadard deviatio. 3 Liear Regressio Assumptios Liear relatioship betwee the respose (y) ad the predictor (x). y liear y piecewise-liear x x y possible outlier y o-liear x x 4

Liear Regressio Assumptios Errors are statistically idepedet. residual residual o tred predicted respose residual predicted respose tred predicted respose tred 5 Liear Regressio Assumptios Errors are ormally distributed. residual quatile residual quatile Normal quatile ormally distributed errors Normal quatile o-ormally distributed errors 6 3

Liear Regressio Assumptios Errors have a costat stadard deviatio. residual predicted respose o tred i spread residual predicted respose icreasig spread 7 Other Regressio Models 8 4

Multiple Liear Regressio Use to predict the value of the respose variable as fuctio of k predictor variables x,, x. Y ˆ = b + b X + b X +... + b i i i Similar to simple liear regressio. MS Excel ca be used to do multiple liear regressio. x X ki 9 CPU Time (yi) I/O Time (xi) Memory Requiremet (xi) 4 7 5 6 75 7 7 44 9 4 9 39 3 5 35 83 4 Wat to fid: CPUTime = b + b * I/OTime + b * MemoryRequiremet 3 5

SUMMARY OUTPUT Regressio Statistics Multiple R.987 R Square.974 Adjusted R Square.964 Stadard Error.5 Observatios 7 R Coefficiets Stadard Error t Stat Lower 95% Upper 95% Lower 9.% Upper 9.% Itercept (b) -.645.9345 -.7674 -.69759.3747 -.878.78589 X Variable (b).84.96.6389 -.465.6599 -.936.5884 X Variable (b).65.445.6559 -.858.388 -.5973.73 3 Curviliear Regressio Approach: plot a scatter plot. If it does ot look liear, try o-liear models: No-liear Liear y = a + b / x y = a + b(/ x) y = /( a + bx) (/ y) = a + bx y = x /( a + bx) ( x / y) = a + bx x y = a b l y = l a + x l b y = a + bx y = a + b( x ) 3 6

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700