Regression Analysis. Regression Analysis

Regresson Analyss Smple Regresson Multvarate Regresson Stepwse Regresson Replcaton and Predcton Error 1 Regresson Analyss In general, we "ft" a model by mnmzng a metrc that represents the error. n mn (y - y ) 2 =1 The sum of squares gves closed form solutons and mnmum varance for lnear models. 2

The Smplest Regresson Model Lne through the orgn: y y=bx x y u =βx u +ε u u=1,2,...,n ε u ~N(0, σ R 2 ) n mn S = mn (y u - βx u ) 2 : estmate of 2 σ R u=1 y=bx η u =βx u b: estmate of β y: estmate of η u, the true value of the model. 3 Usng the Normal Equaton mn (y-y) 2 y2 y y=bx (1 d.f.) y1 4

Usng the Normal Equaton (cont) Choose b so that the resdual vector s perpendcular to the model vector... (y-y) x =0 (y - bx) x = 0 b= xy x (est. of β) 2 s2 = S R n-1 (est. of σ R 2 ) V(b) = s2 67% conf: b ± s 2 x 2 x 2 Sgnf. test: t= b-β* s 2 x 2 ~ t n-1 5 Etch tme vs removed materal: y = bx 500 R e m o v ed ( n m ) 400 300 200 100 Data Fle: regresson Varable Name 0 0.0 0.2 0.4 0.6 0.8 1.0 Etch Tme (sec) x 10^3 Coeffcent Dependent Varable : Removed (nm) Std. Err. Estmate t Statstc Prob > t Etch Tme (sec) 5.0098e-1 1.6199e-2 3.0927e+1 1.33e-8 6

Model Valdaton through ANOVA The dea s to decompose the sum of squares nto orthogonal components. Assumng that there s no dependence: H 0 : β * =0 y2 u = y2 u + (y u - y u ) 2 n p n-p total model resdual 7 Model Valdaton through ANOVA (cont) Assumng a specfc model: H 0 : β * = b (y u - β * x u ) 2 = (y u - β * x u ) 2 + (y u - y u ) 2 n p n-p total model resdual The ANOVA table wll answer the queston: Is Is there a relatonshp between x and y? y? 8

Data Fle: ANOVA table and Resdual Plot regresson Source Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 1.8293e+5 6.4669e+3 1 7 1.8293e+5 9.2385e+2 1.9801e+2 2.17e-6 Total R es d u a l s 1.8939e+5 8 60 Coeffcent 40 of Determnaton Coeffcent 20 of Correlaton Standard Error of Estmate 0 Durbn-Watson Statstc -20-40 9.6585e-1 9.8278e-1 3.0395e+1 2.9730e+0-60 0.0 0.2 0.4 0.6 0.8 Etch Tme (sec) x 10^3 1.0 9 A More Complex Regresson Equaton actual estmated η = α + β (x - x ) y = a + b (x - x ) y ~ N (η, σ 2 ) Mnmze R = (y -y ) 2 to estmate α and β a=y b= (x -x)y (x -x) 2 =(x -x)(y -y) (x -x) 2 Are a and b good estmators of α and β? E[a] = α E[b] = (x -x)e[y ] (x -x) 2 = β 10

Varance Estmaton: Note that all varablty comes from y! V[a] = V V[b] = V y = 1 2 V[ y ] = σ 2 (x -x)y (x -x) 2 = σ 2 (x -x) 2 mn var. thans to to least squares! 11 LTO thcness vs deposton tme: y = a + bx L T O t h c A x 4 3 2 1 0^ 1 3 1.0 1.5 2.0 2.5 3.0 3.5 Dep tme x 10^3 Data Fle: regresson Dependent Varable: LTO thc A Varable Name Coeffcent Std. Err. Estmate t Statstc Prob > t Constant Dep tme 6.0352e+1 5.6058e+1 1.0766e+0 2.98e-1 9.7456e-1 2.5155e-2 3.8743e+1 3.02e-17 12

Data Fle: Source regresson Anova table and Resdual Plot Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 4.7725e+6 5.0872e+4 1 16 4.7725e+6 3.1795e+3 1.5010e+3 3.02e-17 Total 4.8233e+6 17 Coeffcent 100 of Determnaton R Coeffcent of Correlaton es Standard Error of Estmate Durbn-Watson 0 Statstc d u a l s -100 9.8945e-1 9.9471e-1 5.6387e+1 2.3417e+0 1.0 1.5 2.0 2.5 3.0 3.5 Dep tme x 10^3 13 ANOVA Representaton (x,y ) (y -y ) y (y -η ) b(x -x) (y -η ) (a-α) y = a+b(x -x) η = α+β(x -x) β(x -x) x x x Note dfferences between "true" and "estmated" model. 14

ANOVA Representaton (cont) (y -η ) = (a- α ) + (b- β )(x -x) + ( y - y ) (y -η ) 2 = (a-α ) 2 + (b-β) 2 (x -x)+ () (1) (1) ~σ 2 χ 2 () ~σ 2 χ 2 (1) ~ σ 2 χ 2 (1) (y -y ) 2 (-2) ~σ 2 χ 2 (-2) In In ths way, the sgnfcance of of the model can be be analyzed n n detal. 15 Confdence Lmts of an Estmate y0= y+b(x0 -x ) V(y0) = V(y)+(x0 -x ) 2 V(b) V(y0) = 1 n (x0 -x )2 + (x -x ) 2 s2 predcton nterval: y 0 +/- tα 2 V(y 0 ) 16

L T O Confdence Interval of Predcton (all ponts) p 3000 T h c n e s s 2500 2000 1500 1000 1000 1500 2000 2500 3000 Dep tme Leverage 17 Confdence Interval of Predcton (half the ponts) L T O T h c n e s s 3000 2500 2000 1500 1000 1000 1500 2000 2500 3000 Dep tme Leverage 18

Confdence Interval of Predcton (1/4 of ponts) L T O T h c n e s s 3000 2500 2000 1500 1000 1000 1500 2000 2500 3000 Dep tme Leverage 19 Predcton Error vs Expermental Error y Expermental Error Predcton error Estmated Model True model x Expermental Error Error Does Does not not depend on on locaton or or sample sample sze. sze. Predcton Error Error depends on on locaton gets gets smaller smaller as as sample sample sze sze ncreases. 20

Multvarate Regresson η = β 1 x 1 +β 2 x 2 β 2 y y x 2 R The Resdual s s to to y,, x 1,, x 2.. β 1 x 1 Coeffcent Estmaton: (y-y)x 1 =0 (y-y)x 2 =0 yx 1 -b 1 x 1 2 -b 2 x 1 x 2 = 0 yx 2 -b 2 x 2 2 -b 1 x 1 x 2 = 0 21 Varance Estmaton: s 2 = S R n-p V(b 1 ) = 1 s 2 1-ρ 2 x2 1 V(b 2 ) = 1 1-ρ 2 s 2 x 2 2 ρ = -x 1x 2 x 12 x 2 2 22

Thcness vs tme, temp: y = a + b1 x1 + b2 x2 Data Fle: regresson Varable Name Coeffcent Dependent Varable : tox nm Std. Err. Estmate t Statstc Prob > t Constant temp tme mn -7.0363e+2 7.1769e+1-9.8041e+0 1.10e-8 7.1429e-1 6.9976e-2 1.0208e+1 7.49e-9 8.6874e-1 3.8905e-2 2.2330e+1 3.72e-9 23 Data Fle: Anova table and Correlaton of Estmates regresson Source Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 2.5828e+4 7.7121e+2 2 18 1.2914e+4 4.2845e+1 3.0141e+2 1.45e-14 Total 2.6599e+4 20 Coeffcent of Determnaton 9.7101e-1 Coeffcent of Correlaton 9.8540e-1 Standard Error of Estmate 6.5456e+0 Data Fle: regresson Durbn-Watson Statstc Tox Temp 8.6171e-1 Tme tox nm 1.000 0.410 0.896 temp tme mn 0.410 0.896 1.000 0.000 0.000 1.000 24

Multple Regresson n General x 1 x 2 x n b = y + e mnmze Xb - y 2 = e 2 = ( y - Xb ) T ( y - Xb ) or, mn -e T Xb + e T y whch s equv. to: ( y - Xb ) T Xb = 0 X T Xb = X T y b = ( X T X ) -1 X T y V(b) = ( X T X ) -1 σ 2 25 Jont Confdence Regon for x 1 x 2 S = S R 1 + p n-p F α(p, n-p) 2 β 1 -b 1 x 2 2 1 +2 β 1 -b 1 β 2 -b 2 x 1 x 2 + β 2 -b 2 x 2 2= S-S R 26

What f a lnear model s not enough? 300 d e p r a t e 200 100 600 610 620 630 640 650 nlet temp Data Fle: Varable Name regresson Coeffcent Dependent Varable: dep rate Std. Err. Estmate t Statstc Prob > t Constant nlet temp -1.8502e+3 4.6425e+1-3.9853e+1 3.72e-9 3.2426e+0 7.4592e-2 4.3471e+1 3.72e-9 27 Data Fle: ANOVA table and Resdual Plot regresson Source Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 3.6490e+4 4.0550e+2 1 21 3.6490e+4 1.9309e+1 1.8897e+3 0.00e+0 Total 3.6895e+4 22 20 Coeffcent of Determnaton Coeffcent of Correlaton R 10 Standard es Error of Estmate Durbn-Watson Statstc 0 d u a -10 l s 9.8901e-1 9.9449e-1 4.3942e+0 1.5516e+0-20 600 610 620 630 640 650 nlet temp 28

Multple Regresson wth Replcaton S E = 1 2 (y 1 -y 2 ) 2 S LF =S R -S E (a-α) 2 η v n (y v -η ) 2 = η + (b-β) 2 η (x -x) 2 + η (y. -y ) 2 + (y v -y. ) 2 n 1 1-2 η - v v n v n (y v -y) 2 = (y v -y. ) 2 + η (y. -y ) 2 + η (y-y ) 2 29 Pure Error vs. Lac of Ft Example Lac Of Ft Source Lac Of Ft Pure Error Total Error DF 17 4 21 Sum of Squares 401.01171 4.48543 405.49714 Mean Square 23.5889 1.1214 F Rato 21.0360 Prob > F 0.0047 Parameter Estmates Term Intercept nlet temp Estmate -1850.159 3.242592 Std Error 46.4247 0.07459 t Rato -39.85 43.47 Prob> t 0.0000 0.0000 Effect Test Source nlet temp Nparm 1 DF 1 Sum of Squares 36489.550 F Rato 999.9999 Prob > F 0.0000 30

Dep. rate vs temperature: y = a + bx + cx 2 300 d e p r a t e 200 100 600 610 620 630 640 650 Data Fle: regresson nlet Dependent temp Varable : dep rate Varable Name Coeffcent Std. Err. Estmate t Statstc Prob > t Constant nlet temp nlet temp ^2 8.3391e+3 1.7899e+3 4.6589e+0 1.35e-4-2.9445e+1 5.7415e+0-5.1284e+0 4.43e-5 2.6205e-2 4.6028e-3 5.6933e+0 1.19e-5 31 Pure Error vs. Lac of Ft Example (cont) Lac Of Ft Source Lac Of Ft Pure Error Total Error DF 16 4 20 Sum of Squares 150.24382 4.48543 154.72925 Mean Square 9.39024 1.12136 F Rato 8.3740 Prob > F 0.0264 Parameter Estmates Term Intercept nlet temp^1 nlet temp^2 Estmate 8339.0507-29.44466 0.0262051 Std Error 1789.92 5.74154 0.0046 t Rato 4.66-5.13 5.69 Prob> t 0.0002 0.0001 0.0000 Effect Test Source Poly(nlet temp,2) Nparm 2 DF 2 Sum of Squares 36740.318 F Rato 999.9999 Prob > F 0.0000 32

Data Fle: Source ANOVA table and Resdual Plot regresson Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 3.6740e+4 1.5473e+2 2 20 1.8370e+4 7.7365e+0 2.3745e+3 0.00e+0 Total 3.6895e+4 22 Coeffcent 6 of Determnaton 9.9581e-1 Coeffcent 4 of Correlaton 9.9790e-1 RStandard Error of Estmate 2.7814e+0 es 2 Durbn-Watson Statstc 2.6878e+0 0 d u -2 a l s -4-6 600 610 620 630 640 650 nlet temp 33 Use regresson lne to predct LTO thcness... y = 60.352 + 0.97456 x R 2 = 0.989 y = - 38.440 + 1.0153 x R 2 = 0.989 4000 4000 3000 3000 2000 1000 0 1000 LTO Thc A 90%LmtLow 90%LmtHgh 2000 3000 4000 Dep Tme Sec 2000 1000 1000 2000 3000 LTO Thc A 4000 34

Response Surface Methodology Objectves: get a feel of I/O relatonshps fnd settng(s) that satsfy multple constrants fnd settngs that lead to optmum performance Observatons: Functon s nearly lnear away from the pea Functon s nearly quadratc at the pea 35 Buldng the planar model A Factoral experment wth center ponts s enough to buld and confrm a planar model. b1, b2, b12 = -0.65 +/-0.75 b11+b22=1/4p+1/3c= -0.50 +/-1.15 36

Quadratc Model and Confrmaton Run Close to the pea, a quadratc model can be bult and confrmed by an expanded two-phase experment. 37 Response Surface Methodology RSM conssts of creatng models that lead to vsual mages of a response. The models are usually lnear or quadratc n nature. Ether expanded factoral experments, or regresson analyss can be used. All emprcal models have a random predcton error. In RSM, the average varance of the model s: V(y) = 1 n n =1 V(y ) = pσ2 n where p s the number of model parameters and n s the number of experments. 38

Response Surface Exploraton 39 "Popular" RSM Use snge-stage Box-B or Box-W desgns Use computer (smulated) experments Rely on "goodness of ft" measures Automate model structure generaton Problems? 40