Regression Analysis. Regression Analysis

Similar documents
Regression Analysis. Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error EE290H F05

Statistics MINITAB - Lab 2

Introduction to Regression

Comparison of Regression Lines

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

STAT 3008 Applied Regression Analysis

Statistics for Economics & Business

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Statistics for Business and Economics

Chapter 11: Simple Linear Regression and Correlation

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Basic Business Statistics, 10/e

Negative Binomial Regression

Chapter 9: Statistical Inference and the Relationship between Two Variables

Biostatistics 360 F&t Tests and Intervals in Regression 1

Chapter 13: Multiple Regression

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β

Learning Objectives for Chapter 11

Chapter 14 Simple Linear Regression

STATISTICS QUESTIONS. Step by Step Solutions.

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

First Year Examination Department of Statistics, University of Florida

Economics 130. Lecture 4 Simple Linear Regression Continued

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Lecture 2: Prelude to the big shrink

Chapter 15 - Multiple Regression

a. (All your answers should be in the letter!

β0 + β1xi and want to estimate the unknown

Laboratory 3: Method of Least Squares

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

17 - LINEAR REGRESSION II

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Laboratory 1c: Method of Least Squares

Professor Chris Murray. Midterm Exam

Lecture 6: Introduction to Linear Regression

Regression. The Simple Linear Regression Model

The Ordinary Least Squares (OLS) Estimator

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Topic 7: Analysis of Variance

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Statistics II Final Exam 26/6/18

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 12 Analysis of Covariance

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

STAT 511 FINAL EXAM NAME Spring 2001

e i is a random error

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Lecture 6 More on Complete Randomized Block Design (RBD)

Chapter 15 Student Lecture Notes 15-1

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Unit 10: Simple Linear Regression and Correlation

x i1 =1 for all i (the constant ).

Lecture 3 Stat102, Spring 2007

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Generalized Linear Methods

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

Lab 4: Two-level Random Intercept Model

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Lecture 4 Hypothesis Testing

F8: Heteroscedasticity

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Systems of Equations (SUR, GMM, and 3SLS)

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Continuous vs. Discrete Goods

/ n ) are compared. The logic is: if the two

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Scatter Plot x

Polynomial Regression Models

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Addressing Alternative Explanations: Multiple Regression

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Lecture 3 Specification

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

Properties of Least Squares

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Chemometrics. Unit 2: Regression Analysis

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

General Linear Models

The SAS program I used to obtain the analyses for my answers is given below.

Transcription:

Regresson Analyss Smple Regresson Multvarate Regresson Stepwse Regresson Replcaton and Predcton Error 1 Regresson Analyss In general, we "ft" a model by mnmzng a metrc that represents the error. n mn (y - y ) 2 =1 The sum of squares gves closed form solutons and mnmum varance for lnear models. 2

The Smplest Regresson Model Lne through the orgn: y y=bx x y u =βx u +ε u u=1,2,...,n ε u ~N(0, σ R 2 ) n mn S = mn (y u - βx u ) 2 : estmate of 2 σ R u=1 y=bx η u =βx u b: estmate of β y: estmate of η u, the true value of the model. 3 Usng the Normal Equaton mn (y-y) 2 y2 y y=bx (1 d.f.) y1 4

Usng the Normal Equaton (cont) Choose b so that the resdual vector s perpendcular to the model vector... (y-y) x =0 (y - bx) x = 0 b= xy x (est. of β) 2 s2 = S R n-1 (est. of σ R 2 ) V(b) = s2 67% conf: b ± s 2 x 2 x 2 Sgnf. test: t= b-β* s 2 x 2 ~ t n-1 5 Etch tme vs removed materal: y = bx 500 R e m o v ed ( n m ) 400 300 200 100 Data Fle: regresson Varable Name 0 0.0 0.2 0.4 0.6 0.8 1.0 Etch Tme (sec) x 10^3 Coeffcent Dependent Varable : Removed (nm) Std. Err. Estmate t Statstc Prob > t Etch Tme (sec) 5.0098e-1 1.6199e-2 3.0927e+1 1.33e-8 6

Model Valdaton through ANOVA The dea s to decompose the sum of squares nto orthogonal components. Assumng that there s no dependence: H 0 : β * =0 y2 u = y2 u + (y u - y u ) 2 n p n-p total model resdual 7 Model Valdaton through ANOVA (cont) Assumng a specfc model: H 0 : β * = b (y u - β * x u ) 2 = (y u - β * x u ) 2 + (y u - y u ) 2 n p n-p total model resdual The ANOVA table wll answer the queston: Is Is there a relatonshp between x and y? y? 8

Data Fle: ANOVA table and Resdual Plot regresson Source Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 1.8293e+5 6.4669e+3 1 7 1.8293e+5 9.2385e+2 1.9801e+2 2.17e-6 Total R es d u a l s 1.8939e+5 8 60 Coeffcent 40 of Determnaton Coeffcent 20 of Correlaton Standard Error of Estmate 0 Durbn-Watson Statstc -20-40 9.6585e-1 9.8278e-1 3.0395e+1 2.9730e+0-60 0.0 0.2 0.4 0.6 0.8 Etch Tme (sec) x 10^3 1.0 9 A More Complex Regresson Equaton actual estmated η = α + β (x - x ) y = a + b (x - x ) y ~ N (η, σ 2 ) Mnmze R = (y -y ) 2 to estmate α and β a=y b= (x -x)y (x -x) 2 =(x -x)(y -y) (x -x) 2 Are a and b good estmators of α and β? E[a] = α E[b] = (x -x)e[y ] (x -x) 2 = β 10

Varance Estmaton: Note that all varablty comes from y! V[a] = V V[b] = V y = 1 2 V[ y ] = σ 2 (x -x)y (x -x) 2 = σ 2 (x -x) 2 mn var. thans to to least squares! 11 LTO thcness vs deposton tme: y = a + bx L T O t h c A x 4 3 2 1 0^ 1 3 1.0 1.5 2.0 2.5 3.0 3.5 Dep tme x 10^3 Data Fle: regresson Dependent Varable: LTO thc A Varable Name Coeffcent Std. Err. Estmate t Statstc Prob > t Constant Dep tme 6.0352e+1 5.6058e+1 1.0766e+0 2.98e-1 9.7456e-1 2.5155e-2 3.8743e+1 3.02e-17 12

Data Fle: Source regresson Anova table and Resdual Plot Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 4.7725e+6 5.0872e+4 1 16 4.7725e+6 3.1795e+3 1.5010e+3 3.02e-17 Total 4.8233e+6 17 Coeffcent 100 of Determnaton R Coeffcent of Correlaton es Standard Error of Estmate Durbn-Watson 0 Statstc d u a l s -100 9.8945e-1 9.9471e-1 5.6387e+1 2.3417e+0 1.0 1.5 2.0 2.5 3.0 3.5 Dep tme x 10^3 13 ANOVA Representaton (x,y ) (y -y ) y (y -η ) b(x -x) (y -η ) (a-α) y = a+b(x -x) η = α+β(x -x) β(x -x) x x x Note dfferences between "true" and "estmated" model. 14

ANOVA Representaton (cont) (y -η ) = (a- α ) + (b- β )(x -x) + ( y - y ) (y -η ) 2 = (a-α ) 2 + (b-β) 2 (x -x)+ () (1) (1) ~σ 2 χ 2 () ~σ 2 χ 2 (1) ~ σ 2 χ 2 (1) (y -y ) 2 (-2) ~σ 2 χ 2 (-2) In In ths way, the sgnfcance of of the model can be be analyzed n n detal. 15 Confdence Lmts of an Estmate y0= y+b(x0 -x ) V(y0) = V(y)+(x0 -x ) 2 V(b) V(y0) = 1 n (x0 -x )2 + (x -x ) 2 s2 predcton nterval: y 0 +/- tα 2 V(y 0 ) 16

L T O Confdence Interval of Predcton (all ponts) p 3000 T h c n e s s 2500 2000 1500 1000 1000 1500 2000 2500 3000 Dep tme Leverage 17 Confdence Interval of Predcton (half the ponts) L T O T h c n e s s 3000 2500 2000 1500 1000 1000 1500 2000 2500 3000 Dep tme Leverage 18

Confdence Interval of Predcton (1/4 of ponts) L T O T h c n e s s 3000 2500 2000 1500 1000 1000 1500 2000 2500 3000 Dep tme Leverage 19 Predcton Error vs Expermental Error y Expermental Error Predcton error Estmated Model True model x Expermental Error Error Does Does not not depend on on locaton or or sample sample sze. sze. Predcton Error Error depends on on locaton gets gets smaller smaller as as sample sample sze sze ncreases. 20

Multvarate Regresson η = β 1 x 1 +β 2 x 2 β 2 y y x 2 R The Resdual s s to to y,, x 1,, x 2.. β 1 x 1 Coeffcent Estmaton: (y-y)x 1 =0 (y-y)x 2 =0 yx 1 -b 1 x 1 2 -b 2 x 1 x 2 = 0 yx 2 -b 2 x 2 2 -b 1 x 1 x 2 = 0 21 Varance Estmaton: s 2 = S R n-p V(b 1 ) = 1 s 2 1-ρ 2 x2 1 V(b 2 ) = 1 1-ρ 2 s 2 x 2 2 ρ = -x 1x 2 x 12 x 2 2 22

Thcness vs tme, temp: y = a + b1 x1 + b2 x2 Data Fle: regresson Varable Name Coeffcent Dependent Varable : tox nm Std. Err. Estmate t Statstc Prob > t Constant temp tme mn -7.0363e+2 7.1769e+1-9.8041e+0 1.10e-8 7.1429e-1 6.9976e-2 1.0208e+1 7.49e-9 8.6874e-1 3.8905e-2 2.2330e+1 3.72e-9 23 Data Fle: Anova table and Correlaton of Estmates regresson Source Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 2.5828e+4 7.7121e+2 2 18 1.2914e+4 4.2845e+1 3.0141e+2 1.45e-14 Total 2.6599e+4 20 Coeffcent of Determnaton 9.7101e-1 Coeffcent of Correlaton 9.8540e-1 Standard Error of Estmate 6.5456e+0 Data Fle: regresson Durbn-Watson Statstc Tox Temp 8.6171e-1 Tme tox nm 1.000 0.410 0.896 temp tme mn 0.410 0.896 1.000 0.000 0.000 1.000 24

Multple Regresson n General x 1 x 2 x n b = y + e mnmze Xb - y 2 = e 2 = ( y - Xb ) T ( y - Xb ) or, mn -e T Xb + e T y whch s equv. to: ( y - Xb ) T Xb = 0 X T Xb = X T y b = ( X T X ) -1 X T y V(b) = ( X T X ) -1 σ 2 25 Jont Confdence Regon for x 1 x 2 S = S R 1 + p n-p F α(p, n-p) 2 β 1 -b 1 x 2 2 1 +2 β 1 -b 1 β 2 -b 2 x 1 x 2 + β 2 -b 2 x 2 2= S-S R 26

What f a lnear model s not enough? 300 d e p r a t e 200 100 600 610 620 630 640 650 nlet temp Data Fle: Varable Name regresson Coeffcent Dependent Varable: dep rate Std. Err. Estmate t Statstc Prob > t Constant nlet temp -1.8502e+3 4.6425e+1-3.9853e+1 3.72e-9 3.2426e+0 7.4592e-2 4.3471e+1 3.72e-9 27 Data Fle: ANOVA table and Resdual Plot regresson Source Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 3.6490e+4 4.0550e+2 1 21 3.6490e+4 1.9309e+1 1.8897e+3 0.00e+0 Total 3.6895e+4 22 20 Coeffcent of Determnaton Coeffcent of Correlaton R 10 Standard es Error of Estmate Durbn-Watson Statstc 0 d u a -10 l s 9.8901e-1 9.9449e-1 4.3942e+0 1.5516e+0-20 600 610 620 630 640 650 nlet temp 28

Multple Regresson wth Replcaton S E = 1 2 (y 1 -y 2 ) 2 S LF =S R -S E (a-α) 2 η v n (y v -η ) 2 = η + (b-β) 2 η (x -x) 2 + η (y. -y ) 2 + (y v -y. ) 2 n 1 1-2 η - v v n v n (y v -y) 2 = (y v -y. ) 2 + η (y. -y ) 2 + η (y-y ) 2 29 Pure Error vs. Lac of Ft Example Lac Of Ft Source Lac Of Ft Pure Error Total Error DF 17 4 21 Sum of Squares 401.01171 4.48543 405.49714 Mean Square 23.5889 1.1214 F Rato 21.0360 Prob > F 0.0047 Parameter Estmates Term Intercept nlet temp Estmate -1850.159 3.242592 Std Error 46.4247 0.07459 t Rato -39.85 43.47 Prob> t 0.0000 0.0000 Effect Test Source nlet temp Nparm 1 DF 1 Sum of Squares 36489.550 F Rato 999.9999 Prob > F 0.0000 30

Dep. rate vs temperature: y = a + bx + cx 2 300 d e p r a t e 200 100 600 610 620 630 640 650 Data Fle: regresson nlet Dependent temp Varable : dep rate Varable Name Coeffcent Std. Err. Estmate t Statstc Prob > t Constant nlet temp nlet temp ^2 8.3391e+3 1.7899e+3 4.6589e+0 1.35e-4-2.9445e+1 5.7415e+0-5.1284e+0 4.43e-5 2.6205e-2 4.6028e-3 5.6933e+0 1.19e-5 31 Pure Error vs. Lac of Ft Example (cont) Lac Of Ft Source Lac Of Ft Pure Error Total Error DF 16 4 20 Sum of Squares 150.24382 4.48543 154.72925 Mean Square 9.39024 1.12136 F Rato 8.3740 Prob > F 0.0264 Parameter Estmates Term Intercept nlet temp^1 nlet temp^2 Estmate 8339.0507-29.44466 0.0262051 Std Error 1789.92 5.74154 0.0046 t Rato 4.66-5.13 5.69 Prob> t 0.0002 0.0001 0.0000 Effect Test Source Poly(nlet temp,2) Nparm 2 DF 2 Sum of Squares 36740.318 F Rato 999.9999 Prob > F 0.0000 32

Data Fle: Source ANOVA table and Resdual Plot regresson Sum of Squares Deg. of Freedom Mean Squares F-Rato Prob>F Model Error 3.6740e+4 1.5473e+2 2 20 1.8370e+4 7.7365e+0 2.3745e+3 0.00e+0 Total 3.6895e+4 22 Coeffcent 6 of Determnaton 9.9581e-1 Coeffcent 4 of Correlaton 9.9790e-1 RStandard Error of Estmate 2.7814e+0 es 2 Durbn-Watson Statstc 2.6878e+0 0 d u -2 a l s -4-6 600 610 620 630 640 650 nlet temp 33 Use regresson lne to predct LTO thcness... y = 60.352 + 0.97456 x R 2 = 0.989 y = - 38.440 + 1.0153 x R 2 = 0.989 4000 4000 3000 3000 2000 1000 0 1000 LTO Thc A 90%LmtLow 90%LmtHgh 2000 3000 4000 Dep Tme Sec 2000 1000 1000 2000 3000 LTO Thc A 4000 34

Response Surface Methodology Objectves: get a feel of I/O relatonshps fnd settng(s) that satsfy multple constrants fnd settngs that lead to optmum performance Observatons: Functon s nearly lnear away from the pea Functon s nearly quadratc at the pea 35 Buldng the planar model A Factoral experment wth center ponts s enough to buld and confrm a planar model. b1, b2, b12 = -0.65 +/-0.75 b11+b22=1/4p+1/3c= -0.50 +/-1.15 36

Quadratc Model and Confrmaton Run Close to the pea, a quadratc model can be bult and confrmed by an expanded two-phase experment. 37 Response Surface Methodology RSM conssts of creatng models that lead to vsual mages of a response. The models are usually lnear or quadratc n nature. Ether expanded factoral experments, or regresson analyss can be used. All emprcal models have a random predcton error. In RSM, the average varance of the model s: V(y) = 1 n n =1 V(y ) = pσ2 n where p s the number of model parameters and n s the number of experments. 38

Response Surface Exploraton 39 "Popular" RSM Use snge-stage Box-B or Box-W desgns Use computer (smulated) experments Rely on "goodness of ft" measures Automate model structure generaton Problems? 40