Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU

Size: px

Start display at page:

Download "Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU"

Mitchell Noah Roberts
5 years ago
Views:

1 Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU

2 Least squares regression What we will cover Box, G.E.P., Use and abuse of regression, Technometrics, 8 (4), , Adv. Eng. Stat., Jay Liu 1

3 [FYI]Least squares vs. interpolation Given the data, there are two choices when we want to know the value of y at x = (x 1 + x 2 )/2 x y y x 1 y1 x 1 x 2 x x 2 y 2 least squares? or interpolation? Interpolation is recommended when data are subject to negligible experimental error (or noise) Ex. In using steam tables Otherwise, least squares is recommended Adv. Eng. Stat., Jay Liu 2

4 Least squares - usage examples Quantify relationship between 2 variables (or 2 sets of variables): Manager: How does yield from the lactic acid batch fermentation relate to the purity of sucrose? Engineer: The yield can be predicted from sucrose purity with an error of plus/minus 8% Manager: And how about the relationship between yield and glucose purity? Engineer: Over the range of our historical data, there is no discernible relationship Adv. Eng. Stat., Jay Liu 3

5 Least squares - usage examples Two general applications Predictive modeling usually when an exact model form is unknown. Modeling data trends in order to predict future y values Simulation usually when parameters in the model are unknown. Getting parameter values in the known model form (e.g., calculate activation energy from reaction data) Terminology y : response variables, output variables, dependent variables, x : input variables, regressor variables, independent variables, Adv. Eng. Stat., Jay Liu 4

6 Review: covariance Consider measurements from a gas cylinder: temperature (K) and pressure (kpa). Ideal gas law applies under moderate condition: pv = nrt Fixed volume, V = m3 = 20 L Moles of gas, n = 14.1 mols of chlorine gas, (1 kg gas) Gas constant, R = J/(mol.K) Simplify the ideal gas law to: p = b 1 T, where b 1 nr V Adv. Eng. Stat., Jay Liu 5

7 Review: covariance (Cont.) Adv. Eng. Stat., Jay Liu 6

8 Review: covariance (Cont.) Formal definition: 2. Multiply the centered values: cov( x, y) E x x y y where E( z) z 1. Calculate deviation variables: T T and p p Subtracting off mean centers the vector at zero Calculate the expected value (mean): Covariance has units: [K kpa] c.f) Covariance between temperature and humidity is 202 [K %] Covariance with itself is the variance: T T p p cov( x, x) V ( x) E x x x x or T T centered p centered Adv. Eng. Stat., Jay Liu 7

9 Review: correlation Q: Which one (pressure or temperature) has stronger relationship with temperature? Covariance depends on units: e.g. different covariance for grams vs kilograms Correlation removes the scaling effect: corr( x, y) Divides by the units of x and y: dimensionless result Gas cylinder example: cov( xy, ) corr(temperature, pressure) = corr(temperature, humidity) = E x x y y x y x y 1 corr( x, y) xy Adv. Eng. Stat., Jay Liu 8

10 Review: correlation (cont.) Want to find a relationship y = f(x) other than the above? Adv. Eng. Stat., Jay Liu 9

11 Review: correlation (cont.) Remember!? Adv. Eng. Stat., Jay Liu 10

12 Least squares? Least squares regression? Regression is the act of choosing the best values for the unknown parameters in a model on the basis of a set of measured data. Linear regression is the special case where the model is linear in the parameters. A straight line has the form: There are many possible ways to define the best fit. However, the most commonly used measure for bestness is the sum of squared residuals. y a a x( e) 0 1 Least sum of squares of errors least squares in short. Important: error is from y, not from x Adv. Eng. Stat., Jay Liu 11

13 [FYI] why minimize the sum of squares? The least squares model: has the lowest possible variance for a 0 and a 1 when certain assumptions are met (more later) computationally tractable by hand easy to prove various mathematical properties intuitive: penalize deviations quadratically Other forms: multiple solutions, unstable, high variance solutions, mathematical proofs are difficult Adv. Eng. Stat., Jay Liu 12

14 Least squares (regression) It is the basis for : DOE (Design of Experiments) Latent variable methods We consider only 2 (sets of) variables : x and y (or x s and y) Simple least squares Multiple least squares Generalized least squares Adv. Eng. Stat., Jay Liu 13

15 Simple least squares Wind tunnel example How can we find the best line that describe the following data? Data from wind tunnel experiments: Drag force (F) at various wind velocities Adv. Eng. Stat., Jay Liu 14

16 Wind tunnel example (cont.) From the plot, a linear line seems adequate. y = a 0 + a 1 x At a data point (x i, y i ), error between the line and the point is: (see the figure on the right) e i = y i yˆi = y i a 0 a 1 x i Earlier, least squares means least sum of squares of errors. For all data points, sum of squares of errors is: We need to find model parameters a 0 and a 1 that minimize S r. Least squares Adv. Eng. Stat., Jay Liu 15

17 Wind tunnel example (cont.) How to find model parameters? Take a look at Sr. S r is a parabolic function w.r.t a o and a 1 and sign of a and 2 2 o 1 are plus. S r becomes minimum where S a r a Sr 0 & 0. a 0 1 a 1 a 0 Rearranging and solving for a 0 and a Adv. Eng. Stat., Jay Liu 16

18 Wind tunnel example (cont.) Calculations Adv. Eng. Stat., Jay Liu 17

19 Wind tunnel example (cont.) Calculations This is called simple least squares Adv. Eng. Stat., Jay Liu 18

20 Wind tunnel example (cont.) Results Is this OK with you? Adv. Eng. Stat., Jay Liu 19

21 General modeling procedure Define modeling objective Variable selection Identify the response variables (i.e., y variables), and the regressor variables (i.e., x variables) that are to be considered Design of experiment Design an experiment and use it to generate the data that will be used to fit the model Define the model Choose an appropriate form for the model Fit the model Estimate values for the parameters in the model N Does the model fit? Use the model Adv. Eng. Stat., Jay Liu 20 Y Statistical tools + prior knowledge

22 Simple least squares Summary Model form: y = a 0 + a 1 x + e Rearranging and solving for a 0 and a 1 becomes minimizes where S a r Sr 0 & 0. a Adv. Eng. Stat., Jay Liu 21

23 Simple least squares (cont.) Properties i. e., y yˆ 0 i i a 0 a Adv. Eng. Stat., Jay Liu 22

24 Simple least squares (cont.) Questions a 1 a 1 a 0 a 1 a 0 a 1 what if our model we want to find is non-linear? Ex. Activation energy in rate constant k k0e E RT Linearize! Adv. Eng. Stat., Jay Liu 23

25 Linearization Want to model non-linear relationships between independent (x) and dependent (y) variables. 1. Make a simple linear model through a suitable transformation. y = f(x) + e y = a 0 + a 1 x + e 2. Use previous results (simple least squares) Caution: nonlinear transformation also changes P.D.F of variables (and errors) We will discuss about this in model assessment Adv. Eng. Stat., Jay Liu 24

26 Linearization (Cont.) Adv. Eng. Stat., Jay Liu 25

27 Polynomial regression For quadratic form Sum of squares Again, S r has a parabolic shape w.r.t a 0, a 1, and a 2. with plus signs of a, a, and a S a r 0 S a S a r 1 r ( yi a0 a1xi a2xi ) xi ( yi a0 a1xi a2xi ) 0 2 x ( y a a x a x ) i i 0 1 i 2 i Adv. Eng. Stat., Jay Liu 26

28 Polynomial regression (Cont.) Rearranging the previous equations gives the above equations can be solved easily. (three unknowns and three equations.) For general polynomials 2 n xi x i a0 y i 2 3 xi xi xi a1 xi yi xi xi x i a 2 xi y i From the results of two cases (y = a 0 + a 1 x & y = a 0 + a 1 x + a 2 x 2 ) Sr Sr Sr 0 a a a 0 1 m we need to solve (m+1) linear algebraic equations for (m+1) parameters Adv. Eng. Stat., Jay Liu 27

Multiple least squares Consider when there are more than two independent variables, x 1, x 2,, x m. regression plane. y a 0 a 1 x 1 a 2 x 2 a m x m e For 2-D case, y = a 0 + a 1 x 1 + a 2 x 2.

29 Multiple least squares Consider when there are more than two independent variables, x 1, x 2,, x m. regression plane. y a 0 a 1 x 1 a 2 x 2 a m x m e For 2-D case, y = a 0 + a 1 x 1 + a 2 x 2. Again, S r has a parabolic shape w.r.t a 0, a 1. S r ( yi i a0 a1x1, i a2x2, ) 2 S a r 0 S a S a r 1 r 2 2 ( y a a x a x ) 0 i 0 1 1, i 2 2, i 2 x ( y a a x a x ) 0 1, i i 0 1 1, i 2 2, i 2 x ( y a a x a x ) 0 2, i i 0 1 1, i 2 2, i a 2 a Adv. Eng. Stat., Jay Liu 28

30 Multiple least squares (Cont.) Rearranging and solve for a 0, a 1 and a 2 gives For an m-dimensional plane, y a 0 a a a Same as in general polynomials, 1 x 1 2 x 2 m x m e Sr Sr Sr 0 a a a 0 1 m we need to solve (m+1) linear algebraic equations for (m+1) parameters Adv. Eng. Stat., Jay Liu 29

31 General least squares The following form includes all cases (simple least squares, polynomial regression, multiple regression) Ex. Simple and multiple least squares polynomial regression Same as before, Sr Sr Sr 0 a a a 0 1 we need to solve (m+1) linear algebraic equations for (m+1) parameters. m Adv. Eng. Stat., Jay Liu 30

32 Quantification of errors S t y i y 2 S r 2 ei yi a0z0, i a1z1, i amzm, i 2 Total sum of squares around the mean for the response variable, y Sum of squares of residuals around the regression line Adv. Eng. Stat., Jay Liu 31

33 Quantification of errors (Cont.) S y 1 2 St yi y n 1 n 1 S y x Sr n ( m 1) Standard deviation of y Standard error of predicted y (S E ) quantify appropriateness of regression Adv. Eng. Stat., Jay Liu 32

34 Quantification of errors (Cont.) Coefficients of determination, R 2 2 St Sr The amount of variability in the data explained R S by the regression model. t R 2 = 1 when S r = 0 : perfect fit (a regression curve passes through data points) R 2 = 0 when S r = S t : as bad as doing nothing It is evident from the figures that a parabola is adequate. R 2 of (b) is higher than that of (a) Adv. Eng. Stat., Jay Liu 33

35 Quantification of errors (Cont.) Warning! : R 2 1 does not guarantee that the model is adequate, nor the model will predict new data well. It is possible to force R 2 to be one by adding as many terms as there are observations. S r can be big when variance of random error is large. (Usual assumption on error is that error is random is unpredictable) Practice using Excel (1) Wind tunnel example with higher polynomials (2) Simple regression with increasing random noise Adv. Eng. Stat., Jay Liu 34

36 Confidence intervals - coefficients Coefficients in the regression model have confidence interval. Why? They are also statistics like & s. That is, they are numerical quantities calculated in a sample (not entire population). They are estimated values of parameters. x Statistic that we want to find its confidence interval statistic A statistic Value that depends on P.D.F of the statistic & confidence level a Standard error of the statistic statistic A statistic x x n x z a/2 t n,a/2 sx n The standard error of a statistic is the standard deviation of the sampling distribution of that statistic Adv. Eng. Stat., Jay Liu 35

37 Confidence intervals coefficients (cont.) Matrix representation of GLS y Za e Z y T T a T e m+1: number of coefficients n: number of data points Adv. Eng. Stat., Jay Liu 36

38 Confidence intervals coefficients (Cont.) Example Fitting quadratic polynomials to five data points Adv. Eng. Stat., Jay Liu 37 x y e x a x a a y e Za y e e e e e a a a Can you solve this problem? Three unknowns Five equations

39 Confidence intervals coefficients (Cont.) Solutions y Za e Sum of squares of errors S r e i r a 0 2 e T e T y Za y Za S T T Z Z a Z y Called normal equations 1. LU decomposition or other methods to solve L.A.E 2. Matrix inversion T T Z Za Z y " Ax b" T T T 1 T Z Za Z y a Z Z Z y computationally not efficient, but statistically useful Adv. Eng. Stat., Jay Liu 38

40 Confidence intervals coefficients (Cont.) Matrix inversion approach a T 1 T Z Z Z y Denote 1 Z ii as the diagonal element of Confidence interval of estimated coefficients a t S Z 2 1 i1 n( m1), a /2 y x ii Z T Z 1 t n( m1), a /2 Student t statistics S y x Sr n ( m 1) Standard error of estimate Adv. Eng. Stat., Jay Liu 39

41 Confidence intervals coefficients (Cont.) For a linear model, C.I. for a 1 (slope) C.I. for a 0 (intercept) a t S a t S 1 n ( m 1), a /2 y / x 2 0 n( m1), a /2 y/ x 2 What if confidence intervals contain zero? x 1 i 2 1 x n x x x i Adv. Eng. Stat., Jay Liu 40

42 Confidence intervals prediction C.I for predicted y, yˆi 2 0 yˆ t x n( m1), a /2 Sy/ x 2 0 n xi x 1 x x Adv. Eng. Stat., Jay Liu 41

43 Model assessment When we do not know the model form, we have to assess the model before use it after we fit a regression model. However, in order to assess the model and make inferences about the parameters and predictions from the model, we will have to employ statistics and make some assumptions about the nature of the disturbance. Tools for model assessment S y/x, R 2 (quantitative) ( Do not use) Residual Plots (qualitative) Normal probability chart (qualitative or quantitative) Test for lack of fit (quantitative) This is used when the dataset includes replicates. It is based on analysis of variance (ANOVA) Adv. Eng. Stat., Jay Liu 42

44 Model assessment - assumptions What is the most desirable errors in regression? Assumptions on error unpredictable Error is additive y 1 a0 a1x e y ( a0 a1x1 ) e The variance of the error is constant and is not related to values of the response or values of the regressor variables. There is no error associated with the values of the regressor variables. Error is a random variable with Gaussian distribution N(0, 2 ) ( 2 usually unknown) Adv. Eng. Stat., Jay Liu 43

45 Model assessment residual plots Recall the assumptions on error Error is not related to the values of response or regressor variables. Then, assumptions will not be valid if the model is wrong. Following residual plots will reveal this. Residuals vs. regressor variables Residuals vs. fitted y values ( yˆi ) Residuals vs. lurking variables (i.e. time or order) These plots will show some patterns when a model is inadequate Adv. Eng. Stat., Jay Liu 44

46 Model assessment residual plots (con t) Examples e e x x Adv. Eng. Stat., Jay Liu 45

47 Model assessment residual plots (con t) Examples of residual plots Adv. Eng. Stat., Jay Liu 46

Model assessment normal probability plot Recall the assumptions on error Error is a random variable with Gaussian distribution N(0, 2 ) ( 2 usually unknown) Then, errors will fall onto a straight

48 Model assessment normal probability plot Recall the assumptions on error Error is a random variable with Gaussian distribution N(0, 2 ) ( 2 usually unknown) Then, errors will fall onto a straight line (y = x) in a normal probability plot. (especially useful when the number of data points is large) Normal probability plot Alternatively, normality test can be used Adv. Eng. Stat., Jay Liu 47

49 Model assessment ANOVA (Test for lack of fit) The variance breakdown t 2 S y y i S t S r S e ˆ 2 S y y r i i Re g ˆ 2 S y y i S S S t Re g r Really? Prove to yourself Adv. Eng. Stat., Jay Liu 48

50 Model assessment ANOVA (Test for lack of fit) The variance breakdown S S S t Re g r Ratio of S Reg /S r follows F distribution when corrected with degree of freedom. If regression is not meaningful, the ratio (S e /S r ) is small and S t S r. t 2 S y y i S t S r ˆ 2 S y y r i i S Reg Re g ˆ 2 S y y i Adv. Eng. Stat., Jay Liu 49

51 Model assessment ANOVA (Test for lack of fit) a 0 a 1 S t SRe g Sr Adv. Eng. Stat., Jay Liu 50

52 Model assessment ANOVA (Test for lack of fit) ANOVA Table Source of Var. Sum of Degrees of Mean Square F 0 Squares Freedom Regression S Reg p MS Reg =S Reg /p MS Reg / MS E Residual error S r n-p MS E =S r /(n-p) Total S t n-1 Compare F 0 to the critical value F p,n-p;a What we are doing is a test of hypothesis. We are testing the hypothesis: H 0 : b0 b p 0 H 1 : at least one parameter is not equal to zero Adv. Eng. Stat., Jay Liu 51

53 [FYI]Meaning of a p-value in hypothesis test A measure of how much evidence we have against the null hypothesis. Null hypothesis (H 0 ) represents the hypothesis of no change or no effect. Much research involves making a hypothesis and then collecting data to test that hypothesis. Then researchers will collect data and measure the consistency of this data with the null hypothesis. A small p-value is evidence against the null hypothesis while a large p-value means little or no evidence against the null hypothesis. Traditionally, researchers will reject a null hypothesis if the p-value is less than 0.05 (a = 0.05). p-value can mean that the possibility that you can be wrong when rejecting the null hypothesis Adv. Eng. Stat., Jay Liu 52

54 Integer variables in the model Integer variables 0 and 1 can represent qualitative variables. Example: raw material from Spain, India, or Vietnam y = a 0 + a 1 x a k x k + r 1 d 1 + r 2 d 2 + r 3 d 3 d 1 = 1 and d 2 = 0 and d 3 = 0 for Spain d 1 = 0 and d 2 = 1 and d 3 = 0 for India d 1 = 0 and d 2 = 0 and d 3 = 1 for Vietnam Often called indicator variables for this reason Adv. Eng. Stat., Jay Liu 53

55 Integer variables in the model Example Want to predict yield when two different impeller used. Yield = f(temperature, impeller type) Build two different models (one for axial, one for radial) axial Build one model using indicator variable. y = a 0 + a 1 T + rd radial y = a 0 + a 1 T + rd i d i = 0 for axial, d i = 1 for radial Adv. Eng. Stat., Jay Liu 54

56 Leverage effect Unusual observations influence the model parameters and our interpretation y y x x Outliers have an over-proportional effect on resulting regression curves. To avoid the leverage effect, Remove outliers before regression (but do not delete without investigation) Use different S r (no longer least squares) Adv. Eng. Stat., Jay Liu 55

57 Causal relation and correlation Causal relation Cause and effect relation Has physical/chemical/engineering meanings x and y are not interchangeable Direction exists. Correlation (Linear) relationship between two variables No physical/chemical/engineering meanings. Average height of 20 s men vs. year x and y are interchangeable Adv. Eng. Stat., Jay Liu 56

58 Advanced topics Testing of least-squares models Adv. Eng. Stat., Jay Liu 57

59 Advanced topics - Testing of least-squares models Adv. Eng. Stat., Jay Liu 58

60 Advanced topics Correlated x s MLR solution 2 T T y Za e S e r S r 0 a i e e y Za y Za T 1 T a Z Z Z y T Z Z a T Z y When two or more x s are correlated, ill-conditioned. T ZZ 1 becomes nearly singular, i.e., Adv. Eng. Stat., Jay Liu 59

61 Advanced topics correlated x s High/no correlation between x 1 and x 2 T ZZ T ZZ T ZZ 1 T ZZ What if very small (measurement ) noises added to x s T ZZ T ZZ T ZZ 1 T ZZ What will happen to your MLR model? Adv. Eng. Stat., Jay Liu 60

uncertain also Geometrically speaking High

62 Advanced topics correlated x s If high correlation among x s: unstable solutions for a predictions uncertain also Geometrically speaking High correlation between x 1 and x Adv. Eng. Stat., Jay Liu 61

63 Advanced topics correlated x s Remedies? Use selected x variables stepwise regression Use ridge regression Use multivariate methods (will not be covered in this lecture) Adv. Eng. Stat., Jay Liu 62

64 Advanced topics What we want to know: How do we select the form of the model? Which variables should be included? Should we include transformations of the regressor variables?. What we want: We would like to build the best regression model We would like to include as many regressor variables as is necessary to adequately describe the behaviour of y. At the same time, we want to keep the model as simple as possible. Stepwise regression start off by choosing an equation having the single best x variables and the attempts to build up with subsequent additions of x s one at a time as long as these additions are worthwhile Adv. Eng. Stat., Jay Liu 63

65 Advanced topics stepwise regression Procedure 1. Add a x variable to the model (the variable that is most highly correlated with y). 2. Check to see whether or not this has significantly improved the model. One way is to see whether or not the confidence interval for the parameter includes zero. (of course you can use hypothesis test) If the new term or terms are not significant, remove them from the model. 3. Find one of the remaining x variables that is highly correlated with the residuals and repeat the procedure Adv. Eng. Stat., Jay Liu 64

66 Advanced topics Ridge regression (Hoerl, 1962; Hoerl & Kennard, 1970a,b) A modified regression method specifically for ill-conditioned datasets that allows all variables to be kept in the model. This is possible by adding additional information to the problem to remove the ill-conditioning. The objective function to minimize: T T y Xb y Xb kb b Least squares estimates have no bias but large variance, while ridge regression estimates have small bias and small variance Adv. Eng. Stat., Jay Liu 65

67 Advanced topics ridge regression Procedure 1. Mean center and scale all x s to unit variance f i x x i s x i 2. Rewrite the model as: x1 x xp x y y a1s x apsx s s 1 1 p x1 xp y bf b f 1 Y Fb ε p p Adv. Eng. Stat., Jay Liu 66

68 Advanced topics ridge regression 3. The objective function to use for the optimization is to minimize J T Y Fb Y Fb kbb Therefore, T 1 T b* F FZ ki F Y 4. Solve the optimization problem in Step 3 for several values of k between 0 and 1 and choose that value of k at which the estimates of b see to stabilize. Otherwise, choose k by validation on new data Adv. Eng. Stat., Jay Liu 67

69 Advanced topics Non-linear regression General form of a non-linear regression model y f( xa, ) T In a linear model, f ( x, a) x a. In a non-linear model, f() woulds have any form. E.g., b1 e x y 1 b x 2 Remember that nonlinear transformation also changes P.D.F of variables (and errors)? What does this mean? Adv. Eng. Stat., Jay Liu 68

70 Advanced topics non-linear regression The approach is exactly the same as for linear models We use the same objective function: S r εε n i1 n i1 ( y yˆ ) i i 2 ( y f x, a ) i i 2 All we need is to minimize S over a. but how? Adv. Eng. Stat., Jay Liu 69

71 Advanced topics non-linear regression The big difference between linear and nonlinear regression is that in general, the optimization problem for a nonlinear model does not have an exact analytical solution. Therefore, we have to use a numerical optimization algorithm such as: Gauss-Newton Steepest Descent Conjugated Gradients Any other optimization algorithm Adv. Eng. Stat., Jay Liu 70

72 Advanced topics non-linear regression When using an optimization algorithm to solve nonlinear regression problems, one needs to be able to specify: 1. an expectation function (i.e. the form of the model) 2. Data 3. starting guesses for a 4. stopping criteria 5. possibly other tuning parameters associated with the optimization algorithm Adv. Eng. Stat., Jay Liu 71

73 Advanced topics non-linear regression Problems with Numerical Optimization Failure to converge Finding only a local minimum and not the global minimum Requires good starting guesses for the parameters Can be sensitive to the choice of convergence criteria and other tuning parameters of the algorithm Sometimes requires specification of the derivatives of the model with respect the the parameters Adv. Eng. Stat., Jay Liu 72

Simple Linear Regression for the Climate Data

Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO